Solving Project Euler: Problem 063

May 13, 2017Project Eulersolving_project_eulerVijay

Problem 63 is an easy one. Let’s get right to it.

The 5-digit number, 16807=7⁵, is also a fifth power. Similarly, the 9-digit number, 134217728=8⁹, is a ninth power.

How many n-digit positive integers exist which are also an nth power?

Nothing that cannot be solved in one (word-wrapped) line of Python code.

print(sum([1 for j in range(22) for i in range(1, 22)
           if len(str(i ** j)) == j]))

1 2	print(sum([1 for j in range(22) for i in range(1, 22) if len(str(i ** j)) == j]))

Solving Project Euler: Problem 062

May 7, 2017Project Eulersolving_project_eulerVijay

Problem 62 is yet another wonderful illustration of the limits of brute force approaches. Here is the problem description.

The cube, 41063625 (345³), can be permuted to produce two other cubes: 56623104 (384³) and 66430125 (405³). In fact, 41063625 is the smallest cube which has exactly three permutations of its digits which are also cube.

Find the smallest cube for which exactly five permutations of its digits are cube.

On the face of it, the approach seems evident: list out all cubes up to a certain upper bound. For each cube, check if the list of cubes has at least 4 other numbers formed by transposing its digits. The lowest number in that set is the result.

I will admit to taking this approach initially. And my program ran and ran and ran… and ran some more before I terminated it.

Why did this seemingly straightforward approach fail?

It failed because of the number of executions. To give you an idea, 123³ is 1860867. The number of permutations of this number is 5040, the factorial of the number of digits. So, just to loop through all permutations of cubes from 1 to 500, we would have over 23.3 million permutations to search through. If our upper bound was 1000 cubes, then the number shoots up to over 204 million permutations!

How can we solve this? Well, it turns out that if we changes the order of steps in our original approach, we can solve this problem in O(n) time. As we generate cubes, we can cache the digits of the cube. Any new cube that we add to our cache will be checked to see if it can be formed by an existing set of digits. Once the number of cubes for a particular set of digits reaches five, we have our answer.

cubes = [str(i ** 3) for i in range(10000)]
d = {}

for cube in cubes:
    digits = tuple(sorted(cube))  # using tuples since lists are non-hashable
    if digits in d:
        d[digits] += ',' + cube
        if d[digits].count(',') == 4:
            print(d[digits].split(',')[0])
            break
    else:
        d[digits] = cube

cubes = [str(i ** 3) for i in range(10000)]

d = {}

for cube in cubes:

digits = tuple(sorted(cube)) # using tuples since lists are non-hashable

if digits in d:

d[digits] += ',' + cube

if d[digits].count(',') == 4:

print(d[digits].split(',')[0])

break

else:

d[digits] = cube

Solving Project Euler: Problem 061

April 30, 2017Project Eulersolving_project_eulerVijay

Problem 61 deals with figurate numbers. Here is the problem statement.

Triangle, square, pentagonal, hexagonal, heptagonal, and octagonal numbers are all figurate (polygonal) numbers and are generated by the following formulae:

Triangle P_3,n=n(n+1)/2 1, 3, 6, 10, 15, …

Square P_4,n=n² 1, 4, 9, 16, 25, …

Pentagonal P_5,n=n(3n−1)/2 1, 5, 12, 22, 35, …

Hexagonal P_6,n=n(2n−1) 1, 6, 15, 28, 45, …

Heptagonal P_7,n=n(5n−3)/2 1, 7, 18, 34, 55, …

Octagonal P_8,n=n(3n−2) 1, 8, 21, 40, 65, …

The ordered set of three 4-digit numbers: 8128, 2882, 8281, has three interesting properties.

The set is cyclic, in that the last two digits of each number is the first two digits of the next number (including the last number with the first).

Each polygonal type: triangle (P_3,127=8128), square (P_4,91=8281), and pentagonal (P_5,44=2882), is represented by a different number in the set.

This is the only set of 4-digit numbers with this property.

Find the sum of the only ordered set of six cyclic 4-digit numbers for which each polygonal type: triangle, square, pentagonal, hexagonal, heptagonal, and octagonal, is represented by a different number in the set.

The statement at the end — “only ordered set“ — threw me off a bit, because I took it to mean that the six numbers would be triangular, square, pentagonal and so on in that order. There is no such set, and so I had to go back to the drawing board again.

It was then that point 2 in the problem statement dawned on me. 8128, 2882, and 8281 are cyclical, but 2882 is pentagonal. This makes the problem more interesting. The set of six numbers could be any permutation of the six types of figurates we are dealing with. Thankfully, since we know that there exists only one such set of six numbers, we can stop executing when we reach that set.

from itertools import permutations


def is_cyclical(i, j):
    return str(i)[2:] == str(j)[:2]


def append_to_list(l, e):
    return_list = []
    for i in l:
        for j in e:
            if is_cyclical(i[-1], j[0]):
                k = i[:]
                k.append(j[0])
                return_list.append(k)

    return return_list


def sum_of_cyclical_figurate():
    perms = list(permutations([3, 4, 5, 6, 7, 8]))
    for p in perms:
        cy1 = append_to_list(figurates[p[0]], figurates[p[1]])
        cy2 = append_to_list(cy1, figurates[p[2]])
        cy3 = append_to_list(cy2, figurates[p[3]])
        cy4 = append_to_list(cy3, figurates[p[4]])
        cy5 = append_to_list(cy4, figurates[p[5]])
        for x in cy5:
            if is_cyclical(x[-1], x[0]):
                return sum(x)


figurates = [[]] * 9

figurates[3] = [[n] for n in [i * (i + 1) // 2 for i in range(3, 200)] if 999 < n < 10000]
figurates[4] = [[n] for n in [i ** 2 for i in range(3, 100)] if 999 < n < 10000]
figurates[5] = [[n] for n in [i * (3 * i - 1) // 2 for i in range(5, 100)] if 999 < n < 10000]
figurates[6] = [[n] for n in [i * (2 * i - 1) for i in range(6, 100)] if 999 < n < 10000]
figurates[7] = [[n] for n in [i * (5 * i - 3) // 2 for i in range(7, 100)] if 999 < n < 10000]
figurates[8] = [[n] for n in [i * (3 * i - 2) for i in range(8, 100)] if 999 < n < 10000]

print(sum_of_cyclical_figurate())

from itertools import permutations

def is_cyclical(i, j):

return str(i)[2:] == str(j)[:2]

def append_to_list(l, e):

return_list = []

for i in l:

for j in e:

if is_cyclical(i[-1], j[0]):

k = i[:]

k.append(j[0])

return_list.append(k)

return return_list

def sum_of_cyclical_figurate():

perms = list(permutations([3, 4, 5, 6, 7, 8]))

for p in perms:

cy1 = append_to_list(figurates[p[0]], figurates[p[1]])

cy2 = append_to_list(cy1, figurates[p[2]])

cy3 = append_to_list(cy2, figurates[p[3]])

cy4 = append_to_list(cy3, figurates[p[4]])

cy5 = append_to_list(cy4, figurates[p[5]])

for x in cy5:

if is_cyclical(x[-1], x[0]):

return sum(x)

figurates = [[]] * 9

figurates[3] = [[n] for n in [i * (i + 1) // 2 for i in range(3, 200)] if 999 < n < 10000]

figurates[4] = [[n] for n in [i ** 2 for i in range(3, 100)] if 999 < n < 10000]

figurates[5] = [[n] for n in [i * (3 * i - 1) // 2 for i in range(5, 100)] if 999 < n < 10000]

figurates[6] = [[n] for n in [i * (2 * i - 1) for i in range(6, 100)] if 999 < n < 10000]

figurates[7] = [[n] for n in [i * (5 * i - 3) // 2 for i in range(7, 100)] if 999 < n < 10000]

figurates[8] = [[n] for n in [i * (3 * i - 2) for i in range(8, 100)] if 999 < n < 10000]

print(sum_of_cyclical_figurate())

YouTube Spam Collection

April 9, 2017Machine Learningmachine_learning, uci_ml_repositoryVijay

[I recently moved the Jupyter notebooks for this problem to GitHub. If you want to skip the commentary, you may download the notebooks here: Part 1, Part 2]

This data set was posted to the UCI Machine Learning Repository a couple of weeks ago. It is a straight up spam/ham classification problem. The original source of the data is here.

The creators of this data set (Tiago, Tulio) collected 1,956 comments on five YouTube videos during a certain time period and classified each as spam or ham. We will attempt to build a machine learning model to learn the data set and test our model’s performance.

The data is spread across five similar, but distinct data sets. We will take two passes through this. In our first pass, we will consider only the first data set, based on a video by the artist Psy. We will build a simple model based on a Naive Bayes classifier.

In our second pass, we will merge all five data sets into one unified data set, and will build a model that learns from and predicts comments as spam. For this pass, we will attempt multiple classifiers and pick the one that has the best accuracy score, and will further tune this model to improve performance.

As we have done in the past, we will follow our established workflow for building and testing machine learning models, namely, read the data, perform data cleanup where necessary, split the data set, transform the data set as necessary, select a model, train the model, test the model, and determine next steps.

The data set has five columns: comment_id, author, date, content, and class. Though not labeled explicitly, class 1 seems to represent spam and class 0 represents ham.
Of these, the only relevant feature is content, with class being the target.
The columns of interest do not have any missing data, so no data cleanup is necessary.
The spam/ham distribution is split nearly equally. This is good because if one class dominated the data set, the model may be skewed towards that class.
We will learn from 80% of the data set, and use the remaining 20% as our test set.
Since the feature column is text, we will need to convert it to a numeric format. To do this, we will use the CountVectorizer class. The CountVectorizer will build a document-term matrix, which represents each row in the feature column as an array of ones and zeros, with the former representing the presence of a word. The result of this operation is a sparse matrix containing 1564 rows (80%) and 3810 columns (number of unique words in the training set).
Next, we will build the model:
1. For our first pass, we will build a model using the MultinomialNB class
2. For our second pass, we will take a more elaborate approach, choosing from 8 different models. We will use 10-fold cross validation and select the model with the best accuracy score. In this case, the DecisionTreeClassifier performed the best. We will further tune this model using GridSearch to settle on the best set of parameters that produce the highest accuracy
Once the model is built, we will test it against the testing data set.
1. For the Psy data set, our accuracy score was 97%
2. For the combined data set, our accuracy score was 93%
As our final step, we can review the confusion matrix, and take a look at the false positives and false negatives.
If our model allows it, we can also list the words that were the most and least spammy, i.e. words that had the highest and lowest spam-to-ham ratio.

Solving Project Euler: Problem 058

April 2, 2017Project Eulerprime_numbers, solving_project_eulerVijay

We encountered number spirals previously in problem 28, and now we revisit them in Problem 58. Here is the problem description.

Starting with 1 and spiralling anticlockwise in the following way, a square spiral with side length 7 is formed.

37 36 35 34 33 32 31
38 17 16 15 14 13 30
39 18 5 4 3 12 29
40 19 6 1 2 11 28
41 20 7 8 9 10 27
42 21 22 23 24 25 26
43 44 45 46 47 48 49

It is interesting to note that the odd squares lie along the bottom right diagonal, but what is more interesting is that 8 out of the 13 numbers lying along both diagonals are prime; that is, a ratio of 8/13 ≈ 62%.

If one complete new layer is wrapped around the spiral above, a square spiral with side length 9 will be formed. If this process is continued, what is the side length of the square spiral for which the ratio of primes along both diagonals first falls below 10%?

We will take the same approach here as we did for solving problem 28. For each new square layer that is added, the numbers at the four corners will be of the form n², n² – (n-1), n² – (n-1)*2, n² – (n-1)*3. We will therefore check if each of these values (except, of course, n²) is prime and increment a counter accordingly.

My first attempt at solving this problem was using a prime sieve, but even with an upper limit of 100,000,000, we will be unable to reach 10%. So my revised attempt makes use of our trusty old is_prime() function.

from euler_commons import is_prime

prime_count = 0

i = 1
while True:
    i += 2
    n1 = i ** 2  ## never prime
    n2 = n1 - (i - 1)
    n3 = n2 - (i - 1)
    n4 = n3 - (i - 1)

    for n in [n2, n3, n4]:
        if is_prime(n):
            prime_count += 1

    prime_pct = prime_count / (2 * i - 1)

    if prime_pct < 0.1:
        break

print(i)

from euler_commons import is_prime

prime_count = 0

i = 1

while True:

i += 2

n1 = i ** 2 ## never prime

n2 = n1 - (i - 1)

n3 = n2 - (i - 1)

n4 = n3 - (i - 1)

for n in [n2, n3, n4]:

if is_prime(n):

prime_count += 1

prime_pct = prime_count / (2 * i - 1)

if prime_pct < 0.1:

break

print(i)

Solving Project Euler: Problem 057

March 26, 2017Project Eulersolving_project_eulerVijay

Problem 57 is titled Square root convergents. Here is the problem statement.

It is possible to show that the square root of two can be expressed as an infinite continued fraction.

√ 2 = 1 + 1/(2 + 1/(2 + 1/(2 + … ))) = 1.414213…

By expanding this for the first four iterations, we get:

1 + 1/2 = 3/2 = 1.5
1 + 1/(2 + 1/2) = 7/5 = 1.4
1 + 1/(2 + 1/(2 + 1/2)) = 17/12 = 1.41666…
1 + 1/(2 + 1/(2 + 1/(2 + 1/2))) = 41/29 = 1.41379…

The next three expansions are 99/70, 239/169, and 577/408, but the eighth expansion, 1393/985, is the first example where the number of digits in the numerator exceeds the number of digits in the denominator.

In the first one-thousand expansions, how many fractions contain a numerator with more digits than denominator?

I found putting pen to paper very helpful in solving this problem. Obviously, trying construct a fraction going to 1000 levels should be enough to scare anyone from trying to solve this using a brute force approach. Let us write down the first few numbers in this series.

3/2
7/5
17/12
41/29
99/70
239/169

My first thought was that the numerators are prime, but seeing 99 in the list blew a hole in that notion. As we continue to look for patterns, we find that the next denominator in the series is merely the sum of the numerator and the denominator of the current number.

5 = 3 + 2; 12 = 7 + 5; 29 = 17 + 12, and so on

The numerators also follow a pattern. If you look at 3, 7, 17, 41, you will notice that the next numerator in the series is twice the current numerator plus the previous numerator.

17 = 7*2 + 3; 41 = 17*2 + 7; 99 = 41*2 + 17, and so on

Now that we have reduced the n^th number in the series into its constituent parts, we can set up two lists, one each for the numerator and the denominator, and count for those instances where the numerator has more digits than the denominator.

numerators = [3, 7]
denominators = [2]

count = 0
while count <= 999:
    count += 1
    denominators.append(denominators[count - 1] + numerators[count - 1])
    numerators.append(numerators[count] * 2 + numerators[count - 1])

print(sum(1 for i, j in enumerate(numerators)
          if i < 1000 and len(str(j)) > len(str(denominators[i]))))

numerators = [3, 7]

denominators = [2]

count = 0

while count <= 999:

count += 1

denominators.append(denominators[count - 1] + numerators[count - 1])

numerators.append(numerators[count] * 2 + numerators[count - 1])

print(sum(1 for i, j in enumerate(numerators)

if i < 1000 and len(str(j)) > len(str(denominators[i]))))

The Airline Report, January 2017

March 26, 2017The Airline Reportairline_guruVijay

Download [727.31 KB]

Solving Project Euler: Problem 056

March 19, 2017Project Eulersolving_project_eulerVijay

Problem 56 is titled Powerful digit sum. The problem statement is as follows.

A googol (10¹⁰⁰) is a massive number: one followed by one-hundred zeros; 100¹⁰⁰ is almost unimaginably large: one followed by two-hundred zeros. Despite their size, the sum of the digits in each number is only 1.

Considering natural numbers of the form, a^b, where a, b < 100, what is the maximum digital sum?

Once you break this problem down to its constituent parts, it becomes very easy to solve.

def get_digital_sum(s):
    return sum([int(n) for n in list(str(s))])


max_digital_sum = 0
for a in range(1, 100):
    for b in range(1, 100):
        max_digital_sum = max(get_digital_sum(a ** b), max_digital_sum)

print(max_digital_sum)

def get_digital_sum(s):

return sum([int(n) for n in list(str(s))])

max_digital_sum = 0

for a in range(1, 100):

for b in range(1, 100):

max_digital_sum = max(get_digital_sum(a ** b), max_digital_sum)

print(max_digital_sum)

Solving Project Euler: Problem 055

March 12, 2017Project Eulersolving_project_eulerVijay

Problem 55 introduces us to Lychrel numbers. Let’s take a look at the problem statement.

If we take 47, reverse and add, 47 + 74 = 121, which is palindromic.

Not all numbers produce palindromes so quickly. For example,

349 + 943 = 1292,
1292 + 2921 = 4213
4213 + 3124 = 7337

That is, 349 took three iterations to arrive at a palindrome.

Although no one has proved it yet, it is thought that some numbers, like 196, never produce a palindrome. A number that never forms a palindrome through the reverse and add process is called a Lychrel number. Due to the theoretical nature of these numbers, and for the purpose of this problem, we shall assume that a number is Lychrel until proven otherwise. In addition you are given that for every number below ten-thousand, it will either (i) become a palindrome in less than fifty iterations, or, (ii) no one, with all the computing power that exists, has managed so far to map it to a palindrome. In fact, 10677 is the first number to be shown to require over fifty iterations before producing a palindrome: 4668731596684224866951378664 (53 iterations, 28-digits).

Surprisingly, there are palindromic numbers that are themselves Lychrel numbers; the first example is 4994.

How many Lychrel numbers are there below ten-thousand?

NOTE: Wording was modified slightly on 24 April 2007 to emphasise the theoretical nature of Lychrel numbers.

It is very helpful to know that we could stop checking after 50 attempts, for if we did not have this information, this would be a never-ending quest.

In the code listing below, I get a count of all the non-Lychrel numbers under 10000, and subtract that from 10000 to get the right answer. (While the logic is right, it took me a few tries to get to the right answer, because I did not know what number to use as my lower bound. I started with 10, moved down to 1, and finally to 0 in order to get to the right answer.)

def get_palindrome(n):
    return int(str(n)[::-1])


def is_palindrome(n):
    if str(n) == str(n)[::-1]:
        return True
    return False


upper_limit = 10000
non_lychrel_count = 0

for i in range(upper_limit):
    current_number = i
    for k in range(50):
        total = current_number + get_palindrome(current_number)
        if is_palindrome(total):
            non_lychrel_count += 1
            break
        else:
            current_number = total

print(upper_limit - non_lychrel_count)

def get_palindrome(n):

return int(str(n)[::-1])

def is_palindrome(n):

if str(n) == str(n)[::-1]:

return True

return False

upper_limit = 10000

non_lychrel_count = 0

for i in range(upper_limit):

current_number = i

for k in range(50):

total = current_number + get_palindrome(current_number)

if is_palindrome(total):

non_lychrel_count += 1

break

else:

current_number = total

print(upper_limit - non_lychrel_count)

Solving Project Euler: Problem 053

January 22, 2017Project Eulerfactorials, solving_project_eulerVijay

Problem 53 is titled Combinatoric selections, and its description is as follows.

There are exactly ten ways of selecting three from five, 12345:

123, 124, 125, 134, 135, 145, 234, 235, 245, and 345

In combinatorics, we use the notation, ⁵C₃ = 10.

In general,

ⁿC_r =
n! / (r!(n−r)!)
,where r ≤ n, n! = n×(n−1)×…×3×2×1, and 0! = 1.

It is not until n = 23, that a value exceeds one-million: ²³C₁₀ = 1144066.

How many, not necessarily distinct, values of ⁿC_r, for 1 ≤ n ≤ 100, are greater than one-million?

Interestingly, this problem only looks for the number of combinations, and the formula is listed in the problem statement itself. We will make use of this and check for values of n ≥ 23. For the given upper bound of n, it is sufficient to check for values of r between 4 and n-3. This is because ¹⁰⁰C₃ = ¹⁰⁰C₉₇ < 1000000.

def factorial(n):
    if n in factorials:
        return factorials[n]

    product = 1
    for i in range(2, n + 1):
        product *= i

    factorials[n] = product
    return product


factorials = {0: 1, 1: 1}
count = sum([1 for n in range(23, 101) for r in range(4, n - 3)
             if factorial(n) / (factorial(r) * factorial(n - r)) > 1000000])

print(count)

def factorial(n):

if n in factorials:

return factorials[n]

product = 1

for i in range(2, n + 1):

product *= i

factorials[n] = product

return product

factorials = {0: 1, 1: 1}

count = sum([1 for n in range(23, 101) for r in range(4, n - 3)

if factorial(n) / (factorial(r) * factorial(n - r)) > 1000000])

print(count)

Vijay Narayanan

labor ipse voluptas

Solving Project Euler: Problem 063

Solving Project Euler: Problem 062

Solving Project Euler: Problem 061

YouTube Spam Collection

Solving Project Euler: Problem 058

Solving Project Euler: Problem 057

The Airline Report, January 2017

Solving Project Euler: Problem 056

Solving Project Euler: Problem 055

Solving Project Euler: Problem 053

Triangle	P_3,n=n(n+1)/2	1, 3, 6, 10, 15, …
Square	P_4,n=n²	1, 4, 9, 16, 25, …
Pentagonal	P_5,n=n(3n−1)/2	1, 5, 12, 22, 35, …
Hexagonal	P_6,n=n(2n−1)	1, 6, 15, 28, 45, …
Heptagonal	P_7,n=n(5n−3)/2	1, 7, 18, 34, 55, …
Octagonal	P_8,n=n(3n−2)	1, 8, 21, 40, 65, …