C Developer: Project Euler problem 5

Project Euler problem 5

What is the smallest number that has all the numbers from 1 to 20 as factors? Or, what is the Least Common Multiple (LCM) of 1 to 20?

Solution

The simplest solution would be to start from 1 and test each number for factors 1 to 20. This works, but is very expensive - we would need to check 20n factors, where n is the smallest number for which all numbers from 1 to 20 are factors. Although we don't know in advance what n will be, its upper limit is 20! (i.e. 20 x 19 x 18 ...), which is a huge number. Factorisation is also expensive, as it often involves a division instruction (some can be converted to shifts or multiplications, but not all, and this also depends on the compiler).

Our first question therefore is: how do we reduce the search space from 1..20! to a more manageable size? There are two main changes we can make.

Only check multiples of the highest factor. Only multiples of x will have x as a factor. In this particular example our largest factor is 20, so we only need to check every 20th number (20, 40, 60, ...). Any numbers in between will not have 20 as a factor, and therefore we do not need to test any of the other factors either. This simple observation reduces the search space by 95%, as we only need to check 5% (1 in every 20) of the numbers.

Only check the top half of the factors list. If we split the list of factors in half, we end up with (1 to 10) and (11 to 20). Each element in the first set is a factor of at least one element in the second set (2 is a factor of 12, 3 is a factor of 12, 4 is a factor of 16 etc.) If a number, f, is a factor of another number, f', then any number which has f' as a factor will also have f as a factor. For example, if we know that 16 is a factor of 32, then we know that 2, 4 and 8 are also factors of 32, because they are factors of 16. This means that we only have to check the second set of factors, reducing the number of checks by 50%.

If we combine these two methods, we reduce the number of calculations from 20n to n (95% reduction) and then to n/2 (50% reduction), or 2.5% of our simplest solution.

#include <stdlib.h>
#include <stdio.h>

int main() {
    int smallest_number = 0;
    const int highest_factor = 20;
    const int lowest_factor = 11;
    const int required_factor_count = 9;

    for (int current_number = step; smallest_number == 0; current_number += step) {
        int current_factor_count = 0;

        for (int current_factor = lowest_factor; current_factor < highest_factor; current_factor++) {
            if (current_number % current_factor == 0) {
                current_factor_count++;
            } else {
                break;
            }
        }

        if (current_factor_count == required_factor_count) {
            smallest_number = current_number;
        }
    }

    printf("%d\n", smallest_number);

    return EXIT_SUCCESS;
}

This runs in 0.08s on my machine, an acceptable result. The LCM is over 200mn, so our simple solution would have resulted in over 4000mn calculations, whereas by reducing the search space and factors we end up with closer to 100mn.

Alternative solution

If we want to avoid divisions (via the modulus operator), we can instead keep a running total of each of the top half of factors. Each time we increment the highest factor (20) by itself, we also increase the other target factors (11 to 19) by themselves, until they equal or exceed the running total of the highest factor. If all the running totals match, we have found the LCM.

#include <stdlib.h>
#include <stdio.h>

int main() {
    int factors[] = {
        0,
        1,
        2,
        3,
        4,
        5,
        6,
        7,
        8,
        9,
        10,
        11,
        12,
        13,
        14,
        15,
        16,
        17,
        18,
        19,
        20
    };

    const int highest_factor = 20;
    const int lowest_factor = 11;
    const int required_factor_count = 9;
    int smallest_number = 0;

    for (int current_number = highest_factor; smallest_number == 0; current_number += highest_factor) {
        int factor_count = 0;

        // Increase all factors until they equal or exceed the current number
        for (int current_factor = lowest_factor; current_factor < highest_factor; current_factor++) {
            while (factors[current_factor] < current_number) {
                factors[current_factor] += current_factor;
            }

            if (factors[current_factor] == current_number) {
                factor_count++;
            } else {
                break;
            }
        }

        if (factor_count == required_factor_count) {
            smallest_number = current_number;
        }
    }

    printf("%d\n", smallest_number);

    return EXIT_SUCCESS;
}

This takes 0.25s on my machine - still very fast but roughly 3x as long as the original solution.

Generic solution

Both of the above solutions are optimised for the specific problem, including where the list of target factors is an unbroken sequence of positive integers starting with 1. A more generic solution, to find the LCM of any given set of positive integers, requires a bit more thought. To implement this we first need to understand that:

lcm(a, b, c) = lcm(a, lcm(b, c))

This is useful because we only need to implement one function - finding the LCM of two integers. Any other combination can be calculated by calling that function multiple times with different values.

Our search space for lcm(a, b) is: min(a, b) to ab. This is because the lowest possible LCM is the smallest of a and b, because no number can be a factor of a smaller number (e.g. 4 cannot be a factor of 2). The highest possible LCM is ab, because ab is guaranteed to have a and b as factors.

Some examples before we move on to writing an LCM function in C:

a	b	Lower bound: min(a, b)	Upper bound: ab
2	4	2	8
10	15	10	150
19	20	19	380