Given this same piece of code, why is the python version 100x+ slower than C?

486 views Asked by At

I'm doing project euler Q14.

Which starting number, under one million, produces the longest collatz chain?

I was very surprised to see someone who could get a result in 0.7sec. More surprised when I see it is merely a naive brute force solution.

I was skeptical as I spent so much time to optimize my python version, getting the runtime down to about a minute. So I ran the code myself... OP wasn't lying.

I translated the same code to Python, it failed to terminate after 5 minutes.

What gives?

C version : http://codepad.org/VD9QJDkt

#include <stdio.h>
#include <time.h>

int main(int argc, char **argv)
{
  int longest = 0;
  int terms = 0;
  int i;
  unsigned long j;
  clock_t begin, end;
  double time_spent;

  begin = clock();

  for (i = 1; i <= 1000000; i++)
  {
    j = i;
    int this_terms = 1;

    while (j != 1)
    {
      this_terms++;

      if (this_terms > terms)
      {
        terms = this_terms;
        longest = i;
      }

      if (j % 2 == 0)
      {
        j = j / 2;
      }
      else
      {
        j = 3 * j + 1;
      }
    }
  }

  printf("longest: %d (%d)\n", longest, terms);

  end = clock();
  time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
  printf("It took %f seconds\n", time_spent);
  return 0;
}

Python version: http://codepad.org/ZloPyEcz

import time

def iterative_brute_force(n):
    longest = 0
    terms = 0
    for i in range(1, n + 1):
        j = i
        this_terms = 1

    while j != 1:
        this_terms += 1
        if this_terms > terms:
            terms = this_terms
            longest = i

    if j % 2 == 0:
        j = j / 2
    else:
        j = 3 * j + 1

    return longest, terms

t0 = time.time()
print(iterative_brute_force(10 ** 6))
t1 = time.time()
total = t1-t0
print(total)
1

There are 1 answers

0
Leeor On BEST ANSWER

In short - it's not slower, it's just stuck.

The while loop in your python version is an infinite loop - your indentation doesn't include changing j so you'll never exit it. The fact that your program didn't just "take longer" but actually got completely stuck should have been a hint (don't feel bad though, I once waited 3 days before getting convinced about a similar scenario).

That's one thing, fixing that would make the program stop, but with incorrect results - that's because the outer for loop is also missing the indentation - you want to run the check for each number in the range.

Fixed code:

import time

def iterative_brute_force(n):
    longest = 0
    terms = 0
    for i in range(1, n + 1):
        j = i
        this_terms = 1

        while j != 1:
            this_terms += 1
            if this_terms > terms:
                terms = this_terms
                longest = i

            if j % 2 == 0:
                j = j / 2
            else:
                j = 3 * j + 1

    return longest, terms

t0 = time.time()
print(iterative_brute_force(10 ** 6))
t1 = time.time()
total = t1-t0
print(total)

Gives -

(837799, 525)
34.4885718822

while the c version gives -

longest: 837799 (525)
It took 0.600000 seconds

There, now everything makes sense again, python is indeed slower and we can get to the real question :)

On a side note though - this is far from being optimized, as you may repeat numbers you already visited. A better algorithm here would have been storing the results for these numbers in some handy lookup table.


Now, regarding the fundamental question here (which is relevant even after fixing the code as you can see) - execution time across languages is a tricky domain, even if you truly perform the same algorithm in your code, the actual implementation is affected by compiler or interpreter behavior - Python is interpreted and therefore your code has to perform through another layer of code that manages your program, while C just runs natively. This opens up potential language features (and maybe some optimizations), and it would depend on benchmarking each application to see how well it works, but it's probably safe to say that on most workloads you'll observe that python (or other interpreted languages) behaves 10-100x slower due to this overhead.

Furthermore - having compiled the c code in advance allows your compiler to produce a far better optimized code. It's possible to use JITting on python to mitigate that (and reduce the interpreter overhead a bit), but it's not available on all python implementations (at least not "pure" ones)

See also the discussion at - Why are Python Programs often slower than the Equivalent Program Written in C or C++?