Don't know how to optimize the code with Cython

101 views Asked by At

I'm new on trying to optimize with cython and I need some help because I know I can optimize more but I don't know how.

Firstly what I did was declare the variables in C, however a lot of them are objects, and I don't know how to declare that.

Then I have the class, in the HTML I can see every time the code wants to get an object of the class it is slow, so I wanna optimize that too, and I don't know how to do it :(

import math
import random

def class City:
  def __init__(self, city_name='NoName', posx=0, posy=0):
    self.name = city_name
    self.x = posx
    self.y = posy

def distance_between_cities( object C1, object C2 ):
    """Compute distance between two cities"""
    dx = C1.x - C2.x
    dy = C1.y - C2.y
    return math.sqrt( dx*dx + dy*dy )

def read_cities_from_file ( str filename ):
    """Read list of Cities from text file"""
    cdef list Cities= [] # List of Empty Cities
    cdef list R = []
    with open(filename) as file:
        for line in file:
            R= line.split()
            Cities.append ( City( R[0], float(R[1]), float(R[2]) ) )
    return Cities

def path_distance( object Cities ):
    """Compute total distance of the path traversing all cities in order"""
    cdef int i = 0
    cdef double D = 0
    for i in range( len(Cities) ):
        D = D + distance_between_cities ( Cities[i],
                                          Cities[i+1 if i+1<len(Cities) else 0])
    return D

cdef int closerCity( object City, object Cities ):
    """Compute position of the city C in the list of Cities that is closer to City"""
    cdef float minDist = 2*1000*1000
    cdef int minIndex= 0
    cdef int i = 0
    cdef double dx = 0
    cdef double dy = 0
    cdef double dist = 0
    for i in range(len(Cities)):
        c = Cities[i]
        dx = City.x - c.x
        dy = City.y - c.y
        dist = dx*dx + dy*dy
        if dist < minDist:
            minDist = dist
            minIndex= i
    return minIndex

def GoodPath( object Cities ):
    """Generate a path with small total distance using greedy algorithm"""
    cdef list NotVisited = []
    cdef int len_C = len(Cities)
    cdef int i = 0
    for i in range(len_C):
      NotVisited.append(Cities[i])
    Path= [ Cities[0] ] # Start path with first city
    del NotVisited[0]
    while len(NotVisited) > 0:
        pos = closerCity( Path[-1], NotVisited )
        Path.append( NotVisited[pos] )
        del NotVisited[pos]
    return Path

if __name__ == "__main__":
  import argparse as arg
  parser = arg.ArgumentParser(prog='ARGUMENTS', usage='%(prog)s [options]')
  parser.add_argument("input",  type=str, help="File containing list of Cities")
  args = parser.parse_args()
  
  ListOfCities= read_cities_from_file ( str(args.input) )
  GoodList    = GoodPath( ListOfCities )
  print ( "Initial Distance =", path_distance( ListOfCities ),
          "  Good Distance =",  path_distance( GoodList) )

This is what I get from the HTML when I cythonize: Part 1 of the HTML Part 2

Any help will be appreciated

Thank You!

1

There are 1 answers

0
Jérôme Richard On BEST ANSWER

The complexity of your algorithm is O(n^2) (with n the number of cities). Indeed, you walk through all the cities and for each city you iterate over most of the cities again (O(n) cities). Note also that item deletion is algo performed in O(n) time. The result can be computed with a more efficient O(n log n) algorithm:

You can build a K-d tree structure, or more specifically a Quad-tree structure in O(n log n) time and then find the closest city of any other city in O(log n) time with this structure. Additionally, it is better to change the type of NotVisited to a set rather than a list to avoid a slow list item deletion.

Apart from that, I advise you to replace object-typed variables with lower-level types (eg. arrays or tuples of double/int) so that Cython can generate a faster C code.

Note that your method will often not find the overall shortest path and may not even find a quite small path. There are better algorithms, but they are more expensive.