Not finding any speed up from increasing threads in OMP prange for boids simulation, ( Cython)

38 views Asked by At

My code is a bit spaghetti from adapting it from python and trying so many different angles here so apologies for that. My prange loop contains all of the velocity updates and position updates for each boid. I a similar result if I set threads to 1, 2 ,3 or 4 (I'm only running a 4 core machine w/o hyperthreading). I also have access to a 28 core cluster and find the same problem there. I really don't know why the number of threads has no affect here, I compile with GCC and get no errors. I actually see some slow down as I increase the threads, if anything. Any suggestions?

I realise that I am updating the positions of the boids within the prange too, which I think might be the problem. If I move it outside of the loop I get unwanted simulation behaviours, the nature of the sim completely changes. This is my first attempt at parallelising a program, so any insights or guidance would be greatly appreciated. Thanks.

for frame_num in range(num_frames):        
    neighbour_list = GenerateNeighbours(x_values, y_values, num_boids)
        
    for boid_outer in prange(num_boids, nogil=True, num_threads=threads, schedule='dynamic', chunksize=500):
        
        n = neighbour_list[boid_outer, 0]
        #current_neighbours = neighbour_list[boid_outer, 1 : 1 + n]
        
        sum_x = 0.0
        sum_y = 0.0
        sum_vx = 0.0
        sum_vy = 0.0
        
        if n == 0:
            avg_x = 0.0
            avg_y = 0.0
            avg_vx = 0.0
            avg_vy = 0.0
            
        else:
            for neighbour in neighbour_list[boid_outer, 1 : 1 + n]:
                
                sum_x = sum_x + x_values[neighbour]
                sum_y = sum_y +  y_values[neighbour]
                sum_vx = sum_vx + vx_values[neighbour]
                sum_vy = sum_vy + vy_values[neighbour]
                
    
            inv_count = 1.0 / n
            avg_x = sum_x * inv_count
            avg_y = sum_y * inv_count
            avg_vx = sum_vx * inv_count
            avg_vy = sum_vy * inv_count

        avoid_dx = 0.0
        avoid_dy = 0.0

    
        #neighbours = neighbour_list[boid_outer, 1:1+n]  # Fetch the correct neighbour list for each boid

        for other_boid in neighbour_list[boid_outer, 1 : 1 + n]:
            if (x_values[other_boid] - x_values[boid_outer])**2 + (y_values[other_boid] - y_values[boid_outer])**2 < min_distance**2:  # Use the distance calculated in the inner loop
                avoid_dx = avoid_dx + x_values[boid_outer] - x_values[other_boid]
                avoid_dy = avoid_dy + y_values[boid_outer] - y_values[other_boid]

        # Update velocities
        vx_values[boid_outer] = vx_values[boid_outer] + (avg_x - x_values[boid_outer]) * centering_factor \
            + (avg_vx - vx_values[boid_outer]) * matching_factor \
            + avoid_dx * avoid_factor

        vy_values[boid_outer] = vy_values[boid_outer] + (avg_y - y_values[boid_outer]) * centering_factor \
            + (avg_vy - vy_values[boid_outer]) * matching_factor \
            + avoid_dy * avoid_factor

        # Keep within bounds
        margin = 100
        turn_factor = 3

        if x_values[boid_outer] < margin:
            vx_values[boid_outer] += turn_factor

        if x_values[boid_outer] > width - margin:
            vx_values[boid_outer] -= turn_factor

        if y_values[boid_outer] < margin:
            vy_values[boid_outer] += turn_factor

        if y_values[boid_outer] > height - margin:
            vy_values[boid_outer] -= turn_factor

        # Limit speed
        speed = vx_values[boid_outer]**2 + vy_values[boid_outer]**2
        if speed > speed_limit**2:
            speed_factor = speed_limit / speed
            vx_values[boid_outer] *= speed_factor
            vy_values[boid_outer] *= speed_factor
        # Update position based on velocity
        #with gil:
        x_values[boid_outer] += vx_values[boid_outer]
        y_values[boid_outer] += vy_values[boid_outer]
0

There are 0 answers