My code is a bit spaghetti from adapting it from python and trying so many different angles here so apologies for that. My prange loop contains all of the velocity updates and position updates for each boid. I a similar result if I set threads to 1, 2 ,3 or 4 (I'm only running a 4 core machine w/o hyperthreading). I also have access to a 28 core cluster and find the same problem there. I really don't know why the number of threads has no affect here, I compile with GCC and get no errors. I actually see some slow down as I increase the threads, if anything. Any suggestions?
I realise that I am updating the positions of the boids within the prange too, which I think might be the problem. If I move it outside of the loop I get unwanted simulation behaviours, the nature of the sim completely changes. This is my first attempt at parallelising a program, so any insights or guidance would be greatly appreciated. Thanks.
for frame_num in range(num_frames):
neighbour_list = GenerateNeighbours(x_values, y_values, num_boids)
for boid_outer in prange(num_boids, nogil=True, num_threads=threads, schedule='dynamic', chunksize=500):
n = neighbour_list[boid_outer, 0]
#current_neighbours = neighbour_list[boid_outer, 1 : 1 + n]
sum_x = 0.0
sum_y = 0.0
sum_vx = 0.0
sum_vy = 0.0
if n == 0:
avg_x = 0.0
avg_y = 0.0
avg_vx = 0.0
avg_vy = 0.0
else:
for neighbour in neighbour_list[boid_outer, 1 : 1 + n]:
sum_x = sum_x + x_values[neighbour]
sum_y = sum_y + y_values[neighbour]
sum_vx = sum_vx + vx_values[neighbour]
sum_vy = sum_vy + vy_values[neighbour]
inv_count = 1.0 / n
avg_x = sum_x * inv_count
avg_y = sum_y * inv_count
avg_vx = sum_vx * inv_count
avg_vy = sum_vy * inv_count
avoid_dx = 0.0
avoid_dy = 0.0
#neighbours = neighbour_list[boid_outer, 1:1+n] # Fetch the correct neighbour list for each boid
for other_boid in neighbour_list[boid_outer, 1 : 1 + n]:
if (x_values[other_boid] - x_values[boid_outer])**2 + (y_values[other_boid] - y_values[boid_outer])**2 < min_distance**2: # Use the distance calculated in the inner loop
avoid_dx = avoid_dx + x_values[boid_outer] - x_values[other_boid]
avoid_dy = avoid_dy + y_values[boid_outer] - y_values[other_boid]
# Update velocities
vx_values[boid_outer] = vx_values[boid_outer] + (avg_x - x_values[boid_outer]) * centering_factor \
+ (avg_vx - vx_values[boid_outer]) * matching_factor \
+ avoid_dx * avoid_factor
vy_values[boid_outer] = vy_values[boid_outer] + (avg_y - y_values[boid_outer]) * centering_factor \
+ (avg_vy - vy_values[boid_outer]) * matching_factor \
+ avoid_dy * avoid_factor
# Keep within bounds
margin = 100
turn_factor = 3
if x_values[boid_outer] < margin:
vx_values[boid_outer] += turn_factor
if x_values[boid_outer] > width - margin:
vx_values[boid_outer] -= turn_factor
if y_values[boid_outer] < margin:
vy_values[boid_outer] += turn_factor
if y_values[boid_outer] > height - margin:
vy_values[boid_outer] -= turn_factor
# Limit speed
speed = vx_values[boid_outer]**2 + vy_values[boid_outer]**2
if speed > speed_limit**2:
speed_factor = speed_limit / speed
vx_values[boid_outer] *= speed_factor
vy_values[boid_outer] *= speed_factor
# Update position based on velocity
#with gil:
x_values[boid_outer] += vx_values[boid_outer]
y_values[boid_outer] += vy_values[boid_outer]