Pathfinding with Unity Jobs and Burst is slower than without

651 views Asked by At

I tried to implement pathfinding to my game in Unity, based on videos and improving on its code. To get more efficient with multi-threading, I would like to implement Unity Jobs, but as it is my first time to do it, I got some interesting results.

With the code below I get much worse performance than without jobs and burst and I don't understand why. On the profiler (profiling in Unity editor) it takes at least 14 ms to find a new path which is much worse than it should be, as it is much faster without jobs. That's only with two units doing the pathfinding to the same target on a level with below 1000 nodes. As I have found code on the internet in the same pathfinding topic, I think it should be much more performant. Watching videos performance hit should be below 1ms even in these situations. I found even github repo with bigger maps and more than 100 agents with much better performance.

Maybe I don't understand something with the job system, or made a mistake somewhere in the code, but can't figure out what causes this lower performance.

First I have a Unit which call a PathRequestManager every time the target object move a certain distance and after a specified time. This creates a new PathRequest with the current position, the target position, a callback which should be called when the path is calculated and the current unit. Then in a method I add the requested positions to separate NativeLists. These will contain the pathrequest parameters which should be called the next Update().

In the update I see if there pathfinding should be done with a shouldDoPathFinding variable which is set to true when a new path request is added. If it is true I call StartFindPathJobifiedHash method. Pass the required native arrays, hash maps, multi hash maps to the job and schedule it.

public void StartFindPathJobifiedHash()
    {
        CopyNextAttributesToCurrent(); //Copies the next pathfinding attributes to the current list
        ClearNextPathfindingAttributes(); //Clears next pathfinding attributes
 
       //unitList contains the units which should do pathfinding in current Update
        for (int i = 0; i < unitList.Count; i++)
        {
            NodeStruct startNode = grid.NodeFromWorldPointJobified(startPosList[i]);
            int startNodeID = startNode.nodeID;
            NodeStruct endNode = grid.NodeFromWorldPointJobified(endPosList[i]);
            int endNodeID = endNode.nodeID;
 
//startNodeList and endNodeList are native lists
            startNodeList.Add(startNodeID);
            endNodeList.Add(endNodeID);
        }
     
       //node neighbours are calculated when grid is created on main thread
        NativeParallelMultiHashMap<int, int> neighboursMap = grid.neighboursMapJobified;
 
        //originalPathNodeArray is a native array which contains all the nodes in the grid, populated when the grid is calculated
        // neighboursMap is a native parallel multi hashmap, with node ids as keys and neighbour ids as values, populated when the grid is calculated
        // waypointsHash native parallel multi hashmap with node ids as keys and positions as values
        FindPathJobParallel findPathJob = new FindPathJobParallel { pathRequestIDList = pathRequestIDList, originalPathNodeArray = originalPathNodeArray, neighboursMap = neighboursMap, startNodeList = startNodeList, endNodeList = endNodeList, waypointsHash = waypointsHash, pathSuccessHash = pathSuccessHash, originalPathNodeArrayHash = originalPathNodeHash, unitIDs = unitIDs };
     
        JobHandle jobHandle = findPathJob.Schedule(unitList.Count, 1);
        jobhandle = jobHandle;
    }

Here is the job:

[BurstCompile]
public struct FindPathJobParallel : IJobParallelFor
{
 
 
 
    [ReadOnly, NativeDisableParallelForRestriction] public NativeArray<NodeStruct> originalPathNodeArray;
    [ReadOnly, NativeDisableParallelForRestriction] public NativeParallelHashMap<int, NodeStruct> originalPathNodeArrayHash;
 
    [ReadOnly, NativeDisableParallelForRestriction] public NativeList<int> pathRequestIDList;
    [ReadOnly, NativeDisableParallelForRestriction] public NativeList<int> startNodeList;
    [ReadOnly, NativeDisableParallelForRestriction] public NativeList<int> endNodeList;
    [ReadOnly, NativeDisableParallelForRestriction] public NativeList<int> unitIDs;
 
    [WriteOnly, NativeDisableParallelForRestriction] public NativeParallelMultiHashMap<int, float3> waypointsHash;
    [NativeDisableParallelForRestriction] public NativeParallelHashMap<int, bool> pathSuccessHash;
    [ReadOnly, NativeDisableParallelForRestriction] public NativeParallelMultiHashMap<int, int> neighboursMap;
 
 
 
 
    public void Execute(int i)
    {
 
        NativeList<NodeStruct> openList = new NativeList<NodeStruct>(Allocator.Temp);
 
        NativeList<NodeStruct> closedList = new NativeList<NodeStruct>(Allocator.Temp);
 
 
 
        NativeParallelMultiHashMap<int, int> openHashMap = new NativeParallelMultiHashMap<int, int>(100, Allocator.Temp);
        NativeParallelMultiHashMap<int, int> closedHashMap = new NativeParallelMultiHashMap<int, int>(1000, Allocator.Temp);
 
 
        NativeList<int> neighbourIndexes = new NativeList<int>(10, Allocator.Temp);
 
        NativeArray<NodeStruct> pathNodeArray = new NativeArray<NodeStruct>(originalPathNodeArray.Length, Allocator.Temp);
        NativeParallelHashMap<int, NodeStruct> pathNodeHash = new NativeParallelHashMap<int, NodeStruct>(originalPathNodeArray.Length, Allocator.Temp);
 
 
        NodeStruct startNodeStruct = originalPathNodeArray[0];
        NodeStruct endNodeStruct = originalPathNodeArray[0];
 
        NativeParallelHashMap<int, bool> successList = new NativeParallelHashMap<int, bool>(100, Allocator.Temp);
 
 
        NodeStruct nodeStruct = new NodeStruct();
        nodeStruct = originalPathNodeArrayHash[startNodeList[i]];
        startNodeStruct = nodeStruct;
 
        startNodeStruct.parentID = startNodeStruct.nodeID;
 
 
 
        NodeStruct endNodeStructNew = new NodeStruct();
        endNodeStructNew = originalPathNodeArrayHash[endNodeList[i]];
        endNodeStruct = endNodeStructNew;
 
        if (!pathNodeHash.ContainsKey(startNodeList[i]))
        {
            pathNodeHash.Add(startNodeList[i], startNodeStruct);
        }
        if (!pathNodeHash.ContainsKey(endNodeList[i]))
        {
            pathNodeHash.Add(endNodeList[i], endNodeStruct);
        }
 
 
 
        startNodeStruct.gCost = 0;
     
        openList.Add(startNodeStruct);
        openHashMap.Add(startNodeStruct.nodeID, startNodeStruct.nodeID);
     
        while (openList.Length > 0)
        {
            NodeStruct currentNodeStruct = GetLowestCostFNodeIndex(openList, openHashMap);
         
            closedList.Add(currentNodeStruct);
            closedHashMap.Add(currentNodeStruct.nodeID, currentNodeStruct.nodeID);
            if (currentNodeStruct.nodeID == endNodeStruct.nodeID)
            {
             
                pathSuccessHash[unitIDs[i]] = true;
                successList[i] = true;
                break;
            }
 
       
            var neighboursOnIndexEnumerator = neighboursMap.GetValuesForKey(currentNodeStruct.nodeID);
 
            while (neighboursOnIndexEnumerator.MoveNext())
            {
             
                neighbourIndexes.Add(neighboursOnIndexEnumerator.Current);
            }
 
            NodeStruct neighbourStruct = originalPathNodeArray[0];
         
            for (int neighbourIndex = 0; neighbourIndex < neighbourIndexes.Length; neighbourIndex++)
            {
             
                int currentNeighbourIndex = neighbourIndexes[neighbourIndex];
                neighbourStruct = originalPathNodeArray[currentNeighbourIndex];
 
 
                if (!neighbourStruct.walkable)
                {
                    continue;
                }
 
                if (closedHashMap.ContainsKey(currentNeighbourIndex))
                {
                    continue;
                }
 
 
                int newMovementCostToNeighbour = currentNodeStruct.gCost + (int)GetDistanceJobified(currentNodeStruct.worldPosition, neighbourStruct.worldPosition, currentNodeStruct.gridX, currentNodeStruct.gridY, neighbourStruct.gridX, neighbourStruct.gridY) + neighbourStruct.movementPenalty;
                if (newMovementCostToNeighbour < neighbourStruct.gCost || !openHashMap.ContainsKey(currentNeighbourIndex))
                {
                    NodeStruct newNodeStruct = new NodeStruct();
                    newNodeStruct.gridX = neighbourStruct.gridX;
                    newNodeStruct.gridY = neighbourStruct.gridY;
                    newNodeStruct.walkable = neighbourStruct.walkable;
                    newNodeStruct.worldPosition = neighbourStruct.worldPosition;
                    newNodeStruct.movementPenalty = neighbourStruct.movementPenalty;
                    newNodeStruct.walkable = neighbourStruct.walkable;
                    newNodeStruct.gCost = newMovementCostToNeighbour;
                    newNodeStruct.hCost = (int)GetDistanceJobified(neighbourStruct.worldPosition, endNodeStruct.worldPosition, neighbourStruct.gridX, neighbourStruct.gridY, endNodeStruct.gridX, endNodeStruct.gridY);
                    newNodeStruct.nodeID = neighbourStruct.nodeID;
                    newNodeStruct.parentID = currentNodeStruct.nodeID;
                    newNodeStruct.angle = neighbourStruct.angle;
                    newNodeStruct.normal = neighbourStruct.normal;
                    newNodeStruct.modifiedWorldPosition = neighbourStruct.modifiedWorldPosition;
 
                    if (!pathNodeHash.ContainsKey(newNodeStruct.nodeID))
                    {
                        pathNodeHash.Add(newNodeStruct.nodeID, newNodeStruct);
                    }
                    else
                    {
                        pathNodeHash[newNodeStruct.nodeID] = newNodeStruct;
                    }
                 
                    if (!openHashMap.ContainsKey(currentNeighbourIndex))
                    {
                        openList.Add(neighbourStruct);
                        openHashMap.Add(neighbourStruct.nodeID, neighbourStruct.nodeID);
                     
                    }
                }
 
            }
 
        }
 
        if (pathSuccessHash[unitIDs[i]])
        {
         
            NativeList<float3> waypointsList = new NativeList<float3>(100, Allocator.Temp);
            waypointsList = RetracePathJobifiedHash(originalPathNodeArrayHash[startNodeStruct.nodeID], pathNodeHash[endNodeStruct.nodeID], pathNodeHash);
 
            for (int j = 0; j < waypointsList.Length; j++)
            {
                waypointsHash.Add(unitIDs[i], waypointsList[j]);
             
                pathSuccessHash[unitIDs[i]] = waypointsList.Length > 0;
            }
 
 
 
            waypointsList.Dispose();
        }
     
 
        openList.Dispose();
        closedList.Dispose();
        pathNodeArray.Dispose();
        neighbourIndexes.Dispose();
 
    }
 
 
 
    public NodeStruct GetLowestCostFNodeIndex(NativeList<NodeStruct> openList, NativeParallelMultiHashMap<int, int> openHashMap)
    {
        NodeStruct lowestCostNode = openList[0];
 
        int lowestIndex = 0;
 
        for (int i = 0; i < openList.Length; i++)
        {
            NodeStruct currentCostNode = openList[0];
            if (currentCostNode.fCost < lowestCostNode.fCost)
            {
                lowestCostNode = currentCostNode;
                lowestIndex = i;
            }
        }
        openList.RemoveAt(lowestIndex);
        openHashMap.Remove(lowestCostNode.nodeID);
        return lowestCostNode;
 
    }
 
    float GetDistanceJobified(float3 nodeAPos, float3 nodeBPos, int nodeAGridX, int nodeAGridY, int nodeBGridX, int nodeBGridY)
    {
 
        int dstX = math.abs(nodeAGridX - nodeBGridX);
        int dstY = math.abs(nodeAGridY - nodeBGridY);
        float dstZ = math.abs(nodeAPos.y - nodeBPos.y);
 
        if (dstZ != 0)
        {
         
            dstZ = dstZ * 10;
        }
        if (dstX > dstY)
            return 14 * dstY + 10 * (dstX - dstY) + (int)dstZ;
        return 14 * dstX + 10 * (dstY - dstX) + (int)dstZ;
     
    }
 
    NativeList<float3> RetracePathJobifiedHash(NodeStruct startNode, NodeStruct endNode, NativeParallelHashMap<int, NodeStruct> pathNodeArray)
    {
     
 
        NativeList<NodeStruct> path = new NativeList<NodeStruct>(100, Allocator.Temp);
        NodeStruct currentNode = endNode;
 
        while (currentNode.nodeID != startNode.nodeID)
        {
         
            path.Add(currentNode);
            currentNode = pathNodeArray[currentNode.parentID];
 
        }
 
        NativeList<float3> pathVector3List = new NativeList<float3>(100, Allocator.Temp);
 
        for (int i = 0; i < path.Length; i++)
        {
            pathVector3List.Add(path[i].worldPosition);
        }
 
     
        NativeList<float3> waypoints = new NativeList<float3>(pathVector3List.Length, Allocator.Temp);
        for (int i = 0; i < path.Length; i++)
        {
            waypoints.Add(path[i].worldPosition);
        }
 
        path.Dispose();
        pathVector3List.Dispose();
 
        pathNodeArray.Dispose();
        return waypoints;
    }
 
 
}

The job is a simple A* pathfinding script rewritten with native lists and arrays. I create the nodes of the grid on start and use that at every pathfinding. To not overwrite nodes, when a new node found I create a new node struct and use that for the future of the current job. It works great, but I can't find out why a job takes at least 14ms to complete. In the profiler I can see that multiple threads are working.

Could you please take a look at the code and try to tell me where the problem can be? Tried to comment the code to be more understandable, but if there is any question I'm happy to answer.

Edit: Using Unity 2021.3.16 version, tried turning off safety checks in burst options, performance is the same. When I turn off jobs, pathfinding works in below 1ms. The problem should be within the FindPathJobParallel code as in the profiler that takes the most time.

1

There are 1 answers

0
Andrew Łukasik On

Burst compiler does one thing - it optimizes math operations very well. But Burst has little to no effect on data layouts and memory access patterns you designed. This means that 2 out of 3 CPU performance factors are in programmer hands here.

memory data layout

The root problem here is that NodeStruct is too big and too eclectic. It is at least 84 bytes where chubby Matrix4x4 is 64 bytes (!). This sabotages whole program as this struct is used by 6 out of 17 data collections which in turn leads to CPU copying and moving these chunky structs around needlessly. The solution to this is to go SOA; split this single struct into a collection of arrays (NativeArray<Cost> costs, NativeArray<ID> ids, NativeArray<float3> worldPositions, ...)

memory access pattern

Minimize cache misses. This is top priority. Cache miss is bad because it makes powerful CPU busy while doing nothing but waiting.

TL;DR:

  • Pathfinding is not math-heavy so Burst won't help much here.

  • Reading data CPU doesn't need at the moment (which fills cache) + random memory access pattern (not reading data in linear order) will cause any program to be slow.

  • Pathfinding uses grids and graphs which require reading data that is not clustered next to each other in memory - this makes cache misses a major bottleneck for most pathfinding routines.