Some cores never return value using MPI

81 views Asked by At

I'm working on a school project using MPI. I use MPICH2 and write code in Fortran. I run my code on my school's server with multiple computing slots. Each slot consists of several computing cores. I try to use parallel computing to speed up my code. I distribute sub-jobs to each core and gather value using MPI_Gather. Some cores never return value and it seems that they got trapped in some infinite loop. Some cores never call the first MPI_Barrier. But there is no infinite loop in sub-jobs. I also do series code and it works well. I put my code in the attachment.

        call MPI_COMM_RANK(MP_LIBRARY_WORLD, rank, ierr) 
        call MPI_COMM_SIZE(MP_LIBRARY_WORLD, numtasks, ierr)             
        loop_min=int(rank*ceiling(float(point_num)/float(numtasks)))+1             
        loop_max=int((rank+1)*ceiling(float(point_num)/float(numtasks)))  

        do ind=loop_min,loop_max,1
            if (ind>point_num) then
                exit
            end if 
            current_wealth_dist=total_grid(:,ind)
            if (current_wealth_dist(1)==0.) then
                call X_init_aiyagarizero(sendbuf(2:))
            else
                call X_init_aiyagari(sendbuf(2:))
            end if
            sendbuf(1)=ind 
            call MPI_GATHER(sendbuf,number_plc_function+1,MPI_REAL,recvbuf,number_plc_function+1,MPI_REAL,0,MP_LIBRARY_WORLD,ierr)
            !print *, "Point", ind, "Finished"  
        end do
        print *,rank, "work finished"

        call MPI_BARRIER(MP_LIBRARY_WORLD,ierr)
        print *, "After the First Barrier"

        call MPI_Bcast(recvbuf,(number_plc_function+1)*point_num,MPI_REAL,0,MP_LIBRARY_WORLD,ierr)
        print *, rank, "Finish Broadcast"
        call MPI_BARRIER(MP_LIBRARY_WORLD,ierr)

        do iter=1,point_num
            do jter=1,4
                init_policy_functions(int(recvbuf(1,iter)),:,jter)=recvbuf(2:,iter)
            end do               
        end do
0

There are 0 answers