I have a piece of pthread code listed as the function "thread" here. It basically creates a number of threads (usually 240 on Xeon Phi and 16 on CPU) and then join them.
If I call this thread() only once, it works perfectly on both CPU and Xeon Phi. If I call it one more time, it still works fine on CPU but the pthread_create() will report "error 22" which should be "invalid argument" every 60 threads.
For example, thread 0, thread 60, thread 120 and so on of the 2nd run of thread() which are also the 241, 301, 361 and so on threads ever created in the process would fail (error 22). But thread 1~59, 61~119, 121~240, and so on work perfectly.
Note that this problem happens only on Xeon Phi.
I have checked the stack sizes, and the argument themselves, but I didn't find the reason for this. The arguments are correct.
void thread()
{
...
int i, rv;
cpu_set_t set;
arg_t args[nthreads];
pthread_t tid[nthreads];
pthread_attr_t attr;
pthread_barrier_t barrier;
rv = pthread_barrier_init(&barrier, NULL, nthreads);
if(rv != 0)
{
printf("Couldn't create the barrier\n");
exit(EXIT_FAILURE);
}
pthread_attr_init(&attr);
for(i = 0; i < nthreads; i++)
{
int cpu_idx = get_cpu_id(i,nthreads);
DEBUGMSG(1, "Assigning thread-%d to CPU-%d\n", i, cpu_idx);
CPU_ZERO(&set);
CPU_SET(cpu_idx, &set);
pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &set);
args[i].tid = i;
args[i].ht = ht;
args[i].barrier = &barrier;
/* assing part of the relR for next thread */
args[i].relR.num_tuples = (i == (nthreads-1)) ? numR : numRthr;
args[i].relR.tuples = relR->tuples + numRthr * i;
numR -= numRthr;
/* assing part of the relS for next thread */
args[i].relS.num_tuples = (i == (nthreads-1)) ? numS : numSthr;
args[i].relS.tuples = relS->tuples + numSthr * i;
numS -= numSthr;
rv = pthread_create(&tid[i], &attr, npo_thread, (void*)&args[i]);
if (rv)
{
printf("ERROR; return code from pthread_create() is %d\n", rv);
printf ("%d %s\n", args[i].tid, strerror(rv));
//exit(-1);
}
}
for(i = 0; i < nthreads; i++)
{
pthread_join(tid[i], NULL);
/* sum up results */
result += args[i].num_results;
}
}
Here's a minimal example to reproduce your problem and show where your code most likely goes wrong:
As I speculated in the comments to your question,
pthread_attr_setaffinity_np
doesn't check if the cpu set is sane. Instead that error gets caught inpthread_create
. Since thecpu_get_id
functions in your code on github are obviously broken, that's where I'd start looking for the problem.Tested on Linux, but that's where
pthread_attr_setaffinity_np
comes from, so it's probably a safe assumption.