pthread_create() fails (invalid argument) every 60 threads on Xeon Phi

2.8k views Asked by At

I have a piece of pthread code listed as the function "thread" here. It basically creates a number of threads (usually 240 on Xeon Phi and 16 on CPU) and then join them.

If I call this thread() only once, it works perfectly on both CPU and Xeon Phi. If I call it one more time, it still works fine on CPU but the pthread_create() will report "error 22" which should be "invalid argument" every 60 threads.

For example, thread 0, thread 60, thread 120 and so on of the 2nd run of thread() which are also the 241, 301, 361 and so on threads ever created in the process would fail (error 22). But thread 1~59, 61~119, 121~240, and so on work perfectly.

Note that this problem happens only on Xeon Phi.

I have checked the stack sizes, and the argument themselves, but I didn't find the reason for this. The arguments are correct.

void thread()
{

...

int i, rv;
cpu_set_t set;
arg_t args[nthreads];
pthread_t tid[nthreads];
pthread_attr_t attr;
pthread_barrier_t barrier;

rv = pthread_barrier_init(&barrier, NULL, nthreads);
if(rv != 0)
{
    printf("Couldn't create the barrier\n");
    exit(EXIT_FAILURE);
}

pthread_attr_init(&attr);

for(i = 0; i < nthreads; i++)
{
    int cpu_idx = get_cpu_id(i,nthreads);

    DEBUGMSG(1, "Assigning thread-%d to CPU-%d\n", i, cpu_idx);

    CPU_ZERO(&set);
    CPU_SET(cpu_idx, &set);
    pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &set);

    args[i].tid = i;
    args[i].ht = ht;
    args[i].barrier = &barrier;

    /* assing part of the relR for next thread */
    args[i].relR.num_tuples = (i == (nthreads-1)) ? numR : numRthr;
    args[i].relR.tuples = relR->tuples + numRthr * i;
    numR -= numRthr;

    /* assing part of the relS for next thread */
    args[i].relS.num_tuples = (i == (nthreads-1)) ? numS : numSthr;
    args[i].relS.tuples = relS->tuples + numSthr * i;

    numS -= numSthr;

    rv = pthread_create(&tid[i], &attr, npo_thread, (void*)&args[i]);
    if (rv)
    {
        printf("ERROR; return code from pthread_create() is %d\n", rv);
        printf ("%d %s\n", args[i].tid, strerror(rv));
        //exit(-1);
    }

}

for(i = 0; i < nthreads; i++)
{
    pthread_join(tid[i], NULL);
    /* sum up results */
    result += args[i].num_results;
}
}
1

There are 1 answers

2
Art On BEST ANSWER

Here's a minimal example to reproduce your problem and show where your code most likely goes wrong:

#define _GNU_SOURCE
#include <pthread.h>
#include <err.h>
#include <stdio.h>

void *
foo(void *v)
{
        printf("foo\n");
        return NULL;
}

int
main(int argc, char **argv)
{
        pthread_attr_t attr;
        pthread_t thr;
        cpu_set_t set;
        void *v;
        int e;

        if (pthread_attr_init(&attr))
                err(1, "pthread_attr_init");
        CPU_ZERO(&set);
        CPU_SET(255, &set);
        if (pthread_attr_setaffinity_np(&attr, sizeof(set), &set))
                err(1, "pthread_attr_setaffinity_np");

        if ((e = pthread_create(&thr, &attr, foo, NULL)))
                errx(1, "pthread_create: %d", e);

        if (pthread_join(thr, &v))
                err(1, "pthread_join");
        return 0;
}

As I speculated in the comments to your question, pthread_attr_setaffinity_np doesn't check if the cpu set is sane. Instead that error gets caught in pthread_create. Since the cpu_get_id functions in your code on github are obviously broken, that's where I'd start looking for the problem.

Tested on Linux, but that's where pthread_attr_setaffinity_np comes from, so it's probably a safe assumption.