repeat string instructions uses extra segment (ES) or not?

130 views Asked by At

I'm investigating how rep string instructions work. Regarding the description of the instructions the rep movsl for example has the following mnemonic.

rep movsl   5+4*(E)CX   Move (E)CX dwords from [(E)SI] to ES:[(E)DI]

Where ES is extra segment register that should contain some offset to the beginning of extra segment in the memory.

The pseudo code of the operation look like below

while (CX != 0) {
        *(ES*16 + DI) = *(DS*16 + SI);
        SI++;
        DI++;
        CX--;
}

But it seems in flat memory model it is not true that rep string operates with extra segment.

For example I've created a test that creates 2 threads which copy an array to TLS (thread local storage) array using rep movs. Logically that should not work as TLS data is kept in GS segment, not in ES. But works. At least a see the correct result running the test. Intel compiler produces the following piece of code for coping.

movl      %gs:0, %eax                                   #27.18
movl      $1028, %ecx                                   #27.18
movl      32(%esp), %esi                                #27.18
lea       TLS_data1@NTPOFF(%eax), %edi                  #27.18
movl      %ecx, %eax                                    #27.18
shrl      $2, %ecx                                      #27.18
rep                                                     #27.18
movsl                                                   #27.18
movl      %eax, %ecx                                    #27.18
andl      $3, %ecx                                      #27.18
rep                                                     #27.18
movsb                                                   #27.18

Here %edi points to TLS array and rep movs store there. In case rep mov uses ES offset implicitly (that I doubt), then such code should not produce the correct result.

Do I miss something here?

There is the test I created:

#define _MULTI_THREADED
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

#define                 NUMTHREADS   2 
#define                 N  257 

typedef struct {
    int   data1[N];
} threadparm_t; 

__thread threadparm_t TLS_data1;

void foo();

void *theThread(void *parm)
{
    int               rc;
    threadparm_t     *gData;

    pthread_t self = pthread_self();
    printf("Thread %u: Entered\n", self);

    gData = (threadparm_t *)parm;

    TLS_data1 = *gData;

    foo();
    return NULL;
}

void foo() {
    int i;
    pthread_t self = pthread_self();
    printf("\nThread %u: foo()\n", self/1000000);
    for (i=0; i<N; i++) {
        printf("%d ", TLS_data1.data1[i]);
    }
    printf("\n\n");
}


int main(int argc, char **argv)
{
    pthread_t             thread[NUMTHREADS];
    int                   rc=0;
    int                   i,j;
    threadparm_t          gData[NUMTHREADS];

    printf("Enter Testcase - %s\n", argv[0]);

    printf("Create/start threads\n");
    for (i=0; i < NUMTHREADS; i++) { 
    /* Create per-thread TLS data and pass it to the thread */
        for (j=0; j < N; j++) { 
           gData[i].data1[j] = i+1;
        }
      rc = pthread_create(&thread[i], NULL, theThread, &gData[i]);
   }
   printf("Wait for the threads to complete, and release their resources\n");
   for (i=0; i < NUMTHREADS; i++) {
      rc = pthread_join(thread[i], NULL);
   }

   printf("Main completed\n");
   return 0;
}
1

There are 1 answers

3
caf On

What you are missing is that these ops:

movl      %gs:0, %eax
...
lea       TLS_data1@NTPOFF(%eax), %edi

are loading the zero-based address of the thread-local TLS_data1 into %edi, which will work fine with the zero-base ES segment.