why linux set the data-segment to __USER_DS at the prologue of exception handler

354 views Asked by At

I'm trying to read Linux source code(2.6.11)

In the exception handler, at entry.s, error_code:

movl $(__USER_DS), %ecx
movl %ecx, %ds
movl %ecx, %es

I don't know why loading user data segment here. Since it is supposed to be entering the exception handler code which runs in the kernel mode, the selector is supposed to be __KERNEL_DS.

I checked other versions of the code, they do the same thing specifically too at this place.

2

There are 2 answers

6
wallyk On BEST ANSWER

If the exception handler is entered with ds and es already set to the data segment, it makes no difference except for maybe a microsecond of delay. Exception handlers don't usually need to be fast.

But what might cause going to the exception handler? Could it have been because a bad value was loaded into a segment register and then referenced? In such cases it is important for the code to establish a safe environment. cs is set by the exception invocation. To be bulletproof, ss and esp should be set up too.


Followup:

Looking at the 2-6.22.18 kernel for i386, I don't see exactly that:

error_code:   /* the function address is in %fs's slot on the stack */
     pushl  %es
     ...  pushes %ds, %eax, %ebp, %edi, %esi, %edx, %ecx, %ebx, %fs
     ...  along with pseudo-ops to manage stack frame layout
     movl  $(__KERNEL_PERCPU), %ecx
     movl  %ecx, %fs
     popl  %ecx   // retrieves saved %fs
     ... sets up registers for the exception function

The symbol __KERNEL_PERCPU is a macro defined (in include/asm-i386/segment.h) as 0 for non-SMP machines and (GDT_ENTRY_PERCPU * 8) for SMPs. The 8 is for the GDT entry size (I think) and the GDT_ENTRY_PERCPU relates to the entries in the per-CPU GDT. Its value is <base> + 15 which the comments indicate is "default user DS", so it is, in fact, the same thing.

The kernel data segment is accessed through fs and ss. Much kernel data access is on the stack. By keeping the user mode descriptors accessed through ds, very little loading of segment registers is needed.

0
Holmes On

In the entry.s:

#define RESTORE_ALL
    RESTORE_REGS
    addl $4, %esp;
1:  iret;
.section .fixup,"ax";
2:  sti;
    movl $(__USER_DS), %edx;
    movl %edx, %ds;
    movl %edx, %es;
    movl $11,%eax;
    call do_exit;
.previous;
.section __ex_table,"a";
    .align 4;
    .long 1b,2b;
.previous

This macro will be called at the end of exception/interrupt/syscalls. The fix code set ds&es to USER_DS, which shows that iret itself will raise an exception once the ds&es's DPL is not 3(user privilege).

So linux set ds&es to USER_DS at the very beginning of exception/interrupt/syscalls to avoid this exception.