I understand how segmentation works, and that paging is the preferred way for memory access in modern operating systems. But I am not sure about the way the segment registers are unused:
- They just APPEAR unused, because they generally have base 0 and limit 0xFFFFFFFF. Note that in this case, they are still involved in physical address calculation, but are transparent and provide a flat memory model.
- They are untouched at all.
A funny combination of them, perhaps. What happens from a high level (if it can be called high) perspective is that most segments are configured with a base of 0 and a limit of 0xFFFFFFFF (
fs
andgs
may be used for special purposes though).But configuring a segment with a non-zero base may have performance consequences. For example on AMD K8 and K10, configuring the code segment to have a non-zero base increases the latency of branch mispredictions by two cycles, and a general address costs a cycle longer to compute if a segment with a non-zero base is involved. This may mean that the processor has a special fast-path for segments with a base of zero, so that the base does not participate in the calculation of the address at all rather than adding zero (which would still take time).
I could find no reference to this effect existing on any other µarchs, but it may not be fully explored because it is a relatively rare effect, especially in performance-sensitive code. In a quick test, a similar effect seems to exist on Haswell, with this code (skips some trivial set-up):
Running two cycles per iteration faster (5 cycles/iteration) than this code (7 cycles/iteration):
Possibly that means that more Intel µarchs are effected as well, though perhaps this is inaccurate since no segment is involved in the first code at all (since it's 64bit code) and perhaps that is what mattered.