Assumptions about dwPageSize on different systems

140 views Asked by At

Can we make any assumptions about SYSTEM_INFO dwPageSize on different systems, when targeting the same architecture (ie. x86_64)?

I generate some custom native code, that is loaded alongside a C++-application via VirtualAlloc. This code has 3 different sections (code, cdata, dynamic static variables), that all need different protections (execute; read; read-write) via VirtualProtect, and thus need to be in different pages. Code references cdata and static variables via RIP-relative addressing.

I'm wondering if I can assume that, if say, when building on x64, dwPageSize is 4096, it will also have that same value on other x64-systems (or at least smaller, but never larger)? If so, I can just take the RIP-relative addresses as-is, because I can ensure that all data is placed at the same page-sized relative offsets. If the pages on the target-system could potentially be larger, I would need to fixup those offsets on loading the code.

2

There are 2 answers

0
CherryDT On BEST ANSWER

Page size is usually dictated by the processor architecture itself, although sometimes a processor architecture may support multiple page sizes. Nonetheless, Microsoft decided for one particular page size per architecture, so it is indeed safe to assume that the page size will be the same for a given architecture (and bitness of Windows) regardless of which machine the code is running on.

In all of the currently supported architectures the page size is 4KB, by the way. (Some discontinued architectures used 8KB.) See here for a full list.

Especially in regards to x86-64 you can be sure it won't change because the processor doesn't offer any other choice than 4KB (for small pages). Microsoft may theoretically change it in some architecture like ARM but that would be quite unexpected to me as it would cause a ton of compatibility issues with no discernible advantage.

8
Peter Cordes On

There's no forseeable way for x86-64 to have a different page-size in the future; that would require a different page-table format. (Or for OSes to only support 2M largepages, but that's too large as a minimum size for most uses.)

Other ISAs do have a range of page-sizes to choose from but I agree with CherryDT's guess that it's unlikely Windows would change to a different page-size: you are far from the only person with a use-case where baking in a compile-time-constant page size would be convenient, so there's lots of software in the wild that would break.


Apparently your use-case is a custom binary format with machine-code + some read-only data. That's convenient: read-only data still works perfectly fine if it's in a read+exec page instead of a purely read-only page, even if code also executes from that page. If one page fits all of your code+data anyway (so you're not requiring 2 dTLB and/or 2 iTLB entries by mixing code+data when 1 each was possible with separation), the only downside is that you have some data in a page that didn't need to be executable. This makes it a possible target for ROP and Spectre gadgets, which is usually not a big deal especially if it's at a random address every run.

So you have two good options for the unlikely hypothetical future case of running on a system with a larger page size:

  • Arrange things so one of the VirtualProtect / mprotect system calls fails, leaving everything read+exec. e.g. memory-map the whole file READ|EXEC, then attempt to VirtualProtect the last n * 4K to just READ. It will fail if the offset relative to the start of the mapping wasn't a multiple of the page size.

    You can test this by building with page_size = 2048 or 1024 to test what happens if the run-time page-size is 2x or 4x what you built for. (With a mock-up binary blob I guess, not one that already has code and data separate by 4K).

    If you're not JITing new machine code or fixing up anything, it is best to just map the existing file so you have a file-backed mapping the OS can evict under memory pressure without paging out to the pagefile (like would happen with an anonymous allocation and then copying the file data into it.)

  • Or, the other strategy: Allocate more than 1 page of the actual runtime page-size, and load your binary data such that the split between its code and data falls on the split between pages. The first byte of the code won't be at the first byte of the allocation, so supporting that might require more tracking (two pointers instead of one), which is less desirable for something we expect never to happen anyway.

    For example, say 16K page size, and you're loading an 8K binary blob with 4K each of code and data: you could load it at 12K into a 32K allocation, so the split between code and data is across a 16K page boundary. This works for read+write data, not just constants.

    Having a read+write+exec page is possible if the OS doesn't enforce W^X (write exclusive-or exec). But on x86-64 hardware with self-modifying-code detection that nukes the pipeline if stores are too near code, you don't want read+write data in the same page as code. (Big-cores like Skylake have 64-byte cache-line granularity for SMC detection, but low-power cores might be coarser. Last I looked, Intel's optimization manual still recommended code and read+write data be in separate pages.)

    This split mapping scheme probably makes a file-backed mapping impossible, so that's another downside. For Linux mmap, the file offset for the start of a mapping has to be a multiple of the page-size, so the mapping can share pages with the pagecache. I assume Windows is the same.


My use-case is a custom compiler for a scripting-language backend. The language has a high-level bytecode interface, and gets compiled to native machine code as an optimization. This code gets shipped alongside a (game) executable. So that code I'm talking about cannot be linked into an existing executable (I also don't want to deal with creating actual DLLs with PE or something). I think the suggestion about letting VirtualProtect fail is a good one. I just need to make sure that in such a case, the exec-protection is not removed from the following code segment. Sounds doable. –

I still don't get why you wouldn't compile to machine-code + data in a .o or .obj that could get linked normally, instead of this custom format that you have to write your own loader for. Maybe there's more going on that makes that not a good option for some reason. Like maybe you want to select the right binary optimized for the hardware. Or maybe you just didn't want to deal with object file format handling to generate relocation metadata in the .o / .obj files that a linker fills in when linking them into the main executable binary.

But ok, sure. And it's not a JIT so running on a new machine couldn't just re-compile from the bytecode with data+code distance large enough to let them be in separate pages.

You could generate your binary blobs with 32K granularity or something for start-of-section, so larger page sizes won't be a problem. But in a simple file format, that means padding. After mapping the whole file, you could unmap the unused pages between code and data (which fails if the page-size is larger than you expect.)

Or to avoid padding in the file, start with a 64K mapping with read|exec to let the OS pick a free region with 64K of contiguous virtual address-space. Then do the equivalent of mmap(start+32K, ..., MAP_FIXED|MAP_PRIVATE, PROT_READ|PROT_WRITE) to map the same part of the same file in the second region, so the distance between code and data in memory is always 32K larger than the distance between their file offsets.

This means the code bytes are there in the data page but you don't use them, and the data bytes are there in the code page but you don't use them. Mapping the same page of file data twice is something Linux executables used to do, before ld started padding sections to keep bytes out of executable pages if they didn't need to be there.

On systems with 4K pages like we expect, all the code can be unmapped from the data mapping and vice versa. Unless the code+data total is under 4K, then you have two mappings of the same page.