I want to extract a path from the following registry key:
HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\ComDlg32\OpenSavePidlMRU
I didn't find any documentation, but it seems that the value contains some IDLIST that can be converted with the GetPathFromIDList() function that accepts an ITEMIDLIST and a flag. If I'm right, and trust me I'm not very confident, there is a way to convert bytes into a ITEMIDLIST, but which one?
This question comes from an answer to my earlier question "Binding a Pidl with function BindToObject?". I have read the documentation from this question: How to retrieve the last folder used by OpenFileDialog?.
Disclaimer: What the question is asking for is inherently unsafe. Indeed, it's so incredibly unsafe that you might as well just write it in C++ instead. This has the benefit that no one is going to assume—by accident—that the code were safe. Or sound. Or even correct.
⚠️The following is provided for educational purposes only. Do not use this code. Anywhere. Ever.⚠️
Addressing the literal question on how to convert a sequence of bytes into an
ITEMIDLISTfirst: "Conversion" is not required, you just need to reinterpret the data. Assuming that you hold the data in aVec<u8>you can get a pointer to its first element, and cast it to a pointer to the desired target type:While getting a pointer and reinterpreting it is safe, handing it off to an API call which will eventually dereference it requires a bit more thoughtfulness. If we were to pass this pointer and a buffer size, we could relax and lean back, knowing that the system can arrange to prevent out-of-bounds memory accesses.
But we will be passing just a pointer, and the system relies on the pointed to data to be formatted in a particular way, so that it knows when to stop reading.
<detour>
Item ID lists are encoded similar to C-style strings: A pointer to the first element, with an—otherwise unused—value set aside to act as a terminator. A client receiving either one will continue reading elements until it discovers a value that matches the terminator, marking the end of the sequence.
A meaningful difference is that, while the individual elements in a C-style string are of fixed size, they are variably sized in case of an item ID list, where the length of each element is encoded in the element itself.
For an
ITEMIDLISTthe element is of typeSHITEMID, with a rough transliteration into C99 looking something like this:In other words: A two-byte size prefix followed by a binary blob of arbitrary length. This encoding allows clients to determine element boundaries without having to know what the binary data means. Since
cbincludes the two bytes of the size prefix, any value less than2is invalid, and anSHITEMIDwith value0x0000is used as the terminator.</detour>
The crucial point is that unlike C-style strings, item ID lists have internal structure. Verifying the integrity of the internal structure requires knowing the overall size of the data, so we cannot delegate this to the system that only receives a pointer.
Since we cannot trust the data, the following function ensures that the segments can be iterated, so that the system can determine whether it is garbage, without risking to go out of bounds in doing so:
This doesn't make the data any more trustworthy. It's still straight out of an arbitrary, untrusted data store. But at least the system can trust the structure behind the pointer.
That said, any solution is better than this.
Full sample code below.
Cargo.toml:
src/main.rs: