I'm failing to understand a specific scenario in which my C++ multi-threaded application (running on a Linux machine, Wind River 6.x) is facing a segmentation fault.
I know the concept of segmentation fault and even went over this post and also this one but failed to encounter a scenario similar to mine and/or an answer to my question, so I'm posting this question.
My code that generates the segmentation fault is as follows (abbreviated and simplified):
// MyStruct* pMyStruct is a function argument that arrives to the function and at some point of time
// being set to NULL
ASSERT_PTR_NE(pMyStruct, NULL); <--- this assertion is logged to my application log (meaning, at this line, pMyStruct is NULL)
int someInt = pMyStruct->someIntOfMyStruct; <--- this line does NOT create the segmentation fault
double someDouble = pMyStruct->someDoubleOfMyStruct; <--- this line ALSO does NOT create the segmentation fault
ASSERT_NUM_EQ(pMyStruct->someIntOfMyStruct, SOME_INT_VALUE_TO_CHECK); <--- this line DOES create the segmentation fault
As mentioned in the last code line, the 4th line of code is the "last line" that my application is executing (I guess) --> when examining the core file with GDB, frame 0 of the core file indicates that this line is the line that causes the crash.
My questions are if so:
How come the 2nd and 3rd lines of code of my application did not cause segmentation fault?
What exactly takes place, system wise, i.e. - in the OS and the application from the moment the NULL was accessed (in the first line) until the application is being terminated by the OS? Meaning, is it possible that indeed the actual segmentation fault was raised due to the 1st line, YET, for some reason, until the OS actually took the decision and action to terminate the application, also lines 2-4 were executed and when arriving to the 4th line the application "again" raised segmentation fault?
Or, perhaps, is it possible that what actually took place here is an overrun of the pMyStruct variable - meaning, after the first line that does the assert (and prints info to the log file of the application) another thread set the pMyStruct to NON NULL value, thus "allowing" lines 2-3 to run WITHOUT causing a crash and then JUST before line 4 was executed the pMyStruct was "overrun" by another thread and was set to NULL thus, this time causing line 4 to crash?
Typically, an OS creates a segmentation fault after the CPU faults on an address. The CPU doesn't know why the fault happened. It might be that the memory is paged out to disk, but for this question we're assuming a bad pointer. The OS knows it's a bad pointer because the address doesn't correspond to any paged-out memory. Hence, the OS tells the CPU it is handling the situation, and tells the CPU to continue execution in the
signal
handler.The C++ null pointer isn't special to the CPU. It just so happens that the OS by convention does not allocate RAM at this address.
By C++ standards, your code has Undefined Behavior, and that allows "time travel". More accurately, to allow optimizations, compilers may shuffle around code in the assumption that Undefined Behavior does not happen. It would seem that lines 2 & 3 are shuffled after line 4. You can't detect this in a correct C++ program.
This is not how a typical CPU sees it. Modern CPU's also shuffle around instructions internally, like compilers do, but when the CPU reports the fault to the OS it will pretend that all instructions happened in the right order.