Memory corruption in System.Move due to changed 8087CW mode (png + stretchblt)

1.7k views Asked by At

I have strange a memory corruption problem. After many hours debugging and trying I think I found something.

For example: I do a simple string assignment:

sTest := 'SET LOCK_TIMEOUT ';

However, the result sometimes becomes:

sTest = 'SET LOCK'#0'TIMEOUT '

So, the _ gets replaced by an 0 byte.

I have seen this happening once (reproducing is tricky, dependent on timing) in the System.Move function, when it uses the FPU stack (fild, fistp) for fast memory copy (in case of 9 till 32 bytes to move):

...
@@SmallMove: {9..32 Byte Move}
fild    qword ptr [eax+ecx] {Load Last 8}
fild    qword ptr [eax] {Load First 8}
cmp     ecx, 8
jle     @@Small16
fild    qword ptr [eax+8] {Load Second 8}
cmp     ecx, 16
jle     @@Small24
fild    qword ptr [eax+16] {Load Third 8}
fistp   qword ptr [edx+16] {Save Third 8}
...

Using the FPU view and 2 memory debug views (Delphi -> View -> Debug -> CPU -> Memory) I saw it going wrong... once... could not reproduce however...

This morning I read something about the 8087CW mode, and yes, if this is changed into $27F I get memory corruption! Normally it is $133F:

The difference between $133F and $027F is that $027F sets up the FPU for doing less precise calculations (limiting to Double in stead of Extended) and different infiniti handling (which was used for older FPU’s, but is not used any more).

Okay, now I found why but not when!

I changed the working of my AsmProfiler with a simple check (so all functions are checked at enter and leave):

if Get8087CW = $27F then    //normally $1372?
  if MainThreadID = GetCurrentThreadId then  //only check mainthread
    DebugBreak;

I "profiled" some units and dll's and bingo (see stack):

Windows.StretchBlt(3372289943,0,0,514,345,4211154027,0,0,514,345,13369376)
pngimage.TPNGObject.DrawPartialTrans(4211154027,(0, 0, 514, 345, (0, 0), (514, 345)))
pngimage.TPNGObject.Draw($7FF62450,(0, 0, 514, 345, (0, 0), (514, 345)))
Graphics.TCanvas.StretchDraw((0, 0, 514, 345, (0, 0), (514, 345)),$7FECF3D0)
ExtCtrls.TImage.Paint
Controls.TGraphicControl.WMPaint((15, 4211154027, 0, 0))

So it is happening in StretchBlt...

What to do now? Is it a fault of Windows, or a bug in PNG (included in D2007)? Or is the System.Move function not failsafe?

Note: simply trying to reproduce does not work:

  Set8087CW($27F);
  sSQL := 'SET LOCK_TIMEOUT ';

It seems to be more exotic... But by debugbreak on 'Get8087CW = $27F' I could reproduce it on an other string: FPU part 1: FPU part 1 FPU part 2: FPU part 2 FPU part 3: FPU part 3 FPU Final: corrupt!: FPU Final: corrupt!

Note 2: Maybe the FPU stack must be cleared in the System.Move?

4

There are 4 answers

2
Zoë Peterson On BEST ANSWER

I haven't seen this particular issue, but Move can definitely get messed up if the FPU is in a bad state. Cisco's VPN driver can screw things up horribly, even if you're not doing anything network related.

http://brianorr.blogspot.com/2006/11/intel-pentium-d-floating-point-unit.html [broken]

https://web.archive.org/web/20160601043520/http://www.dankohn.com/archives/343

http://blog.excastle.com/2007/08/28/delphi-bug-of-the-day-fpu-stack-leak/ (comments by Ritchie Annand)

In our case we detect the buggy VPN driver and swap out Move and FillChar with the Delphi 7 versions, replace IntToStr with a Pascal version (Int64-version uses the FPU), and, since we're using FastMM, we disable its custom fixed size move routines too, since they're even more susceptible than System.Move.

0
Jeroen Wiert Pluimers On

It might be a bug in your video driver that does not preserve the 8087 control word when it performs the StretchBlt operation.
In the past I have seen similar behaviour when using certain printer drivers. They think they own the 8087 CW and are wrong...

Note the default value of the 8087 CW in Delphi seems $1372; for a more detailed explanation of the CW values, see this article: it also explains a situation that Michael Justin described when his 8087CW got hosed.

--jeroen

3
JensG On

For those still interested in this: There's yet another possible cause of problems:

Delphi Rio still ships with a broken ASM version of Move.

I had the pleasure to run into that bug today, luckily enough I had a reproducible test case. The issue is this piece of code:

* ***** BEGIN LICENSE BLOCK *****
 *
 * The assembly function Move is licensed under the CodeGear license terms.
 *
 * The initial developer of the original code is Fastcode
 *
 * Portions created by the initial developer are Copyright (C) 2002-2004
 * the initial developer. All Rights Reserved.
 *
 * Contributor(s): John O'Harrow
 *
 * ***** END LICENSE BLOCK ***** *)

// ... some less interesting parts omitted ...

@@LargeMove:
        JNG     @@LargeDone {Count < 0}
        CMP     EAX, EDX
        JA      @@LargeForwardMove

        // the following overlap test is broken
        // when size>uint(destaddr), EDX underflows to $FFxxxxxx, in which case 
        // we jump to @LargeForwardMove even if a backward loop would be appropriate
        // this will effectively shred everything at EDX + size
        SUB     EDX, ECX              // when this underflows ...
        CMP     EAX, EDX              // ... we also get CF=1 here (EDX is usually < $FFxxxxxx)
        LEA     EDX, [EDX+ECX]        // (does not affect flags)
        JNA     @@LargeForwardMove    // ... CF=1 so let's jump into disaster!

        SUB     ECX, 8 {Backward Move}
        PUSH    ECX
        FILD    QWORD PTR [EAX+ECX] {Last 8}
        FILD    QWORD PTR [EAX] {First 8}
        ADD     ECX, EDX
        AND     ECX, -8 {8-Byte Align Writes}
        SUB     ECX, EDX

References

0
André On

Just for your information (in case some else has same problem too): we did an upgrade of our software for a customer, and the complete touchscreen locked up when our application was started! Windows was completely frozen! The pc had to be restarted (power off). It took some time to figure out the cause of the complete freeze.

Fortunately we had one (only 1!) stacktrace of an AV in FastMove.LargeSSEMove. I disabled the usage of SSE in fastmove, and the problem is gone.

By the way: touchscreen has an VIA Nehemiah cpu with an S3 chipset.

So not only you can get memory corruptions when using the FPU, but also a complete freeze!