Please bear with me, this question is not going to be perfectly formed and/or may not have enough data for you to pin-point a cause. I am simply looking for ideas to continue solving this problem. Read as a horror story.
Problem Description
I have a C# program that interacts with an operator through button clicks, TCP/IP with a set of 4 barcode scanners, and some SQL. This is used in an somewhat-automated manufacturing setting. The barcode scanners come with a communications library to trigger barcode reading, as well as aggregate the data from 4 (or more) scanners into a single data stream to a client (my c# program). Each scanner provides the scanner ID as well as the scanned data, for example: 001:111111;004:444444;003:333333;002:222222.... 001, 004, 003 being the scanner's ID, while 111111, 222222, 333333, 444444 being the barcode data at those associated scanners.
I must apologize, you must be wondering why all these details but they come in play.
We run this program about 1000 times a day, mostly successfully. But at about 0.2% of the times, something unexpected happens.
Normal program flow (99.8%):
SQL Connection Open
User button press
Scanner trigger
Scanner returns data
SQL Operations (New code Registered)
Abnormal program flow (0.2%)
SQL Connection Open
User button press
Scanner trigger
Scanner returns incorrect data
SQL Operations
**Program rewinds back to start
SQL Connection Open
User button press bypassed
Scanner trigger bypassed
Scanner returns GOOD data
SQL Operations
Here is a captured sequence of events in log with bold comments:
SQL Connection Open.
K-----e Scanner LF Connect success? True
K-----e Scanner RF Connect success? True
K-----e Scanner LB Connect success? True
K-----e Scanner RB Connect success? True
New code Registered: 785889<=>819345 wrong data
New code Registered: 917890<=>481899 wrong data
New code Registered: 249447<=>999731 wrong data
New code Registered: 967082<=>386511 wrong data
New code Registered: 794079<=>772860 wrong data
New code Registered: 349467<=>421658 wrong data
New code Registered: 810132<=>525941 wrong data
New code Registered: 879309<=>105578 wrong data
SQL Connection Open. Rewind back to start of cycle, all without any user interaction
K-----e Scanner LF Connect success? True
K-----e Scanner RF Connect success? True
K-----e Scanner LB Connect success? True
K-----e Scanner RB Connect success? True
785889 is not unique. Data is good now, DB ops correctly since all scanned data was already inserted into DB
Already Exist 785889
819345 is not unique.
Already Exist 819345
917890 is not unique.
Already Exist 917890
525941 is not unique.
Already Exist 525941
249447 is not unique.
Already Exist 249447
105578 is not unique.
Already Exist 105578
967082 is not unique.
Already Exist 967082
481899 is not unique.
Already Exist 481899
794079 is not unique.
Already Exist 794079
421658 is not unique.
Already Exist 421658
349467 is not unique.
Already Exist 349467
772860 is not unique.
Already Exist 772860
810132 is not unique.
Already Exist 810132
386511 is not unique.
Already Exist 386511
879309 is not unique.
Already Exist 879309
999731 is not unique.
Already Exist 999731
Known Issues After debugging (which is difficult due to the 0.2% occurrence), the scanner communications library is implicated for the wrong (scrambled) data 001:222222;004:111111;003:222222;002:333333, etc. I am concerned that the data is bad, but I am much more concerned about the program rewind.
Question What mechanism(s) or conditions could result in repeated code execution, triggered by an external library in C# windows form? How can I detect and trap such events?
Conclusion My apologies for the long and incomplete description of this problem, I have included the information that I could gather in this question. It is certainly beyond normal to see this happen, and repeatedly. I hope to gather some information from your replies to help me further diagnose or fix this problem.
I have discussed this problem with my local scanner rep, but software libraries are provided as-is. These scanners are $10K each, but this is now a problem that I have to solve.
Here's one idea: The barcode scans are probably event-driven. If the events occur too closely to each other there's no guarantee that one will complete its query before the next event triggers a different query. There's an easy way to make the thread wait for an action to complete using a basic sychronization object in
System.Threading.SemaphoreSlim
.If an operation is already in progress, the new one won't begin until the first one completes and releases the semaphore. My suggestion would be to protect your critical sections in this manner and see it it helps.