Remove noise and extreme values from data?

615 views Asked by At

I have a program that reads data over serial from an ADC on a PSoC.

The numbers are sent in the format <uint16>, inclusive of the '<' and '>' symbols, transmitted in binary format 00111100 XXXXXXXX XXXXXXXX 00111110 where the 'X's make up the 16 bit unsigned int.

Occasionally the read won't work very well and the program uses the binary data for the '>' symbol as part of its number resulting in the glitch as shown in this screenshot of 2500 samples (ignore the drop between samples 800 to 1500, that was me playing with the ADC input):

Screenshot

You can clearly see that the glitch causes the data to sample roughly the same value each time it happens.

The data is sent ten times a second, so what I was planning on doing was to take ten samples, remove any glitches (where the value is far away from the other samples) and then average the remaining values to smooth out the curve a bit. The output can go anywhere from 0 to 50000+ so I can't just remove values below a certain number.

I'm uncertain how to remove the values that are a long way out of the range of the other values in the 10-sample group, because there may be instances where there are two samples that are affected by this glitch. Perhaps there's some other way of fixing this glitchy data instead of just working around it!

What is the best way of doing this? Here's my code so far (this is inside the DataReceivedEvent method):

SerialPort sp = (SerialPort)sender; //set up serial port
byte[] spBuffer = new byte[4];
int indata = 0;

sp.Read(spBuffer, 0, 4);
indata = BitConverter.ToUInt16(spBuffer, 1);

object[] o = { numSamples, nudDutyCycle.Value, freqMultiplied, nudDistance.Value, pulseWidth, indata };
lock (dt)    //lock for multithread safety
{
    dt.Rows.Add(o); //add data to datatable
}
3

There are 3 answers

1
Matthew Watson On BEST ANSWER

I suspect your problem may be because you are reading less bytes from the serial port than you think you are.

For example, sp.Read(spBuffer, 0, 4); won't necessarily read 4 bytes. It could read 1, 2, 3 or 4 bytes (but never 0).

If you know you should be reading a certain number of bytes, try something like this:

public static void BlockingRead(SerialPort port, byte[] buffer, int offset, int count)
{
    while (count > 0)
    {
        // SerialPort.Read() blocks until at least one byte has been read, or SerialPort.ReadTimeout milliseconds
        // have elapsed. If a timeout occurs a TimeoutException will be thrown.
        // Because SerialPort.Read() blocks until some data is available this is not a busy loop,
        // and we do NOT need to issue any calls to Thread.Sleep().

        int bytesRead = port.Read(buffer, offset, count);
        offset += bytesRead;
        count -= bytesRead;
    }
}

If there's a timeout during the read, there should be a TimeoutException, so no need to put your own timeout in there.

Then change calls like this:

sp.Read(spBuffer, 0, 4);

To this:

BlockingRead(sp, spbuffer, 0, 4);
2
secret squirrel On

A common method in engineering is add a damping function. A damping function basically acts on the differential of a parameter, i.e. the difference between successive values. There are no hard and fast rules about how to choose a damping function and mostly they are tweaked to produce a reasonable result.

So in your case what that means is that you compare the latest value with the one previous to it. If it is greater than a certain amount, either default the latest value to the previous one or reduce the latest value by some fixed factor, say 10% or 1%. That way you don't lose information but also don't have sudden jumps and glitches.

1
Gediminas Masaitis On

First of all, I would strongly suggest to just fix the parsing issue, then you won't have to worry about glitch values.

However, if you still decide to go down the route of fixing data afterwards: I see all the glitched data is around a certain value: ~16000. In fact, judging from the graph, I'd say it's almost identical every time. You could simply ignore the data which is in the glitched value range (you would have to do some testing to find the exact bounds), and use the last non-glitched value instead.