Analog data compression technique

Compression for large captures including analog data takes an inordinate amount of time. If this is because noisy analog data is hard to compress, maybe this one simple trick may help:

We are faced with the same sort of problem with our data collection system which records analog physiological signals and can generate huge files. Our system collects 16 bit data. To compress it we run length encode the high bit of each sample in 16K blocks (the blocking is due to our internal data management), then the next MSB and so on down the bits until we reach a no win threshold.

The rational is that high order bits change less frequently than low order bits so run length encoding high order bits across samples is very effective. As you move down the bit order RLE becomes less effective until you reach a point where its not worth doing.

Overall this can give compression ratios ranging from essentially 1 (no win at all) to about 10x or more (fantastic win) depending on the nature of the signal. The algorithm is fairly straight forward and runs quickly.

1 Like

Thanks @P.Jaquiery for sharing that! I personally don’t quite know what compression algorithms we use, however, I’ll make sure your note reaches our software team.

That’s very interesting! Today we’re using zip, but with a fairly light compression effort level. We could probably find a faster compression implementation too.

This works fairly well until we’re looking at 100s of MBs of uncompressed data. Have you benchmarked your implementation at captures of that size?

I don’t have benchmarks, and without bench marking against the same data compressed using your current code on the same hardware my results wouldn’t be very meaningful. However the code runs in essentially linear time and consists of a tight loop that counts runs for each encoded bit. Something like:

int16_t samples[kBufferSize];
uint16_t lastBits = samples[0]; // Initialize lastBits to first sample's value
int runCount[16];

memset(runCount, 0, sizeof(runCount));

for (int index = 0; index < kBufferSize; ++index)
    {
    uint16_t mask = 0x8000;
    uint16_t delta = lastBits ^ static_cast<uint16_t &>(samples[index]);
    
    for (int bitIndex = 15; bitIndex >= kMinEncodeBit; --bitIndex)
        {
        if (~delta & mask)
            {
            // No Change. Bump the count and continue
            ++runCount[bitIndex];
            continue;
            }

        // Bit state changed. End of run
        lastBits ^= mask;
        // Handle run count runCount[bitIndex] output
        ...
        runCount[bitIndex] = 0;
        }
    }

Untested and incomplete code obviously, but should give some idea of how it might hang together.

1 Like

FWIW, I kind of like the RLE per bit idea. That’s creative! I’m not sure whether it’d be faster or more space efficient but it’s certainly an innovative strategy.

It seems there are two good options if compression speed is the most desirable thing → Google’s brotli and Facebook’s ZStandard. Both companies need to do a lot of compression and decompression so they made extra fast algorithms so they wouldn’t waste any extra time. Both of them have decent compression as well but their chief claim to fame is being very fast. Both are open source and have permissive licenses.

1 Like

The neat thing about the RLE bits technique is that it works well in a very domain specific. For general compression it is rubbish, but for compressing typical analog signal data where high order bits tend to be the same for long stretches it can work very well.

General purpose or text specific compression techniques look at groups of bytes to find frequent patterns and compress by replacing the patterns with a smaller representation. Noise in the low bits of signal data completely defeats byte or character pattern recognition sorts of compression systems.