Compression for large captures including analog data takes an inordinate amount of time. If this is because noisy analog data is hard to compress, maybe this one simple trick may help:
We are faced with the same sort of problem with our data collection system which records analog physiological signals and can generate huge files. Our system collects 16 bit data. To compress it we run length encode the high bit of each sample in 16K blocks (the blocking is due to our internal data management), then the next MSB and so on down the bits until we reach a no win threshold.
The rational is that high order bits change less frequently than low order bits so run length encoding high order bits across samples is very effective. As you move down the bit order RLE becomes less effective until you reach a point where its not worth doing.
Overall this can give compression ratios ranging from essentially 1 (no win at all) to about 10x or more (fantastic win) depending on the nature of the signal. The algorithm is fairly straight forward and runs quickly.
I don’t have benchmarks, and without bench marking against the same data compressed using your current code on the same hardware my results wouldn’t be very meaningful. However the code runs in essentially linear time and consists of a tight loop that counts runs for each encoded bit. Something like:
uint16_t lastBits = samples; // Initialize lastBits to first sample's value
memset(runCount, 0, sizeof(runCount));
for (int index = 0; index < kBufferSize; ++index)
uint16_t mask = 0x8000;
uint16_t delta = lastBits ^ static_cast<uint16_t &>(samples[index]);
for (int bitIndex = 15; bitIndex >= kMinEncodeBit; --bitIndex)
if (~delta & mask)
// No Change. Bump the count and continue
// Bit state changed. End of run
lastBits ^= mask;
// Handle run count runCount[bitIndex] output
runCount[bitIndex] = 0;
Untested and incomplete code obviously, but should give some idea of how it might hang together.
FWIW, I kind of like the RLE per bit idea. That’s creative! I’m not sure whether it’d be faster or more space efficient but it’s certainly an innovative strategy.
It seems there are two good options if compression speed is the most desirable thing → Google’s brotli and Facebook’s ZStandard. Both companies need to do a lot of compression and decompression so they made extra fast algorithms so they wouldn’t waste any extra time. Both of them have decent compression as well but their chief claim to fame is being very fast. Both are open source and have permissive licenses.
The neat thing about the RLE bits technique is that it works well in a very domain specific. For general compression it is rubbish, but for compressing typical analog signal data where high order bits tend to be the same for long stretches it can work very well.
General purpose or text specific compression techniques look at groups of bytes to find frequent patterns and compress by replacing the patterns with a smaller representation. Noise in the low bits of signal data completely defeats byte or character pattern recognition sorts of compression systems.