High disk usage using automation API and analyzers

I ran into the issue that the automation API functions “add_analyzer” and “add_high_level_analyzer” only returns when they are finished, so I can only have one running while the capture is in progress. Also, the API lacks the possibility to uncheck “show in data table”. My capture files are 450s long (672MB), but it take about one hour to analyze one file. As I have two high speed UART analyzers, the fact that “show in data table” is always enabled by automation make it stream in my SSD tmp folder. Which make the analyze time rockets to all time high, as my disk is slower than RAM.

To speed the measurement up, I automated all the the measure, without any analyzer at all and recorded the captures with the automation api, which was really convenient and quick. So I have a moderate amount of .sal files. I then moved them on a computer with 160GB of RAM.

I did run 6 instances of logic, each with a different port in parallel, each instance being controled by a python script that open a capture, add the analyzers (having to wait it is finished before adding the next), save/export .csv, close the capture, open the next capture etc. Unfortunately, it filled 300GB on my SSD up to the point I had nearly no room left and I had to stop the instances.

I really don’t want to open each capture by hand, add the analyzers by hand (at least they can be added in parallel), then export result by hand as I have a great amount of files. Is there a way to solve that, so it doesn’t take ages to analyze and doesn’t fill up my SSD?

Also, I tried to load a preset of analyzer on an open capture file in the GUI (hoping it would add all my analyzers at once, so I don’t have to set them by hand), and it is not possible to do so.

@louled_97 Sorry about the insanely high RAM usage, and for having to work around that in quite a cumbersome manner! Let me check in with the team here to see how this might be better optimized. We’ll keep you updated with our findings/recommendations.

Hi @louled_97 Sorry for the trouble with this!

We’re going to need to discuss this internally with the engineering team here. Unfortunately I don’t have a quick fix I can suggest in the meantime.

We would like to try and reproduce this over here, could you send us a few more details?

  • Are you only analyzing UART, or are there other protocol analyzers involved? If so, which ones, and what settings?
  • What’s the UART baud rate, and can you estimate how many UART bytes are stored in your typical capture?
  • Do you just have 2 UART signals recorded, or are there more than that in a single capture?

The one thing I can suggest would be a bit of a project for you, but it would dramatically speed up processing, if you’re up for it. Our analyzers are open source, and you can find the async-serial analyzer source code here: GitHub - saleae/serial-analyzer: Saleae Asynchronous Serial Analyzer

If you wanted to, you could strip out the code that actually stores the analyzer results, and instead just write the results directly to a file while it’s running. You could even add an export file path to the settings interface, so you can set the path from the GUI or from the automation API. By removing the creation of Frames and FameV2s, you would bypass 100% of the result storage and data table indexing systems. The analyzer would then not consume any significant RAM, regardless of the number of results in the capture. Note that you would not be able to see any results in the Logic 2 software, but the analyzer would still report its progress, so you could tell when it’s complete. I’m not exactly sure how fast you could expect this to go, but it should be reasonable speedup to processing, and would cut the memory usage for results completely.

Happy to help with this too.

I did solve the issue in my side, just using 4instances of python and Logic. I let my computer working for days, and with 4instances I had enough SSD space for the tmp folder files.

I you want to reproduce, you can find a link here (31days):
exemple_files

Just open Logic on the default port and run the python script example.py from my .zip
I’ve given 2files, and one of 10s long only, so one can verify the script works rather quickly

I used the following config when recording the linked example:

record duration: 467.336s
memory buffer: 20GB (when measuring, it fills just over 3GB if I recall correctly)

digital fs: 50MS/s, 3.3V
7 digital channels: 1,4,5,8,9,10,11

analog fs: 6.25MS/s
1 analog channel: 15

channels description:
ch1 toggle each 1ms
ch4 uart, 36bytes/ms
ch4 uart, 2x 36bytes/ms
ch8,9,10,11 are always low, sampled every ch1 toggle so every ms
ch15 1.8Vpeak sine 60Hz

I then put an uart analyser on ch4, 10MHz, default settings
Then a second uart analyser one on ch5, 10MHz, default settings
Then a simple parallel analyser, dual edge, where the clock is ch1, and the data are ch8,9,10,11
Then I add my own HL analyser that use the UART output
Then a second HL analyser that use the other UART output

When using the GUI mode, the uart are relatively quick to reach “available in graph” and really really really slow to reach “available for search”.
When “available in graph” is green ticked, disabling “show in data table” immediatly put the analyzer as “green ticked”.
It seems my HL analyzer only need the uart to be “available in graph” to be able to finish and export the results.

I export the simple parallel analyzer data table in a .csv
Both the HL analyzer data tables in a single .csv
And the raw analog signal downsampled by 625 in a .csv

The inital .sal file is 909 MB

When all my analyzers are finished, I can see in C:\Users\foo.bar\AppData\Local\Temp
5 folders, containing each analyzers results, named analyzer_db.

  1. UART1: 4.49 GB
  2. UART2: 9.02 GB
  3. Parallel: 193 MB
  4. HLA1: 1.94 GB
  5. HLA2: 2.12 GB
    total: 17.7 GB

My 3 exported .csv are of the following size:
the parallel data: relays.csv: 12.3MB
the uart data: mcu1.csv: 326MB
analog.csv: 94.7MB
total: 433MB

Hi @louled_97, that helps a lot. Given the use of HLAs (high level analyzers), my suggestion of modifying the serial analyzer source would not have worked anyway, since that would have broken support for HLAs.

This is great feedback.

The first of the two progress percentages for analyzers shows the progress of processing the data and producing results. The second is for the indexing step required to make huge datasets searchable through the sidebar search box. Indexing is significantly more complex and slower than just producing the analyzer results.

I suspect you’re using the analyzer native export (the export from each analyzer’s menu) which does not require indexing. Exporting the data table does require indexing to complete. That export is what includes all analyzers (LLAs and HLAs) in a single export.

It’s true that high-level analyzers only require the LLA (low level analyzer) to reach “available in graph” to finish producing results, but I’m not sure what you exported if you turned off “show in data table”. HLAs don’t have their own individual export option like LLAs, they can only be exported through the data table. Can you check to see if the exports you have actually contain all the data you expect? I would recommend just looking at the timestamp of the last item in the export, then manually navigate to the last result in the display, and check to see if the timestamp matches. (you can add a timing marker at the location of the last frame to easily get the complete timestamp).

If you look into this, please let me know if you find that your export files are complete or not. The data table export option is supposed to be disabled until processing is complete, so hopefully the export file is complete.

Once the example_files finishes downloading, I’ll take a look at your python file to get a better picture of what you’re trying to do.

Hullo @markgarrison

If I open my big example file in GUI mode, add only one UART analyser with show in data table unticked, and then add one HLA with show in the data table ticked,
I have to wait for the HLA to finish to be able to export the data table in a .csv


This .csv is fully complete: it contains every decoded UART frame to the very last one.

After the export is done, if I tick show in data table in the UART analyser, it start at 35% or even less and it takes ages to complete while filling the tmp folder further.

The issue when using the automation API is as follow:
I don’t have the option to add the analyzer with show in data table disabled
I don’t have the option to add several analyzers in parallel, ie I have to add the UART, wait in show all data in the table, then my script can proceed to add the next UART, etc.

With the consequence that long measure take ages to decode.
Note that it doesn’t cause any immediate issue to me since I was able to analyze everything I needed to. But one could consider this nice additions for the future.