Terminate called after throwing an instance of 'utf8::invalid_utf8'

Hi folks,

I am trying to debug an issue with our custom protocol analyzer.

The analyzer works fine under Logic 1.x

I have narrowed down the issue to the fact that I pass something through the AddTabularText which triggers the exception in the Logic code.

I have tried debug with GDB (ah good old 1.x times when you could do it without tricking…), but the exception is thrown from the Logic libraries:

Without debug symbols for that library I do not know how could I figure out what triggers the exception.

I have tried to indirectly prevent the issue from our side, but so far I have not succeeded.

I have dumped all strings going through the AddTabularText call to a file and tried:

iconv -f UTF-8 -t UTF-8 dbg.txt -o /dev/null

but it does not show any non UTF-8 characters.

I have also tried to validate the passed strings through GitHub - nemtrif/utfcpp: UTF-8 with C++ in a Portable Way but id did not detected any non UTF characters.

Hi @martonmiklos Sorry for the trouble with this!

This is a known crash, although as far as we knew it was only caused by actually invalid UTF-8 sequences.

I have a fix in progress, which handles the exception, however I haven’t decided exactly what we’re going to do about it (e.g. pop up a notification, ignore the frame, or insert an entry with some kind of error message)

If you are using FrameV2, then GetTabularText is not used. Instead, the values stored in the FrameV2 object are indexed in the database directly.

Specifically, only the string type FrameV2 members assume UTF-8 encoding. If you are using FrameV2 (by calling UseFrameV2() in your analyzer constructor, and then later adding FrameV2 entries) then strings you provide to AddString(...) need to be UTF-8 encoded. Also, all keys (column names) need to be valid UTF-8.

If you are not using FrameV2, then GetTabularText will be called, and it will need to provide valid utf-8 data to all strings that it adds with AddTabularText for all display radices.

If you’re sure that the strings involved are all valid UTF-8 sequences, then perhaps something is corrupting the strings along the way. We can help troubleshoot in two possible ways:

  1. If you can send us a core dump, we could load it into gdb over here with symbols, in which case there is a chance we’ll be able to extract the offending string.
  2. If you send us the analyzer source and a saved file that contains the problem, we could load the whole thing up over here in a debug build and figure out which frame has the problem, and what the contents are of that frame.

Hi Mark,

Thanks for the quick feedback.

This is a known crash, although as far as we knew it was only caused by actually invalid UTF-8 sequences.

Well this was my first thought, but I have put some efforts into validating that we do not output any invalid UTF-8.

I have a fix in progress, which handles the exception, however I haven’t decided exactly what we’re going to do about it (e.g. pop up a notification, ignore the frame, or insert an entry with some kind of error message)

I would vote for the last option. Fixing analyzers it in other cases could be quite difficult.

If you are using FrameV2, then GetTabularText is not used. Instead, the values stored in the FrameV2 object are indexed in the database directly.

We are using v1frames. I have some plans to move onto v2Frames, just I need to find some time to implement it.

If you can send us a core dump, we could load it into gdb over here with symbols, in which case there is a chance we’ll be able to extract the offending string.

ZIP password is: saleae

  1. If you send us the analyzer source and a saved file that contains the problem, we could load the whole thing up over here in a debug build and figure out which frame has the problem, and what the contents are of that frame.

I have no objections sharing the analyzer and test data privately, however it is a quite awkward animal: it builds with qmake, depends on some internal codegeneration tools what can be workarounded. Let me know if the core dump is not sufficient!

@markgarrison Have you had a chance to look into the coredump I shared?

Yes, I was able to load that right up, but after hours of digging into it, I was not able to extract the problem string/byte sequence. (50% chance it’s not in there, 50% chance my gdb skills aren’t good enough to find it)

We have a build now that will log UTF-8 errors to the application’s log, including the failing frame index and the UTF-8 error details. There is no user facing indication yet, it just doesn’t hard crash.

It’s not a great solution, but given the extreme rarity of this issue (as measured by crash reports) I can’t prioritize a more comprehensive, user facing fix.

If you write into support we can email you an out-of-band build of the software that contains the fix, as well as details about how to locate and use the log.

AddFrame() returns the frame ID when a new frame is added, which can be used to cross reference the failing frames. Since this error only impacts the data table, and not the actual results storage, you will have several options to track down the offending frame:

  1. if you implement GenerateExportFile and you include the frame IDs in your output file, you can right-click the analyzer and export. (this is independent of the data table export feature)
  2. You could save a capture with the issue, take note of the frame ID, then go back into your source code and add add more debugging for that specific frame ID, add a conditional breakpoint, etc., to debug further.

Hi @markgarrison

I have downloaded the out of band build and it loads the recorded results and the analyzer is not crashing the app.

Where can I find the applicaton logs? I have tried to run with --trace-warnings but nothing relevant is getting reported to the stdout.

Hi @martonmiklos

For reasons I have yet to understand, I think it’s impossible to get the logs to show up on the console on Linux. (I think there is a difference in how handles are inherited by sub-processes on that platform that I don’t completely understand)

Anyway, you can still get logs pretty easily, there will just be a few hoops to jump through.

On Linux, logs are written to ~/.config/Logic/logs. One directory is created each time the Logic app is launched, and the directory name includes the timestamp.

Inside each directory, there is one file named graphio.log, and up to 3 more files if the log file grew large enough to rotate.

To make things a little more annoying, those log files don’t flush very often, so you will likely want to flush them manually.

The easiest way to flush those log files is to click Help > Upload Logs. This will actually send us a copy of your log files, but it will also cause them to flush locally. Unfortunately, flushing the logs also appends a “traceback” to the log file, which is basically the last X messages from another log source that’s otherwise too noisy to include in the regular logs. I don’t think this will cause you any trouble though if you search for “Skipping frame”. Just a heads up that other log lines past the backtrace start point might be duplicated and are no longer in time order.

Example UTF8 Error:

[2025-10-21 16:10:30.702742] [W] [tid  40780] [analyzer] [analyzer_node.cpp:1862] Skipping frame 1789 due to invalid UTF8: Invalid UTF-8

There are two possible error types, invalid utf-8 and invalid code point. Invalid UTF-8 is probably much more likely, I’m not sure of the technical definition, but I think that just means the data doesn’t conform to the encoding standards. Invalid code point is a higher level error, probably indicating that the data stream did parse into a series of code units, but either a single code unit didn’t make sense, or a sequence of code units involved in a complicated character didn’t make sense.

In any event, the frame ID is what you want. The frame ID should be stable as long as every time you run the analyzer it runs over the same data with the same settings.

Let me know if you have any trouble with it!

Hi @markgarrison

I have managed to find the source of the issue: I had a case where uninitialized data had been passed to AddTabularText. I have no idea why I have not been able to catch it during my investigations.

Many thanks for your efforts, the analyzer works fine with the the stable releases now!

Glad to hear it!

I’d like to make this easier to debug. At the moment, the database indexing is done on a different thread, so by the time that the UTF-8 exception was thrown, any trace of the offending stack trace was long gone. In the past, we’ve opted to avoid extra verification in the add frame code paths, to avoid adding more overhead to the critical path of protocol analyzers, but given the difficulty in debugging, it may be nice to provide an optional checker that could be used to check frames before adding them.

The #1 bug we get with protocol analysis (at least that we hear about) is pushing back frames out of time order, something that doesn’t crash, but causes unusual, undefined behavior in the frame rendering system. This is something easy to catch by just tracking the last frame’s interval and comparing that to the next frame. These types of checks would probably save users from quite a bit of debugging.