Detect world age boundary · helpdesk (published)

Context is that I'm generating struct definitions from serialized data. I was previously using Dicts but users complained it was too slow. It is workable as is with either caveat that the user is responsible for hitting top level or using Base.invokelatest, but the former is not so user friendly and the latter has significant overhead (e.g 100 ns becomes 300 ns).

Ideally I would like to either hide the defined structs until the world age advances (or if there is some other trigger for when they are available) or (preferably) have some hook which removes Base.invokelatestwhen the world age has advanced.

Sukera (Jul 08 2022 at 14:34):

Sukera (Jul 08 2022 at 14:35):

Are you certain that you need to generate the definitions at runtime, not just parse them into existing structs?

DrChainsaw (Jul 08 2022 at 14:36):

DrChainsaw (Jul 08 2022 at 14:37):

So the approach is to take a pass over the data and generate the struct definitions.

Sukera (Jul 08 2022 at 14:37):

Sukera (Jul 08 2022 at 14:38):

is the format really undefined, or "just" arbitrarily nested, but consisting of some simple base blocks everything is built out of?

DrChainsaw (Jul 08 2022 at 14:40):

Nope, it is basically C structs (unions, dynamically sized arrays, the whole shebang) with the struct definition basically embedded in a header

Sukera (Jul 08 2022 at 14:42):

Sukera (Jul 08 2022 at 14:43):

DrChainsaw (Jul 08 2022 at 14:45):

Maybe dynamic sized is sloppy terminology. What i mean is that the size is not a hardcoded constant but is instead given as a member of the struct (which ofc must be placed before the array or else the struct would be unreadble even in C I guess).

Sukera (Jul 08 2022 at 14:46):

but is the array actually stored inline in the C struct or rather as a pointer to a block of memory of the correct size?

DrChainsaw (Jul 08 2022 at 14:46):

No idea, but in the versions I have a serialized so all data is in a consecutive stream

Sukera (Jul 08 2022 at 14:47):

that sounds like a serialization strategy, rather than actually how the data is laid out in memory in C

Sukera (Jul 08 2022 at 14:47):

you don't want to serialize pointers for someone else to consume - they can't do anything with your pointers after all

DrChainsaw (Jul 08 2022 at 14:48):

Writing the decoding was pretty straight forward. Main problem is that nested Dicts turned out too slow. I have considered NamedTuples, but I'd like to be able to dispatch on the result so I though I should go all the way and generate structs.

Sukera (Jul 08 2022 at 14:48):

DrChainsaw (Jul 08 2022 at 14:49):

Sukera (Jul 08 2022 at 14:49):

but julia does, since that's probably what you want to mimic for deserializing (albeit not 1:1, since it sounds like you want to parse an array of known length and just have a struct with a Vector typed field, instead of a Ptr)

DrChainsaw (Jul 08 2022 at 14:50):

Yes, for a certain type of a struct. There are different types and even different versions of the same type (e.g. one form rev A of the software and then the same struct in rev B of the software).

Sukera (Jul 08 2022 at 14:51):

DrChainsaw (Jul 08 2022 at 14:51):

Sukera (Jul 08 2022 at 14:51):

Sukera (Jul 08 2022 at 14:52):

what I'm saying is that you can generate the structs for each version ahead of time, since their sizes are known

DrChainsaw (Jul 08 2022 at 14:52):

Sure. For example, in the current code I cache the struct definitions based on revision and identity so the number of struct definitions is much smaller than the total number of structs I need to deserialize.

Sukera (Jul 08 2022 at 14:52):

as in, the bit sizes. That the arras are of arbitrary length doesn't really matter, since they're (from what you told me) not stored inline with the struct itself

DrChainsaw (Jul 08 2022 at 14:54):

Yes, this is what I'm doing now. I'm investigating if there is some way I can improve the user experience here since I guarantee that the first thing a user will do when I release this improvement is to call both the struct generating call as well as the deserialization call in the same function.

Sukera (Jul 08 2022 at 14:54):

Why are you insisting on generating the struct when the user requests it, instead of ahead of time, before the user even gets your code?

Sukera (Jul 08 2022 at 14:55):

You seemingly know the library & the struct sizes that are possible, can't you generate the struct definitions way before then and just load them as regular julia source code?

DrChainsaw (Jul 08 2022 at 14:55):

There are way too many different possible structs and software revisions to do this practically.

Sukera (Jul 08 2022 at 14:56):

Are your users expected to work with two different revisions of the software at the same time?

DrChainsaw (Jul 08 2022 at 14:57):

Yes. In particular they are expected to get a blob of data to analyze without any knowledge about which version of the software generated it.

Sukera (Jul 08 2022 at 14:58):

short of generating a lookup table based on some version identifier early on in the stream, I can't think of how you'd even do that in any language

Sukera (Jul 08 2022 at 14:59):

dynamic languages like julia or python have an advantage in that they don't generally have to care about the layout of stuff in memory ("just type it Any and look it up at runtime"), but that leads to bad performance, as you found out

DrChainsaw (Jul 08 2022 at 15:00):

I agree. I guess Base.invokelatest still gives me about a 200 or so times speedup compared to the Dict approach so it is a decent fallback.

Sukera (Jul 08 2022 at 15:00):

What you're describing is more or less arbitrary ABI conversion, which in general is not possible without a stringent contract about the stream of data and being VERY CAREFUL not to have incompatible regressions.

Sukera (Jul 08 2022 at 15:01):

Are there any statically compiled libraries doing the same task you're trying to do?

Sukera (Jul 08 2022 at 15:01):

They'll generally have the same limitations as julia here, if you want performance.

DrChainsaw (Jul 08 2022 at 15:01):

Yes, but they are doing it in runtime as well and are about 1000 times slower than my package

Sukera (Jul 08 2022 at 15:01):

Either way, I'd suggest structuring your code in such a way that users themselves don't have to run either invokelatest nor generate the types themselves.

Sukera (Jul 08 2022 at 15:02):

Then you're running into fundamentally unsolved problems in computer engineering, I'm afraid

Sukera (Jul 08 2022 at 15:02):

DrChainsaw (Jul 08 2022 at 15:04):

The thing that annoys me just a little bit is that 99% users will run it like this from the REPL

julia> result = readfile("somefile.blob")

julia> plot(get(result, :thisOrThatStruct.somedata)

Where generating the structs during readfile but not using them until get is called would work without performance penalty.

DrChainsaw (Jul 08 2022 at 15:04):

Sukera (Jul 08 2022 at 15:05):

DrChainsaw (Jul 08 2022 at 15:05):

Sukera (Jul 08 2022 at 15:06):

there is no world age callback because the increased world age is only visible to existing code once you return to top level scope

Sukera (Jul 08 2022 at 15:06):

so even if you define new structs with eval, no existing code will ever be able to call that, unless you go the invokelatest route or return to top level scope

Sukera (Jul 08 2022 at 15:08):

if we had a callback like that, we'd more or less have to allow arbitrary code to run at each eval.. which would be very scary and a great source of indeterminism

DrChainsaw (Jul 08 2022 at 15:10):

Yeah, I realize there might be some hard constraints here to what looks like a solveable problem from the outside. I mean, someone must know the world age or else it would not be possible to assert on it.

An ugly solution I have thought about is to just have a try catch, maybe come up with some way to cache the result of the try-catch from the entry point if possible.

Sukera (Jul 08 2022 at 15:16):

that would not help you - you'd have to return to the top level again or call invokelatest in the catch block (which is going to be slower than just calling invokelatest directly, since you have to try first, only offering a speedup when you have the struct already)

Sukera (Jul 08 2022 at 15:17):

if you want a taste for why this is a difficult problem and are not afraid of dissertations, I can recommend Jeff Bezanson's Phd thesis and another paper that's formalized the world age mechanism into a calculus

DrChainsaw (Jul 08 2022 at 15:30):

It would help me in the 99% of cases when people do as above. For the remaining 1% I could print a warning that things might be a bit slower than they have to be. Fully aware that I can't cheat the wordage here.

Anyways, I think I can work with what I have here. Maybe just have a manual command to clean out invokelatest will be just good enough.

Sebastian Pfitzner (Jul 08 2022 at 17:15):

Sukera (Jul 08 2022 at 17:27):

DrChainsaw (Jul 08 2022 at 17:42):

This is indeed the main reason why I didn't go for them. I was also abit worried by exploding compile times (I have seen structs which cover a few pages when fully unrolled) but TBH I don't know if structs really have an advantage over tuples w.r.t this. I think the generation code I have can be easily converted to making a named tuple so I should perhaps just give it a shot. That will have to be an exercise for monday though...

Sukera (Jul 08 2022 at 17:43):

at some point, large structs do perform better than the equivalent tuple would, I think

DrChainsaw (Jul 11 2022 at 09:41):

    currentworld = ccall(:jl_get_tls_world_age, UInt, ())
    all(methods(f)) do fmethod
        fmethod.primary_world <= currentworld
    end

Context: Firstly, this was never about trying to cheat the world age boundary. Its just providing convenience to end users so I don't need to educate them about world ages. Previously the cost of this was perpetual slow speed, and now it is me having to maintain things like the above shenanigans which seems like a good tradeoff for the speed improvement I got.

The above code is only called for functions wrapped in another callable struct. Due to the nature of the problem, I maintain a cache of metadata to struct constructors so as soon as the expression above returns true, the constructors in the cache are unwrapped. If the check fails, it does Base.invokelatest.

There is also a way to call the constructors in the wrapped struct so that the above check is bypassed. Some mock examples:

julia> result = read_horrible_struct_format_file("some.file") # this generates struct definitions

julia>  plot(get(result.someData)) # Will do the above world age check, and unwrap all constructors in the cache

julia> plot(get(result.someOtherData)) # No checks performed, call constructor directly

julia> function readandplot(file)
             result = read_horrible_struct_format_file(file)
             plot(get(result.someData)) # Warns the user about slow speed, including some tips for mitigation, then does Base.invokelatest
             plot(get(NoWorldCheck(), result.someData)) # Just calls Base.invokelatest
       end

The package which does this is pretty much a "leaf" package meant for interactive analysis, so the first example above is how almost all users use it.

Stream: helpdesk (published)

Topic: Detect world age boundary

DrChainsaw (Jul 08 2022 at 14:33):

Sukera (Jul 08 2022 at 14:34):

Sukera (Jul 08 2022 at 14:35):

DrChainsaw (Jul 08 2022 at 14:36):

DrChainsaw (Jul 08 2022 at 14:36):

DrChainsaw (Jul 08 2022 at 14:37):

Sukera (Jul 08 2022 at 14:37):

Sukera (Jul 08 2022 at 14:38):

Sukera (Jul 08 2022 at 14:38):

DrChainsaw (Jul 08 2022 at 14:40):

Sukera (Jul 08 2022 at 14:42):

Sukera (Jul 08 2022 at 14:43):

DrChainsaw (Jul 08 2022 at 14:45):

Sukera (Jul 08 2022 at 14:46):

DrChainsaw (Jul 08 2022 at 14:46):

Sukera (Jul 08 2022 at 14:47):

Sukera (Jul 08 2022 at 14:47):

DrChainsaw (Jul 08 2022 at 14:48):

Sukera (Jul 08 2022 at 14:48):

Sukera (Jul 08 2022 at 14:48):

DrChainsaw (Jul 08 2022 at 14:49):

Sukera (Jul 08 2022 at 14:49):

DrChainsaw (Jul 08 2022 at 14:50):

Sukera (Jul 08 2022 at 14:51):

DrChainsaw (Jul 08 2022 at 14:51):

DrChainsaw (Jul 08 2022 at 14:51):

Sukera (Jul 08 2022 at 14:51):

Sukera (Jul 08 2022 at 14:52):

DrChainsaw (Jul 08 2022 at 14:52):

Sukera (Jul 08 2022 at 14:52):

DrChainsaw (Jul 08 2022 at 14:54):

Sukera (Jul 08 2022 at 14:54):

Sukera (Jul 08 2022 at 14:55):

DrChainsaw (Jul 08 2022 at 14:55):

Sukera (Jul 08 2022 at 14:56):

DrChainsaw (Jul 08 2022 at 14:57):

Sukera (Jul 08 2022 at 14:58):

Sukera (Jul 08 2022 at 14:58):

Sukera (Jul 08 2022 at 14:59):

DrChainsaw (Jul 08 2022 at 15:00):

Sukera (Jul 08 2022 at 15:00):

Sukera (Jul 08 2022 at 15:01):

Sukera (Jul 08 2022 at 15:01):

DrChainsaw (Jul 08 2022 at 15:01):

Sukera (Jul 08 2022 at 15:01):

Sukera (Jul 08 2022 at 15:02):

Sukera (Jul 08 2022 at 15:02):

DrChainsaw (Jul 08 2022 at 15:04):

DrChainsaw (Jul 08 2022 at 15:04):

Sukera (Jul 08 2022 at 15:05):

Sukera (Jul 08 2022 at 15:05):

DrChainsaw (Jul 08 2022 at 15:05):

Sukera (Jul 08 2022 at 15:06):

Sukera (Jul 08 2022 at 15:06):

Sukera (Jul 08 2022 at 15:06):

Sukera (Jul 08 2022 at 15:08):

DrChainsaw (Jul 08 2022 at 15:10):

Sukera (Jul 08 2022 at 15:16):

Sukera (Jul 08 2022 at 15:17):

DrChainsaw (Jul 08 2022 at 15:30):

Sebastian Pfitzner (Jul 08 2022 at 17:15):

Sukera (Jul 08 2022 at 17:27):

DrChainsaw (Jul 08 2022 at 17:42):

Sukera (Jul 08 2022 at 17:43):

DrChainsaw (Jul 11 2022 at 09:41):