Stream: helpdesk (published)

Topic: Detect world age boundary


view this post on Zulip DrChainsaw (Jul 08 2022 at 14:33):

Is there any way to detect or get a callback when the world age is advanced?

Context is that I'm generating struct definitions from serialized data. I was previously using Dicts but users complained it was too slow. It is workable as is with either caveat that the user is responsible for hitting top level or using Base.invokelatest, but the former is not so user friendly and the latter has significant overhead (e.g 100 ns becomes 300 ns).

Ideally I would like to either hide the defined structs until the world age advances (or if there is some other trigger for when they are available) or (preferably) have some hook which removes Base.invokelatestwhen the world age has advanced.

view this post on Zulip Sukera (Jul 08 2022 at 14:34):

Not that I know of

view this post on Zulip Sukera (Jul 08 2022 at 14:35):

generating struct definitions from serialized data

Are you certain that you need to generate the definitions at runtime, not just parse them into existing structs?

view this post on Zulip DrChainsaw (Jul 08 2022 at 14:36):

Unfortunately yes :(

view this post on Zulip DrChainsaw (Jul 08 2022 at 14:36):

It is a horrible format where the struct definitions are embedded in the data

view this post on Zulip DrChainsaw (Jul 08 2022 at 14:37):

So the approach is to take a pass over the data and generate the struct definitions.

view this post on Zulip Sukera (Jul 08 2022 at 14:37):

I.. uh.. huh.

view this post on Zulip Sukera (Jul 08 2022 at 14:38):

I'd advise to run, but that doesn't sound like an option

view this post on Zulip Sukera (Jul 08 2022 at 14:38):

is the format really undefined, or "just" arbitrarily nested, but consisting of some simple base blocks everything is built out of?

view this post on Zulip DrChainsaw (Jul 08 2022 at 14:40):

Nope, it is basically C structs (unions, dynamically sized arrays, the whole shebang) with the struct definition basically embedded in a header

view this post on Zulip Sukera (Jul 08 2022 at 14:42):

dynamically sized arrays stored inline in the struct? hoh boy

view this post on Zulip Sukera (Jul 08 2022 at 14:43):

do you have a concrete example?

view this post on Zulip DrChainsaw (Jul 08 2022 at 14:45):

Maybe dynamic sized is sloppy terminology. What i mean is that the size is not a hardcoded constant but is instead given as a member of the struct (which ofc must be placed before the array or else the struct would be unreadble even in C I guess).

view this post on Zulip Sukera (Jul 08 2022 at 14:46):

but is the array actually stored inline in the C struct or rather as a pointer to a block of memory of the correct size?

view this post on Zulip DrChainsaw (Jul 08 2022 at 14:46):

No idea, but in the versions I have a serialized so all data is in a consecutive stream

view this post on Zulip Sukera (Jul 08 2022 at 14:47):

that sounds like a serialization strategy, rather than actually how the data is laid out in memory in C

view this post on Zulip Sukera (Jul 08 2022 at 14:47):

you don't want to serialize pointers for someone else to consume - they can't do anything with your pointers after all

view this post on Zulip DrChainsaw (Jul 08 2022 at 14:48):

Writing the decoding was pretty straight forward. Main problem is that nested Dicts turned out too slow. I have considered NamedTuples, but I'd like to be able to dispatch on the result so I though I should go all the way and generate structs.

view this post on Zulip Sukera (Jul 08 2022 at 14:48):

ok, let me ask differently - are the number of fields in the C structs fixed?

view this post on Zulip Sukera (Jul 08 2022 at 14:48):

on a per-struct basis

view this post on Zulip DrChainsaw (Jul 08 2022 at 14:49):

that sounds like a serialization strategy, rather than actually how the data is laid out in memory in C

Yes, I don't care how the data is layed out in the C application.

view this post on Zulip Sukera (Jul 08 2022 at 14:49):

Yes, I don't care how the data is layed out in the C application.

but julia does, since that's probably what you want to mimic for deserializing (albeit not 1:1, since it sounds like you want to parse an array of known length and just have a struct with a Vector typed field, instead of a Ptr)

view this post on Zulip DrChainsaw (Jul 08 2022 at 14:50):

ok, let me ask differently - are the number of fields in the C structs fixed?

Yes, for a certain type of a struct. There are different types and even different versions of the same type (e.g. one form rev A of the software and then the same struct in rev B of the software).

view this post on Zulip Sukera (Jul 08 2022 at 14:51):

ok, then that's all you need for the structs on the julia side, no?

view this post on Zulip DrChainsaw (Jul 08 2022 at 14:51):

but julia does, since that's probably what you want to mimic for deserializing

Absolutely, and that part I have gotten down already.

view this post on Zulip DrChainsaw (Jul 08 2022 at 14:51):

ok, then that's all you need for the structs on the julia side, no?

Yup, that is what the current code does.

view this post on Zulip Sukera (Jul 08 2022 at 14:51):

for any given revision, the number of fields and their sizes is known

view this post on Zulip Sukera (Jul 08 2022 at 14:52):

what I'm saying is that you can generate the structs for each version ahead of time, since their sizes are known

view this post on Zulip DrChainsaw (Jul 08 2022 at 14:52):

for any given revision, the number of fields and their sizes is known

Sure. For example, in the current code I cache the struct definitions based on revision and identity so the number of struct definitions is much smaller than the total number of structs I need to deserialize.

view this post on Zulip Sukera (Jul 08 2022 at 14:52):

as in, the bit sizes. That the arras are of arbitrary length doesn't really matter, since they're (from what you told me) not stored inline with the struct itself

view this post on Zulip DrChainsaw (Jul 08 2022 at 14:54):

what I'm saying is that you can generate the structs for each version ahead of time, since their sizes are known

Yes, this is what I'm doing now. I'm investigating if there is some way I can improve the user experience here since I guarantee that the first thing a user will do when I release this improvement is to call both the struct generating call as well as the deserialization call in the same function.

view this post on Zulip Sukera (Jul 08 2022 at 14:54):

Why are you insisting on generating the struct when the user requests it, instead of ahead of time, before the user even gets your code?

view this post on Zulip Sukera (Jul 08 2022 at 14:55):

You seemingly know the library & the struct sizes that are possible, can't you generate the struct definitions way before then and just load them as regular julia source code?

view this post on Zulip DrChainsaw (Jul 08 2022 at 14:55):

Why are you insisting on generating the struct when the user requests it, instead of ahead of time, before the user even gets your code?

There are way too many different possible structs and software revisions to do this practically.

view this post on Zulip Sukera (Jul 08 2022 at 14:56):

Are your users expected to work with two different revisions of the software at the same time?

view this post on Zulip DrChainsaw (Jul 08 2022 at 14:57):

Yes. In particular they are expected to get a blob of data to analyze without any knowledge about which version of the software generated it.

view this post on Zulip Sukera (Jul 08 2022 at 14:58):

..Oo

view this post on Zulip Sukera (Jul 08 2022 at 14:58):

short of generating a lookup table based on some version identifier early on in the stream, I can't think of how you'd even do that in any language

view this post on Zulip Sukera (Jul 08 2022 at 14:59):

dynamic languages like julia or python have an advantage in that they don't generally have to care about the layout of stuff in memory ("just type it Any and look it up at runtime"), but that leads to bad performance, as you found out

view this post on Zulip DrChainsaw (Jul 08 2022 at 15:00):

I agree. I guess Base.invokelatest still gives me about a 200 or so times speedup compared to the Dict approach so it is a decent fallback.

view this post on Zulip Sukera (Jul 08 2022 at 15:00):

What you're describing is more or less arbitrary ABI conversion, which in general is not possible without a stringent contract about the stream of data and being VERY CAREFUL not to have incompatible regressions.

view this post on Zulip Sukera (Jul 08 2022 at 15:01):

Are there any statically compiled libraries doing the same task you're trying to do?

view this post on Zulip Sukera (Jul 08 2022 at 15:01):

They'll generally have the same limitations as julia here, if you want performance.

view this post on Zulip DrChainsaw (Jul 08 2022 at 15:01):

Yes, but they are doing it in runtime as well and are about 1000 times slower than my package

view this post on Zulip Sukera (Jul 08 2022 at 15:01):

Either way, I'd suggest structuring your code in such a way that users themselves don't have to run either invokelatest nor generate the types themselves.

view this post on Zulip Sukera (Jul 08 2022 at 15:02):

Yes, but they are doing it in runtime as well and are about 1000 times slower than my package

Then you're running into fundamentally unsolved problems in computer engineering, I'm afraid

view this post on Zulip Sukera (Jul 08 2022 at 15:02):

you can consider yourself lucky that the approach is already as fast as it is

view this post on Zulip DrChainsaw (Jul 08 2022 at 15:04):

I'm fully aware that there is no magic to get the types known at runtime.

The thing that annoys me just a little bit is that 99% users will run it like this from the REPL

julia> result = readfile("somefile.blob")

julia> plot(get(result, :thisOrThatStruct.somedata)

Where generating the structs during readfile but not using them until get is called would work without performance penalty.

view this post on Zulip DrChainsaw (Jul 08 2022 at 15:04):

I am very lucky indeed. Just as the Julia creators I'm also greedy :)

view this post on Zulip Sukera (Jul 08 2022 at 15:05):

is readfile your function?

view this post on Zulip Sukera (Jul 08 2022 at 15:05):

if not, I'd suggest putting the generation code there

view this post on Zulip DrChainsaw (Jul 08 2022 at 15:05):

Yes, bad name example maybe. Real function has a more obtuse name

view this post on Zulip Sukera (Jul 08 2022 at 15:06):

either way, you will not get around world age

view this post on Zulip Sukera (Jul 08 2022 at 15:06):

there is no world age callback because the increased world age is only visible to existing code once you return to top level scope

view this post on Zulip Sukera (Jul 08 2022 at 15:06):

so even if you define new structs with eval, no existing code will ever be able to call that, unless you go the invokelatest route or return to top level scope

view this post on Zulip Sukera (Jul 08 2022 at 15:08):

if we had a callback like that, we'd more or less have to allow arbitrary code to run at each eval.. which would be very scary and a great source of indeterminism

view this post on Zulip DrChainsaw (Jul 08 2022 at 15:10):

Yeah, I realize there might be some hard constraints here to what looks like a solveable problem from the outside. I mean, someone must know the world age or else it would not be possible to assert on it.

An ugly solution I have thought about is to just have a try catch, maybe come up with some way to cache the result of the try-catch from the entry point if possible.

view this post on Zulip Sukera (Jul 08 2022 at 15:16):

that would not help you - you'd have to return to the top level again or call invokelatest in the catch block (which is going to be slower than just calling invokelatest directly, since you have to try first, only offering a speedup when you have the struct already)

view this post on Zulip Sukera (Jul 08 2022 at 15:17):

if you want a taste for why this is a difficult problem and are not afraid of dissertations, I can recommend Jeff Bezanson's Phd thesis and another paper that's formalized the world age mechanism into a calculus

view this post on Zulip DrChainsaw (Jul 08 2022 at 15:30):

It would help me in the 99% of cases when people do as above. For the remaining 1% I could print a warning that things might be a bit slower than they have to be. Fully aware that I can't cheat the wordage here.

Anyways, I think I can work with what I have here. Maybe just have a manual command to clean out invokelatest will be just good enough.

view this post on Zulip Sebastian Pfitzner (Jul 08 2022 at 17:15):

Couldn't you also use named tuples everywhere?

view this post on Zulip Sukera (Jul 08 2022 at 17:27):

I imagine that would prevent convenient dispatch

view this post on Zulip DrChainsaw (Jul 08 2022 at 17:42):

I imagine that would prevent convenient dispatch

This is indeed the main reason why I didn't go for them. I was also abit worried by exploding compile times (I have seen structs which cover a few pages when fully unrolled) but TBH I don't know if structs really have an advantage over tuples w.r.t this. I think the generation code I have can be easily converted to making a named tuple so I should perhaps just give it a shot. That will have to be an exercise for monday though...

view this post on Zulip Sukera (Jul 08 2022 at 17:43):

at some point, large structs do perform better than the equivalent tuple would, I think

view this post on Zulip DrChainsaw (Jul 11 2022 at 09:41):

Fwiw, I ended up doing this for now:

    currentworld = ccall(:jl_get_tls_world_age, UInt, ())
    all(methods(f)) do fmethod
        fmethod.primary_world <= currentworld
    end

Context: Firstly, this was never about trying to cheat the world age boundary. Its just providing convenience to end users so I don't need to educate them about world ages. Previously the cost of this was perpetual slow speed, and now it is me having to maintain things like the above shenanigans which seems like a good tradeoff for the speed improvement I got.

The above code is only called for functions wrapped in another callable struct. Due to the nature of the problem, I maintain a cache of metadata to struct constructors so as soon as the expression above returns true, the constructors in the cache are unwrapped. If the check fails, it does Base.invokelatest.

There is also a way to call the constructors in the wrapped struct so that the above check is bypassed. Some mock examples:

julia> result = read_horrible_struct_format_file("some.file") # this generates struct definitions

julia>  plot(get(result.someData)) # Will do the above world age check, and unwrap all constructors in the cache

julia> plot(get(result.someOtherData)) # No checks performed, call constructor directly

julia> function readandplot(file)
             result = read_horrible_struct_format_file(file)
             plot(get(result.someData)) # Warns the user about slow speed, including some tips for mitigation, then does Base.invokelatest
             plot(get(NoWorldCheck(), result.someData)) # Just calls Base.invokelatest
       end

The package which does this is pretty much a "leaf" package meant for interactive analysis, so the first example above is how almost all users use it.


Last updated: Nov 22 2024 at 04:41 UTC