So we have this tricky issue in Automa. The short version is that Automa.jl works by creating _very_ large machine-generated functions by computing some Expr
objects, interpolating them in, and eval
'ing the function. Currently, some expression are being interpolated many times in the function, which leads to excessively large code. That's bad for compilation, cache usage, you name it.
I'd like to factor it out, and I can think of two ways of doing it.
callq
and popq
instructions. But then I would need to be able to specify "push the memory location of this expression onto the stack", and I don't think that is possible.Is this a solvable problem in Julia?
Maybe I'm misunderstanding something, but does RuntimeGeneratedFunctions.jl work for you?
I don't think so, sorry. We can construct the function alright (and we always do it at compiletime), the issue is code duplication within the function
Ah, I see. Then you're really after some kind of common subexpression elimination (CSE)?
Not quite. Here are the details:
Automa.jl is a package for generating parsers, lexers and similar programs. The user inputs some regexes, and some Julia code (Expr
objects) that should be executed at each regex. Automa then computes the optimal finite state machine that parses the given regexes, and then creates a function that parses the code and executes the input Expr objects.
The issue is that while the user may specify a regex input once, that regex may correspond to many states in the state machine, and the result is that the input Expr object is interpolated many times into the large function
If it's possible, I'd suggest avoiding updating the state by re-assigning to the variables in the outer scope (reading from outer scopes should be totally fine). That is to say, instead of
x = 0
f() = x += 1
f()
I suggest
x = 0
g(x) = x + 1
x = g(x)
if possible. It's a very efficient approach. Purity works great for the compiler. This is why foldl
/reduce
-based iteration that I use in Transducers.jl is so powerful. It wouldn't be possible to all that stuff if I modeled the iteration as foreach
(i.e., equivalent to how f
is used) instead of foldl
(i.e., how g
is used).
Maybe you could use Base.Experimental.@opaque
and hope that Keno will implement enough optimizations you need. But, if you control codegen, maybe it isn't worth the risk.
That would be nice, but the content of these functions (the otherwise duplicated content) are supplied by the user. So to wrap them in ordinary scoped functions, I would need to, given some input Expr
object, calculate all the variables accessed (for the function signature), and the variables modified (for the return statement), which I think is really hard, no?
Just to clarify my understanding: when you say "functions without scope" you mean those should just evaluate as if the body was in place of the function call, right? Or did you mean a function that has no access to any outer scope?
The former
Pretty sure Pluto.jl has all the requisite tooling for that, considering it can wrap cells in functions automatically
Maybe restrict somehow user interface? Force user to provide all necessary information for input and output variables.
You can use this function later, even when you would be able to get information automatically.
@Sebastian Pfitzner Pluto.jl - yeah. that looks right. I'll look into it.
I'm a bit wary of changing the interface, since that would break all the code ever written for Automa :sweat_smile:
This way it would be possible to focus on internal implementation details and get proof of concept that the whole restructuring make sense, get some profiling and what's not.
The former sounds like you could pull this off with the gotos you're already using. Like the stack thing you mentioned but you don't keep an actual function stack but you just label all the occasions where a "function" is called and insert some makeshift return-to-callside logic in each function. I don't know how easy that would be. Do you know all the places where each of those functions may be called before?
Idea:
# keep a stack of callsides
push!(stack, :fun_A_caller1)
@goto fun_A
@label fun_A_caller1
.
.
push!(stack, :fun_A_caller2)
@goto fun_A
@label fun_A_caller2
.
.
# function A
@label fun_A
... do work ...
# return logic that would have to be generated
caller = pop!(stack)
caller === :fun_A_caller1 && @goto fun_A_caller1
caller === :fun_A_caller2 && @goto fun_A_caller2
Although I have no clue if this would please the compiler or not.
Ah yes, I could totally do it with @goto
, a manual stack, and some symbols (or perhaps integers) for figuring out where to jump. It's not quite as efficient I suppose, but maybe it'll be fine.
Ah, but then the problem is that executing the function would lead to allocating a vector for the stack, and some people have already used Automa to build functions with runtimes that are a few tens of nanoseconds.
That's true, it'd be a huge overhead for those but alleviate the scaling problems for larger things.
You should probably consider ripping Pluto's internals out and bathing in it's glory before you try that out :D
Good luck
Last updated: Dec 28 2024 at 04:38 UTC