Scopeless functions OR manipulate the stack · helpdesk (published)

So we have this tricky issue in Automa. The short version is that Automa.jl works by creating _very_ large machine-generated functions by computing some Expr objects, interpolating them in, and eval'ing the function. Currently, some expression are being interpolated many times in the function, which leads to excessively large code. That's bad for compilation, cache usage, you name it.

Robbie Rosati (Mar 28 2021 at 16:20):

Jakob Nybo Nissen (Mar 28 2021 at 16:22):

I don't think so, sorry. We can construct the function alright (and we always do it at compiletime), the issue is code duplication within the function

Robbie Rosati (Mar 28 2021 at 16:24):

Ah, I see. Then you're really after some kind of common subexpression elimination (CSE)?

Jakob Nybo Nissen (Mar 28 2021 at 16:27):

Not quite. Here are the details:
Automa.jl is a package for generating parsers, lexers and similar programs. The user inputs some regexes, and some Julia code (Expr objects) that should be executed at each regex. Automa then computes the optimal finite state machine that parses the given regexes, and then creates a function that parses the code and executes the input Expr objects.
The issue is that while the user may specify a regex input once, that regex may correspond to many states in the state machine, and the result is that the input Expr object is interpolated many times into the large function

Takafumi Arakaki (tkf) (Mar 28 2021 at 21:34):

If it's possible, I'd suggest avoiding updating the state by re-assigning to the variables in the outer scope (reading from outer scopes should be totally fine). That is to say, instead of

x = 0
f() = x += 1
f()

x = 0
g(x) = x + 1
x = g(x)

if possible. It's a very efficient approach. Purity works great for the compiler. This is why foldl/reduce-based iteration that I use in Transducers.jl is so powerful. It wouldn't be possible to all that stuff if I modeled the iteration as foreach (i.e., equivalent to how f is used) instead of foldl (i.e., how g is used).

Maybe you could use Base.Experimental.@opaque and hope that Keno will implement enough optimizations you need. But, if you control codegen, maybe it isn't worth the risk.

Jakob Nybo Nissen (Mar 29 2021 at 08:12):

That would be nice, but the content of these functions (the otherwise duplicated content) are supplied by the user. So to wrap them in ordinary scoped functions, I would need to, given some input Expr object, calculate all the variables accessed (for the function signature), and the variables modified (for the return statement), which I think is really hard, no?

Florian Große (Mar 29 2021 at 09:28):

Just to clarify my understanding: when you say "functions without scope" you mean those should just evaluate as if the body was in place of the function call, right? Or did you mean a function that has no access to any outer scope?

Jakob Nybo Nissen (Mar 29 2021 at 09:32):

Sebastian Pfitzner (Mar 29 2021 at 09:34):

Pretty sure Pluto.jl has all the requisite tooling for that, considering it can wrap cells in functions automatically

Kwaku Oskin (Mar 29 2021 at 09:38):

Maybe restrict somehow user interface? Force user to provide all necessary information for input and output variables.

You can use this function later, even when you would be able to get information automatically.

Jakob Nybo Nissen (Mar 29 2021 at 09:39):

@Sebastian Pfitzner Pluto.jl - yeah. that looks right. I'll look into it.
I'm a bit wary of changing the interface, since that would break all the code ever written for Automa :sweat_smile:

Kwaku Oskin (Mar 29 2021 at 09:40):

This way it would be possible to focus on internal implementation details and get proof of concept that the whole restructuring make sense, get some profiling and what's not.

Florian Große (Mar 29 2021 at 09:41):

The former sounds like you could pull this off with the gotos you're already using. Like the stack thing you mentioned but you don't keep an actual function stack but you just label all the occasions where a "function" is called and insert some makeshift return-to-callside logic in each function. I don't know how easy that would be. Do you know all the places where each of those functions may be called before?

# keep a stack of callsides
push!(stack, :fun_A_caller1)
@goto fun_A
@label fun_A_caller1
.
.
push!(stack, :fun_A_caller2)
@goto fun_A
@label fun_A_caller2
.
.
# function A
@label fun_A
... do work ...
# return logic that would have to be generated
caller = pop!(stack)
caller === :fun_A_caller1 && @goto fun_A_caller1
caller === :fun_A_caller2 && @goto fun_A_caller2

Jakob Nybo Nissen (Mar 29 2021 at 09:43):

Ah yes, I could totally do it with @goto, a manual stack, and some symbols (or perhaps integers) for figuring out where to jump. It's not quite as efficient I suppose, but maybe it'll be fine.

Jakob Nybo Nissen (Mar 29 2021 at 09:46):

Ah, but then the problem is that executing the function would lead to allocating a vector for the stack, and some people have already used Automa to build functions with runtimes that are a few tens of nanoseconds.

Florian Große (Mar 29 2021 at 09:49):

That's true, it'd be a huge overhead for those but alleviate the scaling problems for larger things.
You should probably consider ripping Pluto's internals out and bathing in it's glory before you try that out :D

Stream: helpdesk (published)

Topic: Scopeless functions OR manipulate the stack

Jakob Nybo Nissen (Mar 28 2021 at 16:12):

Robbie Rosati (Mar 28 2021 at 16:20):

Jakob Nybo Nissen (Mar 28 2021 at 16:22):

Robbie Rosati (Mar 28 2021 at 16:24):

Jakob Nybo Nissen (Mar 28 2021 at 16:27):

Takafumi Arakaki (tkf) (Mar 28 2021 at 21:34):

Jakob Nybo Nissen (Mar 29 2021 at 08:12):

Florian Große (Mar 29 2021 at 09:28):

Jakob Nybo Nissen (Mar 29 2021 at 09:32):

Sebastian Pfitzner (Mar 29 2021 at 09:34):

Kwaku Oskin (Mar 29 2021 at 09:38):

Jakob Nybo Nissen (Mar 29 2021 at 09:39):

Kwaku Oskin (Mar 29 2021 at 09:40):

Florian Große (Mar 29 2021 at 09:41):

Jakob Nybo Nissen (Mar 29 2021 at 09:43):

Jakob Nybo Nissen (Mar 29 2021 at 09:46):

Florian Große (Mar 29 2021 at 09:49):

Florian Große (Mar 29 2021 at 09:49):