Stream: helpdesk (published)

Topic: [Tangent] (Un)Structured concurrency


view this post on Zulip Kwaku Oskin (Mar 07 2021 at 06:17):

Sorry for derailing this conversation, but why are you saying that one should never write "rogue" async operations? This is contradicts the idea, that was in https://julialang.org/blog/2019/07/multithreading/#how_to_use_it Sort example shows how to use spawn without @sync , and I suppose it cannot be written with one.

view this post on Zulip Takafumi Arakaki (tkf) (Mar 07 2021 at 06:31):

(I detached the tangent starting from my comment https://julialang.zulipchat.com/#narrow/stream/274208-helpdesk-%28published%29/topic/how.20does.20.60Threads.2ECondition.60.20work.3F/near/229137749)

view this post on Zulip Takafumi Arakaki (tkf) (Mar 07 2021 at 06:32):

You don't see @sync in the blog post because the core devs are not buying the idea of so-called structured concurrency https://en.wikipedia.org/wiki/Structured_concurrency

view this post on Zulip Takafumi Arakaki (tkf) (Mar 07 2021 at 06:32):

There is a discussion in GitHub issue: https://github.com/JuliaLang/julia/issues/33248

view this post on Zulip Takafumi Arakaki (tkf) (Mar 07 2021 at 06:37):

It is perhaps fair to say I'm an extremist in the discussion of structured concurrency. But considering the "convergent evolution" in the space of concurrency programming (ref Python, Kotlin, Java, etc.), it's just too interesting topic to ignore.

view this post on Zulip Kwaku Oskin (Mar 07 2021 at 06:37):

Ah, I've seen that thread before, but I thought that you propose structured concurrency as an additional optional layer, not as a complete switch from one paradigm to another.

It's easier to understand complex concepts on example. Just to be more concrete (sorry if I am asking too much), how sort example could be written in the structured concurrency paradigm? Presuming that we added necessary keywords (they can be explained on the go).

view this post on Zulip Kwaku Oskin (Mar 07 2021 at 06:40):

Oh, just read first few sentences from wiki and I have additional question. What should we do if task should never complete? I am thinking of web server and producer/consumers pattern, when we spawn bunch of workers and they continuously process requests that are coming from external world.

view this post on Zulip Kwaku Oskin (Mar 07 2021 at 06:42):

Or even, to make it simple, some cron like task, which should run indefinitely and produce signals to external world.

view this post on Zulip Kwaku Oskin (Mar 07 2021 at 06:43):

Or to other parts of my application.

view this post on Zulip Takafumi Arakaki (tkf) (Mar 07 2021 at 07:09):

how sort example could be written in the structured concurrency paradigm

The relevant part of the sort example would be

@sync begin
    @spawn psort!(v, lo, mid)
    psort!(v, mid+1, hi)
end

To be fair, @sync allocates a Channel so you might see a performance penalty for it. But I'd say it's just a missed optimization opportunity, rather than a fundamental limitation. Also, sorting is an example where we can (potentially) optimize the code even more, by using a parallel programming model that is a strict subset of structured concurrency (Ref: my PR for it https://github.com/JuliaLang/julia/pull/39773)

What should we do if task should never complete?

If you write "server-like" software, your server starts all the tasks that are required for it to function. So, there is no need to have unstructured concurrency that leaks tasks. When the server ends, no tasks are needed to exist afterward.

If you have a sub-component of your server that wants to leak tasks (e.g., a request handler of a web server initiating a background work on a request) you can use the "nursery passing style." I'm not sure what's the best resource explaining it in a language-agnostic manner. Here is a relevant part in Trio's (Python) documentation: https://trio.readthedocs.io/en/stable/reference-core.html#spawning-tasks-without-becoming-a-parent

view this post on Zulip Kwaku Oskin (Mar 08 2021 at 06:55):

This is a very interesting reading, thank you. I am trying to understand how to apply it to a real world examples.

I am currently writing reminder bot for zulip, and I run exactly into this sort of problems. Or maybe it's not a problem, or maybe it's not a problem yet. Code is here: https://github.com/Arkoniak/ZulipReminderBot.jl/blob/master/src/ZulipReminderBot.jl#L80-L133

Idea of the bot is simple: user is writing message in chat "remind me in 4 hours" and in 4 hours bot sends a message. Solution (it is prototype now, only skeleton of all things that needs to be done) is quite simple, I have three workers:

As you can see from the code, I basically do it like this

function run()
  @async cron()
  @async messenger()
  main()
end

And they communicate through various channels.

Now the question is, how it should be structured according to the principles of structured concurrency and what benefits I get here?

Just rewriting everything as

function run()
  @sync begin
    @async cron()
    @async messenger()
    main()
  end
end

doesn't look like much of improvement.

view this post on Zulip Takafumi Arakaki (tkf) (Mar 10 2021 at 00:07):

Without @sync, you don't notice exceptions unless you have other mechanisms implemented manually and consciously. I believe it makes bug finding, debugging, and error recovery hard. In sequential programs, you have to try hard to ignore an error. With unstructured concurrency, programs are ignoring errors by default.

view this post on Zulip Kwaku Oskin (Mar 10 2021 at 12:00):

Don't get me wrong, I am not arguing with the idea, quite the contrary! I want to explore these uncharted regions and ReminderBot looks like a good toy, where I would be glad to apply it.

The only problem is that it hasn't clicked yet in my head, how to do it properly. I've read through https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/ and Trio tutorial https://trio.readthedocs.io/en/stable/tutorial.html. Also I looked at the PR you've mentioned, but it's too much about IR so it is beyond my understanding.

I do understand that it is more a plans for the future, but if there is anything (even experimental) like a Trio, that can be used as a foundation for structured concurrency, I'll be glad to try it.

I have only some pieces of puzzle in my head. First of all, it seems that @sync is not exactly the same as nursery in Trio. I suppose that equivalent code in Julia should look like

function child1()
   println("  child1: started! sleeping now...")
   sleep(1)
   println("  child1: exiting!")
end

function child2()
   println("  child2: started! sleeping now...")
   sleep(1)
   println("  child2: exiting!")
end

function parent()
  println("parent: started")
  open_nursery() do nursery
     println("parent: spawning child")
     start_soon(nursery, child1)

     println("parent: spawning child")
     start_soon(nursery, child2)
  end
  println("parent: all done!")
end

Where open_nursery and start_soon has all the magic.
But I do not quite understand how these functions should look internally.
And maybe there is no need to copy python so literally and @async and @sync is already enough, only they should be used properly?

view this post on Zulip Takafumi Arakaki (tkf) (Mar 10 2021 at 18:50):

(Ah, sorry if my comment sounded too harsh. I just wanted to write a concise reply.)

view this post on Zulip Takafumi Arakaki (tkf) (Mar 10 2021 at 18:52):

But I do not quite understand how these functions should look internally.

Yeah, I think it'd be tricky to implement them. I've been thinking to implement Concurrent ML which provides composable synchronizations (think Go's select but extensible). Concurrent ML itself is not structured but it looks like we can build robust cancellation like Trio on top of it (https://doi.org/10.1145/996893.996849).

And maybe there is no need to copy python so literally and @async and @sync is already enough, only they should be used properly?

Yeah, I don't think we need to change the surface syntax. @sync/@async sound perfectly fine to me.

view this post on Zulip Kwaku Oskin (Mar 10 2021 at 19:02):

Thing that bothers me about @sync/@async is that it hides context inside. Both Trio author and Elizarov in his article uses idea of context. At least nursery looks like context to me in the light of the idea that you can pass nursery around to care about child that you do not have time to care about.

Maybe I am wrong, but when I looked into the code of @sync , it uses constant sync_varname which looks very context to me.

const sync_varname = gensym(:sync)

macro sync(block)
    var = esc(sync_varname)
    quote
        let $var = Channel(Inf)
            v = $(esc(block))
            sync_end($var)
            v
        end
    end
end

So, I was thinking, may be using something like

macro sync(nursery, block)
    var = esc(nursery)
    quote
        let $var = Channel(Inf)
            v = $(esc(block))
            sync_end($var)
            v
        end
    end
end

gives the same context notion? Of course, one need much more than just passing more than one channel around.

view this post on Zulip Takafumi Arakaki (tkf) (Mar 10 2021 at 19:24):

With your 2-arg @sync and maybe something like

macro getnursery()
    esc(sync_varname)
end

we can do

function f(nursery)
    @sync nursery begin
        @sync println("hello")
    end
end

function g()
    @sync begin
        f(@getnursery)
        f(@getnursery)
    end
end

Last updated: Nov 22 2024 at 04:41 UTC