Sorry for derailing this conversation, but why are you saying that one should never write "rogue" async operations? This is contradicts the idea, that was in https://julialang.org/blog/2019/07/multithreading/#how_to_use_it Sort example shows how to use spawn without @sync
, and I suppose it cannot be written with one.
(I detached the tangent starting from my comment https://julialang.zulipchat.com/#narrow/stream/274208-helpdesk-%28published%29/topic/how.20does.20.60Threads.2ECondition.60.20work.3F/near/229137749)
You don't see @sync
in the blog post because the core devs are not buying the idea of so-called structured concurrency https://en.wikipedia.org/wiki/Structured_concurrency
There is a discussion in GitHub issue: https://github.com/JuliaLang/julia/issues/33248
It is perhaps fair to say I'm an extremist in the discussion of structured concurrency. But considering the "convergent evolution" in the space of concurrency programming (ref Python, Kotlin, Java, etc.), it's just too interesting topic to ignore.
Ah, I've seen that thread before, but I thought that you propose structured concurrency as an additional optional layer, not as a complete switch from one paradigm to another.
It's easier to understand complex concepts on example. Just to be more concrete (sorry if I am asking too much), how sort example could be written in the structured concurrency paradigm? Presuming that we added necessary keywords (they can be explained on the go).
Oh, just read first few sentences from wiki and I have additional question. What should we do if task should never complete? I am thinking of web server and producer/consumers pattern, when we spawn bunch of workers and they continuously process requests that are coming from external world.
Or even, to make it simple, some cron
like task, which should run indefinitely and produce signals to external world.
Or to other parts of my application.
how sort example could be written in the structured concurrency paradigm
The relevant part of the sort example would be
@sync begin
@spawn psort!(v, lo, mid)
psort!(v, mid+1, hi)
end
To be fair, @sync
allocates a Channel
so you might see a performance penalty for it. But I'd say it's just a missed optimization opportunity, rather than a fundamental limitation. Also, sorting is an example where we can (potentially) optimize the code even more, by using a parallel programming model that is a strict subset of structured concurrency (Ref: my PR for it https://github.com/JuliaLang/julia/pull/39773)
What should we do if task should never complete?
If you write "server-like" software, your server starts all the tasks that are required for it to function. So, there is no need to have unstructured concurrency that leaks tasks. When the server ends, no tasks are needed to exist afterward.
If you have a sub-component of your server that wants to leak tasks (e.g., a request handler of a web server initiating a background work on a request) you can use the "nursery passing style." I'm not sure what's the best resource explaining it in a language-agnostic manner. Here is a relevant part in Trio's (Python) documentation: https://trio.readthedocs.io/en/stable/reference-core.html#spawning-tasks-without-becoming-a-parent
This is a very interesting reading, thank you. I am trying to understand how to apply it to a real world examples.
I am currently writing reminder bot for zulip, and I run exactly into this sort of problems. Or maybe it's not a problem, or maybe it's not a problem yet. Code is here: https://github.com/Arkoniak/ZulipReminderBot.jl/blob/master/src/ZulipReminderBot.jl#L80-L133
Idea of the bot is simple: user is writing message in chat "remind me in 4 hours" and in 4 hours bot sends a message. Solution (it is prototype now, only skeleton of all things that needs to be done) is quite simple, I have three workers:
cron
which is running indefinitely and just verify if any of the messages are due to be reminded of.messenger
which is doing actual deliverymain
which is process incoming messages and sends them to a cron
.As you can see from the code, I basically do it like this
function run()
@async cron()
@async messenger()
main()
end
And they communicate through various channels.
Now the question is, how it should be structured according to the principles of structured concurrency and what benefits I get here?
Just rewriting everything as
function run()
@sync begin
@async cron()
@async messenger()
main()
end
end
doesn't look like much of improvement.
Without @sync
, you don't notice exceptions unless you have other mechanisms implemented manually and consciously. I believe it makes bug finding, debugging, and error recovery hard. In sequential programs, you have to try hard to ignore an error. With unstructured concurrency, programs are ignoring errors by default.
Don't get me wrong, I am not arguing with the idea, quite the contrary! I want to explore these uncharted regions and ReminderBot looks like a good toy, where I would be glad to apply it.
The only problem is that it hasn't clicked yet in my head, how to do it properly. I've read through https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/ and Trio tutorial https://trio.readthedocs.io/en/stable/tutorial.html. Also I looked at the PR you've mentioned, but it's too much about IR so it is beyond my understanding.
I do understand that it is more a plans for the future, but if there is anything (even experimental) like a Trio, that can be used as a foundation for structured concurrency, I'll be glad to try it.
I have only some pieces of puzzle in my head. First of all, it seems that @sync
is not exactly the same as nursery
in Trio. I suppose that equivalent code in Julia should look like
function child1()
println(" child1: started! sleeping now...")
sleep(1)
println(" child1: exiting!")
end
function child2()
println(" child2: started! sleeping now...")
sleep(1)
println(" child2: exiting!")
end
function parent()
println("parent: started")
open_nursery() do nursery
println("parent: spawning child")
start_soon(nursery, child1)
println("parent: spawning child")
start_soon(nursery, child2)
end
println("parent: all done!")
end
Where open_nursery
and start_soon
has all the magic.
But I do not quite understand how these functions should look internally.
And maybe there is no need to copy python so literally and @async
and @sync
is already enough, only they should be used properly?
(Ah, sorry if my comment sounded too harsh. I just wanted to write a concise reply.)
But I do not quite understand how these functions should look internally.
Yeah, I think it'd be tricky to implement them. I've been thinking to implement Concurrent ML which provides composable synchronizations (think Go's select
but extensible). Concurrent ML itself is not structured but it looks like we can build robust cancellation like Trio on top of it (https://doi.org/10.1145/996893.996849).
And maybe there is no need to copy python so literally and
@async
and@sync
is already enough, only they should be used properly?
Yeah, I don't think we need to change the surface syntax. @sync
/@async
sound perfectly fine to me.
Thing that bothers me about @sync/@async
is that it hides context inside. Both Trio
author and Elizarov in his article uses idea of context. At least nursery looks like context to me in the light of the idea that you can pass nursery
around to care about child that you do not have time to care about.
Maybe I am wrong, but when I looked into the code of @sync
, it uses constant sync_varname
which looks very context to me.
const sync_varname = gensym(:sync)
macro sync(block)
var = esc(sync_varname)
quote
let $var = Channel(Inf)
v = $(esc(block))
sync_end($var)
v
end
end
end
So, I was thinking, may be using something like
macro sync(nursery, block)
var = esc(nursery)
quote
let $var = Channel(Inf)
v = $(esc(block))
sync_end($var)
v
end
end
end
gives the same context notion? Of course, one need much more than just passing more than one channel around.
With your 2-arg @sync
and maybe something like
macro getnursery()
esc(sync_varname)
end
we can do
function f(nursery)
@sync nursery begin
@sync println("hello")
end
end
function g()
@sync begin
f(@getnursery)
f(@getnursery)
end
end
Last updated: Nov 22 2024 at 04:41 UTC