Stream: helpdesk (published)

Topic: How to preserve the collection type with `tcollect`?


view this post on Zulip Júlio Hoffimann (Feb 08 2024 at 21:20):

The Transducers.jl tcollect function is super useful, but it seems to always return a Vector regardless of the input type.

Is there a way to preserve the input type with Transducers.jl or Base.Threads?

julia> using Transducers

julia> using CircularArrays

julia> tcollect((1,2,3))
3-element Vector{Int64}:
 1
 2
 3

julia> tcollect([1,2,3])
3-element Vector{Int64}:
 1
 2
 3

julia> tcollect(CircularVector([1,2,3]))
3-element Vector{Int64}:
 1
 2
 3

view this post on Zulip Júlio Hoffimann (Feb 08 2024 at 21:22):

The underlying issue is that we have a couple of map(fun, iter) that we would like to parallelize. But replacing map with tcollect doesn't work as expected.

view this post on Zulip Mason Protter (Feb 08 2024 at 22:26):

Yeah, this is something Transducers.jl doesn't really handle well. I think the right function here is tcopy since that doesn't try to follow the behaviour of collect

view this post on Zulip Mason Protter (Feb 08 2024 at 22:27):

but trying your examples with tcopy, none of them work still

view this post on Zulip Mason Protter (Feb 08 2024 at 22:27):

which I guess we should classify as bugs that need to be fixed.

view this post on Zulip Mason Protter (Feb 08 2024 at 22:27):

The circular arrays one could potentially be a package extension

view this post on Zulip Mason Protter (Feb 08 2024 at 22:28):

but this is in general a hard problem

view this post on Zulip Mason Protter (Feb 08 2024 at 22:29):

I would suggest though that if your problem is so time consuming that it requires parallelism, then maybe it's not so bad to have to do a convert after, i.e.

julia> Tuple(tcollect((1,2,3)))
(1, 2, 3)

julia> CircularVector(tcollect(CircularVector([1,2,3])))
3-element CircularVector(::Vector{Int64}):
 1
 2
 3

view this post on Zulip Mason Protter (Feb 08 2024 at 22:29):

but I get that's also annoying

view this post on Zulip Júlio Hoffimann (Feb 08 2024 at 22:33):

Yes, we ended up following this route.

view this post on Zulip Júlio Hoffimann (Feb 08 2024 at 22:35):

It would be super nice if these things worked. Also, is it too big of a dream to imagine a future where we can just replace map by a pmap and choose the form of parallelism? Threads vs GPU threads vs processes...

view this post on Zulip Júlio Hoffimann (Feb 08 2024 at 22:35):

I wish we had a general pmap that performed all sorts of parallelism with keyword options

view this post on Zulip Mason Protter (Feb 08 2024 at 22:39):

Also, is it too big of a dream to imagine a future where we can just replace map by a pmap and choose the form of parallelism? Threads vs GPU threads vs processes...

That's what Transducers.jl / Folds.jl already does. They provide sequential, threaded, and distributed backends

view this post on Zulip Mason Protter (Feb 08 2024 at 22:39):

Taka worked on a GPU backend but it's broken now

view this post on Zulip Mason Protter (Feb 08 2024 at 22:39):

would be great if we could revive it

view this post on Zulip Júlio Hoffimann (Feb 08 2024 at 22:40):

Perhaps it is an issue of documentation then. I never saw the GPU case for instance

view this post on Zulip Mason Protter (Feb 08 2024 at 22:41):

https://github.com/JuliaFolds/FoldsCUDA.jl

view this post on Zulip Mason Protter (Feb 08 2024 at 22:41):

It was always experimental though

view this post on Zulip Mason Protter (Feb 08 2024 at 22:42):

but yes, the idea is that Transducers can give us a very general way of doing parallelism that can be re-implemented for many different backends


Last updated: Nov 06 2024 at 04:40 UTC