How to preserve the collection type with `tcollect`? · helpdesk (published)

Stream: helpdesk (published)

Topic: How to preserve the collection type with `tcollect`?

Júlio Hoffimann (Feb 08 2024 at 21:20):

The Transducers.jl tcollect function is super useful, but it seems to always return a Vector regardless of the input type.

Is there a way to preserve the input type with Transducers.jl or Base.Threads?

julia> using Transducers

julia> using CircularArrays

julia> tcollect((1,2,3))
3-element Vector{Int64}:
 1
 2
 3

julia> tcollect([1,2,3])
3-element Vector{Int64}:
 1
 2
 3

julia> tcollect(CircularVector([1,2,3]))
3-element Vector{Int64}:
 1
 2
 3

Júlio Hoffimann (Feb 08 2024 at 21:22):

The underlying issue is that we have a couple of map(fun, iter) that we would like to parallelize. But replacing map with tcollect doesn't work as expected.

Mason Protter (Feb 08 2024 at 22:26):

Yeah, this is something Transducers.jl doesn't really handle well. I think the right function here is tcopy since that doesn't try to follow the behaviour of collect

Mason Protter (Feb 08 2024 at 22:27):

but trying your examples with tcopy, none of them work still

Mason Protter (Feb 08 2024 at 22:27):

which I guess we should classify as bugs that need to be fixed.

Mason Protter (Feb 08 2024 at 22:27):

The circular arrays one could potentially be a package extension

Mason Protter (Feb 08 2024 at 22:28):

but this is in general a hard problem

Mason Protter (Feb 08 2024 at 22:29):

I would suggest though that if your problem is so time consuming that it requires parallelism, then maybe it's not so bad to have to do a convert after, i.e.

julia> Tuple(tcollect((1,2,3)))
(1, 2, 3)

julia> CircularVector(tcollect(CircularVector([1,2,3])))
3-element CircularVector(::Vector{Int64}):
 1
 2
 3

Mason Protter (Feb 08 2024 at 22:29):

but I get that's also annoying

Júlio Hoffimann (Feb 08 2024 at 22:33):

Yes, we ended up following this route.

Júlio Hoffimann (Feb 08 2024 at 22:35):

It would be super nice if these things worked. Also, is it too big of a dream to imagine a future where we can just replace map by a pmap and choose the form of parallelism? Threads vs GPU threads vs processes...

Júlio Hoffimann (Feb 08 2024 at 22:35):

I wish we had a general pmap that performed all sorts of parallelism with keyword options

Mason Protter (Feb 08 2024 at 22:39):

Also, is it too big of a dream to imagine a future where we can just replace map by a pmap and choose the form of parallelism? Threads vs GPU threads vs processes...

That's what Transducers.jl / Folds.jl already does. They provide sequential, threaded, and distributed backends

Mason Protter (Feb 08 2024 at 22:39):

Taka worked on a GPU backend but it's broken now

Mason Protter (Feb 08 2024 at 22:39):

would be great if we could revive it

Júlio Hoffimann (Feb 08 2024 at 22:40):

Perhaps it is an issue of documentation then. I never saw the GPU case for instance

Mason Protter (Feb 08 2024 at 22:41):

https://github.com/JuliaFolds/FoldsCUDA.jl

Mason Protter (Feb 08 2024 at 22:41):

It was always experimental though

Mason Protter (Feb 08 2024 at 22:42):

but yes, the idea is that Transducers can give us a very general way of doing parallelism that can be re-implemented for many different backends

Last updated: Jul 08 2025 at 04:49 UTC