#gripes multi-arg mapreduce
is just reduce(..., map(...
And so it allocates an intermediate array. I am trying to write a fast version of all(map(foo, array1, array2))
.
I can think to do
function _all(f, A, B)
for i = eachindex(A, B)
@inbounds a = A[i]
@inbounds b = A[i]
f(a, b) || return false
end
return true
end
It doesn't handle missing. I could make an iterator and use all(f, itr)
in Base. But I'd like a good primitive for this, that can be sped up with parallelism, for example.
Have you tried Transducers.jl for this? It’ll require zipping and then splatting the arrays, but it’ll give you parallelism.
I've been meaning to try that, but I haven't yet.
Let's suppose I'm doing reduce((a, b) -> map(foo, a, b), [array1, array2, ...])
So there's two " levels" where parallelism can come in: the map
, and the reduce
. Does Transducers.jl give me a way to not think about where to make operations parallel?
Does Transducers.jl give me a way to not think about where to make operations parallel?
Yes, absolutely
So here's what this'd look like using Transducers in the singlethreaded case:
tr_all(f, A, B) = foldl(&, zip(A, B) |> MapSplat(f) |> ReduceIf(x::Bool -> x == false))
Breaking this into parts, first look at
zip(A, B) |> MapSplat(f) |> ReduceIf(x::Bool -> x == false)
this is called an 'eduction', and you can think of it kinda like an iterator, except it's specifically designed for use with folds. Looking at it's parts,
zip(A, B) |> MapSplat(f)
is kinda like a lazy version of map(f, A, B)
. Basically, zip(A, B)
combines A
and B
into a single iterable container and then MapSplat(f)
applied to that iterator will take each element of the iterator ABi
and perform f(ABi...)
to it.
julia> let A = [1, 2, 3], B = [:a, :b, :c]
f(a, b)= (a=a, b=b)
collect(zip(A, B) |> MapSplat(f))
end
3-element Vector{NamedTuple{(:a, :b), Tuple{Int64, Symbol}}}:
(a = 1, b = :a)
(a = 2, b = :b)
(a = 3, b = :c)
The next ingredient is the |> ReduceIf(x::Bool -> x == false)
. This is important for early termination. It basically allows us to bail out of the fold
if we ever encounter an element that is false
(this is important for all
to be performant!).
Then finally we can look at the whole thing
foldl(&, zip(A, B) |> MapSplat(f) |> ReduceIf(x::Bool -> x == false))
This says "take the eduction zip(A, B) |> MapSplat(f) |> ReduceIf(x::Bool -> x == false)
and fold over it with &
. Transducers.jl has all sorts of clever ways to optimize these sorts of reductions better than naive iterators.
Looking at some benchmarks, it seems that there's really not a huge win here compared to allocating an intermediate array:
julia> let n = 500
A = rand(n) .+ 1
B = rand(n)
function f(x, y)
x > y
end
@btime all(map($f, $A, $B))
@btime tr_all($f, $A, $B)
end
612.653 ns (1 allocation: 624 bytes)
574.342 ns (0 allocations: 0 bytes)
true
and it seems there is still some overhead compared to the way all
works
julia> let n = 100_000
A = rand(n) .+ 1
B = rand(n)
function f(x, y)
exp(sin(x)) > exp(sin(y))
end
@btime all(map($f, $A, $B))
@btime tr_all($f, $A, $B)
end
2.497 ms (2 allocations: 97.77 KiB)
2.501 ms (0 allocations: 0 bytes)
true
Perhaps there is a better way I could have written the transducer.
Ahh! Thanks for taking the time. I was just trying to wrap my head around the documentation.
I would like to ensure that axes(A) == axes(B)
. I am guessing that zip(A, B)
doesn't do that for me.
Now lets look at multithreading. The nice thing about transducers is that they can be multithreaded pretty easily by just replacing foldl
with foldlxt
:
julia> tr_all_xt(f, A, B) = foldxt(&, zip(A, B) |> MapSplat(f) |> ReduceIf(x::Bool -> x == false))
tr_all_xt (generic function with 1 method)
Now let's see how it compares
julia> let n = 100_000
A = rand(n) .+ 1
B = rand(n)
function f(x, y)
x > y
end
@btime all(map($f, $A, $B))
@btime tr_all($f, $A, $B)
@btime tr_all_xt($f, $A, $B)
end
110.539 μs (2 allocations: 97.77 KiB)
114.949 μs (0 allocations: 0 bytes)
35.380 μs (64 allocations: 5.41 KiB)
true
julia> let n = 100_000
A = rand(n) .+ 1
B = rand(n)
function f(x, y)
exp(sin(x)) > exp(sin(y))
end
@btime all(map($f, $A, $B))
@btime tr_all($f, $A, $B)
@btime tr_all_xt($f, $A, $B)
end
2.490 ms (2 allocations: 97.77 KiB)
2.464 ms (0 allocations: 0 bytes)
638.910 μs (65 allocations: 5.44 KiB)
true
I'm trying to understand the intent of the name "Reduce If ". "Do reduce if predicate"? Doesn't seem to make sense to me.
Early return, in any case, makes sense. What about writing it just as ReduceIf(!)
?
Yes, you could just write ReduceIf(!)
. I figured it might be more clear what it was doing if I wrote x -> x == false
but I guess I was wrong
Gustavo Goretkin said:
I would like to ensure that
axes(A) == axes(B)
. I am guessing thatzip(A, B)
doesn't do that for me.
yeah that's right. You could just stick in an @assert axes(A) == axes(B)
in that case
Mason Protter [said](https://julialang.zulipchat.com/#narrow/stream/225540-gripes/topic/multi-arg.20.60mapreduce.60.20is.20just.20.60reduce(.2E.2E.2E.2C.20map(.2E.2E.2E.60/near/229728188):
Yes, you could just write
ReduceIf(!)
. I figured it might be more clear what it was doing if I wrotex -> x == false
but I guess I was wrong
I'm not sure you were wrong :)
What does it mean to just bail out of the fold? If we were expressing "any" we'd like to bail out as soon as we see true
, but by early-returning true
.
Yeah that's right. For any, we'd just write
tr_any_xt(f, A, B) = foldxt(|, zip(A, B) |> MapSplat(f) |> ReduceIf(x::Bool -> x == true))
Wait, so it returns early with whatever the value of x
is?
Yes
Oh, okay. Something about that seems surprising, but it's exactly what is needed for any / all.
julia> collect(1:10 |> ReduceIf(x -> x >= 5))
5-element Vector{Int64}:
1
2
3
4
5
Oh, I think I had misunderstood. This stops the "iteration".
ReduceIf
is indeed a bit of a confusing name.
it's not exactly an early return. But of course fold
will now return, because the iteration stopped.
e.g. StopIf(x -> false)
is an identity on ...eductions?
(Oops, I renamed ReduceIf
to StopIf
in my mind.)
I believe it's modelled off of writing
res = _initial_value
for x in iter
res = f(x, res)
if cond(res)
return res
end
end
Oh, for me it seems more like Iterators.takewhile
Yeah, ReduceIf(_ -> false)
would be a do-nothing operation
Mason Protter [said](https://julialang.zulipchat.com/#narrow/stream/225540-gripes/topic/multi-arg.20.60mapreduce.60.20is.20just.20.60reduce(.2E.2E.2E.2C.20map(.2E.2E.2E.60/near/229729171):
julia> collect(1:10 |> ReduceIf(x -> x >= 5)) 5-element Vector{Int64}: 1 2 3 4 5
since this returned not just 5
.
(Is there a bug in Zulip? Why is quote-and-reply broken? I saw you edited yours earlier to fix it, but I don't know what the syntax is)
That's because I used collect, not fold.
julia> foldl(right, 1:10 |> ReduceIf(x -> x >= 5))
5
Yeah, it seems there's some bug in the quote reply
But that's because you're using right
and not, e.g. +
, which would give 1 + 2 + 3 + 4 + 5, right?
julia> foldl(+, 1:10 |> ReduceIf(x -> x >= 5))
15
julia> sum(1:5)
15
In my notional machine, 1:10 |> ReduceIf(x -> x >= 5)
evaluates to equivalent to 1:5
, not to 5
. That's why it feels like ReduceIf
is like takewhile
yeah, it seems like it's basically iter |> ReduceIf(pred) ≈ takewhile(!pred, iter)
Not quite because there's different behaviour with the final element
Right, yeah, the predicate is negated. Maybe I'll be annoying and open an issue suggesting it be called StopIf
or StopWhen
. Reduce
seems like it is quite the wrong word in this context, since it approximately means "fold", and in fact is called reduce
and mapreduce
in Base
.
Yeah, I think @Takafumi Arakaki (tkf) would appreciate feedback on this
Mason Protter [said](https://julialang.zulipchat.com/#narrow/stream/225540-gripes/topic/multi-arg.20.60mapreduce.60.20is.20just.20.60reduce(.2E.2E.2E.2C.20map(.2E.2E.2E.60/near/229730907):
Not quite because there's different behaviour with the final element
Pesky boundary case. It's hard to encode that in a name...
By the way, do you mind if I transfer this thread to #helpdesk (published) or #helpdesk? I get that the base behaviour is gripe worthy, but I think there's information here that might be useful here to future readers.
That would be great. #helpdesk (published) is fine.
I wonder if [the link to the discussion in this thread](https://julialang.zulipchat.com/#narrow/stream/225540-gripes/topic/multi-arg.20.60mapreduce.60.20is.20just.20.60reduce(.2E.2E.2E.2C.20map(.2E.2E.2E.60) will still work after the discussion is moved.
Yeah, I'm not sure about the links either :confused: Just before I move it, would you mind editing the first line of your post to include what you wrote on the topic? I'm going to rename the topic to "nonallocating multiarg any
" if that's okay.
If I change it now, the first line might be a bit confusing
Changed!
This topic was moved here from #gripes > no subject by Mason Protter
For the record, the link broke.
:confused:
Oh! By the way, the link formatting wasn't working because my original subject contained unmatched parentheses. Hahaha! Perils of in-band signalling.
Gustavo Goretkin said:
Right, yeah, the predicate is negated. Maybe I'll be annoying and open an issue suggesting it be called
StopIf
orStopWhen
.Reduce
seems like it is quite the wrong word in this context, since it approximately means "fold", and in fact is calledreduce
andmapreduce
inBase
.
Thanks for the feedback :+1:. Yeah, I can see ReduceIf
might be confusing. A lot of terminologies are borrowed from Clojure while I was learning about it. So, there are some messes in the terminologies that I have to cleanup.
I wonder if we can communicate that the current element would be processed. Maybe StopAfter
?
Some other suggestions
FinishIf
ExitEarlyIf
StopIf
Do you think $(Stopish)If
is more appropriate? ATM I'm inclined to $(Stopish)After
or $(Stopish)At
. Maybe $(Stopish)AtIf
?
I don't really like After
, At
or AtIf
personally
I wonder if there are other ways to convey the difference to TakeWhile ∘ !
?
Another idea is to ditch TakeWhile
and have StopIf
and AbortIf
(which is TakeWhile ∘ !
), since "abort" sounds more immediate.
Then the transducer does one more step than is actually needed for the computation though right?
I think it's case-dependent that if you want to process the current item or not. It all depends on the function passed by the user. I think it makes sense to have both.
I also considered After
to convey the handling of the final element, and yet I also agree with @Mason Protter
re: abort, FWIW https://developers.google.com/style/word-list?hl=en#letter-a and https://developers.google.com/style/word-list?hl=en#letter-t
oh my, naming is hard
If you wanted to make a while loop whose body still ran once more after the condition is false, that is a do-while loop, right? As opposed to "while-do".
So maybe that's an argument for StopIf
. and then TakeWhile ∘ !
is like "IfStop"
Or maybe it's the opposite argument. Like you said, naming is hard :-/
(btw, just a quick glance, but the world list link looks pretty useful. thanks!)
the analogy to while-do makes me think that we can keep TakeWhile
and create ProcessWhile = ReduceIf ∘ !
wait, nah, it doesn't make sense :rolling_on_the_floor_laughing:
Out of curiosity, is the ¿eduction? (sorry if that's not the right term) analogous to TakeWhile ∘ !
useful in Transducers.jl?
I am trying to figure out if it is necessary to distinguish between the two ways, where the final element is or isn't "emitted".
Now I'm wondering about declarative vs imperative wording. "StopIf" is imperative. "MakeEnd" or "IsEnd" (interpreted as a constraint, not as a predicate) is declarative.
IsEnd and WasEnd for "This is the end" and "That was the end"... :-p
I think both are useful. For example, some reduce-based high-level APIs like findfirst
requires ReduceIf
and not implementable with TakeWhile
:
(there's the same code in Folds.jl, but would look more complicated)
Takafumi Arakaki (tkf) said:
I think both are useful. For example, some reduce-based high-level APIs like
findfirst
requiresReduceIf
and not implementable withTakeWhile
:(there's the same code in Folds.jl, but would look more complicated)
Right, but if all you had was ReduceIf
and you did not have TakeWhile
, would that be okay?
OTOH, I think it perfectly reasonable to want, e.g., "give me all first consecutive positive numbers"
xs |> TakeWhile(>(0)) |> collect
and I don't think you want to get non-positive numbers in the output in this case.
Ah, yeah, okay that answers my question. So it would be good to distinguish both.
yeah I think so
I think a similar example is eachline
in Base. it has keep
argument to control the inclusion of the last element (i..e, \n
)
oh yeah, I think I ran into that recently.
So, I am a big fan of using as few names as possible, so I'm tempted to suggest something like KeepIf{0}(predicate)
and KeepIf{1}(predicate)
where the number indicates how many elements to keep after the transition of the predicate.
(And then aliases could be introduced on top of that if it's too ugly for everyday usage. Or perhaps a default could be chosen when the type parameter is omitted)
I put it as a type parameter, but I suppose that one could rely on constant propagation. The point is that the two operations are quite, quite related, and there's just a "flag", like the keep
argument you mentioned, to choose behavior.
yeah, a flag like keep
might be the least confusing option
Last updated: Nov 22 2024 at 04:41 UTC