Stream: helpdesk (published)

Topic: nonallocating multiarg `any`


view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 05:51):

#gripes multi-arg mapreduce is just reduce(..., map(...

And so it allocates an intermediate array. I am trying to write a fast version of all(map(foo, array1, array2)).

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 06:22):

I can think to do

function _all(f, A, B)
  for i = eachindex(A, B)
    @inbounds a = A[i]
    @inbounds b = A[i]
    f(a, b) || return false
  end
  return true
end

It doesn't handle missing. I could make an iterator and use all(f, itr) in Base. But I'd like a good primitive for this, that can be sped up with parallelism, for example.

view this post on Zulip Mason Protter (Mar 10 2021 at 06:43):

Have you tried Transducers.jl for this? It’ll require zipping and then splatting the arrays, but it’ll give you parallelism.

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 16:30):

I've been meaning to try that, but I haven't yet.

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 16:48):

Let's suppose I'm doing reduce((a, b) -> map(foo, a, b), [array1, array2, ...])

So there's two " levels" where parallelism can come in: the map, and the reduce. Does Transducers.jl give me a way to not think about where to make operations parallel?

view this post on Zulip Mason Protter (Mar 10 2021 at 17:08):

Does Transducers.jl give me a way to not think about where to make operations parallel?

Yes, absolutely

view this post on Zulip Mason Protter (Mar 10 2021 at 19:23):

So here's what this'd look like using Transducers in the singlethreaded case:

tr_all(f, A, B) = foldl(&, zip(A, B) |> MapSplat(f) |> ReduceIf(x::Bool -> x == false))

Breaking this into parts, first look at

zip(A, B) |> MapSplat(f) |> ReduceIf(x::Bool -> x == false)

this is called an 'eduction', and you can think of it kinda like an iterator, except it's specifically designed for use with folds. Looking at it's parts,

zip(A, B) |> MapSplat(f)

is kinda like a lazy version of map(f, A, B). Basically, zip(A, B) combines A and B into a single iterable container and then MapSplat(f) applied to that iterator will take each element of the iterator ABi and perform f(ABi...) to it.

julia> let A =  [1, 2, 3], B = [:a, :b, :c]
           f(a, b)= (a=a, b=b)
           collect(zip(A, B) |> MapSplat(f))
       end
3-element Vector{NamedTuple{(:a, :b), Tuple{Int64, Symbol}}}:
 (a = 1, b = :a)
 (a = 2, b = :b)
 (a = 3, b = :c)

The next ingredient is the |> ReduceIf(x::Bool -> x == false). This is important for early termination. It basically allows us to bail out of the fold if we ever encounter an element that is false (this is important for all to be performant!).

Then finally we can look at the whole thing

foldl(&, zip(A, B) |> MapSplat(f) |> ReduceIf(x::Bool -> x == false))

This says "take the eduction zip(A, B) |> MapSplat(f) |> ReduceIf(x::Bool -> x == false) and fold over it with &. Transducers.jl has all sorts of clever ways to optimize these sorts of reductions better than naive iterators.

Looking at some benchmarks, it seems that there's really not a huge win here compared to allocating an intermediate array:

julia> let n = 500
           A = rand(n) .+ 1
           B = rand(n)
           function f(x, y)
               x > y
           end
           @btime all(map($f, $A, $B))
           @btime tr_all($f, $A, $B)
       end
  612.653 ns (1 allocation: 624 bytes)
  574.342 ns (0 allocations: 0 bytes)
true

and it seems there is still some overhead compared to the way all works

julia> let n = 100_000
           A = rand(n) .+ 1
           B = rand(n)
           function f(x, y)
               exp(sin(x)) > exp(sin(y))
           end
           @btime all(map($f, $A, $B))
           @btime tr_all($f, $A, $B)
       end
  2.497 ms (2 allocations: 97.77 KiB)
  2.501 ms (0 allocations: 0 bytes)
true

Perhaps there is a better way I could have written the transducer.

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 19:27):

Ahh! Thanks for taking the time. I was just trying to wrap my head around the documentation.

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 19:29):

I would like to ensure that axes(A) == axes(B). I am guessing that zip(A, B) doesn't do that for me.

view this post on Zulip Mason Protter (Mar 10 2021 at 19:30):

Now lets look at multithreading. The nice thing about transducers is that they can be multithreaded pretty easily by just replacing foldl with foldlxt:

julia> tr_all_xt(f, A, B) = foldxt(&, zip(A, B) |> MapSplat(f) |> ReduceIf(x::Bool -> x == false))
tr_all_xt (generic function with 1 method)

Now let's see how it compares

julia> let n = 100_000
           A = rand(n) .+ 1
           B = rand(n)
           function f(x, y)
               x > y
           end
           @btime all(map($f, $A, $B))
           @btime tr_all($f, $A, $B)
           @btime tr_all_xt($f, $A, $B)
       end
  110.539 μs (2 allocations: 97.77 KiB)
  114.949 μs (0 allocations: 0 bytes)
  35.380 μs (64 allocations: 5.41 KiB)
true

julia> let n = 100_000
           A = rand(n) .+ 1
           B = rand(n)
           function f(x, y)
               exp(sin(x)) > exp(sin(y))
           end
           @btime all(map($f, $A, $B))
           @btime tr_all($f, $A, $B)
           @btime tr_all_xt($f, $A, $B)
       end
  2.490 ms (2 allocations: 97.77 KiB)
  2.464 ms (0 allocations: 0 bytes)
  638.910 μs (65 allocations: 5.44 KiB)
true

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 19:31):

I'm trying to understand the intent of the name "Reduce If ". "Do reduce if predicate"? Doesn't seem to make sense to me.
Early return, in any case, makes sense. What about writing it just as ReduceIf(!) ?

view this post on Zulip Mason Protter (Mar 10 2021 at 19:32):

Yes, you could just write ReduceIf(!). I figured it might be more clear what it was doing if I wrote x -> x == false but I guess I was wrong

view this post on Zulip Mason Protter (Mar 10 2021 at 19:32):

Gustavo Goretkin said:

I would like to ensure that axes(A) == axes(B). I am guessing that zip(A, B) doesn't do that for me.

yeah that's right. You could just stick in an @assert axes(A) == axes(B) in that case

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 19:34):

Mason Protter [said](https://julialang.zulipchat.com/#narrow/stream/225540-gripes/topic/multi-arg.20.60mapreduce.60.20is.20just.20.60reduce(.2E.2E.2E.2C.20map(.2E.2E.2E.60/near/229728188):

Yes, you could just write ReduceIf(!). I figured it might be more clear what it was doing if I wrote x -> x == false but I guess I was wrong

I'm not sure you were wrong :)

What does it mean to just bail out of the fold? If we were expressing "any" we'd like to bail out as soon as we see true, but by early-returning true.

view this post on Zulip Mason Protter (Mar 10 2021 at 19:35):

Yeah that's right. For any, we'd just write

tr_any_xt(f, A, B) = foldxt(|, zip(A, B) |> MapSplat(f) |> ReduceIf(x::Bool -> x == true))

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 19:37):

Wait, so it returns early with whatever the value of x is?

view this post on Zulip Mason Protter (Mar 10 2021 at 19:37):

Yes

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 19:38):

Oh, okay. Something about that seems surprising, but it's exactly what is needed for any / all.

view this post on Zulip Mason Protter (Mar 10 2021 at 19:38):

julia> collect(1:10 |> ReduceIf(x -> x >= 5))
5-element Vector{Int64}:
 1
 2
 3
 4
 5

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 19:39):

Oh, I think I had misunderstood. This stops the "iteration".

view this post on Zulip Mason Protter (Mar 10 2021 at 19:39):

ReduceIf is indeed a bit of a confusing name.

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 19:39):

it's not exactly an early return. But of course fold will now return, because the iteration stopped.

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 19:41):

e.g. StopIf(x -> false) is an identity on ...eductions?

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 19:41):

(Oops, I renamed ReduceIf to StopIf in my mind.)

view this post on Zulip Mason Protter (Mar 10 2021 at 19:42):

I believe it's modelled off of writing

res = _initial_value
for x in iter
    res = f(x, res)
    if cond(res)
      return res
   end
end

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 19:43):

Oh, for me it seems more like Iterators.takewhile

view this post on Zulip Mason Protter (Mar 10 2021 at 19:43):

Yeah, ReduceIf(_ -> false) would be a do-nothing operation

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 19:43):

Mason Protter [said](https://julialang.zulipchat.com/#narrow/stream/225540-gripes/topic/multi-arg.20.60mapreduce.60.20is.20just.20.60reduce(.2E.2E.2E.2C.20map(.2E.2E.2E.60/near/229729171):

julia> collect(1:10 |> ReduceIf(x -> x >= 5))
5-element Vector{Int64}:
 1
 2
 3
 4
 5

since this returned not just 5.

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 19:44):

(Is there a bug in Zulip? Why is quote-and-reply broken? I saw you edited yours earlier to fix it, but I don't know what the syntax is)

view this post on Zulip Mason Protter (Mar 10 2021 at 19:44):

That's because I used collect, not fold.

julia> foldl(right, 1:10 |> ReduceIf(x -> x >= 5))
5

view this post on Zulip Mason Protter (Mar 10 2021 at 19:45):

Yeah, it seems there's some bug in the quote reply

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 19:45):

But that's because you're using right and not, e.g. +, which would give 1 + 2 + 3 + 4 + 5, right?

view this post on Zulip Mason Protter (Mar 10 2021 at 19:46):

julia> foldl(+, 1:10 |> ReduceIf(x -> x >= 5))
15

julia> sum(1:5)
15

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 19:46):

In my notional machine, 1:10 |> ReduceIf(x -> x >= 5) evaluates to equivalent to 1:5, not to 5. That's why it feels like ReduceIf is like takewhile

view this post on Zulip Mason Protter (Mar 10 2021 at 19:48):

yeah, it seems like it's basically iter |> ReduceIf(pred) ≈ takewhile(!pred, iter)

view this post on Zulip Mason Protter (Mar 10 2021 at 19:49):

Not quite because there's different behaviour with the final element

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 19:50):

Right, yeah, the predicate is negated. Maybe I'll be annoying and open an issue suggesting it be called StopIf or StopWhen. Reduce seems like it is quite the wrong word in this context, since it approximately means "fold", and in fact is called reduce and mapreduce in Base.

view this post on Zulip Mason Protter (Mar 10 2021 at 19:51):

Yeah, I think @Takafumi Arakaki (tkf) would appreciate feedback on this

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 19:51):

Mason Protter [said](https://julialang.zulipchat.com/#narrow/stream/225540-gripes/topic/multi-arg.20.60mapreduce.60.20is.20just.20.60reduce(.2E.2E.2E.2C.20map(.2E.2E.2E.60/near/229730907):

Not quite because there's different behaviour with the final element

Pesky boundary case. It's hard to encode that in a name...

view this post on Zulip Mason Protter (Mar 10 2021 at 19:54):

By the way, do you mind if I transfer this thread to #helpdesk (published) or #helpdesk? I get that the base behaviour is gripe worthy, but I think there's information here that might be useful here to future readers.

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 19:57):

That would be great. #helpdesk (published) is fine.

I wonder if [the link to the discussion in this thread](https://julialang.zulipchat.com/#narrow/stream/225540-gripes/topic/multi-arg.20.60mapreduce.60.20is.20just.20.60reduce(.2E.2E.2E.2C.20map(.2E.2E.2E.60) will still work after the discussion is moved.

view this post on Zulip Mason Protter (Mar 10 2021 at 20:01):

Yeah, I'm not sure about the links either :confused: Just before I move it, would you mind editing the first line of your post to include what you wrote on the topic? I'm going to rename the topic to "nonallocating multiarg any" if that's okay.

If I change it now, the first line might be a bit confusing

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 20:06):

Changed!

view this post on Zulip Notification Bot (Mar 10 2021 at 20:06):

This topic was moved here from #gripes > no subject by Mason Protter

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 20:08):

For the record, the link broke.

view this post on Zulip Mason Protter (Mar 10 2021 at 20:09):

:confused:

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 20:10):

Oh! By the way, the link formatting wasn't working because my original subject contained unmatched parentheses. Hahaha! Perils of in-band signalling.

view this post on Zulip Takafumi Arakaki (tkf) (Mar 10 2021 at 21:08):

Gustavo Goretkin said:

Right, yeah, the predicate is negated. Maybe I'll be annoying and open an issue suggesting it be called StopIf or StopWhen. Reduce seems like it is quite the wrong word in this context, since it approximately means "fold", and in fact is called reduce and mapreduce in Base.

Thanks for the feedback :+1:. Yeah, I can see ReduceIf might be confusing. A lot of terminologies are borrowed from Clojure while I was learning about it. So, there are some messes in the terminologies that I have to cleanup.

view this post on Zulip Takafumi Arakaki (tkf) (Mar 10 2021 at 21:08):

I wonder if we can communicate that the current element would be processed. Maybe StopAfter?

view this post on Zulip Mason Protter (Mar 10 2021 at 21:14):

Some other suggestions

view this post on Zulip Takafumi Arakaki (tkf) (Mar 10 2021 at 21:20):

Do you think $(Stopish)If is more appropriate? ATM I'm inclined to $(Stopish)After or $(Stopish)At. Maybe $(Stopish)AtIf?

view this post on Zulip Mason Protter (Mar 10 2021 at 21:22):

I don't really like After, At or AtIf personally

view this post on Zulip Takafumi Arakaki (tkf) (Mar 10 2021 at 21:25):

I wonder if there are other ways to convey the difference to TakeWhile ∘ !?

view this post on Zulip Takafumi Arakaki (tkf) (Mar 10 2021 at 21:26):

Another idea is to ditch TakeWhile and have StopIf and AbortIf (which is TakeWhile ∘ !), since "abort" sounds more immediate.

view this post on Zulip Mason Protter (Mar 10 2021 at 21:42):

Then the transducer does one more step than is actually needed for the computation though right?

view this post on Zulip Takafumi Arakaki (tkf) (Mar 10 2021 at 21:46):

I think it's case-dependent that if you want to process the current item or not. It all depends on the function passed by the user. I think it makes sense to have both.

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 21:49):

I also considered After to convey the handling of the final element, and yet I also agree with @Mason Protter

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 21:51):

re: abort, FWIW https://developers.google.com/style/word-list?hl=en#letter-a and https://developers.google.com/style/word-list?hl=en#letter-t

view this post on Zulip Takafumi Arakaki (tkf) (Mar 10 2021 at 21:51):

oh my, naming is hard

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 21:54):

If you wanted to make a while loop whose body still ran once more after the condition is false, that is a do-while loop, right? As opposed to "while-do".

So maybe that's an argument for StopIf. and then TakeWhile ∘ ! is like "IfStop"

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 21:55):

Or maybe it's the opposite argument. Like you said, naming is hard :-/

view this post on Zulip Takafumi Arakaki (tkf) (Mar 10 2021 at 21:56):

(btw, just a quick glance, but the world list link looks pretty useful. thanks!)

view this post on Zulip Takafumi Arakaki (tkf) (Mar 10 2021 at 21:58):

the analogy to while-do makes me think that we can keep TakeWhile and create ProcessWhile = ReduceIf ∘ !

view this post on Zulip Takafumi Arakaki (tkf) (Mar 10 2021 at 21:59):

wait, nah, it doesn't make sense :rolling_on_the_floor_laughing:

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 22:03):

Out of curiosity, is the ¿eduction? (sorry if that's not the right term) analogous to TakeWhile ∘ ! useful in Transducers.jl?

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 22:04):

I am trying to figure out if it is necessary to distinguish between the two ways, where the final element is or isn't "emitted".

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 22:06):

Now I'm wondering about declarative vs imperative wording. "StopIf" is imperative. "MakeEnd" or "IsEnd" (interpreted as a constraint, not as a predicate) is declarative.

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 22:07):

IsEnd and WasEnd for "This is the end" and "That was the end"... :-p

view this post on Zulip Takafumi Arakaki (tkf) (Mar 10 2021 at 22:10):

I think both are useful. For example, some reduce-based high-level APIs like findfirst requires ReduceIf and not implementable with TakeWhile:

https://github.com/tkf/ThreadsX.jl/blob/3ff8264f1c4b92318e836a1994a38ffde0433553/src/reduce.jl#L89-L97

(there's the same code in Folds.jl, but would look more complicated)

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 22:12):

Takafumi Arakaki (tkf) said:

I think both are useful. For example, some reduce-based high-level APIs like findfirst requires ReduceIf and not implementable with TakeWhile:

https://github.com/tkf/ThreadsX.jl/blob/3ff8264f1c4b92318e836a1994a38ffde0433553/src/reduce.jl#L89-L97

(there's the same code in Folds.jl, but would look more complicated)

Right, but if all you had was ReduceIf and you did not have TakeWhile, would that be okay?

view this post on Zulip Takafumi Arakaki (tkf) (Mar 10 2021 at 22:12):

OTOH, I think it perfectly reasonable to want, e.g., "give me all first consecutive positive numbers"

xs |> TakeWhile(>(0)) |> collect

and I don't think you want to get non-positive numbers in the output in this case.

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 22:12):

Ah, yeah, okay that answers my question. So it would be good to distinguish both.

view this post on Zulip Takafumi Arakaki (tkf) (Mar 10 2021 at 22:12):

yeah I think so

view this post on Zulip Takafumi Arakaki (tkf) (Mar 10 2021 at 22:13):

I think a similar example is eachline in Base. it has keep argument to control the inclusion of the last element (i..e, \n)

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 22:14):

oh yeah, I think I ran into that recently.

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 22:15):

So, I am a big fan of using as few names as possible, so I'm tempted to suggest something like KeepIf{0}(predicate) and KeepIf{1}(predicate)

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 22:15):

where the number indicates how many elements to keep after the transition of the predicate.

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 22:16):

(And then aliases could be introduced on top of that if it's too ugly for everyday usage. Or perhaps a default could be chosen when the type parameter is omitted)

view this post on Zulip Gustavo Goretkin (Mar 10 2021 at 22:18):

I put it as a type parameter, but I suppose that one could rely on constant propagation. The point is that the two operations are quite, quite related, and there's just a "flag", like the keep argument you mentioned, to choose behavior.

view this post on Zulip Takafumi Arakaki (tkf) (Mar 10 2021 at 22:50):

yeah, a flag like keep might be the least confusing option


Last updated: Oct 02 2023 at 04:34 UTC