In some applications, we noticed that reduce(op, map(f, v))
is faster, specially when v
items have a complicated parametric type.
Is there a clear rule to choose between the two functions? When is one expected to outperform the other?
I don't know for sure, but I suspect you already nailed it. mapreduce
will try to do a loop over s = op(s, f(v))
and unless f(v)
is something relatively simple it'll give up and type s
as Any
. When you do reduce(op, map(f, v))
, it's essentially forcing the type of s
to be op(::T, ::T)
where T = eltype(map(f, v))
. The compiler doesn't guarantee that the type inference is equivalent in these cases.
Is there a helper function in Base that can be used to tell when the compiler will give up given f
and v
?
We have some algorithms to write, and it would be nice to know a priori when to use mapreduce
or reduce
.
There's undoubtedly a way to do it with internals, but you should not do that in package code. The best thing to do would be to assume no and to work around it.
We wouldn't use it in package code. Just curious if there is tooling to help decide between the two during development.
There is a bit of general advice I can give that I'm not sure applies to you, but which I suspect may having seen some of what's going on in Meshes.jl: Julia really, really, really does not like containers (or iterators) of mixed eltype, even if those types look to you like they are very similar (i.e. differ only by a single parameter or something). Yes, you can get miraculously good performance out of containers with small union types, but for one, the compiler does not narrow the containers to those types for you, you'd have to do it yourself, and for another, this holds fragile promise and can break type inference in more complicated code down the road.
Would be nice if VSCode could give hints to developers about mapreduce
for example.
The code in Meshes.jl assumes heterogeneous containers in general, and when the containers are homogeneous of the same geometry type, things run with the top performance, and inference works. We do a few checks here and there in the test suite for that.
Yeah I suspected that. You may want to seriously consider wrapper types that hold different objects in separate containers internally, otherwise it's pretty much a guarantee that you will repeatedly slam into type inference issues. For some things you can rely on small union types, but again, this is fragile, and if you hand over an object that's typed that way to a "user" it's easy for them to break it unless they understand very clearly what is happening.
As an example, the Grid
subtypes are heavily optimized because the grid elements are always of the same type. Same for SimpleMesh
of Triangle
elements. In the case of a GeometrySet
, there is nothing we can do. It can store heterogeneous containers. It still allows homogeneous containers though, as is the case with PointSet
.
Expanding Man said:
Yeah I suspected that. You may want to seriously consider wrapper types that hold different objects in separate containers internally, otherwise it's pretty much a guarantee that you will repeatedly slam into type inference issues. For some things you can rely on small union types, but again, this is fragile, and if you hand over an object that's typed that way to a "user" it's easy for them to break it unless they understand very clearly what is happening.
We have evidence that everything is working pretty well with rich parametric types. Only in a few places that we need to circumvent Julia's inference problems.
Again, the problem is that it's fragile. It can work great until it doesn't, and knowing where type inference is going to start completely giving up is really hard. This is an issue with Julia's type system in general that I think should be better warned about in the documentation. You can be really impressed with how it's doing with apparently dynamic types and then all of a sudden you hit a performance wall when everything in a complicated block of code starts showing up as Any
.
I think that is part of the game. As a Julia developer you need to handle these. I agree it should be documented better in the official docs.
I'm not talking about manifestly type stable code that just happens to have really complicated type signatures, that should be fine (sometimes it isn't, but it really should be), but stuff like containers without a single concrete type can start to get real ugly.
Most container types in Meshes.jl have a concrete eltype
. The GeometrySet
is the one with greatest potential for non-concrete eltype
. If you find issues with the package in practical applications, please feel free to report. We will be happy to discuss improvements.
Now, coming back to the original issue. I wish VSCode could warn about mapreduce
in some cases before running the function that uses it.
If we know the type of the arguments of the function, then we can know the intermediate type of expressions inside the function. At that point, VSCode could say: "this mapreduce will not infer the type properly...".
There are tools like Cthulhu.jl and @code_warntype
. As things stand right now code has to get compiled (or lowered, or whatever @code_warntype
does exactly) for you to know what's going to happen with type inference.
Yes, I use @code_warntype
in these situations, but sometimes we don't have time to launch a REPL and test these details.
VSCode can show type hints. I've got it disabled so maybe I'm not up to date with how accurate it is. I don't think VSCode would really be suggesting things like that not using mapreduce
directly though.
Sort of rephrasing and filling in: I think one thing that hasn't changed since I last used Julia frequently is that for Vector{T}
with abstract T
the compiler can't (does not) make any assumptions about the elements. So any dispatch, say in addition in sum
happens at runtime. A Union
is not concrete, but is not abstract. If T
is a small union, then sum
can be much faster.
In general, this is a good application for sum types (tagged unions). I don't know if it would work for you. In discussing this once on slack, someone showed me that a plain old Union
is just as good in many cases. This is easy to verify with simple examples.
Aside from the type inference issue, this seems like an instance of the general fusion/fission question: sometimes you want map(f, map(g, xs))
and sometimes you want map(f∘g, xs)
.
@jar I'm confused, where is the reduce?
I'm just talking about doing two passes vs one pass over the data
Last updated: Dec 28 2024 at 04:38 UTC