I'm trying to make a recursive type for a bifurcating tree with edge lengths. I'm not sure which would be more efficient or easier to handle afterwards. I've come with two options:
mutable struct Tree
d1::Tree
d2::Tree
edge::Float64
Tree() = (x = new(); x.d1 = x.d2 = x,)
end
mutable struct Tree
d1::Union{Tree, Nothing}
d2::Union{Tree, Nothing}
edge::Float64
end
Any thoughts on which one is better?
In my experienceUnion{Tree, Nothing}
is better (note that you should use Nothing
as a type, not nothing
which is variable).
EDITED: This is wrong, as it was explained in the thread later.
It is better from performance point of view: isdefined(tree, :d1)
take longer, then tree.d1 === nothing
.
It is better from performance point of view:
isdefined(tree, :d1)
take longer, thentree.d1 === nothing
.
Are you sure about that? That would surprise me, I wouldn't expect any big performance differences between both approaches. It also doesn't seem like isdefined
will be used in the first example at all, but instead the end of the list is denoted by arriving at the original node again.
Well, some time ago I've benchmarked these things.
Maybe something has changed, idk.
And yes, you are right in this case there will be no validation for nothingness of the leaf, but as a rule of thumb, I prefer to follow one strategy always.
And anyway, working with types is more consistent, than switching between x.field
and isdefined(x, :field)
notation.
Hmmm... It's hard to measure properly, so probably my benchmarks are wrong.
struct Tree1
l::Tree1
r::Tree1
Tree1() = new()
end
struct Tree2
l::Union{Tree2, Nothing}
r::Union{Tree2, Nothing}
end
Tree2() = Tree2(nothing, nothing)
leftleaf(t::Tree1) = isdefined(t, :l)
leftleaf(t::Tree2) = t.l !== nothing
t1 = [Tree1() for _ in 1:10000]
t2 = [Tree2() for _ in 1:10000]
@btime leftleaf.($t1);
# 11.222 μs (3 allocations: 5.55 KiB)
@btime leftleaf.($t2);
# 8.423 μs (3 allocations: 5.55 KiB)
Well, probably it's just interaction between broadcasting and === nothing
operations, but in real world, we always have some complicated interaction. Overall, I saw that isdefined
is slower in real world applications in a situations when I have lots of operations with nodes.
I mean it's hard to see the difference with a single operation, since it take ~1ns and difference is lost in the noise.
Thanks so much for this! So just to add, I need to do many iterations like transversing the tree and so forth, and according to the following very simple example, it would seem faster to transverse over the undefined type to estimate, say, the tree length:
mutable struct Tree1
l::Tree1
r::Tree1
e::Float64
Tree1() = new()
Tree1(l::Tree1, r::Tree1, e::Float64) = new(l, r, e)
end
mutable struct Tree2
l::Union{Tree2, Nothing}
r::Union{Tree2, Nothing}
e::Float64
end
Tree2() = Tree2(nothing, nothing, 0.0)
using BenchmarkTools
t1 = Tree1(Tree1(), Tree1(), 0.5)
t1.l.e = 1.0
t1.r.e = 1.0
t2 = Tree2(Tree2(nothing, nothing, 1.0), Tree2(nothing, nothing, 1.0), 0.5)
function treelength(t::Tree1, l::Float64)
l += t.e
if isdefined(t,:l)
l = treelength(t.l, l)
end
if isdefined(t,:r)
l = treelength(t.r, l)
end
return l
end
treelength(t::Tree2) =
t.e + treelength(t.l) + treelength(t.r)
treelength(t::Nothing) = 0.0
@benchmark treelength($t1, 0.0)
# memory estimate: 0 bytes
# allocs estimate: 0
# --------------
# minimum time: 6.327 ns (0.00% GC)
# median time: 6.357 ns (0.00% GC)
# mean time: 6.731 ns (0.00% GC)
# maximum time: 91.463 ns (0.00% GC)
# --------------
# samples: 10000
# evals/sample: 1000
@benchmark treelength($t2)
# memory estimate: 0 bytes
# allocs estimate: 0
# --------------
# minimum time: 7.770 ns (0.00% GC)
# median time: 8.024 ns (0.00% GC)
# mean time: 8.700 ns (0.00% GC)
# maximum time: 113.810 ns (0.00% GC)
# --------------
# samples: 10000
# evals/sample: 999
Simeon Schaub said:
It is better from performance point of view:
isdefined(tree, :d1)
take longer, thentree.d1 === nothing
.Are you sure about that? That would surprise me, I wouldn't expect any big performance differences between both approaches. It also doesn't seem like
isdefined
will be used in the first example at all, but instead the end of the list is denoted by arriving at the original node again.
Yeah, so in the first case, one can test one arrived at leaf simply by ===
with one of the daughters?
One final question, how do you start a Constructor for the undefined Type where one creates a new object with undefined recursive types but an e = x
argument?
Something like Tree1(e::Float64) = new(l, r, e)
(but that works...)
struct Tree1
l::Tree1
r::Tree1
Tree1() = new()
end
I think that's kind of wild because you can't edit l
or r
without pointer stuff. Maybe you meant to say mutable
?
@Ignacio Quintero You might want to look at LinkedList.jl and DataStructures.jl One of them takes the approach of Nil nodes and the other defines a leaf by checking if the next
pointer is pointing to itself. Either way is fine.
Haha, you right! Of course it should be mutable
and then difference is not that big.
Anyway, it looks like I should reconsider my own advice :-) It seems that Union
version is worse than undef
or even better cycled pointers.
Last updated: Nov 22 2024 at 04:41 UTC