Stream: helpdesk (published)

Topic: Recursive Types


view this post on Zulip Ignacio Quintero (Jul 22 2021 at 09:01):

I'm trying to make a recursive type for a bifurcating tree with edge lengths. I'm not sure which would be more efficient or easier to handle afterwards. I've come with two options:

mutable struct Tree
  d1::Tree
  d2::Tree
  edge::Float64
  Tree() = (x = new(); x.d1 = x.d2 = x,)
end

mutable struct Tree
  d1::Union{Tree, Nothing}
  d2::Union{Tree, Nothing}
  edge::Float64
end

Any thoughts on which one is better?

view this post on Zulip Kwaku Oskin (Jul 22 2021 at 09:02):

In my experienceUnion{Tree, Nothing} is better (note that you should use Nothing as a type, not nothing which is variable).

EDITED: This is wrong, as it was explained in the thread later.

view this post on Zulip Kwaku Oskin (Jul 22 2021 at 09:03):

It is better from performance point of view: isdefined(tree, :d1) take longer, then tree.d1 === nothing.

view this post on Zulip Simeon Schaub (Jul 22 2021 at 09:13):

It is better from performance point of view: isdefined(tree, :d1) take longer, then tree.d1 === nothing.

Are you sure about that? That would surprise me, I wouldn't expect any big performance differences between both approaches. It also doesn't seem like isdefined will be used in the first example at all, but instead the end of the list is denoted by arriving at the original node again.

view this post on Zulip Kwaku Oskin (Jul 22 2021 at 09:17):

Well, some time ago I've benchmarked these things.
Maybe something has changed, idk.

And yes, you are right in this case there will be no validation for nothingness of the leaf, but as a rule of thumb, I prefer to follow one strategy always.

And anyway, working with types is more consistent, than switching between x.field and isdefined(x, :field) notation.

view this post on Zulip Kwaku Oskin (Jul 22 2021 at 09:27):

Hmmm... It's hard to measure properly, so probably my benchmarks are wrong.

struct Tree1
    l::Tree1
    r::Tree1

    Tree1() = new()
end

struct Tree2
    l::Union{Tree2, Nothing}
    r::Union{Tree2, Nothing}
end
Tree2() = Tree2(nothing, nothing)

leftleaf(t::Tree1) = isdefined(t, :l)
leftleaf(t::Tree2) = t.l !== nothing

t1 = [Tree1() for _ in 1:10000]
t2 = [Tree2() for _ in 1:10000]

@btime leftleaf.($t1);
#  11.222 μs (3 allocations: 5.55 KiB)

@btime leftleaf.($t2);
#  8.423 μs (3 allocations: 5.55 KiB)

Well, probably it's just interaction between broadcasting and === nothing operations, but in real world, we always have some complicated interaction. Overall, I saw that isdefined is slower in real world applications in a situations when I have lots of operations with nodes.

view this post on Zulip Kwaku Oskin (Jul 22 2021 at 09:29):

I mean it's hard to see the difference with a single operation, since it take ~1ns and difference is lost in the noise.

view this post on Zulip Ignacio Quintero (Jul 22 2021 at 10:16):

Thanks so much for this! So just to add, I need to do many iterations like transversing the tree and so forth, and according to the following very simple example, it would seem faster to transverse over the undefined type to estimate, say, the tree length:

mutable struct Tree1
    l::Tree1
    r::Tree1
    e::Float64
    Tree1() = new()
    Tree1(l::Tree1, r::Tree1, e::Float64) = new(l, r, e)
end

mutable struct Tree2
    l::Union{Tree2, Nothing}
    r::Union{Tree2, Nothing}
    e::Float64
end
Tree2() = Tree2(nothing, nothing, 0.0)

using BenchmarkTools

t1 = Tree1(Tree1(), Tree1(), 0.5)
t1.l.e = 1.0
t1.r.e = 1.0
t2 = Tree2(Tree2(nothing, nothing, 1.0), Tree2(nothing, nothing, 1.0), 0.5)

function treelength(t::Tree1, l::Float64)
  l += t.e
  if isdefined(t,:l)
    l = treelength(t.l, l)
  end
  if isdefined(t,:r)
   l = treelength(t.r, l)
  end
  return l
end

treelength(t::Tree2) =
  t.e + treelength(t.l) + treelength(t.r)
treelength(t::Nothing) = 0.0

@benchmark treelength($t1, 0.0)
  # memory estimate:  0 bytes
  # allocs estimate:  0
  # --------------
  # minimum time:     6.327 ns (0.00% GC)
  # median time:      6.357 ns (0.00% GC)
  # mean time:        6.731 ns (0.00% GC)
  # maximum time:     91.463 ns (0.00% GC)
  # --------------
  # samples:          10000
  # evals/sample:     1000

@benchmark treelength($t2)
  # memory estimate:  0 bytes
  # allocs estimate:  0
  # --------------
  # minimum time:     7.770 ns (0.00% GC)
  # median time:      8.024 ns (0.00% GC)
  # mean time:        8.700 ns (0.00% GC)
  # maximum time:     113.810 ns (0.00% GC)
  # --------------
  # samples:          10000
  # evals/sample:     999

view this post on Zulip Ignacio Quintero (Jul 22 2021 at 10:21):

Simeon Schaub said:

It is better from performance point of view: isdefined(tree, :d1) take longer, then tree.d1 === nothing.

Are you sure about that? That would surprise me, I wouldn't expect any big performance differences between both approaches. It also doesn't seem like isdefined will be used in the first example at all, but instead the end of the list is denoted by arriving at the original node again.

Yeah, so in the first case, one can test one arrived at leaf simply by === with one of the daughters?

view this post on Zulip Ignacio Quintero (Jul 22 2021 at 10:36):

One final question, how do you start a Constructor for the undefined Type where one creates a new object with undefined recursive types but an e = x argument?
Something like Tree1(e::Float64) = new(l, r, e)(but that works...)

view this post on Zulip Colin Caine (Jul 22 2021 at 13:11):

struct Tree1
    l::Tree1
    r::Tree1

    Tree1() = new()
end

I think that's kind of wild because you can't edit l or r without pointer stuff. Maybe you meant to say mutable?

@Ignacio Quintero You might want to look at LinkedList.jl and DataStructures.jl One of them takes the approach of Nil nodes and the other defines a leaf by checking if the next pointer is pointing to itself. Either way is fine.

view this post on Zulip Kwaku Oskin (Jul 22 2021 at 13:15):

Haha, you right! Of course it should be mutable and then difference is not that big.

Anyway, it looks like I should reconsider my own advice :-) It seems that Union version is worse than undef or even better cycled pointers.


Last updated: Nov 06 2024 at 04:40 UTC