Stream: helpdesk (published)

Topic: Diagnosing slow iterator


view this post on Zulip Timothy (Jan 05 2022 at 04:00):

I have a tree-like structure, composed of a mix of types, and I'm looking to iterate through each of them in a DFS-like manner. I've managed to get some code working, but it's much slower than I expected and I'm not sure why.

I use a wrapper type for the iterator, and the idea is that I keep the current "tree path" as a stack, if from the current node I can decent further the first child is added to the stack, and when the iterator at the end of the stack is exhausted, the stack is pop!ed.

Here's the full code for the iterator (for later: line 1 below corresponds to line 1 of the file)

import Base: iterate, length

struct OrgIterator
    o::Org
end

Base.IteratorSize(::OrgIterator) = Base.SizeUnknown()
iterate(it::OrgIterator) =
    if length(it.o.contents) > 0
        el, state = iterate(it.o)
        (el, Vector{Tuple}([(it.o, state), (el, nothing)]))
    end
iterate(it::OrgIterator, stack::Vector) =
    if length(stack) > 0
        next = if isnothing(stack[end][2])
            iterate(stack[end][1])
        else
            iterate(stack[end][1], stack[end][2])
        end
        if isnothing(next)
            pop!(stack)
            iterate(it, stack)
        else
            el, state = next
            stack[end] = (stack[end][1], state)
            if applicable(iterate, el)
                push!(stack, (el, nothing))
            end
            (el, stack)
        end
    end

This takes 1.2s to run on a structure with 26262 nodes. I have profiled Iterators.filter(c -> c isa FootnoteRef, OrgIterator(m)) |> collect (which just matches ~130 nodes) and looking at Profile.print() (full result: http://ix.io/3L3w) it appears that other than spending ages on poptask (which seems to always be the case in profiles?) the bulk of the time looks to be spent on line 26, i.e. if applicable(iterate, el).
However, benchmarking some call of applicable (I assume that the performance is the same for any type) shows it taking around 140ns per call, so calling that on every node should take ~4000000ns or 0.004s, which wouldn't explain the poor performance.

So, I'm somewhat perplexed. Might anyone have some idea as to what is going on and how I can improve this?


Last updated: Dec 28 2024 at 04:38 UTC