Stream: helpdesk (published)

Topic: Dictionary-based Priority Queues


view this post on Zulip Chad Scherrer (Jun 18 2021 at 16:36):

Hi,

I need a data structure for representing log-weighted observations. I was originally thinking of using a PriorityQueue from DataStructures.jl. But this stores an array of pairs and mutates this as it goes, maintaining heap ordering at each insert. In high dimensions, this is a lot of copying, so it will get very slow.

Also, this data structure stores each key twice: once in the array of pairs, and again in an Index::Dict{K,Int} field. For large collections, this seems like a waste of memory.

Finally, lots of functions will depend only on the values (the log-weights), so iterating through them needs to be fast.

I was thinking of adapting some ideas from @Andy Ferris 's Dictionaries.jl:

I know the ordering on indices is internal, but this would use it in a similar way to what Dictionary does.

What do you think? Or maybe there's an easier way to do this?

view this post on Zulip Chad Scherrer (Jun 19 2021 at 13:58):

I think I found a better way to do this:
https://github.com/cscherrer/PermutedArrays.jl

view this post on Zulip Andy Ferris (Jun 20 2021 at 10:34):

I think the idea of making the internals of Indices more abstract/replaceable is OK. I backed out of it because there's no real trait for resize! on arrays, and I wanted to get at least something out that worked and was easy to use. Certainly something you could implement via some copy-paste of the Dictionaries.jl code. If someone figures out how to make it robust we can probably upstream it.

Sort-ordered dictionaries are an area of interest for me (though I'm probably more thinking about fully-sorted dictionaries rather than heaps at the moment).

Maintaining summary stats for data structures is another really cool idea. For invertible, associative, commutative operations I think that this can be done easily enough with a wrapper that intercepts the mutations (assuming you don't want to track the deltas manually); for more general associative reductions it depends somewhat on the data structure (e.g. you can do this with a tree).

If PermutedArrays can do everything you want already, using that sounds like a great idea.

view this post on Zulip Chad Scherrer (Jun 20 2021 at 21:43):

Thanks for the feedback :smile:

If Indices can be made to take another type parameter, say replacing values::Vector{I} with values::V, this could be something like

struct MyDictionary{I,T,V,F,S} <: AbstractDictionary{I, T}
    inds::Indices{I,V}

    # keys.data === inds.values
    keys::PermutedVector{V}

    # This could be abstracted too, but no need to do everything at once
    values::Vector{T}

    stats::S            # e.g., OnlineStats.LogSumExp
    update_stats!::F    # e.g., OnlineStats.fit!
end

I had convinced myself this is important for my JuliaCon talk, but I think that's just because it sounds fun. So I should work on slides. I'll think about this some more in the background, and maybe we can talk some more in a week or two?

view this post on Zulip Chad Scherrer (Jul 08 2021 at 13:45):

With JuliaCon submitted (:sweat_smile: ), I have a little catch-up to do. Adding this to my to-do list, hope to get back to it soon :smile:


Last updated: Nov 22 2024 at 04:41 UTC