Dictionary-based Priority Queues · helpdesk (published)

Stream: helpdesk (published)

Topic: Dictionary-based Priority Queues

Chad Scherrer (Jun 18 2021 at 16:36):

Hi,

I need a data structure for representing log-weighted observations. I was originally thinking of using a PriorityQueue from DataStructures.jl. But this stores an array of pairs and mutates this as it goes, maintaining heap ordering at each insert. In high dimensions, this is a lot of copying, so it will get very slow.

Also, this data structure stores each key twice: once in the array of pairs, and again in an Index::Dict{K,Int} field. For large collections, this seems like a waste of memory.

Finally, lots of functions will depend only on the values (the log-weights), so iterating through them needs to be fast.

I was thinking of adapting some ideas from @Andy Ferris 's Dictionaries.jl:

A modified Indices with
- the values field changed to <:AbstractArray (with a type parameter). This way indices could be stored in a StructArray or TupleVector, which give better efficiency for many operations.
- fields for permutations, one for lookup and one for find. This would allow reordering without changing the original array.
A <: AbstractDictionary that's very similar to Dictionary, but maintains heap ordering similarly to DataStructures.PriorityQueue
A logtotal field, maybe using the new LogSumExp in OnlineStats

I know the ordering on indices is internal, but this would use it in a similar way to what Dictionary does.

What do you think? Or maybe there's an easier way to do this?

Chad Scherrer (Jun 19 2021 at 13:58):

I think I found a better way to do this:
https://github.com/cscherrer/PermutedArrays.jl

Andy Ferris (Jun 20 2021 at 10:34):

I think the idea of making the internals of Indices more abstract/replaceable is OK. I backed out of it because there's no real trait for resize! on arrays, and I wanted to get at least something out that worked and was easy to use. Certainly something you could implement via some copy-paste of the Dictionaries.jl code. If someone figures out how to make it robust we can probably upstream it.

Sort-ordered dictionaries are an area of interest for me (though I'm probably more thinking about fully-sorted dictionaries rather than heaps at the moment).

Maintaining summary stats for data structures is another really cool idea. For invertible, associative, commutative operations I think that this can be done easily enough with a wrapper that intercepts the mutations (assuming you don't want to track the deltas manually); for more general associative reductions it depends somewhat on the data structure (e.g. you can do this with a tree).

If PermutedArrays can do everything you want already, using that sounds like a great idea.

Chad Scherrer (Jun 20 2021 at 21:43):

Thanks for the feedback :smile:

If Indices can be made to take another type parameter, say replacing values::Vector{I} with values::V, this could be something like

struct MyDictionary{I,T,V,F,S} <: AbstractDictionary{I, T}
    inds::Indices{I,V}

    # keys.data === inds.values
    keys::PermutedVector{V}

    # This could be abstracted too, but no need to do everything at once
    values::Vector{T}

    stats::S            # e.g., OnlineStats.LogSumExp
    update_stats!::F    # e.g., OnlineStats.fit!
end

I had convinced myself this is important for my JuliaCon talk, but I think that's just because it sounds fun. So I should work on slides. I'll think about this some more in the background, and maybe we can talk some more in a week or two?

Chad Scherrer (Jul 08 2021 at 13:45):

With JuliaCon submitted (:sweat_smile: ), I have a little catch-up to do. Adding this to my to-do list, hope to get back to it soon :smile:

Last updated: Oct 02 2023 at 04:34 UTC