Hi,
I need a data structure for representing log-weighted observations. I was originally thinking of using a PriorityQueue from DataStructures.jl. But this stores an array of pairs and mutates this as it goes, maintaining heap ordering at each insert. In high dimensions, this is a lot of copying, so it will get very slow.
Also, this data structure stores each key twice: once in the array of pairs, and again in an Index::Dict{K,Int} field. For large collections, this seems like a waste of memory.
Finally, lots of functions will depend only on the values (the log-weights), so iterating through them needs to be fast.
I was thinking of adapting some ideas from @Andy Ferris 's Dictionaries.jl:
Indices withvalues field changed to <:AbstractArray (with a type parameter). This way indices could be stored in a StructArray or TupleVector, which give better efficiency for many operations.lookup and one for find. This would allow reordering without changing the original array.<: AbstractDictionary that's very similar to Dictionary, but maintains heap ordering similarly to DataStructures.PriorityQueuelogtotal field, maybe using the new LogSumExp in OnlineStatsI know the ordering on indices is internal, but this would use it in a similar way to what Dictionary does.
What do you think? Or maybe there's an easier way to do this?
I think I found a better way to do this:
https://github.com/cscherrer/PermutedArrays.jl
I think the idea of making the internals of Indices more abstract/replaceable is OK. I backed out of it because there's no real trait for resize! on arrays, and I wanted to get at least something out that worked and was easy to use. Certainly something you could implement via some copy-paste of the Dictionaries.jl code. If someone figures out how to make it robust we can probably upstream it.
Sort-ordered dictionaries are an area of interest for me (though I'm probably more thinking about fully-sorted dictionaries rather than heaps at the moment).
Maintaining summary stats for data structures is another really cool idea. For invertible, associative, commutative operations I think that this can be done easily enough with a wrapper that intercepts the mutations (assuming you don't want to track the deltas manually); for more general associative reductions it depends somewhat on the data structure (e.g. you can do this with a tree).
If PermutedArrays can do everything you want already, using that sounds like a great idea.
Thanks for the feedback :smile:
If Indices can be made to take another type parameter, say replacing values::Vector{I} with values::V, this could be something like
struct MyDictionary{I,T,V,F,S} <: AbstractDictionary{I, T}
inds::Indices{I,V}
# keys.data === inds.values
keys::PermutedVector{V}
# This could be abstracted too, but no need to do everything at once
values::Vector{T}
stats::S # e.g., OnlineStats.LogSumExp
update_stats!::F # e.g., OnlineStats.fit!
end
I had convinced myself this is important for my JuliaCon talk, but I think that's just because it sounds fun. So I should work on slides. I'll think about this some more in the background, and maybe we can talk some more in a week or two?
With JuliaCon submitted (:sweat_smile: ), I have a little catch-up to do. Adding this to my to-do list, hope to get back to it soon :smile:
Last updated: Nov 18 2025 at 04:43 UTC