I am in search of the appropriate data structure to fast retrieve the index of an object in an array.
I am currently using an one dimensional array to store some objects and the order matters.
Up until now I was just searching through a array objects, i.e. with complexity O(n), to find an object's index.
Since I do that often I would like to improve it to sublinear or even constant time complexity.
For sublinear I think I need to sort the array, which cannot happen because, as mentioned, the order matters.
For a constant time I can implement a hash table where each object of the array is mapped to its index.
I think the the OrderedDict
from the OrderedCollections.jl package could be used.
But when I do:
d = OrderedDict{Char,Int}()
for c in 'a':'e'
d[c] = c-'a'+1
end
then I cannot get the n-th index. I can only iterate through the dict.
e.g.
julia> d[1]
ERROR: KeyError: key 1 not found
Stacktrace:
[1] getindex(h::OrderedDict{Char, Int64}, key::Int64)
@ OrderedCollections ~/.julia/packages/OrderedCollections/PRayh/src/ordered_dict.jl:380
[2] top-level scope
@ REPL[6]:1
Ideally with d[1]
I would like to get the first object .i.e. 'a'.
and with d['a'] I would like to get the uint index .i.e. 1.
Do I miss an interface for OrderedDict
or do you hve any other data structure suggestions ?
The best data structure would probably be a hash table with the implementation of Dictionaries.jl
- they contain the ordered vector of values as part of the structure and is also often more efficient than the Dict
of Base
Another option could be just making your own bootleg data structure with a vector and a dict inside
Does anybody have any idea how expensive is actually hashing? I mean okey, it is constant cost but if I have a relative small array e.g. 1-100 elements it might be on average faster just to check all elements.
Jakob Nybo Nissen said:
The best data structure would probably be a hash table with the implementation of
Dictionaries.jl
- they contain the ordered vector of values as part of the structure and is also often more efficient than theDict
of Base
I 've heard of DIctionaries.jl before and I intended to give it a closer look. I guess the time has come.
It also depends on how expensive it is to compare with equality. For things like integers, I seem to recall the crossover point is somewhere around 30-60 elements (but don't rely on that, I can't quite remember)
In fact, I seem to recall that just a few days ago, there was a proposal to make Base's Dict
simply be an array if there were fewer than N elements in it.
(deleted)
Bijections.jl works pretty similar to what you want, but it doesn't seem to be very actively used/maintained, so I'm not sure how it does in terms of performance and reliability.
julia> b = Bijection{Char, Int}()
Bijection{Char,Int64} (with 0 pairs)
julia> for c in 'a':'e'
b[c] = c-'a'+1
end
julia> b['a']
1
julia> b(1)
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
Last updated: Nov 22 2024 at 04:41 UTC