fast finding index in array · helpdesk (published)

I am in search of the appropriate data structure to fast retrieve the index of an object in an array.
I am currently using an one dimensional array to store some objects and the order matters.
Up until now I was just searching through a array objects, i.e. with complexity O(n), to find an object's index.

Since I do that often I would like to improve it to sublinear or even constant time complexity.
For sublinear I think I need to sort the array, which cannot happen because, as mentioned, the order matters.
For a constant time I can implement a hash table where each object of the array is mapped to its index.

I think the the OrderedDict from the OrderedCollections.jl package could be used.
But when I do:

d = OrderedDict{Char,Int}()
for c in 'a':'e'
    d[c] = c-'a'+1
end

julia> d[1]
ERROR: KeyError: key 1 not found
Stacktrace:
 [1] getindex(h::OrderedDict{Char, Int64}, key::Int64)
   @ OrderedCollections ~/.julia/packages/OrderedCollections/PRayh/src/ordered_dict.jl:380
 [2] top-level scope
   @ REPL[6]:1

Ideally with d[1] I would like to get the first object .i.e. 'a'.
and with d['a'] I would like to get the uint index .i.e. 1.

Do I miss an interface for OrderedDict or do you hve any other data structure suggestions ?

Jakob Nybo Nissen (Mar 17 2022 at 18:12):

The best data structure would probably be a hash table with the implementation of Dictionaries.jl - they contain the ordered vector of values as part of the structure and is also often more efficient than the Dict of Base

Jakob Nybo Nissen (Mar 17 2022 at 18:13):

Another option could be just making your own bootleg data structure with a vector and a dict inside

Filippos Christou (Mar 17 2022 at 18:19):

Does anybody have any idea how expensive is actually hashing? I mean okey, it is constant cost but if I have a relative small array e.g. 1-100 elements it might be on average faster just to check all elements.

Filippos Christou (Mar 17 2022 at 18:20):

I 've heard of DIctionaries.jl before and I intended to give it a closer look. I guess the time has come.

Jakob Nybo Nissen (Mar 17 2022 at 18:20):

It also depends on how expensive it is to compare with equality. For things like integers, I seem to recall the crossover point is somewhere around 30-60 elements (but don't rely on that, I can't quite remember)

Jakob Nybo Nissen (Mar 17 2022 at 18:22):

In fact, I seem to recall that just a few days ago, there was a proposal to make Base's Dict simply be an array if there were fewer than N elements in it.

Kwaku Oskin (Mar 17 2022 at 19:44):

Sundar R (Mar 19 2022 at 03:10):

Bijections.jl works pretty similar to what you want, but it doesn't seem to be very actively used/maintained, so I'm not sure how it does in terms of performance and reliability.

julia> b = Bijection{Char, Int}()
Bijection{Char,Int64} (with 0 pairs)

julia> for c in 'a':'e'
           b[c] = c-'a'+1
       end

julia> b['a']
1

julia> b(1)
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

Stream: helpdesk (published)

Topic: fast finding index in array

Filippos Christou (Mar 17 2022 at 18:09):