Stream: helpdesk (published)

Topic: fast finding index in array


view this post on Zulip Filippos Christou (Mar 17 2022 at 18:09):

I am in search of the appropriate data structure to fast retrieve the index of an object in an array.
I am currently using an one dimensional array to store some objects and the order matters.
Up until now I was just searching through a array objects, i.e. with complexity O(n), to find an object's index.

Since I do that often I would like to improve it to sublinear or even constant time complexity.
For sublinear I think I need to sort the array, which cannot happen because, as mentioned, the order matters.
For a constant time I can implement a hash table where each object of the array is mapped to its index.

I think the the OrderedDict from the OrderedCollections.jl package could be used.
But when I do:

d = OrderedDict{Char,Int}()
for c in 'a':'e'
    d[c] = c-'a'+1
end

then I cannot get the n-th index. I can only iterate through the dict.
e.g.

julia> d[1]
ERROR: KeyError: key 1 not found
Stacktrace:
 [1] getindex(h::OrderedDict{Char, Int64}, key::Int64)
   @ OrderedCollections ~/.julia/packages/OrderedCollections/PRayh/src/ordered_dict.jl:380
 [2] top-level scope
   @ REPL[6]:1

Ideally with d[1] I would like to get the first object .i.e. 'a'.
and with d['a'] I would like to get the uint index .i.e. 1.

Do I miss an interface for OrderedDict or do you hve any other data structure suggestions ?

view this post on Zulip Jakob Nybo Nissen (Mar 17 2022 at 18:12):

The best data structure would probably be a hash table with the implementation of Dictionaries.jl - they contain the ordered vector of values as part of the structure and is also often more efficient than the Dict of Base

view this post on Zulip Jakob Nybo Nissen (Mar 17 2022 at 18:13):

Another option could be just making your own bootleg data structure with a vector and a dict inside

view this post on Zulip Filippos Christou (Mar 17 2022 at 18:19):

Does anybody have any idea how expensive is actually hashing? I mean okey, it is constant cost but if I have a relative small array e.g. 1-100 elements it might be on average faster just to check all elements.

view this post on Zulip Filippos Christou (Mar 17 2022 at 18:20):

Jakob Nybo Nissen said:

The best data structure would probably be a hash table with the implementation of Dictionaries.jl - they contain the ordered vector of values as part of the structure and is also often more efficient than the Dict of Base

I 've heard of DIctionaries.jl before and I intended to give it a closer look. I guess the time has come.

view this post on Zulip Jakob Nybo Nissen (Mar 17 2022 at 18:20):

It also depends on how expensive it is to compare with equality. For things like integers, I seem to recall the crossover point is somewhere around 30-60 elements (but don't rely on that, I can't quite remember)

view this post on Zulip Jakob Nybo Nissen (Mar 17 2022 at 18:22):

In fact, I seem to recall that just a few days ago, there was a proposal to make Base's Dict simply be an array if there were fewer than N elements in it.

view this post on Zulip Kwaku Oskin (Mar 17 2022 at 19:44):

(deleted)

view this post on Zulip Sundar R (Mar 19 2022 at 03:10):

Bijections.jl works pretty similar to what you want, but it doesn't seem to be very actively used/maintained, so I'm not sure how it does in terms of performance and reliability.

julia> b = Bijection{Char, Int}()
Bijection{Char,Int64} (with 0 pairs)

julia> for c in 'a':'e'
           b[c] = c-'a'+1
       end

julia> b['a']
1

julia> b(1)
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

Last updated: Oct 02 2023 at 04:34 UTC