efficient bulk insert for dictionary · helpdesk (published)

Stream: helpdesk (published)

Topic: efficient bulk insert for dictionary

Expanding Man (Feb 04 2021 at 15:51):

Is there a method for efficiently inserting a large number of elements into a dictionary?

Alex Ames (Feb 04 2021 at 19:18):

I think your best bet is merge:

julia> d2 = Dict(900:1100 .=> rand.());

julia> @btime push!(d1, $d2...) setup=(d1 = Dict(1:1000 .=> rand.()));
  456.000 μs (10716 allocations: 496.19 KiB)

julia> @btime merge!(d1, $d2) setup=(d1 = Dict(1:1000 .=> rand.()));
  2.711 μs (0 allocations: 0 bytes)

Expanding Man (Feb 04 2021 at 19:20):

Unless I'm missing something here, I think the problem is that this requires you to build a second dictionary. I'm not entirely sure what happens on dictionary creation, but it probably is doing unnecessary work building that second dicitonary. What I'm looking for is a mass insertion of unhashed stuff without constructing an extra dict

Alex Ames (Feb 04 2021 at 20:00):

I think looping through d[k] = v is your best bet. The time to insert a single element is ~10 ns, and I'm not sure there's any way to escape linear scaling on the number of inserted elements.

julia> function mergepairs!(d, p)
           for (k, v) in p
               d[k] = v
           end
       end
mergepairs! (generic function with 1 method)

julia> d3 = 900:1100 .=> rand.();

julia> @btime mergepairs!(d1, d3) setup=(d1 = Dict(1:1000 .=> rand.()));
  2.256 μs (0 allocations: 0 bytes)

Rafael Fourquet (Feb 07 2021 at 11:12):

Yes I'm pretty sure there is no optimised routine implemented in Base for bulk insertion. And I'm not sure what that would look like... maybe doing better when multiple keys hash the same but it's not clear to me that the resulting book-keeping would be better than sequential insert.

Last updated: Oct 02 2023 at 04:34 UTC