Stream: helpdesk (published)

Topic: efficient bulk insert for dictionary


view this post on Zulip Expanding Man (Feb 04 2021 at 15:51):

Is there a method for efficiently inserting a large number of elements into a dictionary?

view this post on Zulip Alex Ames (Feb 04 2021 at 19:18):

I think your best bet is merge:

julia> d2 = Dict(900:1100 .=> rand.());

julia> @btime push!(d1, $d2...) setup=(d1 = Dict(1:1000 .=> rand.()));
  456.000 μs (10716 allocations: 496.19 KiB)

julia> @btime merge!(d1, $d2) setup=(d1 = Dict(1:1000 .=> rand.()));
  2.711 μs (0 allocations: 0 bytes)

view this post on Zulip Expanding Man (Feb 04 2021 at 19:20):

Unless I'm missing something here, I think the problem is that this requires you to build a second dictionary. I'm not entirely sure what happens on dictionary creation, but it probably is doing unnecessary work building that second dicitonary. What I'm looking for is a mass insertion of unhashed stuff without constructing an extra dict

view this post on Zulip Alex Ames (Feb 04 2021 at 20:00):

I think looping through d[k] = v is your best bet. The time to insert a single element is ~10 ns, and I'm not sure there's any way to escape linear scaling on the number of inserted elements.

julia> function mergepairs!(d, p)
           for (k, v) in p
               d[k] = v
           end
       end
mergepairs! (generic function with 1 method)

julia> d3 = 900:1100 .=> rand.();

julia> @btime mergepairs!(d1, d3) setup=(d1 = Dict(1:1000 .=> rand.()));
  2.256 μs (0 allocations: 0 bytes)

view this post on Zulip Rafael Fourquet (Feb 07 2021 at 11:12):

Yes I'm pretty sure there is no optimised routine implemented in Base for bulk insertion. And I'm not sure what that would look like... maybe doing better when multiple keys hash the same but it's not clear to me that the resulting book-keeping would be better than sequential insert.


Last updated: Nov 22 2024 at 04:41 UTC