Stream: helpdesk (published)

Topic: sortreduce a collection of sorted vectors


view this post on Zulip Mason Protter (Mar 15 2024 at 15:03):

If I have a vector of sorted vectors, do we have an efficient way to combine those into one sorted vector? i.e. basically mergesort them? cc @Lilith Hafner

view this post on Zulip Mason Protter (Mar 15 2024 at 15:40):

Or alternatively if it makes things easier, I could provide a v::Vector{T}, as well as a rs::Vector{<:UnitRange} such that for each r in rs, then v[r]` would already be sorted

view this post on Zulip Andy Dienes (Mar 15 2024 at 16:29):

I know this isn't really a helpful answer, but shouldn't it be fairly straightforward to implement a linear merge? like maintain a pointer to the current index in each subvector, and iterate calling findmin and push the value and incr appropriate pointer

view this post on Zulip Mason Protter (Mar 15 2024 at 16:38):

Sure. It just would surprise me a lot if there wasn't something already out there that was usable for this

view this post on Zulip Andy Dienes (Mar 15 2024 at 17:32):

the top answer here seems nice https://stackoverflow.com/questions/76577386/julia-merging-multiple-streams-in-sorted-order-using-iterators

view this post on Zulip Lilith Hafner (Mar 19 2024 at 15:13):

Or alternatively if it makes things easier, I could provide a v::Vector{T}, as well as...

I mean... is sort!(v) fast enough for you? If T is Float64, Int, or some other radixable type and length(rs) is greater than 5-10, then I'd guess there's not too much room for improvement over sort!. If you need every drop of perf, or length(v) is very long, length(rs) is short, and/or comparisons are more expensive, then you may have to DIY. The fastest approach I can think of is PagedMergeSort with a custom base case for in place, or a merge sort with scratch space if sorting repeatedly. Make sure to merge the shorter runs with eachother first.

Theoretically, this is exactly the sort of input that TimSort is best at. You could try sort!(v; alg=SortingAlgorithms.TimSort), or even dig into the timsort internals and provide rs to skip the initial run-detection phase.

view this post on Zulip jar (Mar 19 2024 at 18:36):

I would have thought Base.Sort.MergeSort contains an implementation of merge_sorted, no?

view this post on Zulip Lilith Hafner (Mar 19 2024 at 19:20):

No, it's all inlined into the function body

@less sort!([1,2,3], 0, 1, MergeSort, Base.Order.Forward)
function sort!(v::AbstractVector{T}, lo::Integer, hi::Integer, a::MergeSortAlg, o::Ordering,
        t0::Union{AbstractVector{T}, Nothing}=nothing) where T
    @inbounds if lo < hi
        hi-lo <= SMALL_THRESHOLD && return sort!(v, lo, hi, SMALL_ALGORITHM, o)

        m = midpoint(lo, hi)

        t = t0 === nothing ? similar(v, m-lo+1) : t0
        length(t) < m-lo+1 && resize!(t, m-lo+1)
        Base.require_one_based_indexing(t)

        sort!(v, lo,  m,  a, o, t)
        sort!(v, m+1, hi, a, o, t)

        i, j = 1, lo
        while j <= m
            t[i] = v[j]
            i += 1
            j += 1
        end

        i, k = 1, lo
        while k < j <= hi
            if lt(o, v[j], t[i])
                v[k] = v[j]
                j += 1
            else
                v[k] = t[i]
                i += 1
            end
            k += 1
        end
        while k < j
            v[k] = t[i]
            k += 1
            i += 1
...

view this post on Zulip jar (Mar 19 2024 at 20:36):

Could be useful to pull it out, similar to https://github.com/JuliaCollections/SortingAlgorithms.jl/issues/81 for the analogous issue for quicksort


Last updated: Nov 06 2024 at 04:40 UTC