If I have a vector of sorted vectors, do we have an efficient way to combine those into one sorted vector? i.e. basically mergesort them? cc @Lilith Hafner
Or alternatively if it makes things easier, I could provide a v::Vector{T}
, as well as a rs::Vector{<:UnitRange}
such that for each r
in rs
, then
v[r]` would already be sorted
I know this isn't really a helpful answer, but shouldn't it be fairly straightforward to implement a linear merge? like maintain a pointer to the current index in each subvector, and iterate calling findmin
and push the value and incr appropriate pointer
Sure. It just would surprise me a lot if there wasn't something already out there that was usable for this
the top answer here seems nice https://stackoverflow.com/questions/76577386/julia-merging-multiple-streams-in-sorted-order-using-iterators
Or alternatively if it makes things easier, I could provide a
v::Vector{T}
, as well as...
I mean... is sort!(v)
fast enough for you? If T
is Float64, Int, or some other radixable type and length(rs)
is greater than 5-10, then I'd guess there's not too much room for improvement over sort!
. If you need every drop of perf, or length(v)
is very long, length(rs)
is short, and/or comparisons are more expensive, then you may have to DIY. The fastest approach I can think of is PagedMergeSort with a custom base case for in place, or a merge sort with scratch space if sorting repeatedly. Make sure to merge the shorter runs with eachother first.
Theoretically, this is exactly the sort of input that TimSort is best at. You could try sort!(v; alg=SortingAlgorithms.TimSort)
, or even dig into the timsort internals and provide rs
to skip the initial run-detection phase.
I would have thought Base.Sort.MergeSort
contains an implementation of merge_sorted
, no?
No, it's all inlined into the function body
@less sort!([1,2,3], 0, 1, MergeSort, Base.Order.Forward)
function sort!(v::AbstractVector{T}, lo::Integer, hi::Integer, a::MergeSortAlg, o::Ordering,
t0::Union{AbstractVector{T}, Nothing}=nothing) where T
@inbounds if lo < hi
hi-lo <= SMALL_THRESHOLD && return sort!(v, lo, hi, SMALL_ALGORITHM, o)
m = midpoint(lo, hi)
t = t0 === nothing ? similar(v, m-lo+1) : t0
length(t) < m-lo+1 && resize!(t, m-lo+1)
Base.require_one_based_indexing(t)
sort!(v, lo, m, a, o, t)
sort!(v, m+1, hi, a, o, t)
i, j = 1, lo
while j <= m
t[i] = v[j]
i += 1
j += 1
end
i, k = 1, lo
while k < j <= hi
if lt(o, v[j], t[i])
v[k] = v[j]
j += 1
else
v[k] = t[i]
i += 1
end
k += 1
end
while k < j
v[k] = t[i]
k += 1
i += 1
...
Could be useful to pull it out, similar to https://github.com/JuliaCollections/SortingAlgorithms.jl/issues/81 for the analogous issue for quicksort
Last updated: Dec 28 2024 at 04:38 UTC