I have an algorithm that needs to allocate several integer vectors of random size (around 10,000 elements), mutate them in calculations, and then discard them.
Is it possible to make these vectors stack allocated? From reading through the thread on StrideArrays it seem like it might be.
Currently they represent a large fraction of my code's memory usage which I think is killing my multithreaded performance.
Here is a representative example:
function g() #would like to reduce allocations here
l = rand(9000:10_000) #large random integer
arr = Vector{Int}(undef,l) #array of length l
#do some stuff with arr
accum = 0
arr .= 1
for k in arr
accum += k + f(arr)
end
return accum #discard arr
end
function f(arr)
return arr[1]*2
end
@allocated g()
thanks!
Can you pre-allocate and re-use one single vector? You could pre-allocate a 10_000-long vector and keep track of the actual length separately.
Oh, maybe! Not one single vector, but a few of them sure. I could then use views of them that are dynamically sized?
Worth a try I'd guess :smile:
Yes, you can take dynamically sized views of arrays. Preallocating is probably your best approach here.
Currently StrideArrays.jl cannot make dynamically sized stack allocated arrays.
E.g.:
julia> function g!(_arr)
l = rand(9000:10_000) #large random integer
arr = @view _arr[1:l]
#do some stuff with arr
accum = 0
arr .= 1
for k in arr
accum += k + f(arr)
end
return accum
end
g! (generic function with 1 method)
julia> function f(arr)
return arr[1]*2
end
f (generic function with 1 method)
julia> let _arr = Vector{Int}(undef, 10_000)
@benchmark g!($_arr)
end
BechmarkTools.Trial: 10000 samples with 3 evaluations.
Time (median): 9.140 μs ┊ GC (median): 0.00%
Time (mean ± σ): 9.202 μs ± 479.145 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
Range (min … max): 8.437 μs … 17.377 μs ┊ GC (min … max): 0.00% … 0.00%
▂▄▅▇███▇▆▅▃▂▁
▁▁▁▁▂▃▄▅▇███████████████▅▅▄▃▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
8.44 μs Histogram: frequency by time 11.1 μs
Memory estimate: 0 bytes, allocs estimate: 0.
compared to
julia> @benchmark g()
BechmarkTools.Trial: 10000 samples with 1 evaluations.
Time (median): 13.290 μs ┊ GC (median): 0.00%
Time (mean ± σ): 16.383 μs ± 30.578 μs ┊ GC (mean ± σ): 6.87% ± 3.74%
Range (min … max): 9.490 μs … 1.189 ms ┊ GC (min … max): 0.00% … 96.94%
▃▅▇██▇▅▄▅▆▆▆▄▃▂▁▁▁▁▁ ▁▁▁▁▁ ▁▁
▅▅████████████████████████▇██████████▇▇▇▇▆███████▇▇▅▅▆▅▅▅▅▆▅▅▅▆
9.49 μs Histogram: log(frequency) by time 37.8 μs
Memory estimate: 70.39 KiB, allocs estimate: 2.
Yeah, I'd reccommend the views-of-a-huge vector approach.
But 10_000 Int
s isn't too large to be stack allocated either. If that's your upper limit, you could also stack allocate a vector that large, and then take a view of it.
Thanks all.
This makes me wonder, would it be possible to write a package that takes upper-bound for the memory usage of my code and then gives me all efficient allocations? I have nowhere near the understanding of memory management to write such a package, just curious.
Generally yes! Just make however many Array
s that are bigger than you need at the beginning, and then keep re-using them (or views into them) in-place. If you can get everything to be in-place after the initial allocation, you could in principle even disable the GC if you wanted.
Oh sorry, you mean a package that does all that for you.. @Lyndon White 's https://github.com/oxinabox/AutoPreallocation.jl is not too far off from that IIRC!
How do you get this histogram output with BenchmarkTools.jl?
https://github.com/JuliaCI/BenchmarkTools.jl/pull/217
https://discourse.julialang.org/t/ann-benchmarkhistograms-jl/61653
Last updated: Nov 06 2024 at 04:40 UTC