Suppose I have a script like this:
# loop with 10 datasets
for i in 1:10
# load huge dataset from i-th file
dataset = load("input$i.dat")
# process the data somehow
derived = func(dataset)
# save results to disk
write("output$i.dat", derived)
end
Each individual iteration works when I run the code manually in a session, but Julia GC is not freeing up memory when the code is placed in the loop.
After the first iteration (i=1), the memory is almost full. In the second iteration (i=2), Julia crashes because it tries to load another huge dataset instead of rewriting the existing variable.
What is the most reliable method to free the GC in this case? I want to run each iteration independently, no need to share memory across i=1,2,3,...,10.
I am considering wrapping the whole script in a function, but am not sure if that is the best approach.
I'd preallocate dataset
and use Random.rand!
, but that may not be possible in your actual use case.
Yes, that is not possible, I used rand
as an example. Will replace the example...
I wonder if wrapping the loop body in a let
(creating a new scope) would make Julia re-use the memory.
I updated the example above. My current attempt is to wrap the loop body into a function:
function simulate(i)
# load huge dataset from i-th file
dataset = load("input$i.dat")
# process the data somehow
derived = func(dataset)
# save results to disk
write("output$i.dat", derived)
end
for i in 1:10
simulate(i)
end
It seems that the GC is doing a better job now. Let's see if it works this time.
I wonder if Julia (or VSCode) should do this by default, i.e., wrap code that was written in a script into a "virtual" function called "vscode_main" for better GC performance.
In that case, sending the script to the REPL for execution would be equivalent to wrapping the code in a function, and calling the function instead.
Wrapping the loop body into a function solved the problem so far. The code is in iteration 5 already.
Júlio Hoffimann has marked this topic as resolved.
FWIW I think Pluto does some auto-wrap-in-a-functioning, seems to work there. I think VSCode extension tends to want to be less magical though
What about adding dataset = nothing; GC.gc()
at the end of each iteration? Would that be reliable?
It is not reliable. Multiple threads on discourse discuss this.
Last updated: Nov 06 2024 at 04:40 UTC