Suppose I have a script like this:
# loop with 10 datasets
for i in 1:10
# load huge dataset from i-th file
dataset = load("input$i.dat")
# process the data somehow
derived = func(dataset)
# save results to disk
write("output$i.dat", derived)
end
Each individual iteration works when I run the code manually in a session, but Julia GC is not freeing up memory when the code is placed in the loop.
After the first iteration (i=1), the memory is almost full. In the second iteration (i=2), Julia crashes because it tries to load another huge dataset instead of rewriting the existing variable.
What is the most reliable method to free the GC in this case? I want to run each iteration independently, no need to share memory across i=1,2,3,...,10.
I am considering wrapping the whole script in a function, but am not sure if that is the best approach.
I'd preallocate dataset and use Random.rand!, but that may not be possible in your actual use case.
Yes, that is not possible, I used rand as an example. Will replace the example...
I wonder if wrapping the loop body in a let (creating a new scope) would make Julia re-use the memory.
I updated the example above. My current attempt is to wrap the loop body into a function:
function simulate(i)
# load huge dataset from i-th file
dataset = load("input$i.dat")
# process the data somehow
derived = func(dataset)
# save results to disk
write("output$i.dat", derived)
end
for i in 1:10
simulate(i)
end
It seems that the GC is doing a better job now. Let's see if it works this time.
I wonder if Julia (or VSCode) should do this by default, i.e., wrap code that was written in a script into a "virtual" function called "vscode_main" for better GC performance.
In that case, sending the script to the REPL for execution would be equivalent to wrapping the code in a function, and calling the function instead.
Wrapping the loop body into a function solved the problem so far. The code is in iteration 5 already.
Júlio Hoffimann has marked this topic as resolved.
FWIW I think Pluto does some auto-wrap-in-a-functioning, seems to work there. I think VSCode extension tends to want to be less magical though
What about adding dataset = nothing; GC.gc() at the end of each iteration? Would that be reliable?
It is not reliable. Multiple threads on discourse discuss this.
Last updated: Oct 23 2025 at 04:41 UTC