Stream: helpdesk (published)

Topic: ✔ How to free memory reliably in a loop?


view this post on Zulip Júlio Hoffimann (Feb 15 2024 at 13:56):

Suppose I have a script like this:

# loop with 10 datasets
for i in 1:10
  # load huge dataset from i-th file
  dataset = load("input$i.dat")

  # process the data somehow
  derived = func(dataset)

  # save results to disk
  write("output$i.dat", derived)
end

Each individual iteration works when I run the code manually in a session, but Julia GC is not freeing up memory when the code is placed in the loop.

After the first iteration (i=1), the memory is almost full. In the second iteration (i=2), Julia crashes because it tries to load another huge dataset instead of rewriting the existing variable.

What is the most reliable method to free the GC in this case? I want to run each iteration independently, no need to share memory across i=1,2,3,...,10.

I am considering wrapping the whole script in a function, but am not sure if that is the best approach.

view this post on Zulip mbaz (Feb 15 2024 at 14:17):

I'd preallocate dataset and use Random.rand!, but that may not be possible in your actual use case.

view this post on Zulip Júlio Hoffimann (Feb 15 2024 at 14:17):

Yes, that is not possible, I used rand as an example. Will replace the example...

view this post on Zulip mbaz (Feb 15 2024 at 14:18):

I wonder if wrapping the loop body in a let (creating a new scope) would make Julia re-use the memory.

view this post on Zulip Júlio Hoffimann (Feb 15 2024 at 14:19):

I updated the example above. My current attempt is to wrap the loop body into a function:

function simulate(i)
  # load huge dataset from i-th file
  dataset = load("input$i.dat")

  # process the data somehow
  derived = func(dataset)

  # save results to disk
  write("output$i.dat", derived)
end

for i in 1:10
  simulate(i)
end

view this post on Zulip Júlio Hoffimann (Feb 15 2024 at 14:19):

It seems that the GC is doing a better job now. Let's see if it works this time.

view this post on Zulip Júlio Hoffimann (Feb 15 2024 at 14:20):

I wonder if Julia (or VSCode) should do this by default, i.e., wrap code that was written in a script into a "virtual" function called "vscode_main" for better GC performance.

view this post on Zulip Júlio Hoffimann (Feb 15 2024 at 14:21):

In that case, sending the script to the REPL for execution would be equivalent to wrapping the code in a function, and calling the function instead.

view this post on Zulip Júlio Hoffimann (Feb 15 2024 at 14:37):

Wrapping the loop body into a function solved the problem so far. The code is in iteration 5 already.

view this post on Zulip Notification Bot (Feb 15 2024 at 14:38):

Júlio Hoffimann has marked this topic as resolved.

view this post on Zulip Eric Hanson (Feb 15 2024 at 18:09):

FWIW I think Pluto does some auto-wrap-in-a-functioning, seems to work there. I think VSCode extension tends to want to be less magical though

view this post on Zulip Leandro Martínez (Feb 17 2024 at 00:13):

What about adding dataset = nothing; GC.gc() at the end of each iteration? Would that be reliable?

view this post on Zulip Júlio Hoffimann (Feb 17 2024 at 02:01):

It is not reliable. Multiple threads on discourse discuss this.

view this post on Zulip Lilith Hafner (Feb 20 2024 at 19:13):

https://docs.julialang.org/en/v1/manual/performance-tips/#Performance-critical-code-should-be-inside-a-function


Last updated: Nov 22 2024 at 04:41 UTC