Friends, I've encapsulated a series of data transformations in functions and created a pipe to perform the transformations. By the end of the pipe, I get the final dataframe. Now, I'm running this code in a notebook (for testing and tweaking). For some reason, everytime I run the pipe, the memory used by the notebook is increasing. I don't understand why is that so. From what I thought, since everything was in functions, it should be taking space in the memory once the code ran. Here is the code:
function importmovements()
movements = CSV.read(datadir("exp_pro") * "/csvs/movements.csv", DataFrame, header = false, ntasks = ntasks)
colsmovement = ["number", "date", "title", "text"]
rename!(movements, colsmovement)
return movements
end
function adicionarid!(movements)
movements[!,:id] = 1:size(movements)[1]
return movements
end
function adicionaretapasiniciofim!(movements)
movinit = combine(groupby(movements, :number),
[:date, :title, :text, :id] .=> last,
renamecols = false)
movinit[!, :etapa] .= "inicio"
movinit[!, :subetapa] .= "inicio"
combine(groupby(movinit, :title), nrow => :count)
movements = antijoin(movements, movinit, on = [:number, :date, :title])
movements = vcat(movements, movinit)
filter!(row -> !ismissing(row[:etapa]), movements)
sort!(movements, [:id, :number, :date], rev = true)
return movements
end
function runpipemovement()
@pipe importmovements() |> unique!(_) |> adicionarid!(_) |> formatardata!(_) |> adicionaretapasiniciofim!(_) ;
end
runpipemovement()
Are you using Pluto notebooks?
Jupyter
Then I have no clue :)
It’s probably because you’re printing the data frame each time
jupyter stores the output of each evaluation in a dictionary called Out
you can do empty!(Out)
to free up that memory
@Mason Protter , I just run empty!(Out)
in any cell? I've tried here, but the memory is still high.
You’ll probably need to wait for the GC to run
I've checked the notebook file. An it's not storing the dataframe.
just an idea but did you check without using @pipe?
No. I'll try.
The very last thing I would try is copying movements at end of adicionaretasinitiofim
and see if that solves the problem the only other thing you seem to create is movinit
, in case it is carried out of the function somehow but essentially brainstorming here :)
Davi Sales Barreira said:
I've checked the notebook file. An it's not storing the dataframe.
It’s not in the notebook file itself
It’s an object in the jupyter kernel session
Oh, ok. Thanks, I'll try again. BTW, any tools to inspect his sort of thing? I've tried ProfileSVG.jl, but didn't quite understood the output.
Last updated: Nov 06 2024 at 04:40 UTC