Stream: helpdesk (published)

Topic: Identifying what is causing increase in memory.


view this post on Zulip Davi Sales Barreira (May 04 2022 at 16:40):

Friends, I've encapsulated a series of data transformations in functions and created a pipe to perform the transformations. By the end of the pipe, I get the final dataframe. Now, I'm running this code in a notebook (for testing and tweaking). For some reason, everytime I run the pipe, the memory used by the notebook is increasing. I don't understand why is that so. From what I thought, since everything was in functions, it should be taking space in the memory once the code ran. Here is the code:

function importmovements()
    movements = CSV.read(datadir("exp_pro") * "/csvs/movements.csv", DataFrame, header = false, ntasks = ntasks)
    colsmovement = ["number", "date", "title", "text"]
    rename!(movements, colsmovement)
    return movements
end
function adicionarid!(movements)
    movements[!,:id] = 1:size(movements)[1]
    return movements
end

function adicionaretapasiniciofim!(movements)
    movinit = combine(groupby(movements, :number),
        [:date, :title, :text, :id] .=> last,
        renamecols = false)
    movinit[!, :etapa] .= "inicio"
    movinit[!, :subetapa] .= "inicio"


    combine(groupby(movinit, :title), nrow => :count)
    movements = antijoin(movements, movinit, on = [:number, :date, :title])
    movements = vcat(movements, movinit)
    filter!(row -> !ismissing(row[:etapa]), movements)
    sort!(movements, [:id, :number, :date], rev = true)
    return movements
end


function runpipemovement()
    @pipe importmovements() |> unique!(_) |> adicionarid!(_) |> formatardata!(_) |> adicionaretapasiniciofim!(_) ;
end

runpipemovement()

view this post on Zulip Giovanni (May 04 2022 at 16:42):

Are you using Pluto notebooks?

view this post on Zulip Davi Sales Barreira (May 04 2022 at 16:42):

Jupyter

view this post on Zulip Giovanni (May 04 2022 at 16:49):

Then I have no clue :)

view this post on Zulip Mason Protter (May 04 2022 at 16:58):

It’s probably because you’re printing the data frame each time

view this post on Zulip Mason Protter (May 04 2022 at 16:58):

jupyter stores the output of each evaluation in a dictionary called Out

view this post on Zulip Mason Protter (May 04 2022 at 16:59):

you can do empty!(Out) to free up that memory

view this post on Zulip Davi Sales Barreira (May 04 2022 at 17:09):

@Mason Protter , I just run empty!(Out) in any cell? I've tried here, but the memory is still high.

view this post on Zulip Mason Protter (May 04 2022 at 17:09):

You’ll probably need to wait for the GC to run

view this post on Zulip Davi Sales Barreira (May 04 2022 at 17:10):

I've checked the notebook file. An it's not storing the dataframe.

view this post on Zulip Giovanni (May 04 2022 at 17:18):

just an idea but did you check without using @pipe?

view this post on Zulip Davi Sales Barreira (May 04 2022 at 17:19):

No. I'll try.

view this post on Zulip Giovanni (May 04 2022 at 17:25):

The very last thing I would try is copying movements at end of adicionaretasinitiofim and see if that solves the problem the only other thing you seem to create is movinit, in case it is carried out of the function somehow but essentially brainstorming here :)

view this post on Zulip Mason Protter (May 04 2022 at 17:32):

Davi Sales Barreira said:

I've checked the notebook file. An it's not storing the dataframe.

It’s not in the notebook file itself

view this post on Zulip Mason Protter (May 04 2022 at 17:32):

It’s an object in the jupyter kernel session

view this post on Zulip Davi Sales Barreira (May 04 2022 at 17:49):

Oh, ok. Thanks, I'll try again. BTW, any tools to inspect his sort of thing? I've tried ProfileSVG.jl, but didn't quite understood the output.


Last updated: Nov 06 2024 at 04:40 UTC