Stream: helpdesk (published)

Topic: Iterating Replacements Over Rows


view this post on Zulip QuBit (Feb 16 2021 at 23:45):

Hello:

I am attempting to edit the row values using a for loop with the
following block (using DataFrames) in Pluto.jl as:

begin
    dateformat = DateFormat("y-m")
    for i in eachrow(DF)
        i[:Name] = replace(i[:Name], "M"=>"-")
        i[:Name] = map(i-> Date(i, dateformat), i[:Name])
end

The goal is to replace the M values with '-' in the first column
AND
to format the first column so that it is in the "y-m" date format.

Any suggestions out there?

view this post on Zulip Expanding Man (Feb 17 2021 at 00:13):

Suggest you do

df[!, :Name] = map(df.Name) do n
    Date(replace(n, "M"=>"-"), dateformat)
end

This is pretty close to what you wrote, the main difference is that it is a purely column-wise operations and does not iterate over rows. Since DataFrames doesn't hold information on the types of it's columns, it's much more efficient if you can avoid iterating over rows. To start out, a good rule of thumb is, if what you are doing only involves one column, you should use that column as you would any other AbstractVector.

view this post on Zulip QuBit (Feb 17 2021 at 09:49):

Expanding Man said:

Suggest you do

df[!, :Name] = map(df.Name) do n
    Date(replace(n, "M"=>"-"), dateformat)
end

This is pretty close to what you wrote, the main difference is that it is a purely column-wise operations and does not iterate over rows. Since DataFrames doesn't hold information on the types of it's columns, it's much more efficient if you can avoid iterating over rows. To start out, a good rule of thumb is, if what you are doing only involves one column, you should use that column as you would any other AbstractVector.

Thank You -- this worked! To paraphrase -- Iterate down a column
rather than across because data can differ horizontally and could
be less efficient to analyze. The complete code would be:

begin
    dateformat = DateFormat("y-m")
    SpendDF[!, :State] = map(SpendDF.State) do n
        Date(replace(n, "M"=>"-"), dateformat)
    end
end

view this post on Zulip QuBit (Mar 01 2021 at 09:10):

Hello Coders:

I am attempting to replace values in my dataframe using:

begin
for col in eachcol(ED3)
    replace(col, "---" => NaN)
end
    first(ED3,5)
end

The code executes, but I am not seeing any changes
among the records. The idea is to convert all the cells
containing "---" to "NaN".

I have attempted, but was unsuccessful using :

ED3[ED3 .=> "---"] .= NaN

Any suggestions?

view this post on Zulip DrChainsaw (Mar 01 2021 at 09:33):

Try DataFrames.transform. In general, anything which is a common thing to do has a builtin function with (imo) very nice syntax.

view this post on Zulip QuBit (Mar 01 2021 at 09:35):

DrChainsaw said:

Try DataFrames.transform. In general, anything which is a common thing to do has a builtin function with (imo) very nice syntax.

Hi Dr.,

Would you say:
df.transform(DF, "---" => "NaN"], to implement the change for the whole of DF?

view this post on Zulip QuBit (Mar 01 2021 at 09:38):

DrChainsaw said:

Try DataFrames.transform. In general, anything which is a common thing to do has a builtin function with (imo) very nice syntax.

From the code:

df.transform(ED3[!,4:11], "---" .=> NaN)

The error message I get reads:

ArgumentError: Unrecognized column selector: "---" => NaN

Last updated: Dec 28 2024 at 04:38 UTC