Stream: helpdesk (published)

Topic: Iterating Replacements Over Rows


view this post on Zulip qu bit (Feb 16 2021 at 23:45):

Hello:

I am attempting to edit the row values using a for loop with the
following block (using DataFrames) in Pluto.jl as:

begin
dateformat = DateFormat("y-m")
for i in eachrow(DF)
i[:Name] = replace(i[:Name], "M"=>"-")
i[:Name] = map(i-> Date(i, dateformat), i[:Name])
end

The goal is to replace the M values with '-' in the first column
AND
to format the first column so that it is in the "y-m" date format.

Any suggestions out there?

view this post on Zulip Expanding Man (Feb 17 2021 at 00:13):

Suggest you do

df[!, :Name] = map(df.Name) do n
    Date(replace(n, "M"=>"-"), dateformat)
end

This is pretty close to what you wrote, the main difference is that it is a purely column-wise operations and does not iterate over rows. Since DataFrames doesn't hold information on the types of it's columns, it's much more efficient if you can avoid iterating over rows. To start out, a good rule of thumb is, if what you are doing only involves one column, you should use that column as you would any other AbstractVector.

view this post on Zulip qu bit (Feb 17 2021 at 09:49):

Expanding Man said:

Suggest you do

df[!, :Name] = map(df.Name) do n
    Date(replace(n, "M"=>"-"), dateformat)
end

This is pretty close to what you wrote, the main difference is that it is a purely column-wise operations and does not iterate over rows. Since DataFrames doesn't hold information on the types of it's columns, it's much more efficient if you can avoid iterating over rows. To start out, a good rule of thumb is, if what you are doing only involves one column, you should use that column as you would any other AbstractVector.

Thank You -- this worked! To paraphrase -- Iterate down a column
rather than across because data can differ horizontally and could
be less efficient to analyze. The complete code would be:

begin
dateformat = DateFormat("y-m")
SpendDF[!, :State] = map(SpendDF.State) do n
Date(replace(n, "M"=>"-"), dateformat)
end
end

view this post on Zulip qu bit (Mar 01 2021 at 09:10):

Hello Coders:

I am attempting to replace values in my dataframe using:

begin
for col in eachcol(ED3)
replace(col, "---" => NaN)
end
first(ED3,5)
end

The code executes, but I am not seeing any changes
among the records. The idea is to convert all the cells
containing "---" to "NaN".

I have attempted, but was unsuccessful using :
ED3[ED3 .=> "---"] .= NaN

Any suggestions?

view this post on Zulip DrChainsaw (Mar 01 2021 at 09:33):

Try DataFrames.transform. In general, anything which is a common thing to do has a builtin function with (imo) very nice syntax.

view this post on Zulip qu bit (Mar 01 2021 at 09:35):

DrChainsaw said:

Try DataFrames.transform. In general, anything which is a common thing to do has a builtin function with (imo) very nice syntax.

Hi Dr.,

Would you say:
df.transform(DF, "---" => "NaN"], to implement the change for the whole of DF?

view this post on Zulip qu bit (Mar 01 2021 at 09:38):

DrChainsaw said:

Try DataFrames.transform. In general, anything which is a common thing to do has a builtin function with (imo) very nice syntax.

From the code:
df.transform(ED3[!,4:11], "---" .=> NaN)

The error message I get reads:
ArgumentError: Unrecognized column selector: "---" => NaN


Last updated: Oct 02 2023 at 04:34 UTC