Hello:
I implemented tryparse broadcast and coalesce methods
on my dataframe. Now, the missing cells read 'nothing' so
that the column no longer has a datatype.
I would like to convert the cells that read 'nothing' to 'NaN'
How might I achieve this?
A very similar question came up on Slack last night.
using DataFrames
a = [1.0, nothing, 3.0]
b = [nothing, 20.0, 30.0]
df = DataFrame(a = a, b = b)
# ok, now df is your data frame with `nothings`
# and you want them to be NaNs
function nan_all_nothings(x)
x[isnothing.(x)] .= NaN
return
end
nan_all_nothings(df.a)
nan_all_nothings(df.b)
julia> df
julia> df
3×2 DataFrame
Row │ a b
│ Union… Union…
─────┼────────────────
1 │ 1.0 NaN
2 │ NaN 20.0
3 │ 3.0 30.0
Jeffrey Sarnoff said:
A very similar question came up on Slack last night.
using DataFrames a = [1.0, nothing, 3.0] b = [nothing, 20.0, 30.0] df = DataFrame(a = a, b = b) # ok, now df is your data frame with `nothings` # and you want them to be NaNs function nan_all_nothings(x) x[isnothing.(x)] .= NaN return end nan_all_nothings(df.a) nan_all_nothings(df.b) julia> df julia> df 3×2 DataFrame Row │ a b │ Union… Union… ─────┼──────────────── 1 │ 1.0 NaN 2 │ NaN 20.0 3 │ 3.0 30.0
Good Day Jefrfrey,
This worked! I would like to stream-line the
code for this by applying the function you
created for eachcol(df) I attempted:
nan_for_nothings(df[!,7:11])
But am getting an index error. Any additional
tips?
Thanks again,
I guess this is related to your other question around NaN
s as well - when working with DataFrames that have missing observations you should really be using missing
instead of nothing
or NaN
. That's what missing
is for, and why passmissing
and skipmissing
exist
Nils said:
I guess this is related to your other question around
NaN
s as well - when working with DataFrames that have missing observations you should really be usingmissing
instead ofnothing
orNaN
. That's whatmissing
is for, and whypassmissing
andskipmissing
exist
Hi Nils,
I was able to implement:
for col in eachcol(ED4)
replace!(col,NaN => 0)
end
This approach helped to address the NaN
fill issue I was having when I applied the
describe method to the dataframe.
Yes that works, but again missing
s would be more natural - they can be used with coalesce
, which is exactly built for this use case.
@qu bit I agree with @Nils , use missing
nothing_is_missing(x) = x[isnothing.(x)] .= missing
The dataframe's columns of interest need to allow values of type Missing
.
Ask someone who knows .. manipulating DataFrames .
map(nothing_is_missing, eachcol(df[!, 7:11]));
Jeffrey Sarnoff said:
qu bit I agree with Nils , use
missing
nothing_is_missing(x) = x[isnothing.(x)] .= missing
The dataframe's columns of interest need to allow values of type
Missing
.
Ask someone who knows .. manipulating DataFrames .map(nothing_is_missing, eachcol(df[!, 7:11]));
Thank you Jeffrey. I attempted to apply your principle to converting
the same column set to Float64 using:
map(convert(DataFrame{Float64,1}, eachcol(ED4[!, 7:11])))
I am returning the following error as:
TypeError: in Type{...} expression, expected UnionAll, got Type{DataFrames.DataFrame}
If I use a Array{Float64,1} ... the error message returns:
MethodError: Cannot convert
an object of type DataFrames.DataFrameColumns{DataFrames.DataFrame} to an object of type Array{Float64,1
Any suggestions about it?
The dataframe's columns of interest need to allow values of type Missing.
Ask someone who knows manipulating DataFrames .
Jeffrey Sarnoff said:
The dataframe's columns of interest need to allow values of type Missing.
Ask someone who knows manipulating DataFrames .
Hi Jeffrey,
Was able to solve it. There is a parameter in the CSV module and 'read'
method called missingstrings which I set to '---'. Problem and coding
averted!
great -- meanwhile you can improve this (if it help)
julia> using DataFrames
julia> a = [1.0, nothing, 3.0, 4.0];
julia> b= [1.0, nothing, 3.0, nothing];
julia> c= [1.0, 2.0, nothing, 4.0];
julia> df = DataFrame(a=a, b=b, c=c);
julia> nothing_is_missing(x) = map(y->(isnothing(y) ? missing : y), x);
julia> df2 = similar(df);
julia> for colidx in 1:size(df)[2]
df2[!, colidx] = nothing_is_missing(df[!, colidx])
end;
julia> df
4×3 DataFrame
Row │ a b c
│ Union… Union… Union…
─────┼────────────────────────
1 │ 1.0 1.0 1.0
2 │ 2.0
3 │ 3.0 3.0
4 │ 4.0 4.0
julia> df2
4×3 DataFrame
Row │ a b c
│ Float64? Float64? Float64?
─────┼─────────────────────────────────
1 │ 1.0 1.0 1.0
2 │ missing missing 2.0
3 │ 3.0 3.0 missing
4 │ 4.0 missing 4.0
Last updated: Nov 06 2024 at 04:40 UTC