Hello:
I implemented tryparse broadcast and coalesce methods
on my dataframe. Now, the missing cells read 'nothing' so
that the column no longer has a datatype.
I would like to convert the cells that read 'nothing' to 'NaN'
How might I achieve this?
A very similar question came up on Slack last night.
using DataFrames
a = [1.0, nothing, 3.0]
b = [nothing, 20.0, 30.0]
df = DataFrame(a = a, b = b)
# ok, now df is your data frame with `nothings`
# and you want them to be NaNs
function nan_all_nothings(x)
x[isnothing.(x)] .= NaN
return
end
nan_all_nothings(df.a)
nan_all_nothings(df.b)
julia> df
julia> df
3×2 DataFrame
Row │ a b
│ Union… Union…
─────┼────────────────
1 │ 1.0 NaN
2 │ NaN 20.0
3 │ 3.0 30.0
Jeffrey Sarnoff said:
A very similar question came up on Slack last night.
using DataFrames a = [1.0, nothing, 3.0] b = [nothing, 20.0, 30.0] df = DataFrame(a = a, b = b) # ok, now df is your data frame with `nothings` # and you want them to be NaNs function nan_all_nothings(x) x[isnothing.(x)] .= NaN return end nan_all_nothings(df.a) nan_all_nothings(df.b) julia> df julia> df 3×2 DataFrame Row │ a b │ Union… Union… ─────┼──────────────── 1 │ 1.0 NaN 2 │ NaN 20.0 3 │ 3.0 30.0
Good Day Jefrfrey,
This worked! I would like to stream-line the
code for this by applying the function you
created for eachcol(df) I attempted:
nan_for_nothings(df[!,7:11])
But am getting an index error. Any additional
tips?
Thanks again,
I guess this is related to your other question around NaNs as well - when working with DataFrames that have missing observations you should really be using missing instead of nothing or NaN. That's what missing is for, and why passmissing and skipmissing exist
Nils said:
I guess this is related to your other question around
NaNs as well - when working with DataFrames that have missing observations you should really be usingmissinginstead ofnothingorNaN. That's whatmissingis for, and whypassmissingandskipmissingexist
Hi Nils,
I was able to implement:
for col in eachcol(ED4)
replace!(col,NaN => 0)
end
This approach helped to address the NaN
fill issue I was having when I applied the
describe method to the dataframe.
Yes that works, but again missings would be more natural - they can be used with coalesce, which is exactly built for this use case.
@qu bit I agree with @Nils , use missing
nothing_is_missing(x) = x[isnothing.(x)] .= missing
The dataframe's columns of interest need to allow values of type Missing.
Ask someone who knows .. manipulating DataFrames .
map(nothing_is_missing, eachcol(df[!, 7:11]));
Jeffrey Sarnoff said:
qu bit I agree with Nils , use
missing
nothing_is_missing(x) = x[isnothing.(x)] .= missingThe dataframe's columns of interest need to allow values of type
Missing.
Ask someone who knows .. manipulating DataFrames .map(nothing_is_missing, eachcol(df[!, 7:11]));
Thank you Jeffrey. I attempted to apply your principle to converting
the same column set to Float64 using:
map(convert(DataFrame{Float64,1}, eachcol(ED4[!, 7:11])))
I am returning the following error as:
TypeError: in Type{...} expression, expected UnionAll, got Type{DataFrames.DataFrame}
If I use a Array{Float64,1} ... the error message returns:
MethodError: Cannot convert an object of type DataFrames.DataFrameColumns{DataFrames.DataFrame} to an object of type Array{Float64,1
Any suggestions about it?
The dataframe's columns of interest need to allow values of type Missing.
Ask someone who knows manipulating DataFrames .
Jeffrey Sarnoff said:
The dataframe's columns of interest need to allow values of type Missing.
Ask someone who knows manipulating DataFrames .
Hi Jeffrey,
Was able to solve it. There is a parameter in the CSV module and 'read'
method called missingstrings which I set to '---'. Problem and coding
averted!
great -- meanwhile you can improve this (if it help)
julia> using DataFrames
julia> a = [1.0, nothing, 3.0, 4.0];
julia> b= [1.0, nothing, 3.0, nothing];
julia> c= [1.0, 2.0, nothing, 4.0];
julia> df = DataFrame(a=a, b=b, c=c);
julia> nothing_is_missing(x) = map(y->(isnothing(y) ? missing : y), x);
julia> df2 = similar(df);
julia> for colidx in 1:size(df)[2]
df2[!, colidx] = nothing_is_missing(df[!, colidx])
end;
julia> df
4×3 DataFrame
Row │ a b c
│ Union… Union… Union…
─────┼────────────────────────
1 │ 1.0 1.0 1.0
2 │ 2.0
3 │ 3.0 3.0
4 │ 4.0 4.0
julia> df2
4×3 DataFrame
Row │ a b c
│ Float64? Float64? Float64?
─────┼─────────────────────────────────
1 │ 1.0 1.0 1.0
2 │ missing missing 2.0
3 │ 3.0 3.0 missing
4 │ 4.0 missing 4.0
Last updated: Oct 24 2025 at 04:41 UTC