Hello Everyone,
Might there be a more practical and workable
approach to normalizing all the elements of
a dataframe?
x = CookieMonster[1:15,:]
y = CookieMonster[16:16,:]
Currently, I have found two guidelines that both
result in an error.
1.
X = (x .- mean(x, dims = 2)) ./ std(x, dims = 2)
Error as: 'no method matching mean'
2.
function normalize(input_df::DataFrame, cols::Array{Int64})
norm_df = input_df
for i in cols
norm_df[i] = (input_df[i] - minimum(input_df[i])) /
(maximum(input_df[i]) - minimum(input_df[i]))
end
norm_df
end
Error as: 'no method matching normalize'
The normalize() method was around since 2015,
but I have not seen any updates to it.
I am programming in Pluto.
Any tips?
Are you using LinearAlgebra
which is needed for mean
?
I think mean
is from Statistics
, normalize
may be in LinearAlgebra
.
Oh, you're right. Sry
Daniel Karrasch
Thank you much -- when I attempt to qualify
with la.normalize(x), I am getting the following
error:
MethodError: no method matching normalize(::DataFrames.DataFrame)
Honestly, I'm not familiar with the specifics of the DataFrames
package. I think you should consult their documentation to see what manipulation methods they have. normalize
is defined in the stdlib LinearAlgebra
for some generic types, to which, apparently, DataFrame
doesn't subtype. So then it's no surprise that LinearAlgebra
doesn't define a normalize method for DataFrame
s. This function overload would be the task of DataFrames.jl
.
Daniel Karrasch
X = (x .- mean(x, dims = 2)) ./ std(x, dims = 2)
Is what I am attempting to achieve from the FluxML
example HERE
When I qualify the methods, I am returning the same
error. As you might suspect, normalizing a DF is not
really the be practice, however, perhaps what they
presented here was pseudo code, because I do not
think it can work, even when I attempted nesting the
eachrow() method.
Daniel Karrasch
Okay -- I attempted something like this:
x2 = (x .- mean(Array(x), dims = 2)) ./ std(Array(x), dims = 2)
What do you think?
That might work, but I wonder if there are more efficient ways that don't allocate the extra array.
Daniel Karrasch
So far I am seeing some methods in C++ and Java.
Am seeing tips on Stack OverFlow HERE
std::vector<int> vi; // if the number of int-s are dynamic
std::array<int, 50> ai; // if the number of int-s are fixed
Is there a problem for a DataFrame
x
when you call mean(x, dims=2)
or std(x, dims=2)
? Do you need this Array(x)
?
Daniel Karrasch
Yes -- without the Array() method, I get the error that:
no method matches mean(DataFrames.DataFrame)
Aha, that may be because it is common that DataFrames are non-numeric? What does DataFrames.jl
recommend for operations like that? That should be a common problem.
Daniel Karrasch
I am not seeing anything specific to this error message HERE
Recommended way to work with such operations described here: https://bkamins.github.io/julialang/2021/07/09/multicol.html
QuBit has marked this topic as resolved.
QuBit has marked this topic as unresolved.
QuBit has marked this topic as resolved.
Last updated: Nov 22 2024 at 04:41 UTC