Hello Julians:
Full Title: PosDefException - matrix is not positive definite; Cholesky factorization failed
Session: Pluto.jl
Julia Version: 1.6.3
Browser: Firefox
I am encountering the issue above when
I am building a logistic regression model
using the GLM package.
I was able to catch a reply from @tim.holy
that covered some functions that could
potentially address this issue HERE
I am not sure what to consider when
attempting to apply the different
PositiveFactorizations.jl functions.
My DataFrame has the structure/content
Teams = ["Jazz", "Heat", "Hawks"]
Rank = ["1st", "2nd", "3rd"]
Outcome = ["Win", "Loss"]
Season = DataFrame(Id = 1:50, Gate = rand(50:15:3000, 50),
Top3 = rand(Teams, 50),
Position = rand(Rank, 50),
Column = rand(Outcome .=="Win", 50))
I performed the _onehot function from:
begin
function _onehot(df,symb)
copy = df
for c in unique(copy[!,symb])
copy[!,Symbol(c)] = copy[!,symb] .== c
end
return(copy)
end
end
Then when I attempted perform the logistic regression build
via:
fm = @formula(Column~ Top3 + Position + Gate + Jazz + Heat + Hawks + 1st + 2nd + 3rd+
Win + Loss)
logit = glm(fm, train, Binomial(), Probit())
I am returning the error in the subject line. Is there a way to
manipulate this data frame so that the encoded columns are
processed during the LogReg model build?
###########Considerations############
I am following a tutorial by Kabir, from machinelearningplus,
created October 2020, for Logistic Regressions with Julia.
I understand that adding the encoded columns could
lead to multicollinearity (exploratory variable confounding)
issues BUT in the tutorial, the author was able to generate
an output. PERHAPS the design is not agreeable with you...
I have attempted to run the GLM LogReg model WITHOUT
encoding but am having an index mapping issue where
I am returning:
(Intercept)
Top3: Jazz
Top3: Heat
Position: 2nd
Position:3rd
Gate
You will notice that 'Top3' is missing 'Hawks' (at index 3)
and that 'Position' is missing '1st' (at index 1).
Am I missing something(s) here?
QuBit
Solution:
Do not fall for the 'Dummy Variable Trap'
Approaches (while build LogReg Model):
Tip:
Question online sources if they do not account
for 'Multi-collinearity' during regression builds.
Last updated: Nov 06 2024 at 04:40 UTC