Stream: helpdesk (published)

Topic: ✔ PosDefException: matrix is not positive definite; Chole...


view this post on Zulip QuBit (Jul 30 2021 at 13:29):

Hello Julians:

Full Title: PosDefException - matrix is not positive definite; Cholesky factorization failed

Session: Pluto.jl
Julia Version: 1.6.3
Browser: Firefox

I am encountering the issue above when
I am building a logistic regression model
using the GLM package.

I was able to catch a reply from @tim.holy
that covered some functions that could
potentially address this issue HERE

I am not sure what to consider when
attempting to apply the different
PositiveFactorizations.jl functions.

My DataFrame has the structure/content

Teams = ["Jazz", "Heat", "Hawks"]
Rank = ["1st", "2nd", "3rd"]
Outcome = ["Win", "Loss"]

Season = DataFrame(Id = 1:50, Gate = rand(50:15:3000, 50),
                                  Top3 = rand(Teams, 50),
                                   Position = rand(Rank, 50),
                                   Column = rand(Outcome .=="Win", 50))

I performed the _onehot function from:

begin
function _onehot(df,symb)
        copy = df
        for c in unique(copy[!,symb])
            copy[!,Symbol(c)] = copy[!,symb] .== c
        end
        return(copy)
    end
end

Then when I attempted perform the logistic regression build
via:

fm = @formula(Column~ Top3 + Position + Gate + Jazz + Heat + Hawks + 1st + 2nd + 3rd+
                         Win + Loss)
logit = glm(fm, train, Binomial(), Probit())

I am returning the error in the subject line. Is there a way to
manipulate this data frame so that the encoded columns are
processed during the LogReg model build?

view this post on Zulip QuBit (Jul 30 2021 at 13:34):

###########Considerations############

  1. I am following a tutorial by Kabir, from machinelearningplus,
    created October 2020, for Logistic Regressions with Julia.

  2. I understand that adding the encoded columns could
    lead to multicollinearity (exploratory variable confounding)
    issues BUT in the tutorial, the author was able to generate
    an output. PERHAPS the design is not agreeable with you...

  3. I have attempted to run the GLM LogReg model WITHOUT
    encoding but am having an index mapping issue where

I am returning:

(Intercept)
Top3: Jazz
Top3: Heat
Position: 2nd
Position:3rd
Gate

You will notice that 'Top3' is missing 'Hawks' (at index 3)
and that 'Position' is missing '1st' (at index 1).

Am I missing something(s) here?

view this post on Zulip QuBit (Jul 30 2021 at 17:47):

QuBit

Solution:
Do not fall for the 'Dummy Variable Trap'

Approaches (while build LogReg Model):

  1. Drop one of the groups (for one of your categorical attributes)
  2. Drop the intercept

Tip:
Question online sources if they do not account
for 'Multi-collinearity' during regression builds.


Last updated: Nov 06 2024 at 04:40 UTC