PosDefException: matrix is not positive definite; Cholesky f · helpdesk (published)

Stream: helpdesk (published)

Topic: PosDefException: matrix is not positive definite; Cholesky f

QuBit (Jul 30 2021 at 13:29):

Hello Julians:

Full Title: PosDefException - matrix is not positive definite; Cholesky factorization failed

Session: Pluto.jl
Julia Version: 1.6.3
Browser: Firefox

I am encountering the issue above when
I am building a logistic regression model
using the GLM package.

I was able to catch a reply from @tim.holy
that covered some functions that could
potentially address this issue HERE

I am not sure what to consider when
attempting to apply the different
PositiveFactorizations.jl functions.

My DataFrame has the structure/content

Teams = ["Jazz", "Heat", "Hawks"]
Rank = ["1st", "2nd", "3rd"]
Outcome = ["Win", "Loss"]

Season = DataFrame(Id = 1:50, Gate = rand(50:15:3000),
                                  Top3 = rand(Teams, 50),
                                   Position = rand(Rank, 50),
                                   Column = rand(Outcome .=="Win", 50))

I performed the _onehot function from:

begin
function _onehot(df,symb)
        copy = df
        for c in unique(copy[!,symb])
            copy[!,Symbol(c)] = copy[!,symb] .== c
        end
        return(copy)
    end
end

Then when I attempted perform the logistic regression build
via:

fm = @formula(Column~ Top3 + Position + Gate + Jazz + Heat + Hawks + 1st + 2nd + 3rd+
                         Win + Loss)
logit = glm(fm, train, Binomial(), Probit())

I am returning the error in the subject line. Is there a way to
manipulate this data frame so that the encoded columns are
processed during the LogReg model build?

QuBit (Jul 30 2021 at 13:34):

###########Considerations############

rand(50:15:3000), entails 50 start, 15 step, and 3000
end. For me generates a random array of values in that
range
I am following a tutorial by Kabir, from machinelearningplus,
created October 2020, for Logistic Regressions with Julia.
I understand that adding the encoded columns could
lead to multicollinearity (exploratory variable confounding)
issues BUT in the tutorial, the author was able to generate
an output. PERHAPS the design is not agreeable with you...
Assume that I have read Bogumil DataFrames Tutotrials
and have a solid understanding of Statistics (particularly
Regressions), I simply need a practical implementation
or some one-to-one resources here.
I have attempted to run the GLM LogReg model WITHOUT
encoding but am having an index mapping issue where

I am returning:

(Intercept)
Top3: Jazz
Top3: Heat
Position: 2nd
Position:3rd
Gate

You will notice that 'Top3' is missing 'Hawks' (at index 3)
and that 'Position' is missing '1st' (at index 1).

Am I missing something(s) here?

Last updated: Oct 02 2023 at 04:34 UTC