Stream: helpdesk (published)

Topic: Aqua.jl usage questions


view this post on Zulip Nils (Nov 09 2023 at 14:03):

I've tried out Aqua.jl for the first time on a package I sometimes contribute to (FixedEffectModels) and ran into a few questions that hopefully someone can help with:

I will post some results of the ambiguity tests in the next message, depending on how things go those can be a different thread.

view this post on Zulip Nils (Nov 09 2023 at 14:12):

Some ambiguity warnings I don't understand:

Ambiguity #1
==(a::Integer, b::SentinelArrays.ChainedVectorIndex) @ SentinelArrays C:\Users\ngudat\.julia\packages\SentinelArrays\1kRo4\src\chainedvector.jl:208
==(x::BigInt, i::Integer) @ Base.GMP gmp.jl:722

Possible fix, define
  ==(::BigInt, ::SentinelArrays.ChainedVectorIndex)

I assume this is because

julia> FixedEffectModels.DataFrames.SentinelArrays.ChainedVectorIndex <: Integer
true

and

julia> BigInt <: Integer
true

Does this mean that SentinelArrays needs to define == for all subtypes of Integer in Base? And that there would be invalidation if another package that defines a <:Integer type gets loaded?

Then there is:

StatsBase.TestStat(v) @ StatsBase C:\Users\ngudat\.julia\packages\StatsBase\WLz8A\src\statmodels.jl:80
(::Type{T})(x::AbstractChar) where T<:Union{AbstractChar, Number} @ Base char.jl:50

Possible fix, define
  StatsBase.TestStat(::AbstractChar)

So the StatsBase.TestStat method is completely unconstrained, and would presumably always be less specific? I don't understand the ambiguity here at all. There's a similar one

StatsBase.TestStat(v) @ StatsBase C:\Users\ngudat\.julia\packages\StatsBase\WLz8A\src\statmodels.jl:80
(::Type{T})(z::Complex) where T<:Real @ Base complex.jl:44

Possible fix, define
  StatsBase.TestStat(::Complex)

I don't really understand most of the reported ambiguity but I think understanding those two will get me there for most of the others, so I'll leave it here.

view this post on Zulip Sukera (Nov 09 2023 at 14:14):

Not familiar with StatsBase, but my gut tells me TestStat <: Real, and hence there's an ambiguity

view this post on Zulip Sukera (Nov 09 2023 at 14:15):

yup, there it is

[sukera@tower StatsBase.jl]$ rg TestStat
src/statmodels.jl
79:struct TestStat <: Real

view this post on Zulip Sukera (Nov 09 2023 at 14:18):

the ambiguity comes in when you try to call e.g. TestStat(Complex(4.5)):

julia> StatsBase.TestStat(Complex(4.5))
ERROR: MethodError: StatsBase.TestStat(::ComplexF64) is ambiguous.

Candidates:
  StatsBase.TestStat(v)
    @ StatsBase ~/.julia/packages/StatsBase/WLz8A/src/statmodels.jl:80
  (::Type{T})(z::Complex) where T<:Real
    @ Base complex.jl:44

Possible fix, define
  StatsBase.TestStat(::Complex)

Stacktrace:
 [1] top-level scope
   @ REPL[5]:1

view this post on Zulip Sukera (Nov 09 2023 at 14:18):

whether that makes sense to do in the first place, I don't know :shrug:

view this post on Zulip Sukera (Nov 09 2023 at 14:20):

oh the ambiguity comes from a default constructor :thinking:

view this post on Zulip Nils (Nov 09 2023 at 14:26):

Ah people have reported this already: https://github.com/JuliaStats/StatsBase.jl/issues/861 and there's a PR to fix it: https://github.com/JuliaStats/StatsBase.jl/pull/866 but no consensus on how or even whether to do this. People seem to be uncertain about whether these ambiguities should be resolved at all if there are no reported problems with them - what's the situation here?

It feels to me that if there's no real need to address these than that test in Aqua becomes pretty meaningless for any non-trivial package as it will report dozens of ambiguities from dependencies.

view this post on Zulip Sukera (Nov 09 2023 at 14:27):

IMO TestStat just shouldn't <: Real :shrug:

view this post on Zulip Nils (Nov 09 2023 at 14:30):

Yes that's an option that the maintainers seem to consider, but more widely how worried should one be about ambiguities? I was under the impression that they introduce latency issues and therefore it would be an ecosystem win if everyone tidied them up in their own packages?

view this post on Zulip Sukera (Nov 09 2023 at 14:32):

https://discourse.julialang.org/t/aqua-jl-finds-many-ambiguities-in-core-base/103963?u=sukera

view this post on Zulip Sukera (Nov 09 2023 at 14:32):

an ambiguity that isn't ever a call candidate is not bad IMO

view this post on Zulip Sukera (Nov 09 2023 at 14:33):

that said, I think having TestStat <: Real still doesn't make sense, there is no + or * defined either

view this post on Zulip Sukera (Nov 09 2023 at 14:34):

and Real certainly doesn't mean "partially ordered"

view this post on Zulip Nils (Nov 09 2023 at 14:35):

Are you agreeing with Tim Holy there that I should be calling Test.detect_ambiguities and ignore the Aqua ambiguity tests?

view this post on Zulip Sukera (Nov 09 2023 at 14:35):

yeah

view this post on Zulip Nils (Nov 09 2023 at 14:35):

That's a bit unfortunate then that one of the core Julia packages meant to improve code quality and standards across the ecosystem starts off on a meaningless test that should be ignored :D

view this post on Zulip Nils (Nov 09 2023 at 14:36):

Next one (maybe I should have one thread per test?) is this:

julia> Aqua.test_all(FixedEffectModels; ambiguities=false)
Test Summary:    |Time
Method ambiguity | None  0.0s
Unbound type parameters detected:
[1] FixedEffectModels.Combination(A::AbstractVecOrMat{T}...) where T @ FixedEffectModels C:\Users\ngudat\.julia\packages\FixedEffectModels\1EZ6p\src\utils\basecol.jl:12

What's the problem with unbound type parameters? How bad is it? What should be done about it?

view this post on Zulip Nils (Nov 09 2023 at 14:37):

(I have read the Aqua docs and the h(x) example in this section but I don't feel confident enough in my understanding to change the package based on this)

view this post on Zulip Sukera (Nov 09 2023 at 14:38):

that parameter is unbound if you call FixedEffectModels.Combination()

view this post on Zulip Sukera (Nov 09 2023 at 14:38):

i.e., in the zero-arg method

view this post on Zulip Sukera (Nov 09 2023 at 14:39):

in this case, it indicates that you assume the vararg means "at least one of this type", but in reality it means "0 or more of this type"

view this post on Zulip Nils (Nov 09 2023 at 14:40):

Ok I'm not sure I see how a parameters that doesn't exist can be unbounded, but I'll accept it. Is this a real problem or can it be ignored? If it is a real problem, is the fix to define the zero-arg method implicitly?

view this post on Zulip Sukera (Nov 09 2023 at 14:40):

and if there are no arguments passed, the T isn't bound to anything, since there are no arguments it could be bound from

view this post on Zulip Sukera (Nov 09 2023 at 14:40):

how to fix it depends on the rest of your code - does it make sense to call this with 0 arguments?

view this post on Zulip Sukera (Nov 09 2023 at 14:41):

to illustrate the issue:

julia> f(x::Vector{T}...) where T = T
f (generic function with 1 method)

julia> f()
ERROR: UndefVarError: `T` not defined
Stacktrace:
 [1] f()
   @ Main ./REPL[1]:1
 [2] top-level scope
   @ REPL[2]:1

view this post on Zulip Nils (Nov 09 2023 at 14:42):

I can't see how - this is a package for regression modelling where Combination is the combination of two or more variables in the data set. Arguably even a one-argument version doesn't make sense.

view this post on Zulip Sukera (Nov 09 2023 at 14:42):

then I'd write this as f(a, b, c...) and combine the 2+ arguments accordingly in the method

view this post on Zulip Nils (Nov 09 2023 at 14:42):

But presumably changing to Combination(x::AbstractVecOrMat{T}, y::AbstractVecOrMat{S}...) where T where S would have the same problem?

view this post on Zulip Sukera (Nov 09 2023 at 14:43):

no, there T is always bound, because the T comes from the first argument

view this post on Zulip Sukera (Nov 09 2023 at 14:43):

an "unbound type variable" means that there is no type that could possibly be assigned, not even Any, because there is no place the type can come from - and it needs to come from somewhere

view this post on Zulip Nils (Nov 09 2023 at 14:44):

But wouldn't the S be unbound there if it's called with one arg?

view this post on Zulip Sukera (Nov 09 2023 at 14:44):

presumably you'd use the same type parameter for both of these, no?

view this post on Zulip Sukera (Nov 09 2023 at 14:45):

that would match your existing method

view this post on Zulip Nils (Nov 09 2023 at 14:45):

No, a combination could be a vector of strings and a vector of floats

view this post on Zulip Sukera (Nov 09 2023 at 14:45):

in the example with S, yes that would be unbound too in the 1-arg case

view this post on Zulip Nils (Nov 09 2023 at 14:45):

But you're right that must be a different method currently

view this post on Zulip Sukera (Nov 09 2023 at 14:46):

this is a bit similar to how all(<any predicate you want>, []) == true, because there is no counterexample that you can produce from []

view this post on Zulip mbaz (Nov 09 2023 at 14:47):

Regarding Project.toml, that test was removed recently: https://github.com/JuliaTesting/Aqua.jl/issues/208 https://github.com/JuliaTesting/Aqua.jl/pull/209

view this post on Zulip Nils (Nov 09 2023 at 15:11):

I don't really understand the unbound args error when looking at the package code. The relevant code is here:

struct Combination{T} <: AbstractMatrix{T}
    A::Tuple
    cumlength::Vector{Int}
end

function Combination(A::Union{AbstractVector{T}, AbstractMatrix{T}}...) where {T}
    Combination{T}(A, cumsum([size(x, 2) for x in A]))
end

So that already is the two arg version (indeed with the same type parameter as suggested), where does the error come from?

view this post on Zulip Sukera (Nov 09 2023 at 15:23):

that is a 0+ arg version

view this post on Zulip Sukera (Nov 09 2023 at 15:23):

A is typed Union

view this post on Zulip Nils (Nov 09 2023 at 15:25):

Ah I misread, sorry!

view this post on Zulip Nils (Nov 09 2023 at 15:27):

Would a workaround be to just define a 0-arg version explicitly or is that just ambiguous?

view this post on Zulip Sukera (Nov 09 2023 at 15:29):

that would override this method

view this post on Zulip Sukera (Nov 09 2023 at 15:29):

or rather, the 0-arg method defined there

view this post on Zulip Sukera (Nov 09 2023 at 15:30):

it should work, but again, it depends on your type whether that makes sense

view this post on Zulip Nils (Nov 09 2023 at 15:30):

Yes I mean just doing

Combination() = error("`Combination` needs to have at least one argument")

or something like that

view this post on Zulip Nils (Nov 09 2023 at 15:31):

What is the issue with these unbound type parameters in practice? Is this something I should be worried about or another Aqua test that can be ignored?

view this post on Zulip Sukera (Nov 09 2023 at 15:32):

it can't be ignored, since you use T in the method

view this post on Zulip Sukera (Nov 09 2023 at 15:32):

function Combination() where {T}
    Combination{T}(A, cumsum([size(x, 2) for x in A]))
end

This is what the compiler sees for the 0-arg method

view this post on Zulip Sukera (Nov 09 2023 at 15:32):

what should T be in this case?

view this post on Zulip Sukera (Nov 09 2023 at 15:33):

mind you, there is no A to get that T from

view this post on Zulip Sukera (Nov 09 2023 at 15:33):

so sooner or later, this code _will_ error

view this post on Zulip Sukera (Nov 09 2023 at 15:33):

(probably sooner)

view this post on Zulip Sukera (Nov 09 2023 at 15:34):

the issue is, the compiler can't know when this happens - so it must lug that T around, inserting UndefVarErrors everywhere that T is used

view this post on Zulip Sukera (Nov 09 2023 at 15:34):

that's a lot of work that could be prevented from happening by always making sure the T is bound to something

view this post on Zulip Nils (Nov 09 2023 at 15:38):

And this compiler work happens even though Combination() will never be called anywhere, ever?

view this post on Zulip Nils (Nov 09 2023 at 15:44):

So I've confirmed that both:

function Combination(A::Union{AbstractVector{T}, AbstractMatrix{T}},
    B::Union{AbstractVector{T}, AbstractMatrix{T}}...) where {T}
    Combination{T}((A, B...), cumsum([size(A, 2); [size(x, 2) for x in B]]))
end

and leaving the defintion untouched but doing:

Combination() = error("`Combination` requires at least one argument")

make the Aqua test pass. Is there one thing that's preferable from a compiler/latency perspective (assuming that I'm ambivalent between them from a "what makes sense for this use case" perspective)?

view this post on Zulip Sukera (Nov 09 2023 at 15:45):

no, both are fine

view this post on Zulip Sukera (Nov 09 2023 at 15:45):

one results in your custom error message, the other results in a MethodError

view this post on Zulip Sukera (Nov 09 2023 at 15:46):

from a UX perspective, the custom error message is preferable


Last updated: Nov 06 2024 at 04:40 UTC