I've tried out Aqua.jl for the first time on a package I sometimes contribute to (FixedEffectModels
) and ran into a few questions that hopefully someone can help with:
[deps]
aren't in alphabetical order? Is that right? Do they have to be? It would be great (although maybe non-trivial) if Aqua could show the diff for what exactly made the test failtest_all
it seems that the process stops once any error is detected, is there way to keep going other than setting each failing test to false
one by one?I will post some results of the ambiguity tests in the next message, depending on how things go those can be a different thread.
Some ambiguity warnings I don't understand:
Ambiguity #1
==(a::Integer, b::SentinelArrays.ChainedVectorIndex) @ SentinelArrays C:\Users\ngudat\.julia\packages\SentinelArrays\1kRo4\src\chainedvector.jl:208
==(x::BigInt, i::Integer) @ Base.GMP gmp.jl:722
Possible fix, define
==(::BigInt, ::SentinelArrays.ChainedVectorIndex)
I assume this is because
julia> FixedEffectModels.DataFrames.SentinelArrays.ChainedVectorIndex <: Integer
true
and
julia> BigInt <: Integer
true
Does this mean that SentinelArrays
needs to define ==
for all subtypes of Integer
in Base? And that there would be invalidation if another package that defines a <:Integer
type gets loaded?
Then there is:
StatsBase.TestStat(v) @ StatsBase C:\Users\ngudat\.julia\packages\StatsBase\WLz8A\src\statmodels.jl:80
(::Type{T})(x::AbstractChar) where T<:Union{AbstractChar, Number} @ Base char.jl:50
Possible fix, define
StatsBase.TestStat(::AbstractChar)
So the StatsBase.TestStat
method is completely unconstrained, and would presumably always be less specific? I don't understand the ambiguity here at all. There's a similar one
StatsBase.TestStat(v) @ StatsBase C:\Users\ngudat\.julia\packages\StatsBase\WLz8A\src\statmodels.jl:80
(::Type{T})(z::Complex) where T<:Real @ Base complex.jl:44
Possible fix, define
StatsBase.TestStat(::Complex)
I don't really understand most of the reported ambiguity but I think understanding those two will get me there for most of the others, so I'll leave it here.
Not familiar with StatsBase
, but my gut tells me TestStat <: Real
, and hence there's an ambiguity
yup, there it is
[sukera@tower StatsBase.jl]$ rg TestStat
src/statmodels.jl
79:struct TestStat <: Real
the ambiguity comes in when you try to call e.g. TestStat(Complex(4.5))
:
julia> StatsBase.TestStat(Complex(4.5))
ERROR: MethodError: StatsBase.TestStat(::ComplexF64) is ambiguous.
Candidates:
StatsBase.TestStat(v)
@ StatsBase ~/.julia/packages/StatsBase/WLz8A/src/statmodels.jl:80
(::Type{T})(z::Complex) where T<:Real
@ Base complex.jl:44
Possible fix, define
StatsBase.TestStat(::Complex)
Stacktrace:
[1] top-level scope
@ REPL[5]:1
whether that makes sense to do in the first place, I don't know :shrug:
oh the ambiguity comes from a default constructor :thinking:
Ah people have reported this already: https://github.com/JuliaStats/StatsBase.jl/issues/861 and there's a PR to fix it: https://github.com/JuliaStats/StatsBase.jl/pull/866 but no consensus on how or even whether to do this. People seem to be uncertain about whether these ambiguities should be resolved at all if there are no reported problems with them - what's the situation here?
It feels to me that if there's no real need to address these than that test in Aqua becomes pretty meaningless for any non-trivial package as it will report dozens of ambiguities from dependencies.
IMO TestStat
just shouldn't <: Real
:shrug:
Yes that's an option that the maintainers seem to consider, but more widely how worried should one be about ambiguities? I was under the impression that they introduce latency issues and therefore it would be an ecosystem win if everyone tidied them up in their own packages?
https://discourse.julialang.org/t/aqua-jl-finds-many-ambiguities-in-core-base/103963?u=sukera
an ambiguity that isn't ever a call candidate is not bad IMO
that said, I think having TestStat <: Real
still doesn't make sense, there is no +
or *
defined either
and Real
certainly doesn't mean "partially ordered"
Are you agreeing with Tim Holy there that I should be calling Test.detect_ambiguities
and ignore the Aqua ambiguity tests?
yeah
That's a bit unfortunate then that one of the core Julia packages meant to improve code quality and standards across the ecosystem starts off on a meaningless test that should be ignored :D
Next one (maybe I should have one thread per test?) is this:
julia> Aqua.test_all(FixedEffectModels; ambiguities=false)
Test Summary: |Time
Method ambiguity | None 0.0s
Unbound type parameters detected:
[1] FixedEffectModels.Combination(A::AbstractVecOrMat{T}...) where T @ FixedEffectModels C:\Users\ngudat\.julia\packages\FixedEffectModels\1EZ6p\src\utils\basecol.jl:12
What's the problem with unbound type parameters? How bad is it? What should be done about it?
(I have read the Aqua docs and the h(x)
example in this section but I don't feel confident enough in my understanding to change the package based on this)
that parameter is unbound if you call FixedEffectModels.Combination()
i.e., in the zero-arg method
in this case, it indicates that you assume the vararg means "at least one of this type", but in reality it means "0 or more of this type"
Ok I'm not sure I see how a parameters that doesn't exist can be unbounded, but I'll accept it. Is this a real problem or can it be ignored? If it is a real problem, is the fix to define the zero-arg method implicitly?
and if there are no arguments passed, the T
isn't bound to anything, since there are no arguments it could be bound from
how to fix it depends on the rest of your code - does it make sense to call this with 0 arguments?
to illustrate the issue:
julia> f(x::Vector{T}...) where T = T
f (generic function with 1 method)
julia> f()
ERROR: UndefVarError: `T` not defined
Stacktrace:
[1] f()
@ Main ./REPL[1]:1
[2] top-level scope
@ REPL[2]:1
I can't see how - this is a package for regression modelling where Combination
is the combination of two or more variables in the data set. Arguably even a one-argument version doesn't make sense.
then I'd write this as f(a, b, c...)
and combine the 2+ arguments accordingly in the method
But presumably changing to Combination(x::AbstractVecOrMat{T}, y::AbstractVecOrMat{S}...) where T where S
would have the same problem?
no, there T
is always bound, because the T
comes from the first argument
an "unbound type variable" means that there is no type that could possibly be assigned, not even Any
, because there is no place the type can come from - and it needs to come from somewhere
But wouldn't the S
be unbound there if it's called with one arg?
presumably you'd use the same type parameter for both of these, no?
that would match your existing method
No, a combination could be a vector of strings and a vector of floats
in the example with S
, yes that would be unbound too in the 1-arg case
But you're right that must be a different method currently
this is a bit similar to how all(<any predicate you want>, []) == true
, because there is no counterexample that you can produce from []
Regarding Project.toml
, that test was removed recently: https://github.com/JuliaTesting/Aqua.jl/issues/208 https://github.com/JuliaTesting/Aqua.jl/pull/209
I don't really understand the unbound args error when looking at the package code. The relevant code is here:
struct Combination{T} <: AbstractMatrix{T}
A::Tuple
cumlength::Vector{Int}
end
function Combination(A::Union{AbstractVector{T}, AbstractMatrix{T}}...) where {T}
Combination{T}(A, cumsum([size(x, 2) for x in A]))
end
So that already is the two arg version (indeed with the same type parameter as suggested), where does the error come from?
that is a 0+ arg version
A
is typed Union
Ah I misread, sorry!
Would a workaround be to just define a 0-arg version explicitly or is that just ambiguous?
that would override this method
or rather, the 0-arg method defined there
it should work, but again, it depends on your type whether that makes sense
Yes I mean just doing
Combination() = error("`Combination` needs to have at least one argument")
or something like that
What is the issue with these unbound type parameters in practice? Is this something I should be worried about or another Aqua test that can be ignored?
it can't be ignored, since you use T
in the method
function Combination() where {T}
Combination{T}(A, cumsum([size(x, 2) for x in A]))
end
This is what the compiler sees for the 0-arg method
what should T
be in this case?
mind you, there is no A
to get that T
from
so sooner or later, this code _will_ error
(probably sooner)
the issue is, the compiler can't know when this happens - so it must lug that T
around, inserting UndefVarError
s everywhere that T
is used
that's a lot of work that could be prevented from happening by always making sure the T
is bound to something
And this compiler work happens even though Combination()
will never be called anywhere, ever?
So I've confirmed that both:
function Combination(A::Union{AbstractVector{T}, AbstractMatrix{T}},
B::Union{AbstractVector{T}, AbstractMatrix{T}}...) where {T}
Combination{T}((A, B...), cumsum([size(A, 2); [size(x, 2) for x in B]]))
end
and leaving the defintion untouched but doing:
Combination() = error("`Combination` requires at least one argument")
make the Aqua test pass. Is there one thing that's preferable from a compiler/latency perspective (assuming that I'm ambivalent between them from a "what makes sense for this use case" perspective)?
no, both are fine
one results in your custom error message, the other results in a MethodError
from a UX perspective, the custom error message is preferable
Last updated: Nov 06 2024 at 04:40 UTC