Stream: helpdesk (published)

Topic: Multiple dispatch API design conundrum


view this post on Zulip Nils (Sep 18 2024 at 07:35):

I posted this on Slack helpdesk yesterday but I think Zulip lends itself better to slightly more open conceptual questions and I'll try and simplify the question (at the risk of creating an X-Y problem):

I'm trying to leverage multiple dispatch to write a hierarchy of functions for a package of mine (TreatmentPanels.jl) with the ultimate goal of providing a sort of "mini-DSL" to the user.

Basically the user can specify treatment units (identified by a String/Symbol identifier) and time periods in which they are treated (either TimeType or Int). I'm trying to write a dispatch hierarchy which breaks up more complex combinations of units and time periods into simple one unit -> one time period components with which I then call my "kernel" function.

This all works reasonably well but I'm hitting a wall with the most general case which is this:

julia> typeof(["b" => Date(1991), "d" => Date(1991) => Date(1992)])
Vector{Pair{String, Any}}

so if there are different units with some of them being assigned only one treatment period (which in my mini-DSL means "all periods after this date") and others a start/end range, type inference taps out and turns the right hand side of my Pairs into Any. That means my whole dispatch house of cards falls down, as I can't define a "fallback" method for Any on the right hand side because I need to distinguish between the Date(1991) and Date(1991) => Date(1992) case.

What do I do? Is this just a bad design/idea in general? Or should I fall back to if right_hand_side isa Date ... "manual" dispatch?

view this post on Zulip Mason Protter (Sep 18 2024 at 13:28):

yeah that's a interesting problem. So basically you want a single date and a date range to be the same sort of object right?

view this post on Zulip Mason Protter (Sep 18 2024 at 13:29):

Does this hierarchy need to be extensible?

view this post on Zulip Expanding Man (Sep 18 2024 at 13:35):

I don't feel like I really understand what you're trying to do here, but the first thing that comes to mind is that you probably simply can't do anything like that with array construction syntax. Arrays will be forced to broaden the type and of course you will wind up with Any, that's just how arrays work in Julia. If you want to use array construction syntax you'll have to pipe stuff through functions which decide on the appropriate types first. At most you can use a macro to make this look like array construction syntax.

view this post on Zulip Nils (Sep 18 2024 at 14:08):

Yes I think ultimately my issue boils down to

julia> [1=>2, 1]
2-element Vector{Any}:
 1 => 2
   1

when I need

julia> Union{Pair{Int,Int}, Int}[1=>2, 1]
2-element Vector{Union{Int64, Pair{Int64, Int64}}}:
 1 => 2
   1

view this post on Zulip Nils (Sep 18 2024 at 14:09):

Because, Mason to your question, the underlying function I want to dispatch to works differently for String => Date vs String => Pair{Date, Date}

view this post on Zulip Nils (Sep 18 2024 at 14:10):

But maybe I just need to change the API to have multiple arguments for the multiple unit case.

view this post on Zulip Nils (Sep 18 2024 at 14:12):

And maybe I should have been clearer: my "mini-DSL" is supposed to allow the user to specify:

"A" => Date(2000) # single unit treated for all periods from 2000 onwards
"A" => Date(2000) => Date(2005) # single unit treated from 2000-2005
"A" => [Date(2000) => Date(2005), Date(2008) => Date(2010), Date(2015) # single unit treated from 2000-2005, 2008-2010, and all periods from 2015 onwards
["A", "B"] => Date(2000) # two units treated from 2000 onwards

view this post on Zulip Expanding Man (Sep 18 2024 at 14:13):

You can't make that type stable (or even promote to the narrowest union type) using array construction syntax. You either have to use a tuple, make them arguments, or make a macro.

view this post on Zulip Nils (Sep 18 2024 at 14:14):

But currently I write the "multiple units, different treatment timings" case as:

["A" => [Date(2000) => Date(2005), Date(2008)], "B" => [Date(2001) => Date(2006), Date(2010)]]

view this post on Zulip Expanding Man (Sep 18 2024 at 14:15):

You may also be interested in IntervalSets.jl which has nice interval types with the convenient .. syntax.

view this post on Zulip Nils (Sep 18 2024 at 14:16):

Yes one can quarrel about the "represent from-to as a pair of dates", but the issue is the same if I use date ranges. I'll think about whether Intervals make sense (with dates things can get a bit tricky because often the interval needs to be specified (monthly, daily, weekly...) and so currently I just use start and end points and capture any dates in between, irrespective at what intervals)

view this post on Zulip Nils (Sep 18 2024 at 14:28):

I'll probably have to consider writing the multiple unit case as separate arguments. My function currently takes an input matrix which it mutates and then the unit/time period specification as the second argument.

I suppose if I change

my_fun(X, ["A" => [2000 => 2005, 2008], "B" => [2001 => 2006, 2010]])

which doesn't work because of the resulting Vector{Any} to

my_fun(X, "A" => [2000 => 2005, 2008], "B" => [2001 => 2006, 2010])

I can get a way with it, just not really sure how to write that (and whether slurping those multiple args will give me the same problem?)

view this post on Zulip Expanding Man (Sep 18 2024 at 14:32):

No, that will not have the same problem, that will result in a Tuple which has type slots for each component. The only problem with that is if you have a very large number of arguments it can get very expensive to compile. You can also use a tuple if you'd like, it is equivalent.

view this post on Zulip Expanding Man (Sep 18 2024 at 14:33):

But you still have arrays which mix Pair{Int,Int} with Int, I'd change those to a tuple as well. They seem like they are guaranteed short, so there should be no problem using a tuple there.

view this post on Zulip Expanding Man (Sep 18 2024 at 14:34):

For example

julia> ("A" => (2000=>2005, 2008), "B" => (2001=>2006, 2010)) |> typeof
Tuple{Pair{String, Tuple{Pair{Int64, Int64}, Int64}}, Pair{String, Tuple{Pair{Int64, Int64}, Int64}}}

view this post on Zulip Expanding Man (Sep 18 2024 at 14:37):

There are annoyingly many foot-guns when it comes to actually iterating over tuples in Julia however. That's potentially a reason to convert them to (properly typed) arrays later, but I'm still not really clear on what you're trying to do so I'm not sure if that's appropriate here.

view this post on Zulip Nils (Sep 18 2024 at 14:38):

I suppose an issue with the tuples is that in my use case, functionally [2000=>2005, 2008] and [2000=>2007, 2009=>2010, 2012] are equivalent, so dispatching on a Vector{Union{Date, Pair{Date, Date}}} is a useful thing to do, while in the tuple design those two things would have a different type?

view this post on Zulip Expanding Man (Sep 18 2024 at 14:43):

Yes they would be different, so you'd probably need some kind of post-processing step.

It's starting to sound to me a bit like what you want is to create an appropriate struct for the arguments. There are still potential type promotion issues with it, but if you can create one with appropriate union types you can still pull it off

struct Arg
    name::String
    inner::Vector{Union{Date,Pair{Date,Date}}}
end

or whatever (I think that's probably not quite what you're looking for but might give you an idea). You could then make your argument a Vector{Arg} or whatever it is you want.

view this post on Zulip Simone Carlo Surace (Dec 19 2024 at 07:44):

This is coming late but I have a similar problem with time tables and I went with representing them all as structs with two fields which are a Union{T, Nothing} but at construction are enforced to have at most one field a Nothing. It worked quite well in my use case.


Last updated: Dec 28 2024 at 04:38 UTC