using PyCall object in threads · helpdesk (published)

import PyCall

so = PyCall.pyimport("scipy.optimize")

function f(x)
    (x - 2) * x * (x + 2)^2
end

function do_optimize(fn)
    result = so.minimize_scalar(fn)
end

let
    out = Vector{Any}(undef, 20)
    Threads.@threads for i in 1:20
        println(i)
        out[i] = do_optimize(f)
    end
end

What should be the proper way to do this? If it is not possible, do I have to use multi-processing with Distributed instead?

Mason Protter (Jun 04 2021 at 05:13):

Python doesn't have mulithreading, so none of it's datastructures are threadsafe

Mason Protter (Jun 04 2021 at 05:14):

Spinning up multiple julia instances won't help I think, instead, you need to have multiple python instances that you're sending data to

Rein Zustand (Jun 04 2021 at 05:16):

I see. Looks like the only way to do it is to use Python's own multiprocessing module, which technically spawn independent processes (with identical setups)?

Mason Protter (Jun 04 2021 at 05:16):

Rein Zustand (Jun 04 2021 at 05:18):

Rein Zustand (Jun 04 2021 at 05:43):

I tried Python's multiprocessing, and still got segmentation fault, and caused my laptop to slow down. I managed to recover by pkill julia in a non-X terminal (ctrl-alt-F6).

import PyCall

so = PyCall.pyimport("scipy.optimize")
mp = PyCall.pyimport("multiprocessing")

function f(x)
    (x - 2) * x * (x + 2)^2
end

function do_optimize(fn)
    result = so.minimize_scalar(fn)
end

let
    procs = []
    for x in 1:20
        println(x)
        proc = mp.Process(target=f, args=())
        proc.start()
        push!(procs, proc)
    end
    for p in procs
        p.join()
    end
end

My hypothesis is that the proc object gets duplicated indefinitely when passed over to Julia?

Rein Zustand (Jun 04 2021 at 05:44):

Rein Zustand (Jun 04 2021 at 06:49):

EDIT: I forgot to add -t2 to the julia argument. The code below failed with segmentation fault as well!
~~This solution works:~~

import PyCall

so = PyCall.pyimport("scipy.optimize")

function f(x)
    (x - 2) * x * (x + 2)^2
end

let
    out = Vector{Any}(undef, 20)
    Threads.@threads for i in 1:20
        function do_optimize(fn)
            result = so.minimize_scalar(fn)
        end

        out[i] = do_optimize(f)
    end
    println(out)
end

Rein Zustand (Jun 04 2021 at 06:53):

EDIT: I forgot to add -t2 to the julia argument. The code below failed with segmentation fault as well!
~~This also works:~~

import PyCall

so = PyCall.pyimport("scipy.optimize")

function f(x)
    (x - 2) * x * (x + 2)^2
end

function do_optimize(fn)
    result = so.minimize_scalar(fn)
end

let
    out = Vector{Any}(undef, 20)
    Threads.@threads for i in 1:20
        _do_optimize = do_optimize

        out[i] = _do_optimize(f)
    end
    println(out)
end

Rein Zustand (Jun 04 2021 at 10:58):

I have tested that using Python multiprocessing, running a Python script that calls Julia functions also result in segfault. Looks like my only viable path is to either go full Julia or stay full Python. Too bad that the only Python function I need is the LBFGSB code from scipy.optimize.

Maarten (Jun 04 2021 at 12:02):

Rein Zustand (Jun 04 2021 at 12:11):

Maarten (Jun 04 2021 at 12:19):

Paulito Palmes (Jun 17 2021 at 02:12):

Paulito Palmes (Jun 17 2021 at 02:15):

Rein Zustand (Jun 17 2021 at 10:01):

@Paulito Palmes yes I confirm that @distributed with PyCall works. Though by now I have already used LBFGSB.jl.