Stream: helpdesk (published)

Topic: Check if file used by multiple processes.


view this post on Zulip Nathan Zimmerberg (Nov 04 2022 at 19:02):

Hello,

I am trying to automate the running many multiweek long simulations. I want these simulations to be restartable after a reboot, so I am storing checkpoints and logs to an output directory per job.

Normally each simulation job should only have one julia process running it at a time, however, I would like to automatically detect if there are multiple processes trying to output to the same output directory at the same time.

I'm not sure if this is a good idea but currently I am trying to have each process constantly append a byte to a file, then check if the file size is as expected. If two processes are running with the same output directory, the file size will be larger than expected.

I initially tried this with open("file_name", "a") but this didn't work, and multiple processes just overwrote each others data.

So now I am using Base.Filesystem.open like so:

function main()
    flags = Base.Filesystem.JL_O_APPEND | Base.Filesystem.JL_O_CREAT | Base.Filesystem.JL_O_WRONLY
    perm = Base.Filesystem.S_IROTH | Base.Filesystem.S_IRGRP | Base.Filesystem.S_IWGRP | Base.Filesystem.S_IRUSR | Base.Filesystem.S_IWUSR
    detect_mult_runners_f = Base.Filesystem.open("detect-mult-runners", flags, perm)
    detect_mult_runners_i = Ref(filesize(detect_mult_runners_f))
    Timer(0.0; interval=1.0) do t
        write(detect_mult_runners_f, 0x41)
        flush(detect_mult_runners_f)
        detect_mult_runners_i[] += 1
        if filesize(detect_mult_runners_f) != detect_mult_runners_i[]
            @error "multiple runners are running this job, exiting"
            exit()
        end
    end
    sleep(1000)
end

main()

Is there a simpler way of locking an output directory from multiple julia processes that is safe when random powerloss can occur?
Also, is my use Filesystem.open something that may break in a future julia release?


Last updated: Oct 02 2023 at 04:34 UTC