Stream: helpdesk (published)

Topic: ✔ Check if file used by multiple processes.


view this post on Zulip Nathan Zimmerberg (Nov 04 2022 at 19:02):

Hello,

I am trying to automate running many multiweek long simulations. I want these simulations to be restartable after a reboot, so I am storing checkpoints and logs to an output directory per job.

Normally each simulation job should only have one julia process running it at a time, however, I would like to automatically detect if there are multiple processes trying to output to the same output directory at the same time.

I'm not sure if this is a good idea but currently I am trying to have each process constantly append a byte to a file, then check if the file size is as expected. If two processes are running with the same output directory, the file size will be larger than expected.

I initially tried this with open("file_name", "a") but this didn't work, and multiple processes just overwrote each others data.

So now I am using Base.Filesystem.open like so:

function main()
    flags = Base.Filesystem.JL_O_APPEND | Base.Filesystem.JL_O_CREAT | Base.Filesystem.JL_O_WRONLY
    perm = Base.Filesystem.S_IROTH | Base.Filesystem.S_IRGRP | Base.Filesystem.S_IWGRP | Base.Filesystem.S_IRUSR | Base.Filesystem.S_IWUSR
    detect_mult_runners_f = Base.Filesystem.open("detect-mult-runners", flags, perm)
    detect_mult_runners_i = Ref(filesize(detect_mult_runners_f))
    Timer(0.0; interval=1.0) do t
        write(detect_mult_runners_f, 0x41)
        flush(detect_mult_runners_f)
        detect_mult_runners_i[] += 1
        if filesize(detect_mult_runners_f) != detect_mult_runners_i[]
            @error "multiple runners are running this job, exiting"
            exit()
        end
    end
    sleep(1000)
end

main()

Is there a simpler way of locking an output directory from multiple julia processes that is safe when random powerloss can occur?
Also, is my use Filesystem.open something that may break in a future julia release?

view this post on Zulip Nathan Zimmerberg (Oct 03 2023 at 16:32):

Update. I am no longer using Filesystem.open with O_APPEND because using O_APPEND is very broken on linux, see the end of https://man7.org/linux/man-pages/man2/pwrite.2.html . I am instead using https://docs.julialang.org/en/v1/stdlib/FileWatching/#Pidfile

view this post on Zulip Notification Bot (Oct 03 2023 at 16:33):

Nathan Zimmerberg has marked this topic as resolved.


Last updated: Nov 06 2024 at 04:40 UTC