Stream: helpdesk (published)

Topic: Determining the disk usage of a folder


view this post on Zulip Timothy (Sep 25 2023 at 17:11):

I _almost_ have a du -B1 replacement in Julia, but in the odd directory it's slightly off (undercounting by a little bit), and I can't for the life of me work out why. Any help would be much appreciated.

"""
    diskusage(path::String)

Find the disk usage of `path`, in bytes (as `du -B1` does). This is almost
eqivalent to [`filesize`](@ref) when applied to a file, and operates recursively
on directories.
"""
function diskusage(path::String)
    if isfile(path)
        st = stat(path)
        st.blocks * 512 # 512 not the blocksize for historical reasons
    elseif isdir(path)
        try
            subpaths = readdir(path, join=true)
            filter!(!islink, subpaths)
            st = stat(path)
            sum(diskusage, subpaths, init = st.blksize * (st.size ÷ st.blksize))
        catch err
            if err isa Base.IOError && err.code == -Base.Libc.EACCES # Permission denied
                printstyled(stderr, "[ Warning: ", color=Base.warn_color(), bold=true)
                println(stderr, "Couldn't read $path: Permission denied")
                0
            else
                rethrow()
            end
        end
    else
        0
    end
end

view this post on Zulip Sebastian Pfitzner (Sep 25 2023 at 17:15):

what's that st.blksize * (st.size ÷ st.blksize) about? Isn't that essentially floor(st.size), and if so, do you actually want it to be?

view this post on Zulip Timothy (Sep 25 2023 at 17:17):

When directories have many items, they seem to start taking up blocks. When st.size is say 400 I want 0, when it's say 9000 I want 4096 * 2. At least from my experiments so far, this seems about right.

view this post on Zulip Timothy (Sep 25 2023 at 17:21):

Here's me being slightly off in practice:

julia> diskusage("/home/tec/Desktop/")
5505024

shell> du -d0 -B1 /home/tec/Desktop/
5505024    /home/tec/Desktop/

julia> diskusage("/home/tec/Documents/")
29516001280

shell> du -d0 -B1 /home/tec/Documents/
29517352960    /home/tec/Documents/

shell> du -d0 -B1 /home/tec/Videos/
171021414400    /home/tec/Videos/

julia> diskusage("/home/tec/Music/")
280397500416

shell> du -d0 -B1 /home/tec/Music/
280397402112    /home/tec/Music/

~/Videos is off by 4 blocks (0.000001%), ~/Documents is off by 330 blocks (0.001%), etc.

view this post on Zulip Sebastian Pfitzner (Sep 25 2023 at 17:25):

hm, makes sense I guess

view this post on Zulip Timothy (Sep 25 2023 at 17:26):

This is just guesswork. I'm struggling to find any documentation of the details here.

view this post on Zulip Timothy (Sep 25 2023 at 17:31):

Ah, I think I've found a minor error in the stat documentation. This seems wrong:

  blksize The file-system preferred block size for the file
  blocks  The number of such blocks allocated

view this post on Zulip Timothy (Sep 25 2023 at 17:37):

https://github.com/JuliaLang/julia/issues/51447


Last updated: Oct 02 2023 at 04:34 UTC