I _almost_ have a du -B1
replacement in Julia, but in the odd directory it's slightly off (undercounting by a little bit), and I can't for the life of me work out why. Any help would be much appreciated.
"""
diskusage(path::String)
Find the disk usage of `path`, in bytes (as `du -B1` does). This is almost
eqivalent to [`filesize`](@ref) when applied to a file, and operates recursively
on directories.
"""
function diskusage(path::String)
if isfile(path)
st = stat(path)
st.blocks * 512 # 512 not the blocksize for historical reasons
elseif isdir(path)
try
subpaths = readdir(path, join=true)
filter!(!islink, subpaths)
st = stat(path)
sum(diskusage, subpaths, init = st.blksize * (st.size ÷ st.blksize))
catch err
if err isa Base.IOError && err.code == -Base.Libc.EACCES # Permission denied
printstyled(stderr, "[ Warning: ", color=Base.warn_color(), bold=true)
println(stderr, "Couldn't read $path: Permission denied")
0
else
rethrow()
end
end
else
0
end
end
what's that st.blksize * (st.size ÷ st.blksize)
about? Isn't that essentially floor(st.size)
, and if so, do you actually want it to be?
When directories have many items, they seem to start taking up blocks. When st.size
is say 400 I want 0, when it's say 9000 I want 4096 * 2. At least from my experiments so far, this seems about right.
Here's me being slightly off in practice:
julia> diskusage("/home/tec/Desktop/")
5505024
shell> du -d0 -B1 /home/tec/Desktop/
5505024 /home/tec/Desktop/
julia> diskusage("/home/tec/Documents/")
29516001280
shell> du -d0 -B1 /home/tec/Documents/
29517352960 /home/tec/Documents/
shell> du -d0 -B1 /home/tec/Videos/
171021414400 /home/tec/Videos/
julia> diskusage("/home/tec/Music/")
280397500416
shell> du -d0 -B1 /home/tec/Music/
280397402112 /home/tec/Music/
~/Videos
is off by 4 blocks (0.000001%), ~/Documents
is off by 330 blocks (0.001%), etc.
hm, makes sense I guess
This is just guesswork. I'm struggling to find any documentation of the details here.
Ah, I think I've found a minor error in the stat
documentation. This seems wrong:
blksize The file-system preferred block size for the file
blocks The number of such blocks allocated
https://github.com/JuliaLang/julia/issues/51447
Last updated: Nov 06 2024 at 04:40 UTC