What is f
here:
str = "some long string possibly with Unicode characters"
trunc_str = f(str)
@assert textwidth(str) == X
x-ref https://github.com/JuliaLogging/LoggingFormats.jl/pull/1#discussion_r709108623
modify https://github.com/julia-vscode/julia-vscode/blob/6b43cb7203e0152d5a727947292b8b302715748c/scripts/packages/VSCodeServer/src/misc.jl#L87-L102 to i += textwidth(c)
, maybe?
Maybe something like this?
function widthtrunc(str, N)
i = 1
w1 = 0
while i < ncodeunits(str)
c = str[i]
w1 += textwidth(c)
w1 >= N && break
i = nextind(str, i)
end
i = i >= ncodeunits(str) ? prevind(str, i) : i
return str[1:i]
end
julia> widthtrunc("h⛵αβ", 4)
"h⛵α"
I had recently done something like this using chop(str, length(str)-N)
(maybe a -1 is needed too, I can't recall)
Thanks, of course a loop is one option, was just thinking if there was something more convenient. I will suggest this:
function shorten_str(str, max_len)
io = IOBuffer(; sizehint=max_len)
len = 0
for c in str
len += textwidth(c)
if len > max_len
break
end
print(io, c)
end
return String(take!(io))
end
just iterate through the string directly, no reason to index
True, I had that version first but then I thought I needed i
for something, but obviously not in this final version. Edited.
print
is relatively slow.
julia> @btime widthtrunc("h⛵αβ", 4)
63.938 ns (1 allocation: 32 bytes)
"h⛵α"
julia> @btime shorten_str("h⛵αβ", 4)
161.731 ns (5 allocations: 256 bytes)
"h⛵α"
Of course, it's probably not a bottleneck function, so it doesn't really matter which one to choose.
(until the moment it bottleneck)
True, final version:
function shorten_str(str, max_len)
len = 0
ind = 1
for i in eachindex(str)
c = @inbounds str[i]
len += textwidth(c)
len > max_len && break
ind = i
end
return str[1:ind]
end
Using chop
for this is non-allocating and much faster for me.
But it doesn't respect textwidth
Also, doesn't chop
allocate a new string as well? If you actually want a non-allocating solution then there's always
function shorten_str(str, max_len)
len = 0
ind = 1
for i in eachindex(str)
c = @inbounds str[i]
len += textwidth(c)
len > max_len && break
ind = i
end
return SubString(str, 1:ind)
end
And if you don't care about textwidth
then you should probably be using first(str, max_len)
anyways
chop
does respect textwidth
, at least the results are the same as shorten_str
, and it seems to be faster than the SubString approach too.
julia> shorten(str, max_len) = chop(str, tail = length(str)-(max_len-1))
shorten (generic function with 1 method)
julia> @btime shorten_str("h⛵αβ", 4)
70.533 ns (0 allocations: 0 bytes)
"h⛵α"
julia> @btime shorten("h⛵αβ", 4)
26.924 ns (0 allocations: 0 bytes)
"h⛵α"
sure, but that's because that example string doesn't have any chars that are wider than 1 column
It does.
Huh, so it does
julia> textwidth('⛵')
2
It only accidentally works for that case though:
julia> shorten("aaaaa", 3)
"aa"
so no, it doesn't respect textwidth
Oh, that's because it calculates from the tail?
julia> chop("h⛵αβ", head=1, tail = 0)
"⛵αβ"
julia> chop("h⛵αβ", head=2, tail = 0)
"αβ"
julia> chop("h⛵αβ", head=3, tail = 0)
"β"
julia> chop("h⛵αβ", head=4, tail = 0)
""
Oh, I see the problem... the example used 4 as the max length, when it should have been 3?
julia> shorten(str, max_len) = chop(str, tail = length(str)-max_len)
shorten (generic function with 1 method)
julia> shorten("aaaaa", 3)
"aaa"
julia> shorten("h⛵αβ", 3)
"h⛵α"
Right and the last case is actually 4 columns wide if you ask textwidth
:
julia> textwidth("h⛵α")
4
Last updated: Nov 22 2024 at 04:41 UTC