Stream: helpdesk (published)

Topic: Truncate string to X characters


view this post on Zulip Fredrik Ekre (Sep 15 2021 at 11:56):

What is f here:

str = "some long string possibly with Unicode characters"
trunc_str = f(str)
@assert textwidth(str) == N

view this post on Zulip Sebastian Pfitzner (Sep 15 2021 at 13:18):

modify https://github.com/julia-vscode/julia-vscode/blob/6b43cb7203e0152d5a727947292b8b302715748c/scripts/packages/VSCodeServer/src/misc.jl#L87-L102 to i += textwidth(c), maybe?

view this post on Zulip Andrey Oskin (Sep 15 2021 at 13:31):

Maybe something like this?

function widthtrunc(str, N)
    i = 1
    w1 = 0
    while i < ncodeunits(str)
        c = str[i]
        w1 += textwidth(c)
        w1 >= N && break
        i = nextind(str, i)
    end

    i = i >= ncodeunits(str) ? prevind(str, i) : i

    return str[1:i]
end
julia> widthtrunc("h⛵αβ", 4)
"h⛵α"

view this post on Zulip Keith Rutkowski (Sep 15 2021 at 14:18):

I had recently done something like this using chop(str, length(str)-N) (maybe a -1 is needed too, I can't recall)

view this post on Zulip Fredrik Ekre (Sep 15 2021 at 14:53):

Thanks, of course a loop is one option, was just thinking if there was something more convenient. I will suggest this:

function shorten_str(str, max_len)
    io = IOBuffer(; sizehint=max_len)
    len = 0
    for c in str
        len += textwidth(c)
        if len > max_len
            break
        end
        print(io, c)
    end
    return String(take!(io))
end

view this post on Zulip Sebastian Pfitzner (Sep 15 2021 at 14:55):

just iterate through the string directly, no reason to index

view this post on Zulip Fredrik Ekre (Sep 15 2021 at 14:59):

True, I had that version first but then I thought I needed i for something, but obviously not in this final version. Edited.

view this post on Zulip Andrey Oskin (Sep 15 2021 at 15:07):

print is relatively slow.

julia> @btime widthtrunc("h⛵αβ", 4)
  63.938 ns (1 allocation: 32 bytes)
"h⛵α"

julia> @btime shorten_str("h⛵αβ", 4)
  161.731 ns (5 allocations: 256 bytes)
"h⛵α"

Of course, it's probably not a bottleneck function, so it doesn't really matter which one to choose.

(until the moment it bottleneck)

view this post on Zulip Fredrik Ekre (Sep 15 2021 at 15:25):

True, final version:

function shorten_str(str, max_len)
    len = 0
    ind = 1
    for i in eachindex(str)
        c = @inbounds str[i]
        len += textwidth(c)
        len > max_len && break
        ind = i
    end
    return str[1:ind]
end

view this post on Zulip Keith Rutkowski (Sep 15 2021 at 17:40):

Using chop for this is non-allocating and much faster for me.

view this post on Zulip Jakob Nybo Nissen (Sep 15 2021 at 17:45):

But it doesn't respect textwidth

view this post on Zulip Sebastian Pfitzner (Sep 15 2021 at 18:07):

Also, doesn't chop allocate a new string as well? If you actually want a non-allocating solution then there's always

function shorten_str(str, max_len)
    len = 0
    ind = 1
    for i in eachindex(str)
        c = @inbounds str[i]
        len += textwidth(c)
        len > max_len && break
        ind = i
    end
    return SubString(str, 1:ind)
end

view this post on Zulip Sebastian Pfitzner (Sep 15 2021 at 18:10):

And if you don't care about textwidth then you should probably be using first(str, max_len) anyways

view this post on Zulip Keith Rutkowski (Sep 15 2021 at 18:38):

chop does respect textwidth, at least the results are the same as shorten_str, and it seems to be faster than the SubString approach too.

julia> shorten(str, max_len) = chop(str, tail = length(str)-(max_len-1))
shorten (generic function with 1 method)

julia> @btime shorten_str("h⛵αβ", 4)
  70.533 ns (0 allocations: 0 bytes)
"h⛵α"

julia> @btime shorten("h⛵αβ", 4)
  26.924 ns (0 allocations: 0 bytes)
"h⛵α"

view this post on Zulip Sebastian Pfitzner (Sep 15 2021 at 18:41):

sure, but that's because that example string doesn't have any chars that are wider than 1 column

view this post on Zulip Andrey Oskin (Sep 15 2021 at 18:45):

It does.

view this post on Zulip Sebastian Pfitzner (Sep 15 2021 at 18:46):

Huh, so it does

view this post on Zulip Andrey Oskin (Sep 15 2021 at 18:46):

julia> textwidth('⛵')
2

view this post on Zulip Sebastian Pfitzner (Sep 15 2021 at 18:51):

It only accidentally works for that case though:

julia> shorten("aaaaa", 3)
"aa"

view this post on Zulip Sebastian Pfitzner (Sep 15 2021 at 18:51):

so no, it doesn't respect textwidth

view this post on Zulip Andrey Oskin (Sep 15 2021 at 18:53):

Oh, that's because it calculates from the tail?

view this post on Zulip Keith Rutkowski (Sep 15 2021 at 18:53):

julia> chop("h⛵αβ", head=1, tail = 0)
"⛵αβ"

julia> chop("h⛵αβ", head=2, tail = 0)
"αβ"

julia> chop("h⛵αβ", head=3, tail = 0)
"β"

julia> chop("h⛵αβ", head=4, tail = 0)
""

view this post on Zulip Keith Rutkowski (Sep 15 2021 at 18:57):

Oh, I see the problem... the example used 4 as the max length, when it should have been 3?

julia> shorten(str, max_len) = chop(str, tail = length(str)-max_len)
shorten (generic function with 1 method)

julia> shorten("aaaaa", 3)
"aaa"

julia> shorten("h⛵αβ", 3)
"h⛵α"

view this post on Zulip Sebastian Pfitzner (Sep 15 2021 at 18:58):

Right and the last case is actually 4 columns wide if you ask textwidth:

julia> textwidth("h⛵α")
4

Last updated: Oct 02 2023 at 04:34 UTC