Stream: helpdesk (published)

Topic: Coverting Array{Any,1} Column Data Types


view this post on Zulip Florian Große (Mar 02 2021 at 20:34):

Relate[!, 1] = map(convert(Int64, Relate[!, 1]))

Especially convert(Int64, Relate[!, 1]) is exactly what the error says, that tries to convert the whole column into a number.
Did you want to change the eltype to Int?

Also, what are all the maps supposed to do? What is the structure of Relate? Maybe we can work from this forwards.

view this post on Zulip Florian Große (Mar 02 2021 at 21:06):

Suppose you have:

julia> df = DataFrame((as = Any[1:10...], bs = rand(10), names = Any["Col_$(i)" for i in 1:10]))
10×3 DataFrame
 Row  as   bs         names
      Any  Float64    Any
─────┼────────────────────────
   1  1    0.452431   Col_1
   2  2    0.965297   Col_2
   3  3    0.0673582  Col_3
   4  4    0.319082   Col_4
   5  5    0.53456    Col_5
   6  6    0.175433   Col_6
   7  7    0.131868   Col_7
   8  8    0.996933   Col_8
   9  9    0.275728   Col_9
  10  10   0.326396   Col_10

What about:

julia> df.as = convert.(Int, df.as);

julia> df.names = string.(df.names);

julia> df
10×3 DataFrame
 Row  as     bs         names
      Int64  Float64    String
─────┼──────────────────────────
   1      1  0.452431   Col_1
   2      2  0.965297   Col_2
   3      3  0.0673582  Col_3
   4      4  0.319082   Col_4
   5      5  0.53456    Col_5
   6      6  0.175433   Col_6
   7      7  0.131868   Col_7
   8      8  0.996933   Col_8
   9      9  0.275728   Col_9
  10     10  0.326396   Col_10

view this post on Zulip qu bit (Mar 02 2021 at 21:27):

Florian Große said:

Suppose you have:

julia> df = DataFrame((as = Any[1:10...], bs = rand(10), names = Any["Col_$(i)" for i in 1:10]))
10×3 DataFrame
 Row  as   bs         names
      Any  Float64    Any
─────┼────────────────────────
   1  1    0.452431   Col_1
   2  2    0.965297   Col_2
   3  3    0.0673582  Col_3
   4  4    0.319082   Col_4
   5  5    0.53456    Col_5
   6  6    0.175433   Col_6
   7  7    0.131868   Col_7
   8  8    0.996933   Col_8
   9  9    0.275728   Col_9
  10  10   0.326396   Col_10

What about:

julia> df.as = convert.(Int, df.as);

julia> df.names = string.(df.names);

julia> df
10×3 DataFrame
 Row  as     bs         names
      Int64  Float64    String
─────┼──────────────────────────
   1      1  0.452431   Col_1
   2      2  0.965297   Col_2
   3      3  0.0673582  Col_3
   4      4  0.319082   Col_4
   5      5  0.53456    Col_5
   6      6  0.175433   Col_6
   7      7  0.131868   Col_7
   8      8  0.996933   Col_8
   9      9  0.275728   Col_9
  10     10  0.326396   Col_10

@Florian Große , was able to implement:

begin
Relate[!, 1:3] = convert.(Int, Relate[!, 1:3])
Relate[!, 4:6] = float.(Relate[!,4:6])
Relate[!, 7:9] = string.(Relate[!, 7:9])
first(Relate, 5)
end

When I hover over the dataframe in the output,
the float fields do not show the data type.

Any suggestion?

view this post on Zulip Florian Große (Mar 02 2021 at 21:32):

I don't really understand what you mean by "hover over". Is this code in a Pluto notebook?

view this post on Zulip qu bit (Mar 02 2021 at 21:43):

Florian Große said:

I don't really understand what you mean by "hover over". Is this code in a Pluto notebook?

@Florian Große , yes it is. v1.5.3

view this post on Zulip Florian Große (Mar 02 2021 at 21:53):

what does typeof.(df[!,name] for name in names(df)) yield for you after conversion?

view this post on Zulip qu bit (Mar 02 2021 at 22:05):

Florian Große said:

what does typeof.(df[!,name] for name in names(df)) yield for you after conversion?

@Florian Große , the following is returned

DataType
Array{Union{Missing, Float64},1}
Array{Union{Missing, Float64},1}
Array{Union{Missing, Float64},1}
Array{Union{Missing, Float64},1}
Array{Union{Missing, Float64},1}
Array{Union{Missing, Float64},1}
Array{Union{Missing, Float64},1}
Array{Union{Missing, Float64},1}
Array{Union{Missing, Float64},1}

view this post on Zulip Florian Große (Mar 02 2021 at 22:07):

too bad, can't make it visible myself

view this post on Zulip Florian Große (Mar 02 2021 at 22:08):

thanks, so the types appear to be correct. Does it matter whether you see them or not?

view this post on Zulip Florian Große (Mar 02 2021 at 22:09):

If it does, it's probably something for the #pluto.jl channel

view this post on Zulip Florian Große (Mar 02 2021 at 22:09):

(assuming the problem is related to Pluto, I still don't exactly understand what you mean by hover over)

view this post on Zulip qu bit (Mar 02 2021 at 22:11):

Florian Große said:

If it does, it's probably something for the #pluto.jl channel

@Florian Große -- Thank you I will reach out to them about it.

What I mean is, once you generate an output, in this case
a dataframe, you can move your mouse over the column
headings and see the eltype for each field.

view this post on Zulip Florian Große (Mar 02 2021 at 22:12):

I see, that's nice, I never tried that

view this post on Zulip Nils (Mar 03 2021 at 11:32):

Have you tried using infer_eltypes=true so that XLSX.jl just infers the types for you?

view this post on Zulip qu bit (Mar 03 2021 at 14:49):

Nils said:

Have you tried using infer_eltypes=true so that XLSX.jl just infers the types for you?

Excellent -- thank you @Nils , this worked.

I used

DeadAvenger = DataFrame(xl.readtable("Data.xlsx", "IronMan",
            infer_eltypes=true)...)

Might you know of a website that lists the parameters for
common methods? I posted the question to the general
board previously.

Thank you,

view this post on Zulip Nils (Mar 03 2021 at 16:36):

The answer that was given to you in the other thread stands: most packages have documentation, and while that is not always complete, it is pretty good for the packages you're deling with (DataFrames, CSV, XLSX). Here it is for XLSX.readtable: https://felipenoris.github.io/XLSX.jl/dev/api/#XLSX.readtable


Last updated: Oct 02 2023 at 04:34 UTC