:confused: but there is no iloc or iat function in DataFrames...
Hmmm.,.
Can you start from the beginning and explain what you have and what you are trying to achieve?
May be small example of the csv file that you are trying to parse? How should result look like after all transformations? Is usage of Pandas is mandatory or it is ok to use only vanilla DataFrames?
I'm not at all sure what you are trying to do here, but it sounds to me a bit like you are looking for IndexedTables.jl. DataFrames are essentially a named tuples of like sized vectors, column values and indices are deliberately kept separate, although the library does have stack
and unstack
functions
You may also want to read here
well, so does dataframes, so I think the first thing you should do is read the above documentation section. However, if using column values as indices is something you want to do very often, then yes, IndexedTables.jl may be a better choice.
only if the thing you want to stack or unstack is an ID column
(deleted)
Is this what you're trying to do?
df = DataFrame(Time1=[now() + Second(i) for i ∈ 1:5],
Time2=[now() + Hour(i) for i ∈ 1:5],
Time3=[now() + Day(i) for i ∈ 1:5]
)
stack(df, [:Time1, :Time2, :Time3])
which gives
15×2 DataFrame
Row │ variable value
│ String DateTime
─────┼───────────────────────────────────
1 │ Time1 2021-02-16T12:50:57.435
2 │ Time1 2021-02-16T12:50:58.435
3 │ Time1 2021-02-16T12:50:59.435
4 │ Time1 2021-02-16T12:51:00.435
5 │ Time1 2021-02-16T12:51:01.435
6 │ Time2 2021-02-16T13:50:56.458
7 │ Time2 2021-02-16T14:50:56.458
8 │ Time2 2021-02-16T15:50:56.458
9 │ Time2 2021-02-16T16:50:56.458
10 │ Time2 2021-02-16T17:50:56.458
11 │ Time3 2021-02-17T12:50:56.483
12 │ Time3 2021-02-18T12:50:56.483
13 │ Time3 2021-02-19T12:50:56.483
14 │ Time3 2021-02-20T12:50:56.483
15 │ Time3 2021-02-21T12:50:56.483
Expanding Man
Sorry for the confusion. I was listing Time1 and Time2 as labels. I was not
attempting to have them represent timestamps, but this is useful in other
cases for sure.
Regardless, was a complete stacking aggregation what you were trying to do? Otherwise you can leave some columns unstacked as an extra column, e.g.
stack(df, [:Time1, :Time2], [:Time3])
which gives
10×3 DataFrame
Row │ Time3 variable value
│ DateTime String DateTime
─────┼────────────────────────────────────────────────────────────
1 │ 2021-02-17T13:07:03.609 Time1 2021-02-16T13:07:04.561
2 │ 2021-02-18T13:07:03.609 Time1 2021-02-16T13:07:05.561
3 │ 2021-02-19T13:07:03.609 Time1 2021-02-16T13:07:06.561
4 │ 2021-02-20T13:07:03.609 Time1 2021-02-16T13:07:07.561
5 │ 2021-02-21T13:07:03.609 Time1 2021-02-16T13:07:08.561
6 │ 2021-02-17T13:07:03.609 Time2 2021-02-16T14:07:03.585
7 │ 2021-02-18T13:07:03.609 Time2 2021-02-16T15:07:03.585
8 │ 2021-02-19T13:07:03.609 Time2 2021-02-16T16:07:03.585
9 │ 2021-02-20T13:07:03.609 Time2 2021-02-16T17:07:03.585
10 │ 2021-02-21T13:07:03.609 Time2 2021-02-16T18:07:03.585
That just about covers the DataFrames.jl stacking functionality. If that doesn't cover your use case, than I'm afraid I just don't know what you mean by stacking.
As for IndexedTables.jl, this doesn't offer any benefit in terms of stacking, but on the other hand, DataFrames.jl is really not designed to be indexed by value, so, if for example you wanted to index the above by :Time3
value, DataFrames.jl doesn't really give you any nice ways of doing that, though you can of course just do Dict(df.Time3 .=> 1:size(df,1))
to create a lookup table (at least in teh case of unique keys). IndexedTables.jl on the other hand is designed with this type of indexing in mind. IndexedTables.jl doesn't see nearly as much development or use as DataFrames.jl, so you probably want to stick with DataFrames, but again, if what you're really interested in is indexing it may not be the best choice.
Expanding Man
As for IndexedTables.jl, this doesn't offer any benefit in terms of stacking, but on the other hand, DataFrames.jl is really not designed to be indexed by value, so, if for example you wanted to index the above by :Time3
value, DataFrames.jl doesn't really give you any nice ways of doing that, though you can of course just do Dict(df.Time3 .=> 1:size(df,1))
to create a lookup table (at least in the case of unique keys). IndexedTables.jl on the other hand is designed with this type of indexing in mind. IndexedTables.jl doesn't see nearly as much development or use as DataFrames.jl, so you probably want to stick with DataFrames, but again, if what you're really interested in is indexing it may not be the best choice.
I appreciate this. I will experiment a little more with the stack method. I also know of a LAJuliaUtils unregistered library that has an explicit 'pivot' function that could perhaps help me simply flip columns for row values. I had trouble implementing it before however.
One way or another, DataFrames.jl should cover all of the standard relational database operations such as stacking and pivots. As a rule of thumb, you probably want to use DataFrames.jl for all tables except
Thank you for providing this explanation.
The original transformation method I provided was
able to switch the header and row 1 positions. However,
the issue is that the entire dataframe is shifted down one
unit. If there was a way to configure the code so that the
transformed R1 values were the headers and the rest of
the dataframe was shifted up one unit it would be the pivot
I was attempting to create.
It would be slightly easier if you can give example of the dataframe or at least part of csv file.
So, I am unsure, whether it is the structure that you have, but maybe you want something like this?
julia> df = DataFrame(:col => ["S1", "S2", "S3"], :T1 => [1, 2, 3], :T2 => [4, 5, 6], :T3 => [7,
8, 9])
3×4 DataFrame
Row │ col T1 T2 T3
│ String Int64 Int64 Int64
─────┼─────────────────────────────
1 │ S1 1 4 7
2 │ S2 2 5 8
3 │ S3 3 6 9
julia> df1 = stack(df, 2:4);
julia> df2 = unstack(df1, :variable, :col, :value)
3×4 DataFrame
Row │ variable S1 S2 S3
│ String Int64? Int64? Int64?
─────┼──────────────────────────────────
1 │ T1 1 2 3
2 │ T2 4 5 6
3 │ T3 7 8 9
Expanding Man said:
only if the thing you want to stack or unstack is an ID column
A fellow colleague suggested, I use CSV.read(.... transpose=true); this solved the problem.
Andrey Oskin said:
It would be slightly easier if you can give example of the dataframe or at least part of csv file.
So, I am unsure, whether it is the structure that you have, but maybe you want something like this?
julia> df = DataFrame(:col => ["S1", "S2", "S3"], :T1 => [1, 2, 3], :T2 => [4, 5, 6], :T3 => [7, 8, 9]) 3×4 DataFrame Row │ col T1 T2 T3 │ String Int64 Int64 Int64 ─────┼───────────────────────────── 1 │ S1 1 4 7 2 │ S2 2 5 8 3 │ S3 3 6 9 julia> df1 = stack(df, 2:4); julia> df2 = unstack(df1, :variable, :col, :value) 3×4 DataFrame Row │ variable S1 S2 S3 │ String Int64? Int64? Int64? ─────┼────────────────────────────────── 1 │ T1 1 2 3 2 │ T2 4 5 6 3 │ T3 7 8 9
Hi Andrey, Thank you for your explanation here. A colleague mentioned that I should use CSV.read() method setting the '
transpose' parameter to 'true'
So that's what you want, a transpose :D
Didn't came through clearly I suppose
Last updated: Nov 06 2024 at 04:40 UTC