Stream: helpdesk (published)

Topic: data processing


view this post on Zulip Plecra (Jan 11 2022 at 15:13):

lds.set_index("Area").filter(regex = "ER_.*").rename(columns = lambda s: int(s[3:])).loc[["South West", "London"]].T.plot()

Hiya! I'd like to start using julia and Pluto instead of Jupyter, but I'm finding it tough to see how to do similar things.
How would I write code to do the same thing as this pandas/matplotlib code? It's selecting the rows with South West and London for their "Area", and graphing the values in the ER_{2005, 2006, ..., 2019} columns for each.

view this post on Zulip Ian Weaver (Jan 11 2022 at 18:55):

Hey! I think AlgebraOfGraphics.jl could be awesome for this. Could you share an example of what lds looks like and the corresponding plot you are trying to make?

view this post on Zulip Sundar R (Jan 11 2022 at 19:15):

Generally, DataFrames.jl is the pandas replacement, with helper packages like DataFramesMeta.jl (among others). Plots.jl is the general plots package, but StatsPlots.jl has a nice interface for working with DataFrames specifically.

I don't know pandas, but based on your description, something like this would be equivalent:

julia> using DataFrames, Random, StatsPlots

julia> df = DataFrame(area = ["South West", "Edinburgh", "London", "South West", "Morecombe"], ER_2005 = rand(1:100, 5), ER_2015 = rand(1:100, 5), something = map(_->randstring(), 1:5))
5×4 DataFrame
 Row │ area        ER_2005  ER_2015  something
     │ String      Int64    Int64    String
─────┼─────────────────────────────────────────
   1 │ South West       71       83  aZQ5TM4K
   2 │ Edinburgh        98       99  BLzqoIFg
   3 │ London           39       15  u43jnMLH
   4 │ South West       58       49  EtrFImPX
   5 │ Morecombe       100       84  1U773XwD

julia> @df df[in(["South West", "London"]).(df.area), r"ER_.*"] plot(cols())

view this post on Zulip Sundar R (Jan 11 2022 at 19:16):

See also: https://dataframes.juliadata.org/latest/man/comparisons/#Comparison-with-the-Python-package-pandas

view this post on Zulip Plecra (Jan 13 2022 at 12:06):

Yeah! These are the libraries I've been using, though I've been struggling to find alternatives to all of pandas' methods

view this post on Zulip Plecra (Jan 13 2022 at 12:06):

It seems like the shape of the API is a little different, so translations probably wont be one to one

view this post on Zulip Plecra (Jan 13 2022 at 12:09):

I've been finding that a lot of seemingly common operations in R and pylab are quite difficult in Julia though - matrix transpositions are common enough for python to have the short .T accessor for them, but there's no blessed implementation I could find.
Is this because I'm looking in the wrong places? Or does Julia expect more things to be manually implemented.

view this post on Zulip Plecra (Jan 13 2022 at 12:09):

Oh, tangential question: The postfix methods are quite a lot easier to chain, is there a way to write them in the same order?

view this post on Zulip Plecra (Jan 13 2022 at 12:15):

And thanks for your example code @Sundar R ! It's nice to know I wasn't too far off, this is almost exactly what I was using already. My trouble is with extending it - I want the axes' values to be properly labelled from the names ("ER_nnnn" is being parsed to an int in the python), and I'm plotting multiple rows in the line chart. That's matplot in R.
I think the ecosystem just isn't as mature, and I'll have to stick to python's notebooks for now.

view this post on Zulip Andrey Oskin (Jan 13 2022 at 13:15):

It's not immature, it's just different.

I highly recommend to read Bogumil blog on DataFrames to get better understanding. For example, you can start with https://bkamins.github.io/julialang/2020/12/24/minilanguage.html May be it is outdated, I do not know, but still it gives very good introduction in dataframes workflow.

Also, there are packages which provide some syntax sugar, for example Chain.jl, Pipe.jl, Underscores.jl. May be you'll find them useful.

view this post on Zulip Ian Weaver (Jan 13 2022 at 18:25):

Couldn't agree more. Here's a quick example using some of those tools to start getting your plot off the ground, I just split out some of the parts for clarity. With Pluto's built-in package manager, running the notebook after you download it should just work™, but I might be pushing it with the Python example at the end :snake:

Screenshot-from-2022-01-13-13-27-13.png


Last updated: Oct 02 2023 at 04:34 UTC