I want to scrape the source code for the Top 50 repos on JuliaHub. My goal is to look at common bigrams and optimize a keyboard symbol layer for Julia. What is the best way to download source files and put substrings into a DataFrame? Does Pkg provide a mechanism that could be used?
You can probably use usual git clone
? As a regular shell command.
Just create workspace in /tmp
(or it's analogue in Windows, if you are using one) and do whatever you want.
Yeah, that would also do. I was just wondering whether this could be done in 100% Julia. But I guess Pkg also just calls Downloads.jl, which ends up using Curl.
You can use UrlDownload.jl, which uses HTTP.jl
But there is nothing wrong with Curl and I think it works fine in windows too.
Just Pkg.add
everything, no? Should be faster than cloning
How one can get path to the source code of the result of add
command?
I suppose it's something simple, but it is hidden somewhere inside Pkg
.
pathof(Foo)
gives you the path of the Foo
package
Additionally Base.find_package("Foo")
doesn't require you to import the package
PackageAnalyzer can clone a bunch of packages to a directory with analyze!
. It’s threaded and does a shallow clone so it can be quite fast
Thanks everyone! :smiley:
Last updated: Nov 06 2024 at 04:40 UTC