Files
Julia_Academy/DataFrames/01__Environment_setup.ipynb
2021-06-26 18:57:09 +02:00

5.1 KiB

Environment setup for data frames tutorial

Bogumił Kamiński

Welcome to DataFrames.jl introduction!

This set of Jupyter notebooks is intended to give you an overwiew of what functionality DataFrames.jl has based on practical examples.

You can find reviews of functionality of DataFrames.jl (not as exercises as this tutorial but task-type oriented) in the following locations:

We also assume that you have a basic knowledge of the Julia language and the Julia ecosystem. There are great tutorials on this topic in JuliaAcademy, so I encourage you to check them out.

As this is a hands-on tutorial you can expect that the examples will be implemented in a way as I would write them when doing actual project.

The notebooks were prepared under Julia 1.5.3 and tested under Julia 1.6.1. If you have a different version of Julia installed change the kernel in Kernel/Change kernel option in menu (assuming you are on a Julia 1.x all examples should work without a problem).

In [1]:
VERSION
Out[1]:
v"1.6.1"

Jupyter Notebook automatically activates project environment if it is found in the working directory.

So first let us check if we have Project.toml and Manifest.toml files present (they should be present if you cloned the repository of this tutorial).

In [2]:
isfile.(["Project.toml", "Manifest.toml"])
Out[2]:
2-element BitVector:
 1
 1

You should get 1 printed (meaning true) in both entries of a vector.

Now we are sure that you are going to use exactly the same versions of the packages that I use when running this tutorial.

Let us check what packages (and in what versions) we will use.

In [3]:
] status

These notebooks should work with DataFrames versions 0.22 and 1.1.

if the command above gives a warning that some of the packages are not downloaded run the instantiate instruction from the following line

In [4]:
] instantiate

As you see we will use the following packages:

Package Description
DataFrames.jl a core package that is a subject of this tutorial; it is used for data manipulation; we use version 0.21.0 of this package
CSV.jl a package for reading/writing of CSV files
FreqTables.jl a very useful package for creating frequency tables
GLM.jl a package for fitting Generalized Linear Models (as no data science tutorial would be complete without building some predictive model)
PyPlot.jl a package for plotting; there are many options in the Julia ecosystem to choose from; in this tutorial we use PyPlot.jl as it is based on Matplotlib so if you have experience with the Python data science technology stack it should be familiar
Pipe.jl a package that makes chaining of operations super powerful (which is something you probably know from %>% in R)
Arrow.jl a package for working with data in Apache Arrow format
Uniftul.jl a package for working with physical units (like kg, cm, ...)