Introduction

The demo shows all the typical steps of a data science cycle and you'll see how FsLab helps with each of them. The example compares university enrollment in the European Union and the Czech Republic - we'll start by getting data about the countries from the World Bank, then we'll do a simple exploratory data analysis.

Accessing data with type providers

First, you need to download the FsLab template or package. Then, we reference the libraries that we need. Here, we use FSharp.Data for data access and Deedle for interactive data exploration:

1: 
2: 
3: 
4: 
#load "packages/FsLab/FsLab.fsx"

open Deedle
open FSharp.Data

Next, we connect to the World Bank and access the indicators for the European Union and Czech Republic. When doing this yourself, change the names to your country and a region or country nearby!

1: 
2: 
3: 
let wb = WorldBankData.GetDataContext()
let cz = wb.Countries.``Czech Republic``.Indicators
let eu = wb.Countries.``European Union``.Indicators

When using advanced F# editor (Xamarin, Visual Studio, Emacs with F# mode etc.), you'll get auto-completion after typing wb.Countries. - this is the type provider magic that makes it easy to access external data sources.

Interactive data exploration

Just like we can easily find countries and regions, we can easily get interesting indicators about them. To compare university enrollment in Czech Republic and European Union, we just pick the relevant indicator and use the series function to create a Deedle time-series:

1: 
2: 
let czschool = series cz.``School enrollment, tertiary (% gross)``
let euschool = series eu.``School enrollment, tertiary (% gross)``

When using Deedle, you can apply numerical operations to an entire time-series. Here, we calculate the difference between CZ and EU data. Deedle automatically aligns the time-series and matches corresponding years, so you do not have to worry about aligning data from multiple sources. We then pick the 5 years with largest differences:

1: 
2: 
3: 
4: 
abs (czschool - euschool)
|> Series.sort
|> Series.rev
|> Series.take 5

Keys

1999

1998

2001

1997

2000

Values

22.62

22.15

22.04

21.42

21.4

With the FsLab journal template, you can easily embed the results of a computation into a report. In fact, this page has been generated using exactly that mechanism!

Summary

This short article demonstrated how to get started with FsLab and we also looked at a demo that shows how FsLab simplifies the three tasks of working with data.

  • Type providers make it easier to access data and help you avoid issues by integrating external data (like the World Bank) into the language and into your editor.

  • The Deedle library provides rich and easy-to-use tools for interactive data exploration using data frame, series and time-series (and it can also integrate with R).

  • FsLab comes with visualization libraries that you can use to produce elegant HTML or LaTeX output.