Introduction

The demo shows all the typical steps of a data science cycle and you'll see how FsLab helps with each of them. The example compares university enrollment in the European Union and the Czech Republic - we'll start by getting data about the countries from the World Bank, then we'll do a simple exploratory data analysis and we'll finish with a little visualization.

Accessing data with type providers

First, you need to download the FsLab template or package. Then, we reference the libraries that we need. Here, we use FSharp.Data for data access, Deedle for interactive data exploration and Foogle for visualization:

1: 
2: 
3: 
4: 
5: 
#load "packages/FsLab/FsLab.fsx"

open Foogle
open Deedle
open FSharp.Data

Next, we connect to the World Bank and access the indicators for the European Union and Czech Republic. When doing this yourself, change the names to your country and a region or country nearby!

1: 
2: 
3: 
let wb = WorldBankData.GetDataContext()
let cz = wb.Countries.``Czech Republic``.Indicators
let eu = wb.Countries.``European Union``.Indicators

When using advanced F# editor (Xamarin, Visual Studio, Emacs with F# mode etc.), you'll get auto-completion after typing wb.Countries. - this is the type provider magic that makes it easy to access external data sources.

Interactive data exploration

Just like we can easily find countries and regions, we can easily get interesting indicators about them. To compare university enrollment in Czech Republic and European Union, we just pick the relevant indicator and use the series function to create a Deedle time-series:

1: 
2: 
let czschool = series cz.``School enrollment, tertiary (% gross)``
let euschool = series eu.``School enrollment, tertiary (% gross)``

When using Deedle, you can apply numerical operations to an entire time-series. Here, we calculate the difference between CZ and EU data. Deedle automatically aligns the time-series and matches corresponding years, so you do not have to worry about aligning data from multiple sources. We then pick the 5 years with largest differences:

1: 
2: 
3: 
4: 
abs (czschool - euschool)
|> Series.sort
|> Series.rev
|> Series.take 5
No value has been returned

With the FsLab journal template, you can easily embed the results of a computation into a report. In fact, this page has been generated using exactly that mechanism!

Visualizing results

As a final step, we're going to create a chart that shows the two time series side-by-side. The following example uses the Foogle chart library, which is a lightweight wrapper over Google chart. When used in F# Interactive, this opens a web browser with the chart, but we can also embed it into this page, just like the table above:

1: 
2: 
3: 
4: 
5: 
6: 
Chart.LineChart
 ([ for y in 1985 .. 2012 ->
     string y,
       [ cz.``School enrollment, tertiary (% gross)``.[y]
         eu.``School enrollment, tertiary (% gross)``.[y] ] ],
  Labels = ["CZ"; "EU"])

Summary

This short article demonstrated how to get started with FsLab and we also looked at a demo that shows how FsLab simplifies the three tasks of working with data.

  • Type providers make it easier to access data and help you avoid issues by integrating external data (like the World Bank) into the language and into your editor.

  • The Deedle library provides rich and easy-to-use tools for interactive data exploration using data frame, series and time-series (and it can also integrate with R).

  • FsLab comes with visualization libraries that you can use to produce elegant HTML or LaTeX output.
namespace Foogle
namespace Deedle
Multiple items
namespace FSharp

--------------------
namespace Microsoft.FSharp
Multiple items
namespace FSharp.Data

--------------------
namespace Microsoft.FSharp.Data
val wb : WorldBankData.ServiceTypes.WorldBankDataService

Full name: Index.wb
type WorldBankData =
  static member GetDataContext : unit -> WorldBankDataService
  nested type ServiceTypes

Full name: FSharp.Data.WorldBankData


<summary>Typed representation of WorldBank data. See http://www.worldbank.org for terms and conditions.</summary>
WorldBankData.GetDataContext() : WorldBankData.ServiceTypes.WorldBankDataService
val cz : WorldBankData.ServiceTypes.Indicators

Full name: Index.cz
property WorldBankData.ServiceTypes.WorldBankDataService.Countries: WorldBankData.ServiceTypes.Countries
val eu : WorldBankData.ServiceTypes.Indicators

Full name: Index.eu
val czschool : Series<obj,obj>

Full name: Index.czschool
val series : observations:seq<'a * 'b> -> Series<'a,'b> (requires equality)

Full name: Deedle.F# Series extensions.series
val euschool : Series<obj,obj>

Full name: Index.euschool
val abs : value:'T -> 'T (requires member Abs)

Full name: Microsoft.FSharp.Core.Operators.abs
Multiple items
module Series

from Deedle

--------------------
type Series =
  static member ofNullables : values:seq<Nullable<'a0>> -> Series<int,'a0> (requires default constructor and value type and 'a0 :> ValueType)
  static member ofObservations : observations:seq<'a0 * 'a1> -> Series<'a0,'a1> (requires equality)
  static member ofOptionalObservations : observations:seq<'K * 'a1 option> -> Series<'K,'a1> (requires equality)
  static member ofValues : values:seq<'a0> -> Series<int,'a0>

Full name: Deedle.F# Series extensions.Series

--------------------
type Series<'K,'V (requires equality)> =
  interface IFsiFormattable
  interface ISeries<'K>
  new : pairs:seq<KeyValuePair<'K,'V>> -> Series<'K,'V>
  new : keys:'K [] * values:'V [] -> Series<'K,'V>
  new : keys:seq<'K> * values:seq<'V> -> Series<'K,'V>
  new : index:IIndex<'K> * vector:IVector<'V> * vectorBuilder:IVectorBuilder * indexBuilder:IIndexBuilder -> Series<'K,'V>
  member After : lowerExclusive:'K -> Series<'K,'V>
  member Aggregate : aggregation:Aggregation<'K> * observationSelector:Func<DataSegment<Series<'K,'V>>,KeyValuePair<'TNewKey,OptionalValue<'R>>> -> Series<'TNewKey,'R> (requires equality)
  member Aggregate : aggregation:Aggregation<'K> * keySelector:Func<DataSegment<Series<'K,'V>>,'TNewKey> * valueSelector:Func<DataSegment<Series<'K,'V>>,OptionalValue<'R>> -> Series<'TNewKey,'R> (requires equality)
  member AsyncMaterialize : unit -> Async<Series<'K,'V>>
  ...

Full name: Deedle.Series<_,_>

--------------------
new : pairs:seq<System.Collections.Generic.KeyValuePair<'K,'V>> -> Series<'K,'V>
new : keys:seq<'K> * values:seq<'V> -> Series<'K,'V>
new : keys:'K [] * values:'V [] -> Series<'K,'V>
new : index:Indices.IIndex<'K> * vector:IVector<'V> * vectorBuilder:Vectors.IVectorBuilder * indexBuilder:Indices.IIndexBuilder -> Series<'K,'V>
val sort : series:Series<'K,'V> -> Series<'K,'V> (requires equality and comparison)

Full name: Deedle.Series.sort
val rev : series:Series<'K,'T> -> Series<'K,'T> (requires equality)

Full name: Deedle.Series.rev
val take : count:int -> series:Series<'K,'T> -> Series<'K,'T> (requires equality)

Full name: Deedle.Series.take
type Chart =
  static member GeoChart : data:seq<string * #value * #value> * ?Labels:string list * ?Region:string * ?DisplayMode:DisplayMode -> FoogleChart
  static member GeoChart : data:seq<string * #value> * ?Label:string * ?Region:string * ?DisplayMode:DisplayMode -> FoogleChart
  static member LineChart : data:seq<string * #value list> * ?Labels:string list * ?CurveType:CurveType -> FoogleChart
  static member LineChart : data:seq<string * #value> * ?Label:string * ?CurveType:CurveType -> FoogleChart
  static member PieChart : data:seq<string * #value> * ?Label:string * ?PieHole:float -> FoogleChart
  static member WithColorAxis : ?MinValue:float * ?MaxValue:float * ?Values:seq<float> * ?Colors:seq<Color> -> (FoogleChart -> FoogleChart)
  static member WithOutput : ?Engine:Engine -> (FoogleChart -> FoogleChart)
  static member WithPie : ?PieHole:float -> (FoogleChart -> FoogleChart)
  static member WithTitle : ?Title:string -> (FoogleChart -> FoogleChart)

Full name: Foogle.Chart
static member Chart.LineChart : data:seq<string * #value list> * ?Labels:string list * ?CurveType:LineChart.CurveType -> FoogleChart
static member Chart.LineChart : data:seq<string * #value> * ?Label:string * ?CurveType:LineChart.CurveType -> FoogleChart
val y : int
Multiple items
val string : value:'T -> string

Full name: Microsoft.FSharp.Core.Operators.string

--------------------
type string = System.String

Full name: Microsoft.FSharp.Core.string