Quickstart: Using Statistical Packages

R is a programming language designed for statistics and data mining. The R community is strong, and created an incredibly rich open source ecosystem of packages.

The F# R Type Provider enables you to use every single one of them, from within the F# environment. You can manipulate data using F#, send it to R for computation, and extract back the results.

Example: Linear Regression

Let's perform a simple linear regression from the F# interactive, using the R.lm function.

Assuming you installed the R Type Provider in your project from NuGet, you can reference the required libraries and packages this way:

#I "../packages/RProvider.1.0.11"
#load "RProvider.fsx"

open System
open RDotNet
open RProvider
open RProvider.graphics
open RProvider.stats

Once the libraries and packages have been loaded, Imagine that our true model is

Y = 5.0 + 3.0 X1 - 2.0 X2 + noise

Let's generate a fake dataset that follows this model:

// Random number generator
let rng = Random()
let rand () = rng.NextDouble()

// Generate fake X1 and X2 
let X1s = [ for i in 0 .. 9 -> 10. * rand () ]
let X2s = [ for i in 0 .. 9 -> 5. * rand () ]

// Build Ys, following the "true" model
let Ys = [ for i in 0 .. 9 -> 5. + 3. * X1s.[i] - 2. * X2s.[i] + rand () ]

Using linear regression on this dataset, we should be able to estimate the coefficients 5.0, 3.0 and -2.0, with some imprecision due to the "noise" part.

Let's first put our dataset into a R dataframe; this allows us to name our vectors, and use these names in R formulas afterwards:

let dataset =
    namedParams [
        "Y", box Ys;
        "X1", box X1s;
        "X2", box X2s; ]
    |> R.data_frame

We can now use R to perform a linear regression. We call the R.lm function, passing it the formula we want to estimate. (See the R manual on formulas for more on their somewhat esoteric construction)

let result = R.lm(formula = "Y~X1+X2", data = dataset)

Extracting Results from R to F#

The result we get back from R is a R Expression. The R Type Provider tries as much as possible to keep data as R Expressions, rather than converting back-and-forth between F# and R types. It limits translations between the 2 languages, which has performance benefits, and simplifies composing R operations. On the other hand, we need to extract the results from the R expression into F# types.

The R docs for lm describes what R.lm returns: a R List. We can now retrieve each element, accessing it by name (as defined in the documentation). For instance, let's retrieve the coefficients and residuals, which are both R vectors containg floats:

let coefficients = result.AsList().["coefficients"].AsNumeric()
let residuals = result.AsList().["residuals"].AsNumeric()

We can also produce summary statistics about our model, like R^2, which measures goodness-of-fit - close to 0 indicates a very poor fit, and close to 1 a good fit. See R docs for the details on Summary.

let summary = R.summary(result)
summary.AsList().["r.squared"].AsNumeric()

Finally, we can directly pass results, which is a R expression, to R.plot, to produce some fancy charts describing our model:

R.plot result

That's it - while simple, we hope this example illustrate how you would go about to use any existing R statistical package. While the details would differ, the general approach would remain the same. Happy modelling!

namespace System
namespace RDotNet
namespace RProvider
Multiple items
namespace RProvider

--------------------
type RProvider = inherit TypeProviderForNamespaces new : cfg:TypeProviderConfig -> RProvider

--------------------
new : cfg:CompilerServices.TypeProviderConfig -> RProvider
namespace RProvider.graphics
namespace RProvider.stats
val rng : Random
Multiple items
type Random = new : unit -> unit + 1 overload member Next : unit -> int + 2 overloads member NextBytes : buffer: byte [] -> unit + 1 overload member NextDouble : unit -> float member NextInt64 : unit -> int64 + 2 overloads member NextSingle : unit -> float32 member Sample : unit -> float static member Shared : Random
<summary>Represents a pseudo-random number generator, which is an algorithm that produces a sequence of numbers that meet certain statistical requirements for randomness.</summary>

--------------------
Random() : Random
Random(Seed: int) : Random
val rand : unit -> float
Random.NextDouble() : float
val X1s : float list
val i : int
val X2s : float list
val Ys : float list
val dataset : SymbolicExpression
val namedParams : s:seq<string * 'a> -> Collections.Generic.IDictionary<string,obj>
<summary> Construct a dictionary of named params to pass to an R function. ## Example For example, if you want to call the `R.plot` function with named parameters specifying `x`, `type`, `col` and `ylim`, you can use the following: [ "x", box widgets; "type", box "o"; "col", box "blue"; "ylim", box [0; 25] ] |&gt; namedParams |&gt; R.plot </summary>
val box : value:'T -> obj
<summary>Boxes a strongly typed value.</summary>
<param name="value">The value to box.</param>
<returns>The boxed object.</returns>
type R = static member ! :?paramArray: obj [] -> SymbolicExpression + 1 overload static member != :?paramArray: obj [] -> SymbolicExpression + 1 overload static member !_hexmode :?a: obj -> SymbolicExpression + 1 overload static member !_octmode :?a: obj -> SymbolicExpression + 1 overload static member $ :?paramArray: obj [] -> SymbolicExpression + 1 overload static member $<- :?paramArray: obj [] -> SymbolicExpression + 1 overload static member $<-_data_frame :?x: obj *?name: obj *?value: obj -> SymbolicExpression + 1 overload static member $_DLLInfo :?x: obj *?name: obj -> SymbolicExpression + 1 overload static member $_package__version :?x: obj *?name: obj -> SymbolicExpression + 1 overload static member %% :?paramArray: obj [] -> SymbolicExpression + 1 overload ...
Base R functions.
R.data_frame(paramsByName: Collections.Generic.IDictionary<string,obj>) : SymbolicExpression
R.data_frame(?___: obj,?row_names: obj,?check_rows: obj,?check_names: obj,?fix_empty_names: obj,?stringsAsFactors: obj,?paramArray: obj []) : SymbolicExpression
Data Frames
val result : SymbolicExpression
type R = static member AIC :?object: obj *?___: obj *?k: obj *?paramArray: obj [] -> SymbolicExpression + 1 overload static member ARMAacf :?ar: obj *?ma: obj *?lag_max: obj *?pacf: obj -> SymbolicExpression + 1 overload static member ARMAtoMA :?ar: obj *?ma: obj *?lag_max: obj -> SymbolicExpression + 1 overload static member BIC :?object: obj *?___: obj *?paramArray: obj [] -> SymbolicExpression + 1 overload static member Box_test :?x: obj *?lag: obj *?type: obj *?fitdf: obj -> SymbolicExpression + 1 overload static member C :?object: obj *?contr: obj *?how_many: obj *?___: obj *?paramArray: obj [] -> SymbolicExpression + 1 overload static member D :?expr: obj *?name: obj -> SymbolicExpression + 1 overload static member DF2formula :?x: obj *?env: obj -> SymbolicExpression + 1 overload static member Gamma :?link: obj -> SymbolicExpression + 1 overload static member HoltWinters :?x: obj *?alpha: obj *?beta: obj *?gamma: obj *?seasonal: obj *?start_periods: obj *?l_start: obj *?b_start: obj *?s_start: obj *?optim_start: obj *?optim_control: obj -> SymbolicExpression + 1 overload ...
R statistical functions.
R.lm(paramsByName: Collections.Generic.IDictionary<string,obj>) : SymbolicExpression
R.lm(?formula: obj,?data: obj,?subset: obj,?weights: obj,?na_action: obj,?method: obj,?model: obj,?x: obj,?y: obj,?qr: obj,?singular_ok: obj,?contrasts: obj,?offset: obj,?___: obj,?paramArray: obj []) : SymbolicExpression
Fitting Linear Models
val coefficients : NumericVector
(extension) SymbolicExpression.AsList() : GenericVector
val residuals : NumericVector
val summary : SymbolicExpression
R.summary(paramsByName: Collections.Generic.IDictionary<string,obj>) : SymbolicExpression
R.summary(?object: obj,?___: obj,?paramArray: obj []) : SymbolicExpression
Object Summaries
type R = static member Axis :?x: obj *?at: obj *?___: obj *?side: obj *?labels: obj *?paramArray: obj [] -> SymbolicExpression + 1 overload static member abline :?a: obj *?b: obj *?h: obj *?v: obj *?reg: obj *?coef: obj *?untf: obj *?___: obj *?paramArray: obj [] -> SymbolicExpression + 1 overload static member arrows :?x0: obj *?y0: obj *?x1: obj *?y1: obj *?length: obj *?angle: obj *?code: obj *?col: obj *?lty: obj *?lwd: obj *?___: obj *?paramArray: obj [] -> SymbolicExpression + 1 overload static member assocplot :?x: obj *?col: obj *?space: obj *?main: obj *?xlab: obj *?ylab: obj -> SymbolicExpression + 1 overload static member axTicks :?side: obj *?axp: obj *?usr: obj *?log: obj *?nintLog: obj -> SymbolicExpression + 1 overload static member axis :?side: obj *?at: obj *?labels: obj *?tick: obj *?line: obj *?pos: obj *?outer: obj *?font: obj *?lty: obj *?lwd: obj *?lwd_ticks: obj *?col: obj *?col_ticks: obj *?hadj: obj *?padj: obj *?gap_axis: obj *?___: obj *?paramArray: obj [] -> SymbolicExpression + 1 overload static member axis_Date :?side: obj *?x: obj *?at: obj *?format: obj *?labels: obj *?___: obj *?paramArray: obj [] -> SymbolicExpression + 1 overload static member axis_POSIXct :?side: obj *?x: obj *?at: obj *?format: obj *?labels: obj *?___: obj *?paramArray: obj [] -> SymbolicExpression + 1 overload static member barplot :?height: obj *?___: obj *?paramArray: obj [] -> SymbolicExpression + 1 overload static member barplot_default :?height: obj *?width: obj *?space: obj *?names_arg: obj *?legend_text: obj *?beside: obj *?horiz: obj *?density: obj *?angle: obj *?col: obj *?border: obj *?main: obj *?sub: obj *?xlab: obj *?ylab: obj *?xlim: obj *?ylim: obj *?xpd: obj *?log: obj *?axes: obj *?axisnames: obj *?cex_axis: obj *?cex_names: obj *?inside: obj *?plot: obj *?axis_lty: obj *?offset: obj *?add: obj *?ann: obj *?args_legend: obj *?___: obj *?paramArray: obj [] -> SymbolicExpression + 1 overload ...
R functions for base graphics.
Multiple items
R.plot(paramsByName: Collections.Generic.IDictionary<string,obj>) : SymbolicExpression
R.plot(?x: obj,?y: obj,?___: obj,?paramArray: obj []) : SymbolicExpression
No documentation available

--------------------
R.plot(paramsByName: Collections.Generic.IDictionary<string,obj>) : SymbolicExpression
R.plot(?x: obj,?y: obj,?___: obj,?paramArray: obj []) : SymbolicExpression
Generic X-Y Plotting