R Type Provider


Quickstart: Using Statistical Packages

R is a programming language designed for statistics and data mining. The R community is strong, and created an incredibly rich open source ecosystem of packages.

The F# R Type Provider enables you to use every single one of them, from within the F# environment. You can manipulate data using F#, send it to R for computation, and extract back the results.

Example: Linear Regression

Let's perform a simple linear regression from the F# interactive, using the R.lm function.

Assuming you installed the R Type Provider in your project from NuGet, you can reference the required libraries and packages this way:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
#I "../packages/RProvider.1.0.11"
#load "RProvider.fsx"

open System
open RDotNet
open RProvider
open RProvider.graphics
open RProvider.stats

Once the libraries and packages have been loaded, Imagine that our true model is

Y = 5.0 + 3.0 X1 - 2.0 X2 + noise

Let's generate a fake dataset that follows this model:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
// Random number generator
let rng = Random()
let rand () = rng.NextDouble()

// Generate fake X1 and X2 
let X1s = [ for i in 0 .. 9 -> 10. * rand () ]
let X2s = [ for i in 0 .. 9 -> 5. * rand () ]

// Build Ys, following the "true" model
let Ys = [ for i in 0 .. 9 -> 5. + 3. * X1s.[i] - 2. * X2s.[i] + rand () ]

Using linear regression on this dataset, we should be able to estimate the coefficients 5.0, 3.0 and -2.0, with some imprecision due to the "noise" part.

Let's first put our dataset into a R dataframe; this allows us to name our vectors, and use these names in R formulas afterwards:

1: 
2: 
3: 
4: 
5: 
6: 
let dataset =
    namedParams [
        "Y", box Ys;
        "X1", box X1s;
        "X2", box X2s; ]
    |> R.data_frame

We can now use R to perform a linear regression. We call the R.lm function, passing it the formula we want to estimate. (See the R manual on formulas for more on their somewhat esoteric construction)

1: 
let result = R.lm(formula = "Y~X1+X2", data = dataset)

Extracting Results from R to F#

The result we get back from R is a R Expression. The R Type Provider tries as much as possible to keep data as R Expressions, rather than converting back-and-forth between F# and R types. It limits translations between the 2 languages, which has performance benefits, and simplifies composing R operations. On the other hand, we need to extract the results from the R expression into F# types.

The R docs for lm describes what R.lm returns: a R List. We can now retrieve each element, accessing it by name (as defined in the documentation). For instance, let's retrieve the coefficients and residuals, which are both R vectors containg floats:

1: 
2: 
let coefficients = result.AsList().["coefficients"].AsNumeric()
let residuals = result.AsList().["residuals"].AsNumeric()

We can also produce summary statistics about our model, like R^2, which measures goodness-of-fit - close to 0 indicates a very poor fit, and close to 1 a good fit. See R docs for the details on Summary.

1: 
2: 
let summary = R.summary(result)
summary.AsList().["r.squared"].AsNumeric()

Finally, we can directly pass results, which is a R expression, to R.plot, to produce some fancy charts describing our model:

1: 
R.plot result

That's it - while simple, we hope this example illustrate how you would go about to use any existing R statistical package. While the details would differ, the general approach would remain the same. Happy modelling!

namespace System
namespace RDotNet
namespace RProvider
namespace RProvider.graphics
namespace RProvider.stats
val rng : Random

Full name: Statistics-QuickStart.rng
Multiple items
type Random =
  new : unit -> Random + 1 overload
  member Next : unit -> int + 2 overloads
  member NextBytes : buffer:byte[] -> unit
  member NextDouble : unit -> float

Full name: System.Random

--------------------
Random() : unit
Random(Seed: int) : unit
val rand : unit -> float

Full name: Statistics-QuickStart.rand
Random.NextDouble() : float
val X1s : float list

Full name: Statistics-QuickStart.X1s
val i : int
val X2s : float list

Full name: Statistics-QuickStart.X2s
val Ys : float list

Full name: Statistics-QuickStart.Ys
val dataset : SymbolicExpression

Full name: Statistics-QuickStart.dataset
val namedParams : s:seq<string * 'a> -> Collections.Generic.IDictionary<string,obj>

Full name: RProvider.Helpers.namedParams
val box : value:'T -> obj

Full name: Microsoft.FSharp.Core.Operators.box
type R =
  static member ! : ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member != : ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member !_hexmode : ?a: obj -> SymbolicExpression + 1 overload
  static member !_octmode : ?a: obj -> SymbolicExpression + 1 overload
  static member $ : ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member $<- : ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member $<-_data_frame : ?x: obj * ?name: obj * ?value: obj -> SymbolicExpression + 1 overload
  static member $_DLLInfo : ?x: obj * ?name: obj -> SymbolicExpression + 1 overload
  static member $_data_frame : ?x: obj * ?name: obj -> SymbolicExpression + 1 overload
  static member $_package__version : ?x: obj * ?name: obj -> SymbolicExpression + 1 overload
  ...

Full name: RProvider.R


Base R functions.
R.data_frame(paramsByName: Collections.Generic.IDictionary<string,obj>) : SymbolicExpression
R.data_frame(?___: obj, ?row_names: obj, ?check_rows: obj, ?check_names: obj, ?fix_empty_names: obj, ?stringsAsFactors: obj, ?paramArray: obj []) : SymbolicExpression


Data Frames
val result : SymbolicExpression

Full name: Statistics-QuickStart.result
type R =
  static member AIC : ?object: obj * ?___: obj * ?k: obj * ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member ARMAacf : ?ar: obj * ?ma: obj * ?lag_max: obj * ?pacf: obj -> SymbolicExpression + 1 overload
  static member ARMAtoMA : ?ar: obj * ?ma: obj * ?lag_max: obj -> SymbolicExpression + 1 overload
  static member BIC : ?object: obj * ?___: obj * ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member Box_test : ?x: obj * ?lag: obj * ?type: obj * ?fitdf: obj -> SymbolicExpression + 1 overload
  static member C : ?object: obj * ?contr: obj * ?how_many: obj * ?___: obj * ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member D : ?expr: obj * ?name: obj -> SymbolicExpression + 1 overload
  static member Gamma : ?link: obj -> SymbolicExpression + 1 overload
  static member HoltWinters : ?x: obj * ?alpha: obj * ?beta: obj * ?gamma: obj * ?seasonal: obj * ?start_periods: obj * ?l_start: obj * ?b_start: obj * ?s_start: obj * ?optim_start: obj * ?optim_control: obj -> SymbolicExpression + 1 overload
  static member IQR : ?x: obj * ?na_rm: obj * ?type: obj -> SymbolicExpression + 1 overload
  ...

Full name: RProvider.stats.R


R statistical functions.
R.lm(paramsByName: Collections.Generic.IDictionary<string,obj>) : SymbolicExpression
R.lm(?formula: obj, ?data: obj, ?subset: obj, ?weights: obj, ?na_action: obj, ?method: obj, ?model: obj, ?x: obj, ?y: obj, ?qr: obj, ?singular_ok: obj, ?contrasts: obj, ?offset: obj, ?___: obj, ?paramArray: obj []) : SymbolicExpression


Fitting Linear Models
val coefficients : NumericVector

Full name: Statistics-QuickStart.coefficients
(extension) SymbolicExpression.AsList() : GenericVector
val residuals : NumericVector

Full name: Statistics-QuickStart.residuals
val summary : SymbolicExpression

Full name: Statistics-QuickStart.summary
R.summary(paramsByName: Collections.Generic.IDictionary<string,obj>) : SymbolicExpression
R.summary(?object: obj, ?___: obj, ?paramArray: obj []) : SymbolicExpression


Object Summaries
type R =
  static member Axis : ?x: obj * ?at: obj * ?___: obj * ?side: obj * ?labels: obj * ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member abline : ?a: obj * ?b: obj * ?h: obj * ?v: obj * ?reg: obj * ?coef: obj * ?untf: obj * ?___: obj * ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member arrows : ?x0: obj * ?y0: obj * ?x1: obj * ?y1: obj * ?length: obj * ?angle: obj * ?code: obj * ?col: obj * ?lty: obj * ?lwd: obj * ?___: obj * ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member assocplot : ?x: obj * ?col: obj * ?space: obj * ?main: obj * ?xlab: obj * ?ylab: obj -> SymbolicExpression + 1 overload
  static member axTicks : ?side: obj * ?axp: obj * ?usr: obj * ?log: obj * ?nintLog: obj -> SymbolicExpression + 1 overload
  static member axis : ?side: obj * ?at: obj * ?labels: obj * ?tick: obj * ?line: obj * ?pos: obj * ?outer: obj * ?font: obj * ?lty: obj * ?lwd: obj * ?lwd_ticks: obj * ?col: obj * ?col_ticks: obj * ?hadj: obj * ?padj: obj * ?___: obj * ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member axis_Date : ?side: obj * ?x: obj * ?at: obj * ?format: obj * ?labels: obj * ?___: obj * ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member axis_POSIXct : ?side: obj * ?x: obj * ?at: obj * ?format: obj * ?labels: obj * ?___: obj * ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member barplot : ?height: obj * ?___: obj * ?paramArray: obj [] -> SymbolicExpression + 1 overload
  static member barplot_default : ?height: obj * ?width: obj * ?space: obj * ?names_arg: obj * ?legend_text: obj * ?beside: obj * ?horiz: obj * ?density: obj * ?angle: obj * ?col: obj * ?border: obj * ?main: obj * ?sub: obj * ?xlab: obj * ?ylab: obj * ?xlim: obj * ?ylim: obj * ?xpd: obj * ?log: obj * ?axes: obj * ?axisnames: obj * ?cex_axis: obj * ?cex_names: obj * ?inside: obj * ?plot: obj * ?axis_lty: obj * ?offset: obj * ?add: obj * ?args_legend: obj * ?___: obj * ?paramArray: obj [] -> SymbolicExpression + 1 overload
  ...

Full name: RProvider.graphics.R


R functions for base graphics.
R.plot(paramsByName: Collections.Generic.IDictionary<string,obj>) : SymbolicExpression
R.plot(?x: obj, ?y: obj, ?___: obj, ?paramArray: obj []) : SymbolicExpression


Generic X-Y Plotting
Fork me on GitHub