Logo RProvider

Quickstart: Using Statistical Packages

A strong R community has contributed over 20,000 packages to CRAN, R's central package registry. The F# R Type Provider enables you to use every single one of them from within the F# environment.

Using RRrovider, you can orchestrate R workflows and manipulate R data, pass in F# values, and extract R values back to F#.

For this example, we simply demonstrate some basic RProvider concepts using the built-in stats package.

Example: Linear Regression

Let's perform a simple linear regression from the F# interactive, using the R.lm function.

Once you have referenced RProvider's nuget package in your script, library, or app, you can reference the required libraries and packages this way:

open RProvider
open RProvider.Operators

open RProvider.graphics
open RProvider.stats

Once the libraries and packages have been loaded, Imagine that our true model is

Y = 5.0 + 3.0 * X1 - 2.0 * X2 + noise

Let's generate a fake dataset using F# that follows this model:

// Random number generator
let rng = System.Random()
let rand () = rng.NextDouble()

// Generate fake X1 and X2 
let X1s = [ for i in 0 .. 9 -> 10. * rand () ]
let X2s = [ for i in 0 .. 9 -> 5. * rand () ]

// Build Ys, following the "true" model
let Ys = [ for i in 0 .. 9 -> 5. + 3. * X1s.[i] - 2. * X2s.[i] + rand () ]

Using linear regression on this dataset, we should be able to estimate the coefficients 5.0, 3.0 and -2.0, with some imprecision due to the "noise" part.

Let's first put our dataset into a R dataframe; this allows us to name our vectors, and use these names in R formulas afterwards:

let dataset = [ 
    "Y" => Ys
    "X1" => X1s
    "X2" => X2s ] |> R.data_frame

We can now use R to perform a linear regression. We call the R.lm function, passing it the formula we want to estimate. (See the R manual on formulas for more on their somewhat esoteric construction)

let result = R.lm(formula = "Y~X1+X2", data = dataset)

Extracting Results from R to F#

The result we get back from R is a R Expression. The R Type Provider tries as much as possible to keep data as R Expressions, rather than converting back-and-forth between F# and R types. It limits translations between the 2 languages, which has performance benefits, and simplifies composing R operations. On the other hand, we need to extract the results from the R expression into F# types.

The R docs for lm describes what R.lm returns: a R List. We can now retrieve each element, accessing it by name (as defined in the documentation). For instance, let's retrieve the coefficients and residuals, which are both R vectors containg floats:

let coefficients = result?coefficients.AsVector().AsReal()
let residuals = result?residuals.AsVector().AsReal()

We can also produce summary statistics about our model, like R^2, which measures goodness-of-fit - close to 0 indicates a very poor fit, and close to 1 a good fit. See R docs for the details on Summary.

let summary = R.summary result

summary?``r.squared``.AsScalar()
NumericS { Sexp = { ptr = 4857322896n } }

Finally, we can directly pass results, which is a R expression, to R.plot, to produce some fancy charts describing our model:

Graphics.svg 8 4 (fun _ -> R.plot result)
0.0 0.1 0.2 0.3 0.4 -2 -1 0 1 Leverage Standardized residuals (function (formula, data, subset, weights, na.action, method = "qr", model ... Cook's distance 1 0.5 0.5 Residuals vs Leverage 7 5 2

That's it - while simple, we hope this example illustrate how you would go about to use any existing R statistical package. While the details would differ, the general approach would remain the same. Happy modelling!

namespace RProvider
module Operators from RProvider
<summary> Custom operators that make composing and working with R symbolic expressions easier. </summary>
namespace RProvider.graphics
namespace RProvider.stats
val rng: System.Random
namespace System
Multiple items
type Random = new: unit -> unit + 1 overload member GetHexString: stringLength: int * ?lowercase: bool -> string + 1 overload member GetItems<'T> : choices: ReadOnlySpan<'T> * length: int -> 'T array + 2 overloads member GetString: choices: ReadOnlySpan<char> * length: int -> string member Next: unit -> int + 2 overloads member NextBytes: buffer: byte array -> unit + 1 overload member NextDouble: unit -> float member NextInt64: unit -> int64 + 2 overloads member NextSingle: unit -> float32 member Shuffle<'T> : values: Span<'T> -> unit + 1 overload ...
<summary>Represents a pseudo-random number generator, which is an algorithm that produces a sequence of numbers that meet certain statistical requirements for randomness.</summary>

--------------------
System.Random() : System.Random
System.Random(Seed: int) : System.Random
val rand: unit -> float
System.Random.NextDouble() : float
val X1s: float list
val i: int
val X2s: float list
val Ys: float list
val dataset: Abstractions.RExpr
type R = static member ``!`` : ?paramArray: obj array -> RExpr + 2 overloads static member ``!=`` : ?paramArray: obj array -> RExpr + 2 overloads static member ``!_hexmode`` : ?a: obj -> RExpr + 2 overloads static member ``!_octmode`` : ?a: obj -> RExpr + 2 overloads static member ``$`` : ?paramArray: obj array -> RExpr + 2 overloads static member ``$<-`` : ?paramArray: obj array -> RExpr + 2 overloads static member ``$<-_POSIXlt`` : ?x: obj * ?name: obj * ?value: obj -> RExpr + 2 overloads static member ``$<-_data_frame`` : ?x: obj * ?name: obj * ?value: obj -> RExpr + 2 overloads static member ``$_DLLInfo`` : ?x: obj * ?name: obj -> RExpr + 2 overloads static member ``$_package__version`` : ?x: obj * ?name: obj -> RExpr + 2 overloads ...
Base R functions.
R.data_frame(paramsByName: List<string * obj>) : Abstractions.RExpr
Data Frames
R.data_frame(paramsByName: System.Collections.Generic.IDictionary<string,obj>) : Abstractions.RExpr
Data Frames
R.data_frame(?row_names: obj, ?check_rows: obj, ?check_names: obj, ?fix_empty_names: obj, ?stringsAsFactors: obj, ?paramArray: obj array) : Abstractions.RExpr
Data Frames
val result: Abstractions.RExpr
type R = static member AIC: ?object: obj * ?k: obj * ?paramArray: obj array -> RExpr + 2 overloads static member ARMAacf: ?ar: obj * ?ma: obj * ?lag_max: obj * ?pacf: obj -> RExpr + 2 overloads static member ARMAtoMA: ?ar: obj * ?ma: obj * ?lag_max: obj -> RExpr + 2 overloads static member BIC: ?object: obj * ?paramArray: obj array -> RExpr + 2 overloads static member Box_test: ?x: obj * ?lag: obj * ?``type`` : obj * ?fitdf: obj -> RExpr + 2 overloads static member C: ?object: obj * ?contr: obj * ?how_many: obj * ?paramArray: obj array -> RExpr + 2 overloads static member D: ?expr: obj * ?name: obj -> RExpr + 2 overloads static member DF2formula: ?x: obj * ?env: obj -> RExpr + 2 overloads static member Gamma: ?link: obj -> RExpr + 2 overloads static member HoltWinters: ?x: obj * ?alpha: obj * ?beta: obj * ?gamma: obj * ?seasonal: obj * ?start_periods: obj * ?l_start: obj * ?b_start: obj * ?s_start: obj * ?optim_start: obj * ?optim_control: obj -> RExpr + 2 overloads ...
R statistical functions.
R.lm(paramsByName: List<string * obj>) : Abstractions.RExpr
Fitting Linear Models
R.lm(paramsByName: System.Collections.Generic.IDictionary<string,obj>) : Abstractions.RExpr
Fitting Linear Models
R.lm(?formula: obj, ?data: obj, ?subset: obj, ?weights: obj, ?na_action: obj, ?method: obj, ?model: obj, ?x: obj, ?y: obj, ?qr: obj, ?singular_ok: obj, ?contrasts: obj, ?offset: obj, ?paramArray: obj array) : Abstractions.RExpr
Fitting Linear Models
val coefficients: Runtime.RTypes.Real.Vector.RRealVector<1>
val residuals: Runtime.RTypes.Real.Vector.RRealVector<1>
val summary: Abstractions.RExpr
R.summary(paramsByName: List<string * obj>) : Abstractions.RExpr
Object Summaries
R.summary(paramsByName: System.Collections.Generic.IDictionary<string,obj>) : Abstractions.RExpr
Object Summaries
R.summary(?object: obj, ?paramArray: obj array) : Abstractions.RExpr
Object Summaries
module Graphics from RProvider
<summary> Functions for working with R graphics </summary>
val svg: width: float -> height: float -> doPlot: (unit -> Abstractions.RExpr) -> string
<summary>Capture the output of an R function that uses a graphics device into a string.</summary>
<param name="width">Width of the SVG to generate</param>
<param name="height">Height of the SVG to generate</param>
<param name="doPlot">A function that has the side-effect of writing to an active R graphics device.</param>
<returns>An SVG-formatted XML string.</returns>
R.plot(paramsByName: List<string * obj>) : Abstractions.RExpr
Generic X-Y Plotting
R.plot(paramsByName: System.Collections.Generic.IDictionary<string,obj>) : Abstractions.RExpr
Generic X-Y Plotting
R.plot(?x: obj, ?y: obj, ?paramArray: obj array) : Abstractions.RExpr
Generic X-Y Plotting

Type something to start searching.