Header menu logo FSharp.Stats

Basics

Binder Notebook

Summary: this tutorial gives an overview over how to do some of the basic statistical measurements with FSharp.Stats.

Central tendency

A central tendency (or measure of central tendency) is a central or typical value for a probability distribution. It may also be called a center or location of the distribution. Colloquially, measures of central tendency are often called averages.

Mean

For a data set, the arithmetic mean, also called the expected value or average, is the central value of a discrete set of numbers: specifically, the sum of the values divided by the number of values:

\(\bar{x} = \frac{1}{n}\left (\sum_{i=1}^n{x_i}\right ) = \frac{x_1+x_2+\cdots +x_n}{n}\)

mean is available as a Sequence (and other collections) extension, as well as meanBy, which takes an additional converter function:

open FSharp.Stats

let mean1 = 
    [10; 2; 19; 24; 6; 23; 47; 24; 54; 77;]
    |> Seq.meanBy float
28.6
let mean2 = 
    [10.; 2.; 19.; 24.; 6.; 23.; 47.; 24.; 54.; 77.;]
    |> Seq.mean
28.6

Truncated mean

Computes the truncated (trimmed) mean where a given percentage of the highest and lowest values are discarded. In total 2 times the given percentage are discarded:

meanTruncated is available as a Sequence (and other collections) extension, as well as meanTruncatedBy, which takes an additional converter function:

let truncMean1 = 
    [10.; 2.; 19.; 24.; 6.; 23.; 47.; 24.; 54.; 77.;]
    |> Seq.meanTruncated 0.2
24.5
let truncMean2 = 
    [10; 2; 19; 24; 6; 23; 47; 24; 54; 77;]
    |> Seq.meanTruncatedBy float 0.2
34.75

Median

The median is a value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value: if you sort the values of a collection by size, the median is the value in central position. Therefore, there are as many bigger values as smaller values than the median in the collection. If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the mean of the two middle values.

median is available as a equence (and other collections) extension:

let median1 = 
    [10; 2; 19; 24; 6; 23; 47; 24; 54; 77;]
    |> Seq.median
23

Harmonic mean

The harmonic mean can be expressed as the reciprocal of the arithmetic mean of the reciprocals of the given set of observations. It is typically appropriate for situations when the average of rates is desired.

\(H = \frac{n}{\frac1{x_1} + \frac1{x_2} + \cdots + \frac1{x_n}} = \frac{n}{\sum\limits_{i=1}^n \frac1{x_i}} = \left(\frac{\sum\limits_{i=1}^n x_i^{-1}}{n}\right)^{-1}.\)

meanHarmonic is available as a sequence (and other collections) extension, as well as meanHarmonicBy, which takes an additional converter function:

let harmonicMean1 = 
    [10.; 2.; 19.; 24.; 6.; 23.; 47.; 24.; 54.; 77.;]
    |> Seq.meanHarmonic
10.01109262
let harmonicMean2 = 
    [10; 2; 19; 24; 6; 23; 47; 24; 54; 77;]
    |> Seq.meanHarmonicBy float
10.01109262

Geometric mean

The geometric mean indicates the central tendency or typical value of a set of numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum). The geometric mean is defined as the nth root of the product of n numbers:

\(\left(\prod_{i=1}^n x_i\right)^\frac{1}{n} = \sqrt[n]{x_1 x_2 \cdots x_n}\)

meanGeometric is available as a sequence (and other collections) extension, as well as meanGeometricBy, which takes an additional converter function:

let geometricMean1 = 
    [10.; 2.; 19.; 24.; 6.; 23.; 47.; 24.; 54.; 77.;]
    |> Seq.meanGeometric
18.9280882
let geometricMean2 = 
    [10; 2; 19; 24; 6; 23; 47; 24; 54; 77;]
    |> Seq.meanGeometricBy float 
 
18.9280882

Dispersion

Dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed.

Range

The range of a set of data is the difference between the largest and smallest values.

range is available as a sequence (and other collections) extension, as well as rangeBy, which takes an additional converter function:

Note: instead of returning the absolute difference between max and min value, these functions return an interval with these values as boundaries. **

let range1 = 
    [10.; 2.; 19.; 24.; 6.; 23.; 47.; 24.; 54.; 77.;]
    |> Seq.rangeBy float
Closed (2.0, 77.0)
let range2 = 
    [10; 2; 19; 24; 6; 23; 47; 24; 54; 77;]
    |> Seq.rangeBy float
Closed (2.0, 77.0)

Variance and Standard Deviation

The variance

\(s_N^2 = \frac{1}{N} \sum_{i=1}^N \left(x_i - \bar{x}\right)^2\)

and the standard deviation

\(s_N = \sqrt{\frac{1}{N} \sum_{i=1}^N \left(x_i - \bar{x}\right)^2}\)

are measures of dispersion the values of a collection have. While the standard deviation has the same unit as the values of the collection the variance has the squared unit.

varPopulation and stDevPopulation are available as sequence (and other collections) extensions, as well as varPopulationBy and stDevPopulationBy, which take an additional converter function:

let data = [|1.;3.;5.;4.;2.;8.|]

let varPopulation = Seq.varPopulation data
5.138888889
let stdPopulation = Seq.stDevPopulation data
2.266911751

If the full population is not given, the calculation lacks in one degree of freedom, so the Bessel corrected version of the calculation has to be used (results in higher values):

\(s^2 = \frac{1}{N - 1} \sum_{i=1}^N \left(x_i - \bar{x}\right)^2\) for the unbiased variance estimation, and

\(s = \sqrt{\frac{1}{N-1} \sum_{i=1}^N \left(x_i - \bar{x}\right)^2}\) for the corrected standard deviation.

var and stDev are available as sequence (and other collections) extensions, as well as varBy and stDevBy, which take an additional converter function:

let varSample = Seq.var data
6.166666667
let stdSample = Seq.stDev data
2.483277404

Coefficient of variation

The coefficient of variation is the mean-normalized standard deviation:

\(\widehat{c_{\rm v}} = \frac{s}{\bar{x}}\)

It describes the ratio of the standard deviation to the mean. It assists in comparing measurement variability with varying amplitudes. Use only if data is measured with a ratio scale (meaningful zero values and meaningful intervals).

cv is available as a sequence (and other collections) extension, as well as cvBy, which takes an additional converter function:

let sample1 =   [1.;4.;2.;6.;5.;3.;2.;]
let sample2 =   [13.;41.;29.;8.;52.;34.;25.;]

let cvSample1 = Seq.cv sample1
0.5476650327
let cvSample2 = Seq.cv sample2
0.5313890073
Multiple items
namespace FSharp

--------------------
namespace Microsoft.FSharp
namespace FSharp.Stats
val mean1: float
Multiple items
module Seq from FSharp.Stats
<summary> Module to compute common statistical measures. </summary>

--------------------
module Seq from Microsoft.FSharp.Collections

--------------------
type Seq = new: unit -> Seq static member geomspace: start: float * stop: float * num: int * ?IncludeEndpoint: bool -> float seq static member linspace: start: float * stop: float * num: int * ?IncludeEndpoint: bool -> float seq

--------------------
new: unit -> Seq
val meanBy: f: ('T -> 'U) -> items: 'T seq -> 'U (requires member Zero and member (+) and member DivideByInt and member (/))
<summary> Computes the population mean (Normalized by N) by applying a function to each element. </summary>
<param name="f">A function applied to transform each element of the sequence.</param>
<param name="items">The input sequence.</param>
<returns>The population mean (Normalized by N) of the transformed sequence.</returns>
<exception cref="System.DivideByZeroException">Thrown if the sequence is empty and type cannot divide by zero.</exception>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let m = Seq.meanBy (fun x -&gt; x * 2.0) values // returns 6.0 </code></example>
Multiple items
val float: value: 'T -> float (requires member op_Explicit)

--------------------
type float = System.Double

--------------------
type float<'Measure> = float
val mean2: float
val mean: items: 'T seq -> 'U (requires member (+) and member Zero and member DivideByInt and member (/))
<summary> Computes the population mean (Normalized by N). </summary>
<param name="items">The input sequence.</param>
<returns>The population mean (Normalized by N).</returns>
<exception cref="System.DivideByZeroException">Thrown if the sequence is empty and type cannot divide by zero.</exception>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let m = Seq.mean values // returns 3.0 </code></example>
val truncMean1: float
val meanTruncated: proportion: float -> data: 'T seq -> 'T (requires member Zero and comparison and member (+) and member DivideByInt and member (/))
<summary> Computes the truncated (trimmed) mean where x*count of the highest, and x*count of the lowest values are discarded (total 2x). </summary>
<param name="proportion">The proportion of values to discard from each end.</param>
<param name="data">The input sequence.</param>
<returns>The truncated (trimmed) mean of the input sequence.</returns>
<exception cref="System.DivideByZeroException">Thrown if the sequence is empty and type cannot divide by zero.</exception>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<example><code> let values = {1.0 .. 10.0} let m = Seq.meanTruncated 0.2 values // returns mean of {3.0 .. 8.0} or 5.5 </code></example>
val truncMean2: float
val meanTruncatedBy: f: ('T -> 'U) -> proportion: float -> data: 'T seq -> 'U (requires comparison and member Zero and member (+) and member DivideByInt and member (/))
<summary> Computes the truncated (trimmed) mean by applying a function to each element. </summary>
<param name="f">A function applied to transform each element of the sequence.</param>
<param name="proportion">The proportion of values to discard from each end.</param>
<param name="data">The input sequence.</param>
<returns>The truncated (trimmed) mean of the transformed input sequence.</returns>
<exception cref="System.DivideByZeroException">Thrown if the sequence is empty and type cannot divide by zero.</exception>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let m = Seq.meanTruncatedBy (fun x -&gt; x * 2.0) 0.2 values // returns 7.0 </code></example>
val median1: int
val median: items: 'T seq -> 'T (requires comparison and member Zero and member One and member (+) and member (/) and member (/))
<summary> Computes the sample median. </summary>
<param name="items">The input sequence.</param>
<returns>The sample median of the input sequence.</returns>
<example><code> let values = [1; 2; 3; 4; 5] let m = Seq.median values // returns 3 </code></example>
val harmonicMean1: float
val meanHarmonic: items: 'T seq -> 'T (requires member Zero and member One and member (+) and comparison and member (/))
<summary> Computes the harmonic mean. </summary>
<param name="items">The input sequence.</param>
<returns>The harmonic mean of the input sequence.</returns>
<exception cref="System.DivideByZeroException">Thrown if the sequence is empty and type cannot divide by zero.</exception>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let m = Seq.meanHarmonic values // returns approximately 2.18978 </code></example>
val harmonicMean2: float
val meanHarmonicBy: f: ('T -> 'U) -> items: 'T seq -> 'U (requires member Zero and member One and member (+) and comparison and member (/))
<summary> Computes the harmonic mean by applying a function to each element. </summary>
<param name="f">A function applied to transform each element of the sequence.</param>
<param name="items">The input sequence.</param>
<returns>The harmonic mean of the transformed input sequence.</returns>
<exception cref="System.DivideByZeroException">Thrown if the sequence is empty and type cannot divide by zero.</exception>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let m = Seq.meanHarmonicBy (fun x -&gt; x * 2.0) values // returns approximately 4.37956 </code></example>
val geometricMean1: float
val meanGeometric: items: 'T seq -> 'U (requires member (+) and member Log and member Zero and member DivideByInt and member Exp and member (/))
<summary> Computes the geometric mean. </summary>
<param name="items">The input sequence.</param>
<returns>The geometric mean of the input sequence.</returns>
<exception cref="System.DivideByZeroException">Thrown if the sequence is empty and type cannot divide by zero.</exception>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let m = Seq.meanGeometric values // returns approximately 2.60517 </code></example>
val geometricMean2: float
val meanGeometricBy: f: ('T -> 'a) -> items: 'T seq -> 'U (requires member (+) and member Log and member Zero and member DivideByInt and member Exp and member (/))
<summary> Computes the geometric mean by applying a function to each element. </summary>
<param name="f">A function applied to transform each element of the sequence.</param>
<param name="items">The input sequence.</param>
<returns>The geometric mean of the transformed input sequence.</returns>
<exception cref="System.DivideByZeroException">Thrown if the sequence is empty and type cannot divide by zero.</exception>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let m = Seq.meanGeometricBy (fun x -&gt; x * 2.0) values // returns approximately 5.21034 </code></example>
val range1: Interval<float>
val rangeBy: f: ('a -> 'a0) -> items: 'a seq -> Interval<'a> (requires comparison and comparison)
<summary> Computes the range of the input sequence by applying a function to each element. </summary>
<param name="f">A function applied to transform each element of the sequence.</param>
<param name="items">The input sequence.</param>
<returns>The range of the transformed input sequence as an <see cref="Interval{T}" />.</returns>
<example><code> let values = [1; 2; 3; 4; 5] let r = Seq.rangeBy (fun x -&gt; x * 2) values // returns Interval.Closed(1, 5) </code></example>
val range2: Interval<int>
val data: float array
val varPopulation: float
val varPopulation: items: 'T seq -> 'U (requires member (-) and member Zero and member DivideByInt and member (+) and member ( * ) and member (+) and member (/))
<summary> Computes the population variance estimator (denominator N). </summary>
<param name="items">The input sequence.</param>
<returns>The population variance estimator (denominator N) of the input sequence.</returns>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let v = Seq.varPopulation values // returns 2.0 </code></example>
val stdPopulation: float
val stDevPopulation: items: 'T seq -> 'U (requires member (-) and member Zero and member DivideByInt and member (+) and member ( * ) and member (+) and member (/) and member Sqrt)
<summary> Computes the population standard deviation (denominator = N). </summary>
<param name="items">The input sequence.</param>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<returns>The population standard deviation (denominator = N) of the input sequence.</returns>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let sd = Seq.stDevPopulation values // returns approximately 1.41421 </code></example>
val varSample: float
val var: items: 'T seq -> 'U (requires member (-) and member Zero and member DivideByInt and member (+) and member ( * ) and member (+) and member (/))
<summary> Computes the sample variance (Bessel's correction by N-1). </summary>
<param name="items">The input sequence.</param>
<returns>The sample variance (Bessel's correction by N-1) of the input sequence.</returns>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let v = Seq.var values // returns 2.5 </code></example>
val stdSample: float
val stDev: items: 'T seq -> 'U (requires member (-) and member Zero and member DivideByInt and member (+) and member ( * ) and member (+) and member (/) and member Sqrt)
<summary> Computes the sample standard deviation. </summary>
<param name="items">The input sequence.</param>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<returns>The sample standard deviation (Bessel's correction by N-1) of the input sequence.</returns>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let sd = Seq.stDev values // returns approximately 1.58114 </code></example>
val sample1: float list
val sample2: float list
val cvSample1: float
val cv: items: 'T seq -> 'U (requires member (-) and member Zero and member DivideByInt and member (+) and member ( * ) and member Sqrt and member (+) and member (/) and member (/))
<summary> Computes the Coefficient of Variation of a sample (Bessel's correction by N-1). </summary>
<param name="items">The input sequence.</param>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<returns>The Coefficient of Variation of a sample (Bessel's correction by N-1) of the input sequence.</returns>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let cv = Seq.cv values // returns approximately 0.52705 </code></example>
val cvSample2: float

Type something to start searching.