Basics
Summary: this tutorial gives an overview over how to do some of the basic statistical measurements with FSharp.Stats.
Central tendency
A central tendency (or measure of central tendency) is a central or typical value for a probability distribution. It may also be called a center or location of the distribution. Colloquially, measures of central tendency are often called averages.
Mean
For a data set, the arithmetic mean, also called the expected value or average, is the central value of a discrete set of numbers: specifically, the sum of the values divided by the number of values:
\(\bar{x} = \frac{1}{n}\left (\sum_{i=1}^n{x_i}\right ) = \frac{x_1+x_2+\cdots +x_n}{n}\)
mean is available as a Sequence (and other collections) extension, as well as meanBy,
which takes an additional converter function:
open FSharp.Stats
let mean1 =
[10; 2; 19; 24; 6; 23; 47; 24; 54; 77;]
|> Seq.meanBy float
|
let mean2 =
[10.; 2.; 19.; 24.; 6.; 23.; 47.; 24.; 54.; 77.;]
|> Seq.mean
|
Truncated mean
Computes the truncated (trimmed) mean where a given percentage of the highest and lowest values are discarded. In total 2 times the given percentage are discarded:
meanTruncated is available as a Sequence (and other collections) extension, as well as meanTruncatedBy,
which takes an additional converter function:
let truncMean1 =
[10.; 2.; 19.; 24.; 6.; 23.; 47.; 24.; 54.; 77.;]
|> Seq.meanTruncated 0.2
|
let truncMean2 =
[10; 2; 19; 24; 6; 23; 47; 24; 54; 77;]
|> Seq.meanTruncatedBy float 0.2
|
Median
The median is a value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value: if you sort the values of a collection by size, the median is the value in central position. Therefore, there are as many bigger values as smaller values than the median in the collection. If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the mean of the two middle values.
median is available as a equence (and other collections) extension:
let median1 =
[10; 2; 19; 24; 6; 23; 47; 24; 54; 77;]
|> Seq.median
|
Harmonic mean
The harmonic mean can be expressed as the reciprocal of the arithmetic mean of the reciprocals of the given set of observations. It is typically appropriate for situations when the average of rates is desired.
\(H = \frac{n}{\frac1{x_1} + \frac1{x_2} + \cdots + \frac1{x_n}} = \frac{n}{\sum\limits_{i=1}^n \frac1{x_i}} = \left(\frac{\sum\limits_{i=1}^n x_i^{-1}}{n}\right)^{-1}.\)
meanHarmonic is available as a sequence (and other collections) extension, as well as meanHarmonicBy,
which takes an additional converter function:
let harmonicMean1 =
[10.; 2.; 19.; 24.; 6.; 23.; 47.; 24.; 54.; 77.;]
|> Seq.meanHarmonic
|
let harmonicMean2 =
[10; 2; 19; 24; 6; 23; 47; 24; 54; 77;]
|> Seq.meanHarmonicBy float
|
Geometric mean
The geometric mean indicates the central tendency or typical value of a set of numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum). The geometric mean is defined as the nth root of the product of n numbers:
\(\left(\prod_{i=1}^n x_i\right)^\frac{1}{n} = \sqrt[n]{x_1 x_2 \cdots x_n}\)
meanGeometric is available as a sequence (and other collections) extension, as well as meanGeometricBy,
which takes an additional converter function:
let geometricMean1 =
[10.; 2.; 19.; 24.; 6.; 23.; 47.; 24.; 54.; 77.;]
|> Seq.meanGeometric
|
let geometricMean2 =
[10; 2; 19; 24; 6; 23; 47; 24; 54; 77;]
|> Seq.meanGeometricBy float
|
Dispersion
Dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed.
Range
The range of a set of data is the difference between the largest and smallest values.
range is available as a sequence (and other collections) extension, as well as rangeBy,
which takes an additional converter function:
Note: instead of returning the absolute difference between max and min value, these functions return an interval with these values as boundaries. **
let range1 =
[10.; 2.; 19.; 24.; 6.; 23.; 47.; 24.; 54.; 77.;]
|> Seq.rangeBy float
|
let range2 =
[10; 2; 19; 24; 6; 23; 47; 24; 54; 77;]
|> Seq.rangeBy float
|
Variance and Standard Deviation
The variance
\(s_N^2 = \frac{1}{N} \sum_{i=1}^N \left(x_i - \bar{x}\right)^2\)
and the standard deviation
\(s_N = \sqrt{\frac{1}{N} \sum_{i=1}^N \left(x_i - \bar{x}\right)^2}\)
are measures of dispersion the values of a collection have. While the standard deviation has the same unit as the values of the collection the variance has the squared unit.
varPopulation and stDevPopulation are available as sequence (and other collections) extensions, as well as varPopulationBy and stDevPopulationBy,
which take an additional converter function:
let data = [|1.;3.;5.;4.;2.;8.|]
let varPopulation = Seq.varPopulation data
|
let stdPopulation = Seq.stDevPopulation data
|
If the full population is not given, the calculation lacks in one degree of freedom, so the Bessel corrected version of the calculation has to be used (results in higher values):
\(s^2 = \frac{1}{N - 1} \sum_{i=1}^N \left(x_i - \bar{x}\right)^2\) for the unbiased variance estimation, and
\(s = \sqrt{\frac{1}{N-1} \sum_{i=1}^N \left(x_i - \bar{x}\right)^2}\) for the corrected standard deviation.
var and stDev are available as sequence (and other collections) extensions, as well as varBy and stDevBy,
which take an additional converter function:
let varSample = Seq.var data
|
let stdSample = Seq.stDev data
|
Coefficient of variation
The coefficient of variation is the mean-normalized standard deviation:
\(\widehat{c_{\rm v}} = \frac{s}{\bar{x}}\)
It describes the ratio of the standard deviation to the mean. It assists in comparing measurement variability with varying amplitudes. Use only if data is measured with a ratio scale (meaningful zero values and meaningful intervals).
cv is available as a sequence (and other collections) extension, as well as cvBy,
which takes an additional converter function:
let sample1 = [1.;4.;2.;6.;5.;3.;2.;]
let sample2 = [13.;41.;29.;8.;52.;34.;25.;]
let cvSample1 = Seq.cv sample1
|
let cvSample2 = Seq.cv sample2
|
namespace FSharp
--------------------
namespace Microsoft.FSharp
module Seq from FSharp.Stats
<summary> Module to compute common statistical measures. </summary>
--------------------
module Seq from Microsoft.FSharp.Collections
--------------------
type Seq = new: unit -> Seq static member geomspace: start: float * stop: float * num: int * ?IncludeEndpoint: bool -> float seq static member linspace: start: float * stop: float * num: int * ?IncludeEndpoint: bool -> float seq
--------------------
new: unit -> Seq
<summary> Computes the population mean (Normalized by N) by applying a function to each element. </summary>
<param name="f">A function applied to transform each element of the sequence.</param>
<param name="items">The input sequence.</param>
<returns>The population mean (Normalized by N) of the transformed sequence.</returns>
<exception cref="System.DivideByZeroException">Thrown if the sequence is empty and type cannot divide by zero.</exception>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let m = Seq.meanBy (fun x -> x * 2.0) values // returns 6.0 </code></example>
val float: value: 'T -> float (requires member op_Explicit)
--------------------
type float = System.Double
--------------------
type float<'Measure> = float
<summary> Computes the population mean (Normalized by N). </summary>
<param name="items">The input sequence.</param>
<returns>The population mean (Normalized by N).</returns>
<exception cref="System.DivideByZeroException">Thrown if the sequence is empty and type cannot divide by zero.</exception>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let m = Seq.mean values // returns 3.0 </code></example>
<summary> Computes the truncated (trimmed) mean where x*count of the highest, and x*count of the lowest values are discarded (total 2x). </summary>
<param name="proportion">The proportion of values to discard from each end.</param>
<param name="data">The input sequence.</param>
<returns>The truncated (trimmed) mean of the input sequence.</returns>
<exception cref="System.DivideByZeroException">Thrown if the sequence is empty and type cannot divide by zero.</exception>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<example><code> let values = {1.0 .. 10.0} let m = Seq.meanTruncated 0.2 values // returns mean of {3.0 .. 8.0} or 5.5 </code></example>
<summary> Computes the truncated (trimmed) mean by applying a function to each element. </summary>
<param name="f">A function applied to transform each element of the sequence.</param>
<param name="proportion">The proportion of values to discard from each end.</param>
<param name="data">The input sequence.</param>
<returns>The truncated (trimmed) mean of the transformed input sequence.</returns>
<exception cref="System.DivideByZeroException">Thrown if the sequence is empty and type cannot divide by zero.</exception>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let m = Seq.meanTruncatedBy (fun x -> x * 2.0) 0.2 values // returns 7.0 </code></example>
<summary> Computes the sample median. </summary>
<param name="items">The input sequence.</param>
<returns>The sample median of the input sequence.</returns>
<example><code> let values = [1; 2; 3; 4; 5] let m = Seq.median values // returns 3 </code></example>
<summary> Computes the harmonic mean. </summary>
<param name="items">The input sequence.</param>
<returns>The harmonic mean of the input sequence.</returns>
<exception cref="System.DivideByZeroException">Thrown if the sequence is empty and type cannot divide by zero.</exception>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let m = Seq.meanHarmonic values // returns approximately 2.18978 </code></example>
<summary> Computes the harmonic mean by applying a function to each element. </summary>
<param name="f">A function applied to transform each element of the sequence.</param>
<param name="items">The input sequence.</param>
<returns>The harmonic mean of the transformed input sequence.</returns>
<exception cref="System.DivideByZeroException">Thrown if the sequence is empty and type cannot divide by zero.</exception>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let m = Seq.meanHarmonicBy (fun x -> x * 2.0) values // returns approximately 4.37956 </code></example>
<summary> Computes the geometric mean. </summary>
<param name="items">The input sequence.</param>
<returns>The geometric mean of the input sequence.</returns>
<exception cref="System.DivideByZeroException">Thrown if the sequence is empty and type cannot divide by zero.</exception>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let m = Seq.meanGeometric values // returns approximately 2.60517 </code></example>
<summary> Computes the geometric mean by applying a function to each element. </summary>
<param name="f">A function applied to transform each element of the sequence.</param>
<param name="items">The input sequence.</param>
<returns>The geometric mean of the transformed input sequence.</returns>
<exception cref="System.DivideByZeroException">Thrown if the sequence is empty and type cannot divide by zero.</exception>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let m = Seq.meanGeometricBy (fun x -> x * 2.0) values // returns approximately 5.21034 </code></example>
<summary> Computes the range of the input sequence by applying a function to each element. </summary>
<param name="f">A function applied to transform each element of the sequence.</param>
<param name="items">The input sequence.</param>
<returns>The range of the transformed input sequence as an <see cref="Interval{T}" />.</returns>
<example><code> let values = [1; 2; 3; 4; 5] let r = Seq.rangeBy (fun x -> x * 2) values // returns Interval.Closed(1, 5) </code></example>
<summary> Computes the population variance estimator (denominator N). </summary>
<param name="items">The input sequence.</param>
<returns>The population variance estimator (denominator N) of the input sequence.</returns>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let v = Seq.varPopulation values // returns 2.0 </code></example>
<summary> Computes the population standard deviation (denominator = N). </summary>
<param name="items">The input sequence.</param>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<returns>The population standard deviation (denominator = N) of the input sequence.</returns>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let sd = Seq.stDevPopulation values // returns approximately 1.41421 </code></example>
<summary> Computes the sample variance (Bessel's correction by N-1). </summary>
<param name="items">The input sequence.</param>
<returns>The sample variance (Bessel's correction by N-1) of the input sequence.</returns>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let v = Seq.var values // returns 2.5 </code></example>
<summary> Computes the sample standard deviation. </summary>
<param name="items">The input sequence.</param>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<returns>The sample standard deviation (Bessel's correction by N-1) of the input sequence.</returns>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let sd = Seq.stDev values // returns approximately 1.58114 </code></example>
<summary> Computes the Coefficient of Variation of a sample (Bessel's correction by N-1). </summary>
<param name="items">The input sequence.</param>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<returns>The Coefficient of Variation of a sample (Bessel's correction by N-1) of the input sequence.</returns>
<example><code> let values = [1.0; 2.0; 3.0; 4.0; 5.0] let cv = Seq.cv values // returns approximately 0.52705 </code></example>
FSharp.Stats