FSharp.Stats


Summary

Short documentation of very basic statistical operations

Central Tendency

Mean

"Mean" stands here for the arithmetic mean (also called average) is the sum of numbers in a collection divided by the count of those numbers.
The mean function is usually located in the module of the respective collection:

1: 
2: 
let v = vector [|1.;2.;5.|]
let mean = Vector.mean v
2.666666667

Median

If you sort the values of a collection by size, the median is the value in central position. Therefore there are as many bigger values as smaller values than the median in the collection. The median function is usually located in the module of the respective collection:

1: 
2: 
let arr = [|1.;3.;5.;4.;2.;8.|]
let median = Array.median arr
3.5

Truncated/Trimmed mean

Computes the truncated (trimmed) mean where a given percentage of the highest and lowest values are discarded. In total 2 times the given percentage are discarded.

1: 
2: 
let seq = seq [1.;3.;5.;4.;2.;8.]
let truMean = Seq.meanTruncated 0.2 arr
3.5

Dispersion

Variance/Standard Deviation

The variance and standard deviation are measures of dispersion the values of a collection have. While the standard deviation has the same unit as the values of the collection the variance has the squared unit. If the full population is not given, the calculation lacks in one degree of freedom, so the Bessel corrected version of the calculation has to be used (results in higher values).

1: 
2: 
3: 
4: 
5: 
let data =          [|1.;3.;5.;4.;2.;8.|]
let varSample =     Seq.var data
let varPopulation = Seq.varPopulation data
let stdSample =     Seq.stDev data
let stdPopulation = Seq.stDevPopulation data
"
stdSample:     2.483
stdPopulation: 2.267"

Coefficient of variation

The coefficient of variation is the mean-normalized standard deviation. It describes the ratio of the standard devation to the mean. It assists in comparing measurement variability with varying amplitudes. Use only if data is measured with a ratio scale (meaningful zero values and meaningful intervals).

1: 
2: 
3: 
4: 
5: 
6: 
7: 
let sample1 =   [1.;4.;2.;6.;5.;3.;2.;]
let sample2 =   [13.;41.;29.;8.;52.;34.;25.;]
let cvSample1 = Seq.cv sample1
let cvSample2 = Seq.cv sample2

//use if data is complete (whole population was measured)
//let cvPopulation = Seq.cvPopulation data
"
cvSample1: 0.548
cvSample2: 0.531"
Multiple items
namespace FSharp

--------------------
namespace Microsoft.FSharp
namespace FSharp.Stats
val v : Vector<float>
Multiple items
val vector : l:seq<float> -> Vector<float>

--------------------
type vector = Vector<float>
val mean : float
Multiple items
module Vector

from FSharp.Stats

--------------------
type Vector<'T> =
  interface IEnumerable
  interface IEnumerable<'T>
  interface IStructuralEquatable
  interface IStructuralComparable
  interface IComparable
  new : opsV:INumeric<'T> option * arrV:'T array -> Vector<'T>
  override Equals : yobj:obj -> bool
  override GetHashCode : unit -> int
  member GetSlice : start:int option * finish:int option -> Vector<'T>
  member Permute : p:permutation -> Vector<'T>
  ...

--------------------
new : opsV:INumeric<'T> option * arrV:'T array -> Vector<'T>
val mean : items:Vector<'T> -> 'a (requires member ( / ))
val arr : float []
val median : float
Multiple items
module Array

from FSharp.Stats

--------------------
module Array

from Microsoft.FSharp.Collections
val median : items:'T array -> 'T (requires member get_Zero and member get_One and comparison and member ( + ) and member ( / ) and member ( / ))
Multiple items
val seq : seq<float>

--------------------
type seq<'T> = System.Collections.Generic.IEnumerable<'T>
val truMean : float
Multiple items
module Seq

from FSharp.Stats

--------------------
module Seq

from Microsoft.FSharp.Collections
val meanTruncated : percent:float -> data:seq<'T> -> 'U (requires comparison and member ( + ) and member get_Zero and member DivideByInt and member ( / ))
val data : float []
val varSample : float
val var : items:seq<'T> -> 'U (requires member ( - ) and member get_Zero and member DivideByInt and member ( + ) and member ( * ) and member ( + ) and member ( / ))
val varPopulation : float
val varPopulation : items:seq<'T> -> 'U (requires member ( - ) and member get_Zero and member DivideByInt and member ( + ) and member ( * ) and member ( + ) and member ( / ))
val stdSample : float
val stDev : items:seq<'T> -> 'U (requires member ( - ) and member get_Zero and member DivideByInt and member ( + ) and member ( * ) and member ( + ) and member ( / ) and member Sqrt)
val stdPopulation : float
val stDevPopulation : items:seq<'T> -> 'U (requires member ( - ) and member get_Zero and member DivideByInt and member ( + ) and member ( * ) and member ( + ) and member ( / ) and member Sqrt)
val printStd : string
val sprintf : format:Printf.StringFormat<'T> -> 'T
val sample1 : float list
val sample2 : float list
val cvSample1 : float
val cv : items:seq<'T> -> 'U (requires member ( - ) and member get_Zero and member DivideByInt and member ( + ) and member ( * ) and member Sqrt and member ( + ) and member ( / ) and member ( / ))
val cvSample2 : float
val printCvS : string
Fork me on GitHub