Clustering with FSharp.Stats I: k-means

Back to the index

BinderScriptNotebook

Clustering with FSharp.Stats I: k-means

Summary: This tutorial demonstrates k means clustering with FSharp.Stats and how to visualize the results with Plotly.NET.

Introduction

Clustering methods can be used to group elements of a huge data set based on their similarity. Elements sharing similar properties cluster together and can be reported as coherent group. k-means clustering is a frequently used technique, that segregates the given data into k clusters with similar elements grouped in each cluster, but high variation between the clusters. The algorithm to cluster a n-dimensional dataset can be fully described in the following 4 steps:

  1. Initialize k n-dimensional centroids, that are randomly distributed over the data range.
  2. Calculate the distance of each point to all centroids and assign it to the nearest one.
  3. Reposition all centroids by calculating the average point of each cluster.
  4. Repeat step 2-3 until convergence.

Centroid initiation

Since the random initiation of centroids may influences the result, a second initiation algorithm is proposed (cvmax), that extract a set of medians from the dimension with maximum variance to initialize the centroids.

Distance measure

While several distance metrics can be used (e.g. Manhattan distance or correlation measures) it is preferred to use Euclidean distance. It is recommended to use a squared Euclidean distance. To not calculate the square root does not change the result but saves computation time.


For demonstration of k-means clustering, the classic iris data set is used, which consists of 150 records, each of which contains four measurements and a species identifier.

Referencing packages

// Packages hosted by the Fslab community
#r "nuget: Deedle"
#r "nuget: FSharp.Stats"
// third party .net packages 
#r "nuget: Plotly.NET, 2.0.0-preview.12"
#r "nuget: Plotly.NET.Interactive, 2.0.0-preview.12"
#r "nuget: FSharp.Data"

Loading data

open FSharp.Data
open Deedle

// Retrieve data using the FSharp.Data package and read it as dataframe using the Deedle package
let rawData = Http.RequestString @"https://raw.githubusercontent.com/fslaborg/datasets/main/data/iris.csv"
let df = Frame.ReadCsvString(rawData)

df.Print()
sepal_length sepal_width petal_length petal_width species    
0   -> 5.5          2.4         3.8          1.1         versicolor 
1   -> 4.9          3.1         1.5          0.1         setosa     
2   -> 7.6          3           6.6          2.1         virginica  
3   -> 5.6          2.8         4.9          2           virginica  
4   -> 6.1          3           4.9          1.8         virginica  
5   -> 6.3          3.4         5.6          2.4         virginica  
6   -> 6.2          2.8         4.8          1.8         virginica  
7   -> 7.2          3.2         6            1.8         virginica  
8   -> 6.9          3.2         5.7          2.3         virginica  
9   -> 4.9          3           1.4          0.2         setosa     
10  -> 5.4          3.9         1.7          0.4         setosa     
11  -> 7            3.2         4.7          1.4         versicolor 
12  -> 6.1          3           4.6          1.4         versicolor 
13  -> 5.4          3.7         1.5          0.2         setosa     
14  -> 7.2          3.6         6.1          2.5         virginica  
:      ...          ...         ...          ...         ...        
135 -> 7.7          3.8         6.7          2.2         virginica  
136 -> 5.7          3           4.2          1.2         versicolor 
137 -> 6.4          2.7         5.3          1.9         virginica  
138 -> 5.8          2.7         3.9          1.2         versicolor 
139 -> 5            2.3         3.3          1           versicolor 
140 -> 6.7          3.1         4.4          1.4         versicolor 
141 -> 6.3          3.3         6            2.5         virginica  
142 -> 4.9          2.4         3.3          1           versicolor 
143 -> 5.8          2.8         5.1          2.4         virginica  
144 -> 5            3.3         1.4          0.2         setosa     
145 -> 7.7          2.6         6.9          2.3         virginica  
146 -> 5.7          2.6         3.5          1           versicolor 
147 -> 5.9          3           5.1          1.8         virginica  
148 -> 6.8          3.2         5.9          2.3         virginica  
149 -> 5            3.6         1.4          0.2         setosa

Let's take a first look at the data with heatmaps using Plotly.NET. Each of the 150 records consists of four measurements and a species identifier. Since the species identifier occur several times (Iris-virginica, Iris-versicolor, and Iris-setosa), we create unique labels by adding the rows index to the species identifier.

open Plotly.NET

let colNames = ["sepal_length";"sepal_width";"petal_length";"petal_width"]

// isolate data as float [] []
let data = 
    Frame.dropCol "species" df
    |> Frame.toJaggedArray

//isolate labels as seq<string>
let labels = 
    Frame.getCol "species" df
    |> Series.values
    |> Seq.mapi (fun i s -> sprintf "%s_%i" s i)

let dataChart = 
    Chart.Heatmap(data,ColNames=colNames,RowNames=labels)
    // required to fit the species identifier on the left side of the heatmap
    |> Chart.withMarginSize(Left=100.)
    |> Chart.withTitle "raw iris data"

// required to fit the species identifier on the left side of the heatmap

Clustering

The function that performs k-means clustering can be found at FSharp.Stats.ML.Unsupervised.IterativeClustering.kmeans. It requires four input parameters:

  1. Centroid initiation method
  2. Distance measure (from FSharp.Stats.ML.DistanceMetrics)
  3. Data to cluster as float [] [], where each entry of the outer array is a sequence of coordinates
  4. k, the number of clusters that are desired
open FSharp.Stats
open FSharp.Stats.ML
open FSharp.Stats.ML.Unsupervised

// For random cluster initiation use randomInitFactory:
let rnd = System.Random()
let randomInitFactory : IterativeClustering.CentroidsFactory<float []> = 
    IterativeClustering.randomCentroids<float []> rnd

// For assisted cluster initiation use cvmaxFactory:
//let cvmaxFactory : IterativeClustering.CentroidsFactory<float []> = 
//    IterativeClustering.intitCVMAX

let distanceFunction = DistanceMetrics.euclideanNaNSquared
  
let kmeansResult = 
    IterativeClustering.kmeans distanceFunction randomInitFactory data 4

After all centroids are set, the affiliation of a datapoint to a cluster can be determined by minimizing the distance of the respective point to each of the centroids. A function realizing the mapping is integrated in the kmeansResult.

let clusteredIrisData =
    Seq.zip labels data
    |> Seq.map (fun (species,dataPoint) -> 
        let clusterIndex,centroid = kmeansResult.Classifier dataPoint
        clusterIndex,species,dataPoint)

// Each datapoint is given associated with its cluster index, species identifier, and coordinates.
"4, "versicolor_0", [|5.5; 2.4; 3.8; 1.1|]
2, "setosa_1", [|4.9; 3.1; 1.5; 0.1|]
3, "virginica_2", [|7.6; 3.0; 6.6; 2.1|]
1, "virginica_3", [|5.6; 2.8; 4.9; 2.0|]
1, "virginica_4", [|6.1; 3.0; 4.9; 1.8|]
3, "virginica_5", [|6.3; 3.4; 5.6; 2.4|]
1, "virginica_6", [|6.2; 2.8; 4.8; 1.8|]
 ... "

Visualization of the clustering result as heatmap

The datapoints are sorted according to their associated cluster index and visualized in a combined heatmap.

let clusterChart =
    clusteredIrisData
    //sort all data points according to their assigned cluster number
    |> Seq.sortBy (fun (clusterIndex,label,dataPoint) -> clusterIndex)
    |> Seq.unzip3
    |> fun (_,labels,d) -> 
        Chart.Heatmap(d,ColNames=colNames,RowNames=labels)
        // required to fit the species identifier on the left side of the heatmap
        |> Chart.withMarginSize(Left=100.)
        |> Chart.withTitle "clustered iris data (k-means clustering)"

To visualize the result in a three-dimensional chart, three of the four measurements are isolated after clustering and visualized as 3D-scatter plot.

let clusterChart3D =
    //group clusters
    clusteredIrisData
    |> Seq.groupBy (fun (clusterIndex,label,dataPoint) -> clusterIndex)
    //for each cluster generate a scatter plot
    |> Seq.map (fun (clusterIndex,cluster) -> 
        cluster
        |> Seq.unzip3
        |> fun (clusterIndex,label,data) -> 
            let clusterName = sprintf "cluster %i" (Seq.head clusterIndex)
            //for 3 dimensional representation isolate sepal length, petal length, and petal width
            let truncData = data |> Seq.map (fun x -> x.[0],x.[2],x.[3]) 
            Chart.Scatter3d(truncData,mode=StyleParam.Mode.Markers,Name = clusterName,Labels=label)
        )
    |> Chart.combine
    |> Chart.withTitle "isolated coordinates of clustered iris data (k-means clustering)"
    |> Chart.withXAxisStyle colNames.[0]
    |> Chart.withYAxisStyle colNames.[2]
    |> Chart.withZAxisStyle colNames.[3]

Optimal cluster number

The identification of the optimal cluster number k in terms of the average squared distance of each point to its centroid can be realized by performing the clustering over a range of k's multiple times and taking the k according to the elbow criterion. Further more robust and advanced cluster number determination techniques can be found here.

let getBestkMeansClustering bootstraps k =
    let dispersions =
        Array.init bootstraps (fun _ -> 
            IterativeClustering.kmeans distanceFunction randomInitFactory data k
            )
        |> Array.map (fun clusteringResult -> IterativeClustering.DispersionOfClusterResult clusteringResult)
    Seq.mean dispersions,Seq.stDev dispersions

let iterations = 10

let maximalK = 10

let bestKChart = 
    [2 .. maximalK] 
    |> List.map (fun k -> 
        let mean,stdev = getBestkMeansClustering iterations k
        k,mean,stdev
        )
    |> List.unzip3
    |> fun (ks,means,stdevs) -> 
        Chart.Line(ks,means)
        |> Chart.withYErrorStyle(stdevs)
        |> Chart.withXAxisStyle "k"
        |> Chart.withYAxisStyle "average dispersion"
        |> Chart.withTitle "iris data set average dispersion per k"

Limitations

  1. Outlier have a strong influence on the positioning of the centroids.
  2. Determining the correct number of clusters in advance is critical. Often it is chosen according to the number of classes present in the dataset which isn't in the spirit of clustering procedures.

Notes

  • Please note that depending on what data you want to cluster, a column wise z-score normalization may be required. In the presented example differences in sepal width have a reduced influence because the absolute variation is low.

References

  • FSharp.Stats documentation, fslaborg, https://fslab.org/FSharp.Stats/Clustering.html
  • Shraddha and Saganna, A Review On K-means Data Clustering Approach, International Journal of Information & Computation Technology, Vol:4 No:17, 2014
  • Moth'd Belal, A New Algorithm for Cluster Initialization, International Journal of Computer and Information Engineering, Vol:1 No:4, 2007
  • Singh et al., K-means with Three different Distance Metrics, International Journal of Computer Applications, 2013, DOI:10.5120/11430-6785
  • Kodinariya and Makwana, Review on Determining of Cluster in K-means Clustering, International Journal of Advance Research in Computer Science and Management Studies, 2013

Further reading

Examples are taken from FSharp.Stats documentation that covers various techniques for an optimal cluster number determination.

The next article in this series covers hierarchical clustering using FSharp.Stats.

Multiple items
namespace FSharp

--------------------
namespace Microsoft.FSharp
Multiple items
namespace FSharp.Data

--------------------
namespace Microsoft.FSharp.Data
namespace Deedle
val rawData : string
type Http = private new : unit -> Http static member private AppendQueryToUrl : url:string * query:(string * string) list -> string static member AsyncRequest : url:string * ?query:(string * string) list * ?headers:seq<string * string> * ?httpMethod:string * ?body:HttpRequestBody * ?cookies:seq<string * string> * ?cookieContainer:CookieContainer * ?silentHttpErrors:bool * ?silentCookieErrors:bool * ?responseEncodingOverride:string * ?customizeHttpRequest:(HttpWebRequest -> HttpWebRequest) * ?timeout:int -> Async<HttpResponse> static member AsyncRequestStream : url:string * ?query:(string * string) list * ?headers:seq<string * string> * ?httpMethod:string * ?body:HttpRequestBody * ?cookies:seq<string * string> * ?cookieContainer:CookieContainer * ?silentHttpErrors:bool * ?silentCookieErrors:bool * ?customizeHttpRequest:(HttpWebRequest -> HttpWebRequest) * ?timeout:int -> Async<HttpResponseWithStream> static member AsyncRequestString : url:string * ?query:(string * string) list * ?headers:seq<string * string> * ?httpMethod:string * ?body:HttpRequestBody * ?cookies:seq<string * string> * ?cookieContainer:CookieContainer * ?silentHttpErrors:bool * ?silentCookieErrors:bool * ?responseEncodingOverride:string * ?customizeHttpRequest:(HttpWebRequest -> HttpWebRequest) * ?timeout:int -> Async<string> static member private EncodeFormData : query:string -> string static member private InnerRequest : url:string * toHttpResponse:(string -> int -> string -> string -> 'a0 option -> Map<string,string> -> Map<string,string> -> Stream -> Async<'a1>) * ?query:(string * string) list * ?headers:seq<string * string> * ?httpMethod:string * ?body:HttpRequestBody * ?cookies:seq<string * string> * ?cookieContainer:CookieContainer * ?silentHttpErrors:bool * ?silentCookieErrors:bool * ?responseEncodingOverride:'a0 * ?customizeHttpRequest:(HttpWebRequest -> HttpWebRequest) * ?timeout:int -> Async<'a1> static member Request : url:string * ?query:(string * string) list * ?headers:seq<string * string> * ?httpMethod:string * ?body:HttpRequestBody * ?cookies:seq<string * string> * ?cookieContainer:CookieContainer * ?silentHttpErrors:bool * ?silentCookieErrors:bool * ?responseEncodingOverride:string * ?customizeHttpRequest:(HttpWebRequest -> HttpWebRequest) * ?timeout:int -> HttpResponse static member RequestStream : url:string * ?query:(string * string) list * ?headers:seq<string * string> * ?httpMethod:string * ?body:HttpRequestBody * ?cookies:seq<string * string> * ?cookieContainer:CookieContainer * ?silentHttpErrors:bool * ?silentCookieErrors:bool * ?customizeHttpRequest:(HttpWebRequest -> HttpWebRequest) * ?timeout:int -> HttpResponseWithStream static member RequestString : url:string * ?query:(string * string) list * ?headers:seq<string * string> * ?httpMethod:string * ?body:HttpRequestBody * ?cookies:seq<string * string> * ?cookieContainer:CookieContainer * ?silentHttpErrors:bool * ?silentCookieErrors:bool * ?responseEncodingOverride:string * ?customizeHttpRequest:(HttpWebRequest -> HttpWebRequest) * ?timeout:int -> string
<summary> Utilities for working with network via HTTP. Includes methods for downloading resources with specified headers, query parameters and HTTP body </summary>
Multiple items
static member Http.RequestString : url:string * ?query:(string * string) list * ?headers:seq<string * string> * ?httpMethod:string * ?body:HttpRequestBody * ?cookies:seq<string * string> * ?cookieContainer:System.Net.CookieContainer * ?silentHttpErrors:bool * ?silentCookieErrors:bool * ?responseEncodingOverride:string * ?customizeHttpRequest:(System.Net.HttpWebRequest -> System.Net.HttpWebRequest) * ?timeout:int -> string

--------------------
static member Http.RequestString : url:string * ?query:(string * string) list * ?headers:seq<string * string> * ?httpMethod:string * ?body:HttpRequestBody * ?cookies:seq<string * string> * ?cookieContainer:System.Net.CookieContainer * ?silentHttpErrors:bool * ?silentCookieErrors:bool * ?responseEncodingOverride:string * ?customizeHttpRequest:(System.Net.HttpWebRequest -> System.Net.HttpWebRequest) * ?timeout:int -> string
val df : Frame<int,string>
Multiple items
module Frame from Deedle

--------------------
type Frame = static member ReadCsv : location:string * hasHeaders:Nullable<bool> * inferTypes:Nullable<bool> * inferRows:Nullable<int> * schema:string * separators:string * culture:string * maxRows:Nullable<int> * missingValues:string [] * preferOptions:bool -> Frame<int,string> + 1 overload static member ReadReader : reader:IDataReader -> Frame<int,string> static member CustomExpanders : Dictionary<Type,Func<obj,seq<string * Type * obj>>> static member NonExpandableInterfaces : ResizeArray<Type> static member NonExpandableTypes : HashSet<Type>

--------------------
type Frame<'TRowKey,'TColumnKey (requires equality and equality)> = interface IDynamicMetaObjectProvider interface INotifyCollectionChanged interface IFsiFormattable interface IFrame new : rowIndex:IIndex<'TRowKey> * columnIndex:IIndex<'TColumnKey> * data:IVector<IVector> * indexBuilder:IIndexBuilder * vectorBuilder:IVectorBuilder -> Frame<'TRowKey,'TColumnKey> + 1 overload member AddColumn : column:'TColumnKey * series:seq<'V> -> unit + 3 overloads member AggregateRowsBy : groupBy:seq<'TColumnKey> * aggBy:seq<'TColumnKey> * aggFunc:Func<Series<'TRowKey,'a>,'b> -> Frame<int,'TColumnKey> member Clone : unit -> Frame<'TRowKey,'TColumnKey> member ColumnApply : f:Func<Series<'TRowKey,'T>,ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey> + 1 overload member DropColumn : column:'TColumnKey -> unit ...

--------------------
new : names:seq<'TColumnKey> * columns:seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>
new : rowIndex:Indices.IIndex<'TRowKey> * columnIndex:Indices.IIndex<'TColumnKey> * data:IVector<IVector> * indexBuilder:Indices.IIndexBuilder * vectorBuilder:Vectors.IVectorBuilder -> Frame<'TRowKey,'TColumnKey>
static member Frame.ReadCsvString : csvString:string * ?hasHeaders:bool * ?inferTypes:bool * ?inferRows:int * ?schema:string * ?separators:string * ?culture:string * ?maxRows:int * ?missingValues:string [] * ?preferOptions:bool -> Frame<int,string>
static member FrameExtensions.Print : frame:Frame<'K,'V> -> unit (requires equality and equality)
static member FrameExtensions.Print : frame:Frame<'K,'V> * printTypes:bool -> unit (requires equality and equality)
namespace Plotly
namespace Plotly.NET
val colNames : string list
val data : float [] []
Multiple items
module Frame from Deedle

--------------------
type Frame = inherit DynamicObj new : unit -> Frame

--------------------
type Frame<'TRowKey,'TColumnKey (requires equality and equality)> = interface IDynamicMetaObjectProvider interface INotifyCollectionChanged interface IFsiFormattable interface IFrame new : rowIndex:IIndex<'TRowKey> * columnIndex:IIndex<'TColumnKey> * data:IVector<IVector> * indexBuilder:IIndexBuilder * vectorBuilder:IVectorBuilder -> Frame<'TRowKey,'TColumnKey> + 1 overload member AddColumn : column:'TColumnKey * series:seq<'V> -> unit + 3 overloads member AggregateRowsBy : groupBy:seq<'TColumnKey> * aggBy:seq<'TColumnKey> * aggFunc:Func<Series<'TRowKey,'a>,'b> -> Frame<int,'TColumnKey> member Clone : unit -> Frame<'TRowKey,'TColumnKey> member ColumnApply : f:Func<Series<'TRowKey,'T>,ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey> + 1 overload member DropColumn : column:'TColumnKey -> unit ...

--------------------
new : unit -> Frame

--------------------
new : names:seq<'TColumnKey> * columns:seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>
new : rowIndex:Indices.IIndex<'TRowKey> * columnIndex:Indices.IIndex<'TColumnKey> * data:IVector<IVector> * indexBuilder:Indices.IIndexBuilder * vectorBuilder:Vectors.IVectorBuilder -> Frame<'TRowKey,'TColumnKey>
val dropCol : column:'C -> frame:Frame<'R,'C> -> Frame<'R,'C> (requires equality and equality)
val toJaggedArray : frame:Frame<'R,'C> -> float [] [] (requires equality and equality)
val labels : seq<string>
val getCol : column:'C -> frame:Frame<'R,'C> -> Series<'R,'V> (requires equality and equality)
Multiple items
module Series from Deedle

--------------------
type Series = static member ofNullables : values:seq<Nullable<'a0>> -> Series<int,'a0> (requires default constructor and value type and 'a0 :> ValueType) static member ofObservations : observations:seq<'c * 'd> -> Series<'c,'d> (requires equality) static member ofOptionalObservations : observations:seq<'K * 'a1 option> -> Series<'K,'a1> (requires equality) static member ofValues : values:seq<'a> -> Series<int,'a>

--------------------
type Series<'K,'V (requires equality)> = interface IFsiFormattable interface ISeries<'K> new : index:IIndex<'K> * vector:IVector<'V> * vectorBuilder:IVectorBuilder * indexBuilder:IIndexBuilder -> Series<'K,'V> + 3 overloads member After : lowerExclusive:'K -> Series<'K,'V> member Aggregate : aggregation:Aggregation<'K> * keySelector:Func<DataSegment<Series<'K,'V>>,'TNewKey> * valueSelector:Func<DataSegment<Series<'K,'V>>,OptionalValue<'R>> -> Series<'TNewKey,'R> (requires equality) + 1 overload member AsyncMaterialize : unit -> Async<Series<'K,'V>> member Before : upperExclusive:'K -> Series<'K,'V> member Between : lowerInclusive:'K * upperInclusive:'K -> Series<'K,'V> member Compare : another:Series<'K,'V> -> Series<'K,Diff<'V>> member Convert : forward:Func<'V,'R> * backward:Func<'R,'V> -> Series<'K,'R> ...

--------------------
new : pairs:seq<System.Collections.Generic.KeyValuePair<'K,'V>> -> Series<'K,'V>
new : keys:seq<'K> * values:seq<'V> -> Series<'K,'V>
new : keys:'K [] * values:'V [] -> Series<'K,'V>
new : index:Indices.IIndex<'K> * vector:IVector<'V> * vectorBuilder:Vectors.IVectorBuilder * indexBuilder:Indices.IIndexBuilder -> Series<'K,'V>
val values : series:Series<'K,'T> -> seq<'T> (requires equality)
Multiple items
module Seq from Plotly.NET

--------------------
module Seq from Microsoft.FSharp.Collections
<summary>Contains operations for working with values of type <see cref="T:Microsoft.FSharp.Collections.seq`1" />.</summary>
val mapi : mapping:(int -> 'T -> 'U) -> source:seq<'T> -> seq<'U>
<summary>Builds a new collection whose elements are the results of applying the given function to each of the elements of the collection. The integer index passed to the function indicates the index (from 0) of element being transformed.</summary>
<param name="mapping">A function to transform items from the input sequence that also supplies the current index.</param>
<param name="source">The input sequence.</param>
<returns>The result sequence.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
val i : int
val s : string
val sprintf : format:Printf.StringFormat<'T> -> 'T
<summary>Print to a string using the given format.</summary>
<param name="format">The formatter.</param>
<returns>The formatted result.</returns>
val dataChart : GenericChart.GenericChart
type Chart = static member Area : x:seq<#IConvertible> * y:seq<#IConvertible> * ?Name:string * ?ShowMarkers:bool * ?ShowLegend:bool * ?MarkerSymbol:MarkerSymbol * ?Color:Color * ?Opacity:float * ?Labels:seq<#IConvertible> * ?TextPosition:TextPosition * ?TextFont:Font * ?Dash:DrawingStyle * ?Width:float -> GenericChart + 1 overload static member Bar : values:seq<#IConvertible> * ?Keys:seq<#IConvertible> * ?Name:string * ?ShowLegend:bool * ?Color:Color * ?PatternShape:PatternShape * ?MultiPatternShape:seq<PatternShape> * ?Pattern:Pattern * ?Base:#IConvertible * ?Width:'a3 * ?MultiWidth:seq<'a3> * ?Opacity:float * ?MultiOpacity:seq<float> * ?Text:'a4 * ?MultiText:seq<'a4> * ?TextPosition:TextPosition * ?MultiTextPosition:seq<TextPosition> * ?TextFont:Font * ?Marker:Marker -> GenericChart (requires 'a3 :> IConvertible and 'a4 :> IConvertible) + 1 overload static member BoxPlot : ?x:seq<#IConvertible> * ?y:seq<#IConvertible> * ?Name:string * ?ShowLegend:bool * ?Text:'a2 * ?MultiText:seq<'a2> * ?Fillcolor:Color * ?MarkerColor:Color * ?OutlierColor:Color * ?OutlierWidth:int * ?Opacity:float * ?WhiskerWidth:float * ?BoxPoints:BoxPoints * ?BoxMean:BoxMean * ?Jitter:float * ?PointPos:float * ?Orientation:Orientation * ?Marker:Marker * ?Line:Line * ?AlignmentGroup:string * ?Offsetgroup:string * ?Notched:bool * ?NotchWidth:float * ?QuartileMethod:QuartileMethod -> GenericChart (requires 'a2 :> IConvertible) + 1 overload static member Bubble : x:seq<#IConvertible> * y:seq<#IConvertible> * sizes:seq<int> * ?Name:string * ?ShowLegend:bool * ?MarkerSymbol:MarkerSymbol * ?Color:Color * ?Opacity:float * ?Labels:seq<#IConvertible> * ?TextPosition:TextPosition * ?TextFont:Font * ?StackGroup:string * ?Orientation:Orientation * ?GroupNorm:GroupNorm * ?UseWebGL:bool -> GenericChart + 1 overload static member Candlestick : open:seq<#IConvertible> * high:seq<#IConvertible> * low:seq<#IConvertible> * close:seq<#IConvertible> * x:seq<#IConvertible> * ?Increasing:Line * ?Decreasing:Line * ?WhiskerWidth:float * ?Line:Line * ?XCalendar:Calendar -> GenericChart + 1 overload static member Column : values:seq<#IConvertible> * ?Keys:seq<#IConvertible> * ?Name:string * ?ShowLegend:bool * ?Color:Color * ?Pattern:Pattern * ?PatternShape:PatternShape * ?MultiPatternShape:seq<PatternShape> * ?Base:#IConvertible * ?Width:'a3 * ?MultiWidth:seq<'a3> * ?Opacity:float * ?MultiOpacity:seq<float> * ?Text:'a4 * ?MultiText:seq<'a4> * ?TextPosition:TextPosition * ?MultiTextPosition:seq<TextPosition> * ?TextFont:Font * ?Marker:Marker -> GenericChart (requires 'a3 :> IConvertible and 'a4 :> IConvertible) + 1 overload static member Contour : data:seq<#seq<'a1>> * ?X:seq<#IConvertible> * ?Y:seq<#IConvertible> * ?Name:string * ?ShowLegend:bool * ?Opacity:float * ?Colorscale:Colorscale * ?Showscale:'a4 * ?zSmooth:SmoothAlg * ?ColorBar:'a5 -> GenericChart (requires 'a1 :> IConvertible) static member Funnel : x:seq<#IConvertible> * y:seq<#IConvertible> * ?Name:string * ?ShowLegend:bool * ?Opacity:float * ?Labels:seq<#IConvertible> * ?TextPosition:TextPosition * ?TextFont:Font * ?Color:Color * ?Line:Line * ?x0:'a3 * ?dX:float * ?y0:'a4 * ?dY:float * ?Width:float * ?Offset:float * ?Orientation:Orientation * ?Alignmentgroup:string * ?Offsetgroup:string * ?Cliponaxis:bool * ?Connector:FunnelConnector * ?Insidetextfont:Font * ?Outsidetextfont:Font -> GenericChart static member Heatmap : data:seq<#seq<'b>> * ?ColNames:seq<#IConvertible> * ?RowNames:seq<#IConvertible> * ?Name:string * ?ShowLegend:bool * ?Opacity:float * ?Colorscale:Colorscale * ?Showscale:'e * ?Xgap:'f * ?Ygap:'g * ?zSmooth:SmoothAlg * ?ColorBar:'h * ?UseWebGL:bool -> GenericChart (requires 'b :> IConvertible) static member Histogram : ?X:seq<#IConvertible> * ?Y:seq<#IConvertible> * ?Orientation:Orientation * ?Name:string * ?ShowLegend:bool * ?Opacity:float * ?Text:'a2 * ?MultiText:seq<'a2> * ?HistFunc:HistFunc * ?HistNorm:HistNorm * ?AlignmentGroup:string * ?OffsetGroup:string * ?NBinsX:int * ?NBinsY:int * ?BinGroup:string * ?XBins:Bins * ?YBins:Bins * ?MarkerColor:Color * ?Marker:Marker * ?Line:Line * ?ErrorX:Error * ?ErrorY:Error * ?Cumulative:Cumulative * ?HoverLabel:Hoverlabel -> GenericChart (requires 'a2 :> IConvertible) + 1 overload ...
static member Chart.Heatmap : data:seq<#seq<'b>> * ?ColNames:seq<#System.IConvertible> * ?RowNames:seq<#System.IConvertible> * ?Name:string * ?ShowLegend:bool * ?Opacity:float * ?Colorscale:StyleParam.Colorscale * ?Showscale:'e * ?Xgap:'f * ?Ygap:'g * ?zSmooth:StyleParam.SmoothAlg * ?ColorBar:'h * ?UseWebGL:bool -> GenericChart.GenericChart (requires 'b :> System.IConvertible)
static member Chart.withMarginSize : ?Left:'a * ?Right:'b * ?Top:'c * ?Bottom:'d * ?Pad:'e * ?Autoexpand:'f -> (GenericChart.GenericChart -> GenericChart.GenericChart)
static member Chart.withTitle : title:string * ?TitleFont:Font -> (GenericChart.GenericChart -> GenericChart.GenericChart)
module GenericChart from Plotly.NET
<summary> Module to represent a GenericChart </summary>
val toChartHTML : gChart:GenericChart.GenericChart -> string
<summary> Converts a GenericChart to it HTML representation. The div layer has a default size of 600 if not specified otherwise. </summary>
namespace FSharp.Stats
namespace FSharp.Stats.ML
namespace FSharp.Stats.ML.Unsupervised
val rnd : System.Random
namespace System
Multiple items
type Random = new : unit -> unit + 1 overload member Next : unit -> int + 2 overloads member NextBytes : buffer: byte [] -> unit + 1 overload member NextDouble : unit -> float member Sample : unit -> float
<summary>Represents a pseudo-random number generator, which is an algorithm that produces a sequence of numbers that meet certain statistical requirements for randomness.</summary>

--------------------
System.Random() : System.Random
System.Random(Seed: int) : System.Random
val randomInitFactory : IterativeClustering.CentroidsFactory<float []>
module IterativeClustering from FSharp.Stats.ML.Unsupervised
type CentroidsFactory<'a> = 'a array -> int -> 'a array
Multiple items
val float : value:'T -> float (requires member op_Explicit)
<summary>Converts the argument to 64-bit float. This is a direct conversion for all primitive numeric types. For strings, the input is converted using <c>Double.Parse()</c> with InvariantCulture settings. Otherwise the operation requires an appropriate static conversion method on the input type.</summary>
<param name="value">The input value.</param>
<returns>The converted float</returns>


--------------------
[<Struct>] type float = System.Double
<summary>An abbreviation for the CLI type <see cref="T:System.Double" />.</summary>
<category>Basic Types</category>


--------------------
type float<'Measure> = float
<summary>The type of double-precision floating point numbers, annotated with a unit of measure. The unit of measure is erased in compiled code and when values of this type are analyzed using reflection. The type is representationally equivalent to <see cref="T:System.Double" />.</summary>
<category index="6">Basic Types with Units of Measure</category>
val randomCentroids : rng:System.Random -> sample:'a array -> k:int -> 'a []
val distanceFunction : (seq<float> -> seq<float> -> float)
module DistanceMetrics from FSharp.Stats.ML
<summary> Functions for computing distances of elements or sets </summary>
val euclideanNaNSquared : s1:seq<float> -> s2:seq<float> -> float
<summary> Squared Euclidean distance of two coordinate float sequences (ignores nan) </summary>
val kmeansResult : IterativeClustering.KClusteringResult<float []>
val kmeans : dist:DistanceMetrics.Distance<float []> -> factory:IterativeClustering.CentroidsFactory<float []> -> dataset:float [] array -> k:int -> IterativeClustering.KClusteringResult<float []>
val clusteredIrisData : seq<int * string * float []>
Multiple items
module Seq from FSharp.Stats
<summary> Module to compute common statistical measure </summary>

--------------------
module Seq from Plotly.NET

--------------------
module Seq from Microsoft.FSharp.Collections
<summary>Contains operations for working with values of type <see cref="T:Microsoft.FSharp.Collections.seq`1" />.</summary>
val zip : source1:seq<'T1> -> source2:seq<'T2> -> seq<'T1 * 'T2>
<summary>Combines the two sequences into a list of pairs. The two sequences need not have equal lengths: when one sequence is exhausted any remaining elements in the other sequence are ignored.</summary>
<param name="source1">The first input sequence.</param>
<param name="source2">The second input sequence.</param>
<returns>The result sequence.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when either of the input sequences is null.</exception>
val map : mapping:('T -> 'U) -> source:seq<'T> -> seq<'U>
<summary>Builds a new collection whose elements are the results of applying the given function to each of the elements of the collection. The given function will be applied as elements are demanded using the <c>MoveNext</c> method on enumerators retrieved from the object.</summary>
<remarks>The returned sequence may be passed between threads safely. However, individual IEnumerator values generated from the returned sequence should not be accessed concurrently.</remarks>
<param name="mapping">A function to transform items from the input sequence.</param>
<param name="source">The input sequence.</param>
<returns>The result sequence.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
val species : string
val dataPoint : float []
val clusterIndex : int
val centroid : float []
IterativeClustering.KClusteringResult.Classifier: float [] -> int * float []
<summary> Classifier function returns cluster index and centroid </summary>
val printClusters : string
val take : count:int -> source:seq<'T> -> seq<'T>
<summary>Returns the first N elements of the sequence.</summary>
<remarks>Throws <c>InvalidOperationException</c> if the count exceeds the number of elements in the sequence. <c>Seq.truncate</c> returns as many items as the sequence contains instead of throwing an exception.</remarks>
<param name="count">The number of items to take.</param>
<param name="source">The input sequence.</param>
<returns>The result sequence.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
<exception cref="T:System.ArgumentException">Thrown when the input sequence is empty.</exception>
<exception cref="T:System.InvalidOperationException">Thrown when count exceeds the number of elements in the sequence.</exception>
val a : int
val b : string
val c : float []
module String from Microsoft.FSharp.Core
<summary>Functional programming operators for string processing. Further string operations are available via the member functions on strings and other functionality in <a href="http://msdn2.microsoft.com/en-us/library/system.string.aspx">System.String</a> and <a href="http://msdn2.microsoft.com/library/system.text.regularexpressions.aspx">System.Text.RegularExpressions</a> types. </summary>
<category>Strings and Text</category>
val concat : sep:string -> strings:seq<string> -> string
<summary>Returns a new string made by concatenating the given strings with separator <c>sep</c>, that is <c>a1 + sep + ... + sep + aN</c>.</summary>
<param name="sep">The separator string to be inserted between the strings of the input sequence.</param>
<param name="strings">The sequence of strings to be concatenated.</param>
<returns>A new string consisting of the concatenated strings separated by the separation string.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when <c>strings</c> is null.</exception>
val x : string
val clusterChart : GenericChart.GenericChart
val sortBy : projection:('T -> 'Key) -> source:seq<'T> -> seq<'T> (requires comparison)
<summary>Applies a key-generating function to each element of a sequence and yield a sequence ordered by keys. The keys are compared using generic comparison as implemented by <see cref="M:Microsoft.FSharp.Core.Operators.compare" />.</summary>
<remarks>This function returns a sequence that digests the whole initial sequence as soon as that sequence is iterated. As a result this function should not be used with large or infinite sequences. The function makes no assumption on the ordering of the original sequence and uses a stable sort, that is the original order of equal elements is preserved.</remarks>
<param name="projection">A function to transform items of the input sequence into comparable keys.</param>
<param name="source">The input sequence.</param>
<returns>The result sequence.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
val label : string
val unzip3 : input:seq<'a * 'b * 'c> -> seq<'a> * seq<'b> * seq<'c>
<summary> Splits a sequence of triples into three sequences </summary>
val d : seq<float []>
val clusterChart3D : GenericChart.GenericChart
val groupBy : projection:('T -> 'Key) -> source:seq<'T> -> seq<'Key * seq<'T>> (requires equality)
<summary>Applies a key-generating function to each element of a sequence and yields a sequence of unique keys. Each unique key contains a sequence of all elements that match to this key.</summary>
<remarks>This function returns a sequence that digests the whole initial sequence as soon as that sequence is iterated. As a result this function should not be used with large or infinite sequences. The function makes no assumption on the ordering of the original sequence.</remarks>
<param name="projection">A function that transforms an element of the sequence into a comparable key.</param>
<param name="source">The input sequence.</param>
<returns>The result sequence.</returns>
val cluster : seq<int * string * float []>
val clusterIndex : seq<int>
val label : seq<string>
val data : seq<float []>
val clusterName : string
val head : source:seq<'T> -> 'T
<summary>Returns the first element of the sequence.</summary>
<param name="source">The input sequence.</param>
<returns>The first element of the sequence.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
<exception cref="T:System.ArgumentException">Thrown when the input does not have any elements.</exception>
val truncData : seq<float * float * float>
val x : float []
static member Chart.Scatter3d : xyz:seq<#System.IConvertible * #System.IConvertible * #System.IConvertible> * mode:StyleParam.Mode * ?Name:string * ?ShowLegend:bool * ?MarkerSymbol:StyleParam.MarkerSymbol * ?Color:Color * ?Opacity:float * ?Labels:seq<#System.IConvertible> * ?TextPosition:StyleParam.TextPosition * ?TextFont:Font * ?Dash:StyleParam.DrawingStyle * ?Width:float -> GenericChart.GenericChart
static member Chart.Scatter3d : x:seq<#System.IConvertible> * y:seq<#System.IConvertible> * z:seq<#System.IConvertible> * mode:StyleParam.Mode * ?Name:string * ?ShowLegend:bool * ?MarkerSymbol:StyleParam.MarkerSymbol * ?Color:Color * ?Opacity:float * ?Labels:seq<#System.IConvertible> * ?TextPosition:StyleParam.TextPosition * ?TextFont:Font * ?Dash:StyleParam.DrawingStyle * ?Width:float -> GenericChart.GenericChart
module StyleParam from Plotly.NET
type Mode = | None | Lines | Lines_Markers | Lines_Text | Lines_Markers_Text | Markers | Markers_Text | Text member Convert : unit -> obj override ToString : unit -> string static member convert : (Mode -> obj) static member toString : (Mode -> string)
union case StyleParam.Mode.Markers: StyleParam.Mode
static member Chart.combine : gCharts:seq<GenericChart.GenericChart> -> GenericChart.GenericChart
static member Chart.withXAxisStyle : title:string * ?TitleFont:Font * ?MinMax:(float * float) * ?ShowGrid:bool * ?ShowLine:bool * ?Side:StyleParam.Side * ?Overlaying:StyleParam.LinearAxisId * ?Id:StyleParam.SubPlotId * ?Domain:(float * float) * ?Position:float * ?Zeroline:bool * ?Anchor:StyleParam.LinearAxisId -> (GenericChart.GenericChart -> GenericChart.GenericChart)
static member Chart.withYAxisStyle : title:string * ?TitleFont:Font * ?MinMax:(float * float) * ?ShowGrid:bool * ?ShowLine:bool * ?Side:StyleParam.Side * ?Overlaying:StyleParam.LinearAxisId * ?Id:StyleParam.SubPlotId * ?Domain:(float * float) * ?Position:float * ?ZeroLine:bool * ?Anchor:StyleParam.LinearAxisId -> (GenericChart.GenericChart -> GenericChart.GenericChart)
static member Chart.withZAxisStyle : title:string * ?TitleFont:Font * ?MinMax:(float * float) * ?ShowGrid:bool * ?ShowLine:bool * ?Domain:(float * float) * ?Anchor:StyleParam.LinearAxisId -> (GenericChart.GenericChart -> GenericChart.GenericChart)
val getBestkMeansClustering : bootstraps:int -> k:int -> float * float
val bootstraps : int
val k : int
val dispersions : float []
Multiple items
module Array from FSharp.Stats
<summary> Module to compute common statistical measure on array </summary>

--------------------
module Array from Microsoft.FSharp.Collections
<summary>Contains operations for working with arrays.</summary>
<remarks> See also <a href="https://docs.microsoft.com/dotnet/fsharp/language-reference/arrays">F# Language Guide - Arrays</a>. </remarks>
val init : count:int -> initializer:(int -> 'T) -> 'T []
<summary>Creates an array given the dimension and a generator function to compute the elements.</summary>
<param name="count">The number of elements to initialize.</param>
<param name="initializer">The function to generate the initial values for each index.</param>
<returns>The created array.</returns>
<exception cref="T:System.ArgumentException">Thrown when count is negative.</exception>
val map : mapping:('T -> 'U) -> array:'T [] -> 'U []
<summary>Builds a new array whose elements are the results of applying the given function to each of the elements of the array.</summary>
<param name="mapping">The function to transform elements of the array.</param>
<param name="array">The input array.</param>
<returns>The array of transformed elements.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input array is null.</exception>
val clusteringResult : IterativeClustering.KClusteringResult<float []>
val DispersionOfClusterResult : kmeansResult:IterativeClustering.KClusteringResult<'a> -> float
<summary> Calculates the average squared distance from the data points to the cluster centroid (also refered to as error) </summary>
val mean : items:seq<'T> -> 'U (requires member ( + ) and member get_Zero and member DivideByInt and member ( / ))
<summary> Computes the population mean (Normalized by N) </summary>
<param name="items">The input sequence.</param>
<remarks>Returns default value if data is empty or if any entry is NaN.</remarks>
<returns>population mean (Normalized by N)</returns>
val stDev : items:seq<'T> -> 'U (requires member ( - ) and member get_Zero and member DivideByInt and member ( + ) and member ( * ) and member ( + ) and member ( / ) and member Sqrt)
<summary> Computes the sample standard deviation </summary>
<param name="items">The input sequence.</param>
<remarks>Returns NaN if data is empty or if any entry is NaN.</remarks>
<returns>standard deviation of a sample (Bessel's correction by N-1)</returns>
val iterations : int
val maximalK : int
val bestKChart : GenericChart.GenericChart
Multiple items
module List from FSharp.Stats
<summary> Module to compute common statistical measure on list </summary>

--------------------
module List from Microsoft.FSharp.Collections
<summary>Contains operations for working with values of type <see cref="T:Microsoft.FSharp.Collections.list`1" />.</summary>
<namespacedoc><summary>Operations for collections such as lists, arrays, sets, maps and sequences. See also <a href="https://docs.microsoft.com/dotnet/fsharp/language-reference/fsharp-collection-types">F# Collection Types</a> in the F# Language Guide. </summary></namespacedoc>


--------------------
type List<'T> = | ( [] ) | ( :: ) of Head: 'T * Tail: 'T list interface IReadOnlyList<'T> interface IReadOnlyCollection<'T> interface IEnumerable interface IEnumerable<'T> member GetReverseIndex : rank:int * offset:int -> int member GetSlice : startIndex:int option * endIndex:int option -> 'T list static member Cons : head:'T * tail:'T list -> 'T list member Head : 'T member IsEmpty : bool member Item : index:int -> 'T with get ...
<summary>The type of immutable singly-linked lists.</summary>
<remarks>Use the constructors <c>[]</c> and <c>::</c> (infix) to create values of this type, or the notation <c>[1;2;3]</c>. Use the values in the <c>List</c> module to manipulate values of this type, or pattern match against the values directly. </remarks>
<exclude />
val map : mapping:('T -> 'U) -> list:'T list -> 'U list
<summary>Builds a new collection whose elements are the results of applying the given function to each of the elements of the collection.</summary>
<param name="mapping">The function to transform elements from the input list.</param>
<param name="list">The input list.</param>
<returns>The list of transformed elements.</returns>
val mean : float
val stdev : float
val unzip3 : list:('T1 * 'T2 * 'T3) list -> 'T1 list * 'T2 list * 'T3 list
<summary>Splits a list of triples into three lists.</summary>
<param name="list">The input list.</param>
<returns>Three lists of split elements.</returns>
val ks : int list
val means : float list
val stdevs : float list
static member Chart.Line : xy:seq<#System.IConvertible * #System.IConvertible> * ?Name:string * ?ShowMarkers:bool * ?ShowLegend:bool * ?MarkerSymbol:StyleParam.MarkerSymbol * ?Color:Color * ?Opacity:float * ?Labels:seq<#System.IConvertible> * ?TextPosition:StyleParam.TextPosition * ?TextFont:Font * ?Dash:StyleParam.DrawingStyle * ?Width:float * ?StackGroup:string * ?Orientation:StyleParam.Orientation * ?GroupNorm:StyleParam.GroupNorm * ?UseWebGL:bool -> GenericChart.GenericChart
static member Chart.Line : x:seq<#System.IConvertible> * y:seq<#System.IConvertible> * ?Name:string * ?ShowMarkers:bool * ?ShowLegend:bool * ?MarkerSymbol:StyleParam.MarkerSymbol * ?Color:Color * ?Opacity:float * ?Labels:seq<#System.IConvertible> * ?TextPosition:StyleParam.TextPosition * ?TextFont:Font * ?Dash:StyleParam.DrawingStyle * ?Width:float * ?StackGroup:string * ?Orientation:StyleParam.Orientation * ?GroupNorm:StyleParam.GroupNorm * ?UseWebGL:bool -> GenericChart.GenericChart
static member Chart.withYErrorStyle : ?Array:seq<#System.IConvertible> * ?Arrayminus:seq<#System.IConvertible> * ?Symmetric:bool * ?Color:Color * ?Thickness:float * ?Width:float -> (GenericChart.GenericChart -> GenericChart.GenericChart)