Summary: This tutorial demonstrates an example workflow using different FsLab libraries. The aim is to check the quality of replicate measurements by clustering the samples.
In biology and other sciences, experimental procedures are often repeated several times in the same conditions. These resulting samples are called replicates.
Replicates are especially useful to check for the reproducibility of the results and to boost their trustability.
One metric for the quality of the measurements is rather easy in principle. Samples received from a similar procedure should also result in similar measurements.
Therefore just checking if replicates are more similar than other samples can already hand to the experimenter some implications about the quality of his samples.
This is especially useful when considering that usually - as the ground truth is unknown - this trustability is difficult to measure.
In this tutorial, a simple workflow will be presented for how to visualize the clustering of replicates in an experiment. For this, 3 FsLab libraries will be used:
In this tutorial, an in silico generated dataset is used.
FSharp.Data and Deedle are used to load the data into the fsi.
openFSharp.DataopenDeedle// Load the data letrawData=Http.RequestString@"https://raw.githubusercontent.com/fslaborg/datasets/main/data/InSilicoGeneExpression.csv"// Create a deedle frame and index the rows with the values of the "Key" column.letrawFrame:Frame<string,string>=Frame.ReadCsvString(rawData)|>Frame.indexRows"Key"
To tackle this, missing values can be substituted in a step called imputation. Different approaches for this exist. Here a k-nearest neighbour imputation is shown, which works as follows:
For each observation with missing values, the k most similar other observations are chosen. Then the missing value of this observation is substituted by the mean of these values in the neighbouring observations.
openFSharp.StatsopenFSharp.Stats.ML// Select the imputation method: kNearestImpute where the 2 nearest observations are consideredletkn:Impute.MatrixBaseImputation<float[],float>=Impute.kNearestImpute2// Impute the missing values using the "imputeBy" function. The values of the deedle frame are first transformed into the input type of this function.letimputedData=rawFrame|>Frame.toJaggedArray|>Impute.imputeByknOps.isNan// Creating a new frame from the old keys and the new imputed dataletimputedFrame=Frame.ofJaggedArrayimputedData|>Frame.indexRowsWithrawFrame.RowKeys|>Frame.indexColsWithrawFrame.ColumnKeys
To sort the level of closeness between samples, we perform a hierarchical clustering. Details about this can be found here and here.
openFSharp.Stats.ML.Unsupervised// Retreive the sample columns from the frameletsamples=imputedFrame|>Frame.getNumericCols|>Series.observations|>Seq.map(fun(k,vs)->k,vs|>Series.values)// Run the hierarchical clustering on the samples// The clustering is performed on labeled samples (name,values) so that these labels later appear in the cluster treeletclustering=HierarchicalClustering.generate(fun(name1,values1)(name2,values2)->DistanceMetrics.euclideanvalues1values2)// perform the distance calculation only on the values, not the labelsHierarchicalClustering.Linker.wardLwLinkersamples|>HierarchicalClustering.mapClusterLeaftagsfst// only keep the labels in the cluster tree
Finally, the clustering results can be visualized to check for replicate clustering. For this we use Cyjs.NET, an FsLab library which makes use of the Cytoscape.js network visualization tool.
Further information about styling the graphs can be found here.
openCyjs.NET// Function for flattening the cluster tree to an edgelistlethClustToEdgeList(f:int->'T)(hClust:HierarchicalClustering.Cluster<'T>)=letrecloop(d,nodeLabel)cluster=matchclusterwith|HierarchicalClustering.Node(id,dist,_,c1,c2)->lett=fidloop(dist,t)c1|>List.append(loop(dist,t)c2)|>List.append[nodeLabel,t,d]|HierarchicalClustering.Leaf(_,_,label)->[(nodeLabel,label,d)]loop(0.,f0)hClustletrawEdgeList=hClustToEdgeList(string)clustering// The styled vertices, samnples are coloured based on the condition they belong to. So replicates of one condition have the same colourletcytoVertices=rawEdgeList|>List.collect(fun(v1,v2,w)->[v1;v2])|>List.distinct|>List.map(funv->letlabel,color,size=matchv.Split'_'with|[|"Condition0";_|]->"Condition0","#6FB1FC","40"|[|"Condition1";_|]->"Condition1","#EDA1ED","40"|[|"Condition2";_|]->"Condition2","#F5A45D","40"|_->"","#DDDDDD","10"letstyling=[CyParam.labellabel;CyParam.colorcolor;CyParam.widthsize]Elements.node(v)styling)// Helper function to transform the distances between samples to weightsletdistanceToWeight=letmax=rawEdgeList|>List.map(fun(a,b,c)->c)|>List.maxfundistance->1.-(distance/max)// Styled edgesletcytoEdges=rawEdgeList|>List.mapi(funi(v1,v2,weight)->letstyling=[CyParam.weight(distanceToWeightweight)]Elements.edge("e"+stringi)v1v2styling)// Resulting cytographletcytoGraph=CyGraph.initEmpty()|>CyGraph.withElementscytoVertices|>CyGraph.withElementscytoEdges|>CyGraph.withStyle"node"[CyParam.content=.CyParam.labelCyParam.shape=.CyParam.shapeCyParam.color=.CyParam.colorCyParam.width=.CyParam.width]|>CyGraph.withLayout(Layout.initCose(id))
// Send the cytograph to the browsercytoGraph|>CyGraph.show
As can be seen in the graph, replicates of one condition cluster together. This is a good sign for the quality of the experiment.
If one replicate of a condition does not behave this way, it can be considered an outlier.
If the replicates don't cluster together at all, there might be some problems with the experiment.
namespace Deedle
namespace Deedle.Internal
type IFsiFormattable =
abstract member Format : unit -> string
Multiple items val string : value:'T -> string <summary>Converts the argument to a string using <c>ToString</c>.</summary> <remarks>For standard integer and floating point values the and any type that implements <c>IFormattable</c><c>ToString</c> conversion uses <c>CultureInfo.InvariantCulture</c>. </remarks> <param name="value">The input value.</param> <returns>The converted string.</returns>
-------------------- type string = System.String <summary>An abbreviation for the CLI type <see cref="T:System.String" />.</summary> <category>Basic Types</category>
val indexRows : column:'C -> frame:Frame<'R1,'C> -> Frame<'R2,'C> (requires equality and equality and equality)
static member FrameExtensions.Print : frame:Frame<'K,'V> -> unit (requires equality and equality) static member FrameExtensions.Print : frame:Frame<'K,'V> * printTypes:bool -> unit (requires equality and equality)
namespace FSharp.Stats
namespace FSharp.Stats.ML
val kn : Impute.MatrixBaseImputation<float [],float>
module Impute
from FSharp.Stats.ML <summary>
Module for data imputation and missing value filtering
</summary>
type MatrixBaseImputation<'a,'b> = seq<'a> -> 'a -> int -> 'b <summary>
Type definintion for a vector based imputation
The imputed values are based on the given whole dataset
</summary>
Multiple items val float : value:'T -> float (requires member op_Explicit) <summary>Converts the argument to 64-bit float. This is a direct conversion for all
primitive numeric types. For strings, the input is converted using <c>Double.Parse()</c>
with InvariantCulture settings. Otherwise the operation requires an appropriate
static conversion method on the input type.</summary> <param name="value">The input value.</param> <returns>The converted float</returns>
-------------------- [<Struct>]
type float = System.Double <summary>An abbreviation for the CLI type <see cref="T:System.Double" />.</summary> <category>Basic Types</category>
-------------------- type float<'Measure> =
float <summary>The type of double-precision floating point numbers, annotated with a unit of measure.
The unit of measure is erased in compiled code and when values of this type
are analyzed using reflection. The type is representationally equivalent to
<see cref="T:System.Double" />.</summary> <category index="6">Basic Types with Units of Measure</category>
val kNearestImpute : k:int -> data:seq<float []> -> arr:float [] -> index:int -> float <summary>
Imputation by k-nearest neighbour
</summary>
val imputedData : float [] []
val toJaggedArray : frame:Frame<'R,'C> -> float [] [] (requires equality and equality)
val imputeBy : impute:Impute.MatrixBaseImputation<'a [],'a> -> isMissing:('a -> bool) -> data:seq<#seq<'a>> -> 'a [] [] <summary>
Imputes rows by matrix-based imputation
</summary>
module Ops
from FSharp.Stats <summary>
Operations module (automatically opened)
</summary>
val isNan : num:'b -> bool (requires equality) <summary>
Returs true if x is nan (generics) equality
</summary>
val imputedFrame : Frame<string,string>
static member Frame.ofJaggedArray : jArray:'T [] [] -> Frame<int,int>
val indexRowsWith : keys:seq<'R2> -> frame:Frame<'R1,'C> -> Frame<'R2,'C> (requires equality and equality and equality)
property Frame.RowKeys: seq<string> with get
val indexColsWith : keys:seq<'C2> -> frame:Frame<'R,'C1> -> Frame<'R,'C2> (requires equality and equality and equality)
property Frame.ColumnKeys: seq<string> with get
namespace FSharp.Stats.ML.Unsupervised
val samples : seq<string * seq<float>>
val getNumericCols : frame:Frame<'R,'C> -> Series<'C,Series<'R,float>> (requires equality and equality)
Multiple items module Series
from Deedle
-------------------- type Series =
static member ofNullables : values:seq<Nullable<'a0>> -> Series<int,'a0> (requires default constructor and value type and 'a0 :> ValueType)
static member ofObservations : observations:seq<'c * 'd> -> Series<'c,'d> (requires equality)
static member ofOptionalObservations : observations:seq<'K * 'a1 option> -> Series<'K,'a1> (requires equality)
static member ofValues : values:seq<'a> -> Series<int,'a>
-------------------- type Series<'K,'V (requires equality)> =
interface IFsiFormattable
interface ISeries<'K>
new : index:IIndex<'K> * vector:IVector<'V> * vectorBuilder:IVectorBuilder * indexBuilder:IIndexBuilder -> Series<'K,'V> + 3 overloads
member After : lowerExclusive:'K -> Series<'K,'V>
member Aggregate : aggregation:Aggregation<'K> * keySelector:Func<DataSegment<Series<'K,'V>>,'TNewKey> * valueSelector:Func<DataSegment<Series<'K,'V>>,OptionalValue<'R>> -> Series<'TNewKey,'R> (requires equality) + 1 overload
member AsyncMaterialize : unit -> Async<Series<'K,'V>>
member Before : upperExclusive:'K -> Series<'K,'V>
member Between : lowerInclusive:'K * upperInclusive:'K -> Series<'K,'V>
member Compare : another:Series<'K,'V> -> Series<'K,Diff<'V>>
member Convert : forward:Func<'V,'R> * backward:Func<'R,'V> -> Series<'K,'R>
...
-------------------- new : pairs:seq<System.Collections.Generic.KeyValuePair<'K,'V>> -> Series<'K,'V> new : keys:seq<'K> * values:seq<'V> -> Series<'K,'V> new : keys:'K [] * values:'V [] -> Series<'K,'V> new : index:Indices.IIndex<'K> * vector:IVector<'V> * vectorBuilder:Vectors.IVectorBuilder * indexBuilder:Indices.IIndexBuilder -> Series<'K,'V>
val observations : series:Series<'K,'T> -> seq<'K * 'T> (requires equality)
Multiple items module Seq
from FSharp.Stats <summary>
Module to compute common statistical measure
</summary>
-------------------- module Seq
from Microsoft.FSharp.Collections <summary>Contains operations for working with values of type <see cref="T:Microsoft.FSharp.Collections.seq`1" />.</summary>
val map : mapping:('T -> 'U) -> source:seq<'T> -> seq<'U> <summary>Builds a new collection whose elements are the results of applying the given function
to each of the elements of the collection. The given function will be applied
as elements are demanded using the <c>MoveNext</c> method on enumerators retrieved from the
object.</summary> <remarks>The returned sequence may be passed between threads safely. However,
individual IEnumerator values generated from the returned sequence should not be accessed concurrently.</remarks> <param name="mapping">A function to transform items from the input sequence.</param> <param name="source">The input sequence.</param> <returns>The result sequence.</returns> <exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
val k : string
val vs : Series<string,float>
val values : series:Series<'K,'T> -> seq<'T> (requires equality)
val clustering : HierarchicalClustering.Cluster<string>
module HierarchicalClustering
from FSharp.Stats.ML.Unsupervised <summary>
Agglomerative hierarchical clustering
</summary>
val generate : distance:DistanceMetrics.Distance<'T> -> linker:HierarchicalClustering.Linker.LancWilliamsLinker -> data:seq<'T> -> HierarchicalClustering.Cluster<'T> <summary>
Builds a hierarchy of clusters of data containing cluster labels
</summary>
val name1 : string
val values1 : seq<float>
val name2 : string
val values2 : seq<float>
module DistanceMetrics
from FSharp.Stats.ML <summary>
Functions for computing distances of elements or sets
</summary>
val euclidean : s1:seq<'a> -> s2:seq<'a> -> 'c (requires member ( - ) and member get_Zero and member ( + ) and member Sqrt and member ( * )) <summary>
Euclidean distance of two coordinate sequences
</summary>
module Linker
from FSharp.Stats.ML.Unsupervised.HierarchicalClustering <summary>
The linkage criterion determines the distance between sets of observations as a function of the pairwise distances between observations
</summary>
val wardLwLinker : int * int * int -> dAB:float -> dAC:float -> dBC:float -> float <summary>
Ward linkage criterion (UPGMA)
Calculates the
d(A u B, C)
</summary>
val mapClusterLeaftags : mapF:('T -> 'U) -> cluster:HierarchicalClustering.Cluster<'T> -> HierarchicalClustering.Cluster<'U> <summary>
Maps the tags of the leafs of the cluster by applying a given mapping function
</summary>
val fst : tuple:('T1 * 'T2) -> 'T1 <summary>Return the first element of a tuple, <c>fst (a,b) = a</c>.</summary> <param name="tuple">The input tuple.</param> <returns>The first value.</returns>
namespace Cyjs
namespace Cyjs.NET
val hClustToEdgeList : f:(int -> 'T) -> hClust:HierarchicalClustering.Cluster<'T> -> ('T * 'T * float) list
val f : (int -> 'T)
Multiple items val int : value:'T -> int (requires member op_Explicit) <summary>Converts the argument to signed 32-bit integer. This is a direct conversion for all
primitive numeric types. For strings, the input is converted using <c>Int32.Parse()</c>
with InvariantCulture settings. Otherwise the operation requires an appropriate
static conversion method on the input type.</summary> <param name="value">The input value.</param> <returns>The converted int</returns>
-------------------- [<Struct>]
type int = int32 <summary>An abbreviation for the CLI type <see cref="T:System.Int32" />.</summary> <category>Basic Types</category>
-------------------- type int<'Measure> =
int <summary>The type of 32-bit signed integer numbers, annotated with a unit of measure. The unit
of measure is erased in compiled code and when values of this type
are analyzed using reflection. The type is representationally equivalent to
<see cref="T:System.Int32" />.</summary> <category>Basic Types with Units of Measure</category>
val hClust : HierarchicalClustering.Cluster<'T>
type Cluster<'T> =
| Node of int * float * int * Cluster<'T> * Cluster<'T>
| Leaf of int * int * 'T <summary>
Binary distance tree
</summary>
union case HierarchicalClustering.Cluster.Node: int * float * int * HierarchicalClustering.Cluster<'T> * HierarchicalClustering.Cluster<'T> -> HierarchicalClustering.Cluster<'T> <summary>
ID * distance * leafCount * cluster left * cluster right
</summary>
val id : int
val dist : float
val c1 : HierarchicalClustering.Cluster<'T>
val c2 : HierarchicalClustering.Cluster<'T>
val t : 'T
Multiple items module List
from FSharp.Stats <summary>
Module to compute common statistical measure on list
</summary>
-------------------- module List
from Microsoft.FSharp.Collections <summary>Contains operations for working with values of type <see cref="T:Microsoft.FSharp.Collections.list`1" />.</summary> <namespacedoc><summary>Operations for collections such as lists, arrays, sets, maps and sequences. See also
<a href="https://docs.microsoft.com/dotnet/fsharp/language-reference/fsharp-collection-types">F# Collection Types</a> in the F# Language Guide.
</summary></namespacedoc>
-------------------- type List<'T> =
| ( [] )
| ( :: ) of Head: 'T * Tail: 'T list
interface IReadOnlyList<'T>
interface IReadOnlyCollection<'T>
interface IEnumerable
interface IEnumerable<'T>
member GetReverseIndex : rank:int * offset:int -> int
member GetSlice : startIndex:int option * endIndex:int option -> 'T list
static member Cons : head:'T * tail:'T list -> 'T list
member Head : 'T
member IsEmpty : bool
member Item : index:int -> 'T with get
... <summary>The type of immutable singly-linked lists.</summary> <remarks>Use the constructors <c>[]</c> and <c>::</c> (infix) to create values of this type, or
the notation <c>[1;2;3]</c>. Use the values in the <c>List</c> module to manipulate
values of this type, or pattern match against the values directly.
</remarks> <exclude />
val append : list1:'T list -> list2:'T list -> 'T list <summary>Returns a new list that contains the elements of the first list
followed by elements of the second.</summary> <param name="list1">The first input list.</param> <param name="list2">The second input list.</param> <returns>The resulting list.</returns>
union case HierarchicalClustering.Cluster.Leaf: int * int * 'T -> HierarchicalClustering.Cluster<'T> <summary>
ID * leafCount * Tag
</summary>
val label : 'T
val rawEdgeList : (string * string * float) list
val cytoVertices : Elements.Node list
val collect : mapping:('T -> 'U list) -> list:'T list -> 'U list <summary>For each element of the list, applies the given function. Concatenates all the results and return the combined list.</summary> <param name="mapping">The function to transform each input element into a sublist to be concatenated.</param> <param name="list">The input list.</param> <returns>The concatenation of the transformed sublists.</returns>
val v1 : string
val v2 : string
val w : float
val distinct : list:'T list -> 'T list (requires equality) <summary>Returns a list that contains no duplicate entries according to generic hash and
equality comparisons on the entries.
If an element occurs multiple times in the list then the later occurrences are discarded.</summary> <param name="list">The input list.</param> <returns>The result list.</returns>
val map : mapping:('T -> 'U) -> list:'T list -> 'U list <summary>Builds a new collection whose elements are the results of applying the given function
to each of the elements of the collection.</summary> <param name="mapping">The function to transform elements from the input list.</param> <param name="list">The input list.</param> <returns>The list of transformed elements.</returns>
val node : id:string -> dataAttributes:CyParam.CyStyleParam list -> Elements.Node
val distanceToWeight : (float -> float)
val max : float
val a : string
val b : string
val c : float
val max : list:'T list -> 'T (requires comparison) <summary>Return the greatest of all elements of the list, compared via Operators.max.</summary> <remarks>Raises <see cref="T:System.ArgumentException" /> if <c>list</c> is empty</remarks> <param name="list">The input list.</param> <exception cref="T:System.ArgumentException">Thrown when the list is empty.</exception> <returns>The maximum element.</returns>
val distance : float
val cytoEdges : Elements.Edge list
val mapi : mapping:(int -> 'T -> 'U) -> list:'T list -> 'U list <summary>Builds a new collection whose elements are the results of applying the given function
to each of the elements of the collection. The integer index passed to the
function indicates the index (from 0) of element being transformed.</summary> <param name="mapping">The function to transform elements and their indices.</param> <param name="list">The input list.</param> <returns>The list of transformed elements.</returns>
val i : int
val weight : float
val weight : v:'a -> CyParam.CyStyleParam
val edge : id:string -> sourceId:string -> targetId:string -> dataAttributes:CyParam.CyStyleParam list -> Elements.Edge
val cytoGraph : CyGraph.CyGraph
module CyGraph
from Cyjs.NET
val initEmpty : unit -> CytoscapeModel.Cytoscape
val withElements : elems:seq<CytoscapeModel.Element> -> cy:CyGraph.CyGraph -> CyGraph.CyGraph
val withStyle : selector:string -> cyStyles:seq<CyParam.CyStyleParam> -> cy:CyGraph.CyGraph -> CyGraph.CyGraph
val content : v:'a -> CyParam.CyStyleParam
val shape : v:'a -> CyParam.CyStyleParam
val withLayout : ly:Layout -> cy:CyGraph.CyGraph -> CyGraph.CyGraph
Multiple items module Layout
from Cyjs.NET
-------------------- type Layout =
inherit DynamicObj
new : name:string -> Layout
member name : string <summary>
Layout type inherits from dynamic object
</summary>
-------------------- new : name:string -> Layout
val initCose : applyOption:(Layout -> Layout) -> Layout <summary>
initializes a layout of type "cose" applying the givin layout option function.
The cose (Compound Spring Embedder) layout uses a physics simulation to lay out graphs.
</summary>
val id : x:'T -> 'T <summary>The identity function</summary> <param name="x">The input value.</param> <returns>The same value.</returns>
val withSize : width:int * height:int -> cy:CyGraph.CyGraph -> CyGraph.CyGraph
module HTML
from Cyjs.NET <summary>
HTML template for Cytoscape
</summary>
val toEmbeddedHTML : cy:CytoscapeModel.Cytoscape -> string <summary>
Converts a CyGraph to it HTML representation and embeds it into a html page.
</summary>