When you want to compare e.g. intensity measurements of elements between samples, you often have to normalize the samples in order
to be able to perform a valid comparison. Samples may vary in their average intensity just because of the technical nature of the measurement itself,
thereby shifting or distorting the underlying correct/real distribution. For the presented normalization strategies it is assumed that global changes across the samples are due to unwanted technical variability and only
a small number of elements are dysregulated (Zhao et al, 2020).
To correct for batch effects or technical measurment differences the median of ratios normalization (MOR) can be applied.
As explained above, it expects that the majority of genes/proteins/elements do not differ between the conditions that should be compared Love et al. 2014.
At first a reference sample is determined by calculating the geometric mean of each of the n elements across all m samples. The calculation of the geometric mean as
the nth square root of the product of all values (1) seems odd, but when displayed as mean of the log-transformed data (2) it becomes clear, that the geometric
mean is just an outlier-insensitive measure of a robust average, which is intuitive to do when dealing with biological data or in general data with some extreme outliers.
After a reference sample for each row is determined, for each entry a correction factor to reach the reference is calculated. Each sample now has n associated correction factors.
By taking the median of those individual correction factors, an outlier-insensitive measure of the true correction factor is determined. By choosing the median it is assumed that in theory >=50% of all
measured elements should not change between the measured samples.
No prior log-transform has to be applied before normalizing the data with this method. If required a log transform can be applied to the normalized values to restore homoscedasticity.
In the following a step-by-step introduction is given with finally applying Signal.Normalization.medianOfRatios function to the given data.
1. Raw intensities for m=3 samples and n=6 elements:
elementID
sample1
sample2
sample3
g00
100
130
30
g01
80
200
30
g02
0
50
0
g03
40
50
20
g04
50
45
25
g05
40
50
15
2. Zero intensities cannot be processed during the MOR-normalization.
Either imputation, row filtering or data transformation must be performed. In this example a single count is added to each intensity.
Note that this is not necessarily reasonable for every analysis! The reference sample is determined by calculating the geometric mean of every row.
elementID
sample1
sample2
sample3
reference
g00
101
131
31
74.30
g01
81
201
31
79.62
g02
1
51
1
3.71
g03
41
51
21
35.28
g04
51
56
26
42.03
g05
41
51
16
32.22
3. For every element the correction factor is determined by dividing it by its reference.
elementID
sample1
sample2
sample3
reference
corr1
corr2
corr3
g00
101
131
31
74.30
1.359
1.763
0.417
g01
81
201
31
79.62
1.017
2.525
0.389
g02
1
51
1
3.71
0.270
13.752
0.270
g03
41
51
21
35.28
1.162
1.446
0.595
g04
51
56
26
42.03
1.213
1.332
0.619
g05
41
51
16
32.22
1.272
1.583
0.497
It becomes apparent, that sample1 and sample2 are overrepresented and sample3 is underepresented. g02 and g03 are obviously outliers that differ between the samples.
These elements should not influence the correction factors. By determining the median of each sample correction column, a outlier insensitive approximation of the true correction
factor is achieved.
4. The median of each correction factor column determines the final correction.
elementID
sample1
sample2
sample3
reference
corr1
corr2
corr3
g00
101
131
31
74.30
1.359
1.763
0.417
g01
81
201
31
79.62
1.017
2.525
0.389
g02
1
51
1
3.71
0.270
13.752
0.270
g03
41
51
21
35.28
1.162
1.446
0.595
g04
51
56
26
42.03
1.213
1.332
0.619
g05
41
51
16
32.22
1.272
1.583
0.497
Median:
1.188
1.673
0.457
5. Apply the correction factors to the original (untransformed) data.
elementID
sample1
sample2
sample3
reference
corr1
corr2
corr3
normedSample1
normedSample2
normedSample3
g00
101
131
31
74.30
1.359
1.763
0.417
84.19
77.71
65.66
g01
81
201
31
79.62
1.017
2.525
0.389
67.35
119.5
565.66
g02
1
51
1
3.71
0.270
13.752
0.270
0.00
29.89
0.00
g03
41
51
21
35.28
1.162
1.446
0.595
33.68
29.89
43.77
g04
51
56
26
42.03
1.213
1.332
0.619
42.10
32.88
54.72
g05
41
51
16
32.22
1.272
1.583
0.497
33.68
29.89
32.83
Median:
1.188
1.673
0.457
The normed intensities of the different elements now match a theoretical common intensity level.
openFSharp.StatsletrawData=[|[|100.;130.;30.|][|80.;200.;30.|][|0.;50.;0.|][|40.;50.;20.|][|50.;45.;25.|][|40.;50.;15.|]|]|>matrix// visualization of the raw dataletrawDataChart=rawData.Transpose|>Matrix.toJaggedArray|>Array.mapi(funsampleIDsample->letsampleIntensities=sample|>Array.mapi(fungIDintensity->gID,intensity)Chart.Column(sampleIntensities,Name=sprintf"raw sample%i"(sampleID+1)))|>Chart.combine|>Chart.withTemplateChartTemplates.lightMirrored|>Chart.withXAxisStyle"gID"|>Chart.withYAxisStyle"raw intensity"|>Chart.withTitle"raw data"
In the above figure you can see, that the green sample is underrepresented, and the orange is overrepresented when compared to the blue one.
No the MOR normalization is applied. If no transformation (+1) should be applied, you can use Signal.Normalization.medianOfRatios.
/// actual median of ratios normalization. 1 is added to each intensity to get rid of the empty intensities.letmor=Signal.Normalization.medianOfRatiosBy(funx->x+1.0)rawData/// Matrix of normalized data in the same order as the input matrixletmorNormedData=mor.NormedData/// Correction factors to assess the strenght of the applied normalization (1 indicates no normalization)letcorrFactors=letcf1=Seq.item0mor.CorrFactorsletcf2=Seq.item1mor.CorrFactorsletcf3=Seq.item2mor.CorrFactorssprintf"Correctionfactor for sample1 is %.3f\nCorrectionfactor for sample2 is %.3f\nCorrectionfactor for sample3 is %.3f"cf1cf2cf3
Correctionfactor for sample1 is 1.217
Correctionfactor for sample2 is 1.673
Correctionfactor for sample3 is 0.457
// visualization of the normed dataletnormedDataChart=morNormedData.Transpose|>Matrix.toJaggedArray|>Array.mapi(funsampleIDsample->letsampleIntensities=sample|>Array.mapi(fungIDintensity->gID,intensity)Chart.Column(sampleIntensities,Name=sprintf"MOR normed sample%i"(sampleID+1)))|>Chart.combine|>Chart.withTemplateChartTemplates.lightMirrored|>Chart.withXAxisStyle"gID"|>Chart.withYAxisStyle"MOR intensity"
The normed data now shows high similarity of the individual element intensities across samples.
To compensate for the technical variance you also can perform a quantile normalization. It is a technique for making two or more
distributions identical in statistical properties and was originally developed for gene expression microarrays. It sees widespread use, constituting a standard part
of analysis pipelines for high-throughput analysis.
You can either quantile normalize data according to a given reference distribution (e.g. Gamma or Normal distribution) or create your own reference distribution out of your samples.
The same dataset as above is used and a quantile normalization is performed. Since no log transform is applied, there is no necessity to add a constant to each intensity:
1. Raw intensities for m=3 samples and n=6 elements:
elementID
sample1
sample2
sample3
g00
100
130
30
g01
80
200
30
g02
0
50
0
g03
40
50
20
g04
50
45
25
g05
40
50
15
2. Intensities are indexed and sorted in ascending order. The row average intensity is determined.
sample1
sample2
sample3
row average
3 -> 0
5 -> 45
3 -> 0
15
4 -> 40
3 -> 50
6 -> 15
35
6 -> 40
4 -> 50
4 -> 20
36.7
5 -> 50
6 -> 50
5 -> 25
41.7
2 -> 80
1 -> 130
1 -> 30
80
1 -> 100
2 -> 200
2 -> 30
110
3. Every intensity is replaced by the row average intensity
sample1
sample2
sample3
row average
3;15
5;15
3;15
15
4;35
3;35
6;35
35
6;36.7
4;36.7
4;36.7
36.7
5;41.7
6;41.7
5;41.7
41.7
2;80
1;80
1;80
80
1;110
2;110
2;110
110
4. Finally the columns are resorted to their original order using the indices. The indices are removed
elementID
normedSample1
normedSample2
normedSample3
g00
110
80
80
g01
80
110
110
g02
15
35
15
g03
35
36.7
36.7
g04
41.7
15
41.7
g05
36.7
41.7
35
letquantileNorm=Signal.Normalization.quantilerawData// visualization of the normed dataletnormedDataQuantileChart=quantileNorm.Transpose|>Matrix.toJaggedArray|>Array.mapi(funsampleIDsample->letsampleIntensities=sample|>Array.mapi(fungIDintensity->gID,intensity)Chart.Column(sampleIntensities,Name=sprintf"quantile normed sample%i"(sampleID+1)))|>Chart.combine|>Chart.withTemplateChartTemplates.lightMirrored|>Chart.withXAxisStyle"gID"|>Chart.withYAxisStyle"qNorm intensity"
As seen in the normalized column chart, the final samples all consists of the same values, just assigned to different row indices according to the rank within the sample.
For data with low row count, this normalization is not appropriate because there is a severe disturbance of the intensity values. The more data is incorporated (more rows), the lower
the single value influence will be.
You can assess the quality of your normalization by performing a PCA prior and after normalization and compare different normalization strategies. While the
correction factors for MOR can be investigated, quantile normalization is a non-linear transformation since every value is normalized by its own rank-row mean.
namespace Plotly
namespace Plotly.NET
module Defaults
from Plotly.NET <summary>
Contains mutable global default values.
Changing these values will apply the default values to all consecutive Chart generations.
</summary>
val mutable DefaultDisplayOptions: DisplayOptions
Multiple items type DisplayOptions =
inherit DynamicObj
new: unit -> DisplayOptions
static member addAdditionalHeadTags: additionalHeadTags: XmlNode list -> (DisplayOptions -> DisplayOptions)
static member addDescription: description: XmlNode list -> (DisplayOptions -> DisplayOptions)
static member combine: first: DisplayOptions -> second: DisplayOptions -> DisplayOptions
static member getAdditionalHeadTags: displayOpts: DisplayOptions -> XmlNode list
static member getDescription: displayOpts: DisplayOptions -> XmlNode list
static member getPlotlyReference: displayOpts: DisplayOptions -> PlotlyJSReference
static member init: ?AdditionalHeadTags: XmlNode list * ?Description: XmlNode list * ?PlotlyJSReference: PlotlyJSReference -> DisplayOptions
static member initCDNOnly: unit -> DisplayOptions
...
-------------------- new: unit -> DisplayOptions
static member DisplayOptions.init: ?AdditionalHeadTags: Giraffe.ViewEngine.HtmlElements.XmlNode list * ?Description: Giraffe.ViewEngine.HtmlElements.XmlNode list * ?PlotlyJSReference: PlotlyJSReference -> DisplayOptions
type PlotlyJSReference =
| CDN of string
| Full
| Require of string
| NoReference <summary>
Sets how plotly is referenced in the head of html docs.
</summary>
union case PlotlyJSReference.NoReference: PlotlyJSReference
module StyleParam
from Plotly.NET
namespace Plotly.NET.LayoutObjects
Multiple items namespace FSharp
-------------------- namespace Microsoft.FSharp
namespace FSharp.Stats
val rawData: Matrix<float>
Multiple items val matrix: ll: seq<#seq<float>> -> Matrix<float>
-------------------- type matrix = Matrix<float>
val rawDataChart: GenericChart.GenericChart
property Matrix.Transpose: Matrix<float> with get
Multiple items module Matrix
from FSharp.Stats
-------------------- type Matrix<'T> =
| DenseRepr of DenseMatrix<'T>
| SparseRepr of SparseMatrix<'T>
interface IMatrixFormattable
interface IFsiFormattable
interface IEnumerable
interface IEnumerable<'T>
interface IStructuralEquatable
interface IStructuralComparable
interface IComparable
override Equals: yobj: obj -> bool
member Format: rowStartCount: int * rowEndCount: int * columnStartCount: int * columnEndCount: int * showInfo: bool -> string + 2 overloads
member FormatStrings: rowStartCount: int * rowEndCount: int * columnStartCount: int * columnEndCount: int -> string[][]
...
val toJaggedArray: m: matrix -> float[][]
Multiple items module Array
from FSharp.Stats <summary>
Module to compute common statistical measure on array
</summary>
-------------------- module Array
from Microsoft.FSharp.Collections <summary>Contains operations for working with arrays.</summary> <remarks>
See also <a href="https://docs.microsoft.com/dotnet/fsharp/language-reference/arrays">F# Language Guide - Arrays</a>.
</remarks>
-------------------- type Array =
new: unit -> Array
static member geomspace: start: float * stop: float * num: int * ?IncludeEndpoint: bool -> float array
static member linspace: start: float * stop: float * num: int * ?IncludeEndpoint: bool -> float[]
-------------------- new: unit -> Array
val mapi: mapping: (int -> 'T -> 'U) -> array: 'T[] -> 'U[] <summary>Builds a new array whose elements are the results of applying the given function
to each of the elements of the array. The integer index passed to the
function indicates the index of element being transformed, starting at zero.</summary> <param name="mapping">The function to transform elements and their indices.</param> <param name="array">The input array.</param> <returns>The array of transformed elements.</returns> <exception cref="T:System.ArgumentNullException">Thrown when the input array is null.</exception> <example id="mapi-1"><code lang="fsharp">
let inputs = [| 10; 10; 10 |]
inputs |> Array.mapi (fun i x -> i + x)
</code>
Evaluates to <c>[| 10; 11; 12 |]</c></example>
val sprintf: format: Printf.StringFormat<'T> -> 'T <summary>Print to a string using the given format.</summary> <param name="format">The formatter.</param> <returns>The formatted result.</returns> <example>See <c>Printf.sprintf</c> (link: <see cref="M:Microsoft.FSharp.Core.PrintfModule.PrintFormatToStringThen``1" />) for examples.</example>
static member Chart.combine: gCharts: seq<GenericChart.GenericChart> -> GenericChart.GenericChart
static member Chart.withTemplate: template: Template -> (GenericChart.GenericChart -> GenericChart.GenericChart)
module ChartTemplates
from Plotly.NET
val lightMirrored: Template
static member Chart.withXAxisStyle: ?TitleText: string * ?TitleFont: Font * ?TitleStandoff: int * ?Title: Title * ?Color: Color * ?AxisType: AxisType * ?MinMax: (#System.IConvertible * #System.IConvertible) * ?Mirror: Mirror * ?ShowSpikes: bool * ?SpikeColor: Color * ?SpikeThickness: int * ?ShowLine: bool * ?LineColor: Color * ?ShowGrid: bool * ?GridColor: Color * ?GridDash: DrawingStyle * ?ZeroLine: bool * ?ZeroLineColor: Color * ?Anchor: LinearAxisId * ?Side: Side * ?Overlaying: LinearAxisId * ?Domain: (float * float) * ?Position: float * ?CategoryOrder: CategoryOrder * ?CategoryArray: seq<#System.IConvertible> * ?RangeSlider: RangeSlider * ?RangeSelector: RangeSelector * ?BackgroundColor: Color * ?ShowBackground: bool * ?Id: SubPlotId -> (GenericChart.GenericChart -> GenericChart.GenericChart)
static member Chart.withYAxisStyle: ?TitleText: string * ?TitleFont: Font * ?TitleStandoff: int * ?Title: Title * ?Color: Color * ?AxisType: AxisType * ?MinMax: (#System.IConvertible * #System.IConvertible) * ?Mirror: Mirror * ?ShowSpikes: bool * ?SpikeColor: Color * ?SpikeThickness: int * ?ShowLine: bool * ?LineColor: Color * ?ShowGrid: bool * ?GridColor: Color * ?GridDash: DrawingStyle * ?ZeroLine: bool * ?ZeroLineColor: Color * ?Anchor: LinearAxisId * ?Side: Side * ?Overlaying: LinearAxisId * ?AutoShift: bool * ?Shift: int * ?Domain: (float * float) * ?Position: float * ?CategoryOrder: CategoryOrder * ?CategoryArray: seq<#System.IConvertible> * ?RangeSlider: RangeSlider * ?RangeSelector: RangeSelector * ?BackgroundColor: Color * ?ShowBackground: bool * ?Id: SubPlotId -> (GenericChart.GenericChart -> GenericChart.GenericChart)
static member Chart.withTitle: title: Title -> (GenericChart.GenericChart -> GenericChart.GenericChart) static member Chart.withTitle: title: string * ?TitleFont: Font -> (GenericChart.GenericChart -> GenericChart.GenericChart)
module GenericChart
from Plotly.NET <summary>
Module to represent a GenericChart
</summary>
val toChartHTML: gChart: GenericChart.GenericChart -> string
val mor: Signal.Normalization.MorResult actual median of ratios normalization. 1 is added to each intensity to get rid of the empty intensities.
namespace FSharp.Stats.Signal
module Normalization
from FSharp.Stats.Signal
val medianOfRatiosBy: f: (float -> float) -> data: Matrix<float> -> Signal.Normalization.MorResult <summary>
Median of ratios normalization As used by Deseq2, see: https://github.com/hbctraining/DGE_workshop/blob/master/lessons/02_DGE_count_normalization.md .
Rows are genes, columns are samples
</summary> <param name="f">The transformation function is applied on all values of the matrix before calculating the normalization factors.</param> <param name="data">data matrix with columns as features (samples,time points) and rows as measured entities (genes,proteins).</param> <returns>Normalized data matrix with correction factors and normalization function.</returns> <example><code>
// raw data with proteins as rows and samples as columns
let myData = Matrix.init 500 5 (fun _ _ -> rnd.NextDouble())
let normedData = Normalization.medianOfRatiosBy (fun x -> ln (x+1)) myData
</code></example>
val x: float
val morNormedData: Matrix<float> Matrix of normalized data in the same order as the input matrix
val corrFactors: string Correction factors to assess the strenght of the applied normalization (1 indicates no normalization)
val cf1: float
Multiple items module Seq
from FSharp.Stats <summary>
Module to compute common statistical measure
</summary>
-------------------- module Seq
from Microsoft.FSharp.Collections <summary>Contains operations for working with values of type <see cref="T:Microsoft.FSharp.Collections.seq`1" />.</summary>
-------------------- type Seq =
new: unit -> Seq
static member geomspace: start: float * stop: float * num: int * ?IncludeEndpoint: bool -> seq<float>
static member linspace: start: float * stop: float * num: int * ?IncludeEndpoint: bool -> seq<float>
-------------------- new: unit -> Seq
val item: index: int -> source: seq<'T> -> 'T <summary>Computes the element at the specified index in the collection.</summary> <param name="index">The index of the element to retrieve.</param> <param name="source">The input sequence.</param> <returns>The element at the specified index of the sequence.</returns> <exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception> <exception cref="T:System.ArgumentException">Thrown when the index is negative or the input sequence does not contain enough elements.</exception> <example id="item-1"><code lang="fsharp">
let inputs = ["a"; "b"; "c"]
inputs |> Seq.item 1
</code>
Evaluates to <c>"b"</c></example> <example id="item-2"><code lang="fsharp">
let inputs = ["a"; "b"; "c"]
inputs |> Seq.item 4
</code>
Throws <c>ArgumentException</c></example>
val quantile: data: Matrix<float> -> Matrix<float> <summary>
Quantile normalization with equal number of elements (rows) for each sample (column).
Column mean and column standard deviation are qual after normalization.
Rows are genes, columns are samples.
</summary> <param name="data">data matrix with columns as measured entities and rows as features (samples,time points) (genes,proteins).</param> <returns>Normalized data matrix.</returns> <example><code>
// raw data with proteins as rows and samples as columns
let myData = Matrix.init 500 5 (fun _ _ -> rnd.NextDouble())
let normedData = Normalization.quantile myData
</code></example>
val normedDataQuantileChart: GenericChart.GenericChart