Signal

Summary: this tutorial demonstrates multiple ways of signal processing with FSharp.Stats.

Outliers

Tukey's fences

A common approach for outlier detection is Tukey's fences-method. It determines the interquartile range (IQR) of the data and adds a fraction of it to the third quartile (Q3) or subtracts it from the first quartile (Q1) respectively. An often-used fraction of the IQR is k=1.5 for outliers and k=3 for points 'far out'.

In the generation of box plots the same method determines the whiskers and outliers of a sample.

Reference:

• Tukey, JW. Exploratory data analysis. Addison-Wesely, 1977
open FSharp.Stats
open FSharp.Collections

let sampleO1 = [|45.;42.;45.5;43.;47.;51.;34.;45.;44.;46.;48.;37.;46.;|]

let outlierBordersO1 = Signal.Outliers.tukey 1.5 sampleO1

let lowerBorderO1 = Intervals.getStart outlierBordersO1
// result: 37.16667

let upperBorderO1 = Intervals.getEnd outlierBordersO1
// result: 51.83333

open Plotly.NET

//some axis styling
let myAxis title = Axis.LinearAxis.init(Title=title,Mirror=StyleParam.Mirror.All,Ticks=StyleParam.TickOptions.Inside,Showgrid=false,Showline=true,Zeroline=true)
let myAxisRange name range = Axis.LinearAxis.init(Title=name,Range=StyleParam.Range.MinMax(range), Mirror=StyleParam.Mirror.All,Ticks=StyleParam.TickOptions.Inside,Showgrid=false,Showline=true)
let styleChart x y chart = chart |> Chart.withX_Axis (myAxis x) |> Chart.withY_Axis (myAxis y)
let styleChartRangeY x y mi ma chart = chart |> Chart.withX_Axis (myAxis x) |> Chart.withY_Axis (myAxisRange y (mi,ma))

let (inside,outside) =
sampleO1
|> Array.partition (fun x -> Intervals.liesInInterval x outlierBordersO1)

let tukeyOutlierChart =
[
Chart.Point(inside |> Seq.map (fun x -> 1,x),"sample")
Chart.Point(outside |> Seq.map (fun x -> 1,x),"outliers")
]
|> Chart.Combine
|> Chart.withShapes(
[
Shape.init(StyleParam.ShapeType.Line,0.5,1.5,lowerBorderO1,lowerBorderO1,Line=Line.init(Dash=StyleParam.DrawingStyle.Dash,Color="grey"))
Shape.init(StyleParam.ShapeType.Line,0.5,1.5,upperBorderO1,upperBorderO1,Line=Line.init(Dash=StyleParam.DrawingStyle.Dash,Color="grey"))
]
)
|> styleChartRangeY "" "" 30. 60.
|> Chart.withTitle "Tukey's fences outlier borders"



Filtering

Savitzgy-Golay description is coming soon.

open FSharp.Stats

// Savitzky-Golay low-pass filtering
let t  = [|-4. ..(8.0/500.).. 4.|]
let dy  = t |> Array.map (fun t -> (-t**2.) + (Distributions.Continuous.Normal.Sample 0. 0.5) )
let dy' = t |> Array.map (fun t -> (-t**2.))

let dysg = Signal.Filtering.savitzkyGolay  31 4 0 1 dy

let savitzgyChart =
[
Chart.Point(t, dy, Name="data with noise");
Chart.Point(t, dy', Name="data without noise");
Chart.Point(t, dysg, Name="data sg");
]
|> Chart.Combine
|> styleChart "" ""


If convolution operations should be performed on a signal trace, it often is necessary to extend (pad) the data with artificial data points. There are several padding methods:

• Zero: Data points with y-value=zero are introduced. This often is useful when analyzing spectra with sparse data because areas without any data measured are assumed to have zero intensity.

• Random: When the baseline of the measured signal is nonzero like in chromatograms, it is necessary to insert data points with random y-values taken from the original data set.

• Delete: No datapoints are inserted.

• Linear interpolation: When a linear relationship is assumed in the range between two adjacent data points, the padding points should lie on the straight line between those points.

Three regions can be defined where padding points could be introduced:

1. In the beginning and end of the data set artificial data points have to be added to analyse the start- and end-regions of the data. Therefore, random data points are chosen from the original data set.
2. If the data is not measured in discrete intervals, the region between two adjacent values have to be padded to ensure sufficient coverage for convolution.
3. If the gap between two adjacent points is too large, another padding method than in case 2. may be chosen.
open FSharp.Stats.Signal

//get raw data
let data =
|> Array.map (fun x ->
let tmp = x.Split([|'\t'|])
float tmp.[0],float tmp.[1])

///interpolate data point y-values when small gaps are present

///take random data point y-values when huge gaps are between data points

///padd the start and end of the signal with random data points

///the maximal distance that is allowed between data points is the minimum spacing divided by 2
let minDistance =

//maximal allowed gap between datapoints where internalPaddingMethod can be applied.
//if internalPaddingMethod = hugeGapPaddingMethod, then it does not matter which value is chosen
let maxSpacing = 10.

//since were dealing with floats the functions are (-) and (+)

//number of datapoints the dataset gets expanded to the left and to the rigth

//if a gap is greater than 10. the HugeGapPaddingMethod is applied

let myYAxis() =
Axis.LinearAxis.init(
Title="Temperature",Mirror=StyleParam.Mirror.All,Ticks=StyleParam.TickOptions.Inside,Showline=true)
let myXAxis() =
Axis.LinearAxis.init(
Title="Time",Mirror=StyleParam.Mirror.All,Ticks=StyleParam.TickOptions.Inside,Showline=true)
[
Chart.Area (data,Name = "rawData")
]
|> Chart.Combine
|> Chart.withX_Axis (myXAxis())
|> Chart.withY_Axis (myYAxis())
|> Chart.withSize(900.,450.)


Example for a linear interpolation as huge gap padding method

//get the padded data
//if a gap is greater than 10. the LinearInterpolation padding method is applied

let axis() = Axis.LinearAxis.init(Mirror=StyleParam.Mirror.All,Ticks=StyleParam.TickOptions.Inside,Showline=true)
let axisWithTitle title= Axis.LinearAxis.init(Title=title,Mirror=StyleParam.Mirror.All,Ticks=StyleParam.TickOptions.Inside,Showline=true)

[
Chart.Area (data,Name = "rawData")
]
|> Chart.Combine
|> Chart.withX_Axis (axisWithTitle "Time")
|> Chart.withY_Axis (axisWithTitle "Temperature")
|> Chart.withSize(900.,450.)


Wavelet

Continuous Wavelet

The Continuous Wavelet Transform (CWT) is a multiresolution analysis method to gain insights into frequency components of a signal with simultaneous temporal classification. Wavelet in this context stands for small wave and describes a window function which is convoluted with the original signal at every position in time. Many wavelets exist, every one of them is useful for a certain application, thereby 'searching' for specific patterns in the data. By increasing the dimensions (scale) of the wavelet function, different frequency patterns are studied.

In contrast to the Fourier transform, that gives a perfect frequency resolution but no time resolution, the CWT is capable of mediating between the two opposing properties of time resolution and frequency resolution (Heisenberg's uncertainty principle).

For further information please visit The Wavelet Tutorial.

open FSharp.Stats
open StyleParam

///Array containing wavelets of all scales that should be investigated. The propagated frequency corresponds to 4 * Ricker.Scale
let rickerArray =
[|2. .. 10.|] |> Array.map (fun x -> Wavelet.createRicker (x**1.8))

///the data already was padded with 1000 additional datapoints in the beginning and end of the data set (see above).
///Not it is transformed with the previous defined wavelets.
let transformedData =
rickerArray
|> Array.map (fun wavelet -> ContinuousWavelet.transform paddedData (-) 1000 wavelet)

///combining the raw and transformed data in one chart
let combinedChart =
//CWT-chart
let heatmap =
let colNames =
transformedData.[0] |> Array.map fst
transformedData
|> JaggedArray.map snd
|> fun x -> Chart.Heatmap(x,ColNames=colNames,Showscale=false)
|> Chart.withAxisAnchor(X=1)
|> Chart.withAxisAnchor(Y=1)

//Rawchart
let rawChart =
Chart.Area (data,Color = "#1f77b4",Name = "rawData")
|> Chart.withAxisAnchor(X=2)
|> Chart.withAxisAnchor(Y=2)