Signal

Binder

Summary: this tutorial demonstrates multiple ways of signal processing with FSharp.Stats.

Table of contents

Outliers

Tukey's fences

A common approach for outlier detection is Tukey's fences-method. It determines the interquartile range (IQR) of the data and adds a fraction of it to the third quartile (Q3) or subtracts it from the first quartile (Q1) respectively. An often-used fraction of the IQR is k=1.5 for outliers and k=3 for points 'far out'.

In the generation of box plots the same method determines the whiskers and outliers of a sample.

Reference:

  • Tukey, JW. Exploratory data analysis. Addison-Wesely, 1977
open FSharp.Stats
open FSharp.Collections

let sampleO1 = [|45.;42.;45.5;43.;47.;51.;34.;45.;44.;46.;48.;37.;46.;|]

let outlierBordersO1 = Signal.Outliers.tukey 1.5 sampleO1

let lowerBorderO1 = Intervals.getStart outlierBordersO1
// result: 37.16667

let upperBorderO1 = Intervals.getEnd outlierBordersO1
// result: 51.83333

open Plotly.NET

//some axis styling
let myAxis title = Axis.LinearAxis.init(Title=title,Mirror=StyleParam.Mirror.All,Ticks=StyleParam.TickOptions.Inside,Showgrid=false,Showline=true,Zeroline=true)
let myAxisRange name range = Axis.LinearAxis.init(Title=name,Range=StyleParam.Range.MinMax(range), Mirror=StyleParam.Mirror.All,Ticks=StyleParam.TickOptions.Inside,Showgrid=false,Showline=true)
let styleChart x y chart = chart |> Chart.withX_Axis (myAxis x) |> Chart.withY_Axis (myAxis y)
let styleChartRangeY x y mi ma chart = chart |> Chart.withX_Axis (myAxis x) |> Chart.withY_Axis (myAxisRange y (mi,ma))

let (inside,outside) =
    sampleO1 
    |> Array.partition (fun x -> Intervals.liesInInterval x outlierBordersO1)

let tukeyOutlierChart =
    [
        Chart.Point(inside |> Seq.map (fun x -> 1,x),"sample")
        Chart.Point(outside |> Seq.map (fun x -> 1,x),"outliers")
    ]
    |> Chart.Combine
    |> Chart.withShapes(
        [
            Shape.init(StyleParam.ShapeType.Line,0.5,1.5,lowerBorderO1,lowerBorderO1,Line=Line.init(Dash=StyleParam.DrawingStyle.Dash,Color="grey"))
            Shape.init(StyleParam.ShapeType.Line,0.5,1.5,upperBorderO1,upperBorderO1,Line=Line.init(Dash=StyleParam.DrawingStyle.Dash,Color="grey"))
        ]
        )
    |> styleChartRangeY "" "" 30. 60.
    |> Chart.withTitle "Tukey's fences outlier borders"
   

Filtering

Savitzgy-Golay description is coming soon.

open FSharp.Stats

// Savitzky-Golay low-pass filtering
let t  = [|-4. ..(8.0/500.).. 4.|]
let dy  = t |> Array.map (fun t -> (-t**2.) + (Distributions.Continuous.Normal.Sample 0. 0.5) )
let dy' = t |> Array.map (fun t -> (-t**2.))

let dysg = Signal.Filtering.savitzkyGolay  31 4 0 1 dy

let savitzgyChart =
    [
        Chart.Point(t, dy, Name="data with noise");
        Chart.Point(t, dy', Name="data without noise");
        Chart.Point(t, dysg, Name="data sg");
    ]
    |> Chart.Combine
    |> styleChart "" ""

Padding

If convolution operations should be performed on a signal trace, it often is necessary to extend (pad) the data with artificial data points. There are several padding methods:

  • Zero: Data points with y-value=zero are introduced. This often is useful when analyzing spectra with sparse data because areas without any data measured are assumed to have zero intensity.

  • Random: When the baseline of the measured signal is nonzero like in chromatograms, it is necessary to insert data points with random y-values taken from the original data set.

  • Delete: No datapoints are inserted.

  • Linear interpolation: When a linear relationship is assumed in the range between two adjacent data points, the padding points should lie on the straight line between those points.

Three regions can be defined where padding points could be introduced:

  1. In the beginning and end of the data set artificial data points have to be added to analyse the start- and end-regions of the data. Therefore, random data points are chosen from the original data set.
  2. If the data is not measured in discrete intervals, the region between two adjacent values have to be padded to ensure sufficient coverage for convolution.
  3. If the gap between two adjacent points is too large, another padding method than in case 2. may be chosen.
open FSharp.Stats.Signal

//get raw data
let data = 
    System.IO.File.ReadAllLines(__SOURCE_DIRECTORY__ + "/data/waveletData.txt")
    |> Array.map (fun x -> 
        let tmp = x.Split([|'\t'|])
        float tmp.[0],float tmp.[1])

///interpolate data point y-values when small gaps are present
let innerPadMethod = Padding.InternalPaddingMethod.LinearInterpolation

///take random data point y-values when huge gaps are between data points
let hugeGapPadMethod = Padding.HugeGapPaddingMethod.Random

///padd the start and end of the signal with random data points
let borderPadMethod = Padding.BorderPaddingMethod.Random

///the maximal distance that is allowed between data points is the minimum spacing divided by 2
let minDistance = 
    (Padding.HelperFunctions.getMinimumSpacing data (-)) / 2.

//maximal allowed gap between datapoints where internalPaddingMethod can be applied.
//if internalPaddingMethod = hugeGapPaddingMethod, then it does not matter which value is chosen
let maxSpacing = 10.

//since were dealing with floats the functions are (-) and (+)
let getDiffFu = Padding.HelperFunctions.Float.getDiffFloat      //(-)
let addXValue = Padding.HelperFunctions.Float.addToXValueFloat  //(+)

//number of datapoints the dataset gets expanded to the left and to the rigth
let borderpadding = 1000

//get the paddedDataSet
let paddedData =
    //if a gap is greater than 10. the HugeGapPaddingMethod is applied
    Padding.pad data minDistance maxSpacing getDiffFu addXValue borderpadding borderPadMethod innerPadMethod hugeGapPadMethod

let paddedDataChart=
    let myYAxis() =
            Axis.LinearAxis.init(
                Title="Temperature",Mirror=StyleParam.Mirror.All,Ticks=StyleParam.TickOptions.Inside,Showline=true)
    let myXAxis() =
            Axis.LinearAxis.init(
                Title="Time",Mirror=StyleParam.Mirror.All,Ticks=StyleParam.TickOptions.Inside,Showline=true)
    [
    Chart.Line (paddedData,Name="paddedData")
    Chart.Area (data,Name = "rawData")
    ]
    |> Chart.Combine
    |> Chart.withX_Axis (myXAxis())
    |> Chart.withY_Axis (myYAxis())
    |> Chart.withSize(900.,450.)

Example for a linear interpolation as huge gap padding method

//get the padded data
let paddedDataLinear =
    //if a gap is greater than 10. the LinearInterpolation padding method is applied
    Padding.pad data minDistance maxSpacing getDiffFu addXValue borderpadding borderPadMethod innerPadMethod Padding.HugeGapPaddingMethod.LinearInterpolation

let axis() = Axis.LinearAxis.init(Mirror=StyleParam.Mirror.All,Ticks=StyleParam.TickOptions.Inside,Showline=true)
let axisWithTitle title= Axis.LinearAxis.init(Title=title,Mirror=StyleParam.Mirror.All,Ticks=StyleParam.TickOptions.Inside,Showline=true)

let paddedDataLinearChart=
    [
    Chart.Line (paddedDataLinear,Name="paddedData")
    Chart.Area (data,Name = "rawData")
    ]
    |> Chart.Combine
    |> Chart.withX_Axis (axisWithTitle "Time")
    |> Chart.withY_Axis (axisWithTitle "Temperature")
    |> Chart.withSize(900.,450.)

Wavelet

Continuous Wavelet

The Continuous Wavelet Transform (CWT) is a multiresolution analysis method to gain insights into frequency components of a signal with simultaneous temporal classification. Wavelet in this context stands for small wave and describes a window function which is convoluted with the original signal at every position in time. Many wavelets exist, every one of them is useful for a certain application, thereby 'searching' for specific patterns in the data. By increasing the dimensions (scale) of the wavelet function, different frequency patterns are studied.

In contrast to the Fourier transform, that gives a perfect frequency resolution but no time resolution, the CWT is capable of mediating between the two opposing properties of time resolution and frequency resolution (Heisenberg's uncertainty principle).

For further information please visit The Wavelet Tutorial.

open FSharp.Stats
open StyleParam

///Array containing wavelets of all scales that should be investigated. The propagated frequency corresponds to 4 * Ricker.Scale
let rickerArray = 
    [|2. .. 10.|] |> Array.map (fun x -> Wavelet.createRicker (x**1.8))

///the data already was padded with 1000 additional datapoints in the beginning and end of the data set (see above). 
///Not it is transformed with the previous defined wavelets.
let transformedData = 
    rickerArray
    |> Array.map (fun wavelet -> ContinuousWavelet.transform paddedData (-) 1000 wavelet)

///combining the raw and transformed data in one chart
let combinedChart =
    //CWT-chart
    let heatmap =
        let colNames = 
            transformedData.[0] |> Array.map fst
        transformedData
        |> JaggedArray.map snd
        |> fun x -> Chart.Heatmap(x,ColNames=colNames,Showscale=false)
        |> Chart.withAxisAnchor(X=1)
        |> Chart.withAxisAnchor(Y=1)

    //Rawchart
    let rawChart = 
        Chart.Area (data,Color = "#1f77b4",Name = "rawData")
        |> Chart.withAxisAnchor(X=2)
        |> Chart.withAxisAnchor(Y=2) 

    //combine the charts and add additional styling
    Chart.Combine([heatmap;rawChart])
    |> Chart.withX_AxisStyle("Time",Side=Side.Bottom,Id=2,Showgrid=false)
    |> Chart.withX_AxisStyle("", Side=Side.Top,Showgrid=false, Id=1,Overlaying=AxisAnchorId.X 2)
    |> Chart.withY_AxisStyle("Temperature", MinMax=(-25.,30.), Side=Side.Left,Id=2)
    |> Chart.withY_AxisStyle(
        "Correlation", MinMax=(0.,19.),Showgrid=false, Side=Side.Right,
        Id=1,Overlaying=AxisAnchorId.Y 2)
    |> Chart.withLegend true
    |> Chart.withX_Axis (axis())
    |> Chart.withY_Axis (axis())
    //|> Chart.withSize(900.,700.)