Summary: This tutorial demonstrates how to access a public dataset for temperature data with FSharp.Data, how to smooth the data points with
the Savitzky-Golay filter from FSharp.Stats and finally how to visualize the results with Plotly.NET.
The Savitzky-Golay is a type of low-pass filter, particularly suited for smoothing noisy data. The main idea behind this approach is to make for each point a
least-square fit with a polynomial of high order over a odd-sized window centered at the point. One advantage of the Savitzky-Golay filter is that portions
of high frequencies are not simply cut off, but are preserved due to the polynomial regression. This allows the filter to preserve properties of the distribution
such as relative maxima, minima, and dispersion, which are usually distorted by flattening or shifting by conventional methods such as moving average.
This is useful when trying to identify general trends in highly fluctuating data sets, or to smooth out noise to improve the ability to find minima and maxima of the data trend.
To showcase this we will plot a temperature dataset from the "Deutscher Wetterdienst",
a german organization for climate data. We will do this for both the original data points and a smoothed version.
The image shows the moving window for polynomial regression used in the Savitzky-Golay filter @wikipedia
// Packages hosted by the Fslab community#r"nuget: FSharp.Stats"// third party .net packages #r"nuget: FSharp.Data"#r"nuget: Plotly.NET, 2.0.0-preview.16"#r"nuget: Plotly.NET.Interactive, 2.0.0-preview.12"
We will start by retrieving the data. This is done with the FSharp.Data package
and will return a single string in the original format.
// Get data from Deutscher Wetterdienst// Explanation for Abbreviations: https://www.dwd.de/DE/leistungen/klimadatendeutschland/beschreibung_tagesmonatswerte.htmlletrawData=FSharp.Data.Http.RequestString@"https://raw.githubusercontent.com/fslaborg/datasets/main/data/WeatherDataAachen-Orsbach_daily_1year.txt"// print first 1000 characters to console.rawData.[..1000]|>printfn"%s"
Currently the data set is not in a format, that is easily parsable. Normally you would try to use
the Deedle package to read in the data into a Deedle data frame. As this is not possible here, we will do some ugly formatting.
openSystemopenSystem.Text.RegularExpressions/// Tuple of 4 data arrays representing the measured temperature for over a year.letprocessedData=// First separate the huge string in linesrawData.Split([|'\n'|],StringSplitOptions.RemoveEmptyEntries)// Skip the first 5 rows until the real data starts, also skip the last row (length-2) to remove a "</pre>" at the end|>funarr->arr.[5..arr.Length-2]|>Array.map(fundata->// Regex pattern that will match groups of whitespaceletwhitespacePattern=@"\s+"// This is needed to tell regex to replace hits with a tabulatorletmatchEval=MatchEvaluator(fun_->@"\t")// The original data columns are separated by different amounts of whitespace.// Therefore, we need a flexible string parsing option to replace any amount of whitespace with a single tabulator.// This is done with the regex pattern above and the fsharp core library "System.Text.RegularExpressions" lettabSeparated=Regex.Replace(data,whitespacePattern,matchEval)tabSeparated// Split each row by tabulator will return rows with an equal amount of values, which we can access.|>fundataStr->dataStr.Split([|@"\t"|],StringSplitOptions.RemoveEmptyEntries)|>fundataArr->// Second value is the date of measurement, which we will parse to the DateTime typeDateTime.ParseExact(dataArr.[1],"yyyyMMdd",Globalization.CultureInfo.InvariantCulture),// 5th value is minimal temperature at that date.floatdataArr.[4],// 6th value is average temperature over 24 timepoints at that date.floatdataArr.[5],// 7th value is maximal temperature at that date.floatdataArr.[6])// Sort by date|>Array.sortBy(fun(day,tn,tm,tx)->day)// Unzip the array of value tuples, to make the different values easier accessible|>funarr->arr|>Array.map(fun(day,tn,tm,tx)->day.ToShortDateString()),arr|>Array.map(fun(day,tn,tm,tx)->tm),arr|>Array.map(fun(day,tn,tm,tx)->tx),arr|>Array.map(fun(day,tn,tm,tx)->tn)
Next we create a create chart function with Plotly.NET to produce a visual representation of our data set.
openPlotly.NETopenPlotly.NET.LayoutObjects// Because our data set is already rather wide we want to move the legend from the right side of the plot// to the right center. As this function is not defined for fsharp we will use the underlying js bindings (https://plotly.com/javascript/legend/#positioning-the-legend-inside-the-plot).// Declarative style in F# using underlying DynamicObj// https://plotly.net/#Declarative-style-in-F-using-the-underlyingletlegend=lettmp=Legend()tmp?yanchor<-"top"tmp?y<-0.99tmp?xanchor<-"left"tmp?x<-0.5tmp/// This function will take 'processedData' as input and return a range chart with a line for the average temperature/// and a different colored area for the range between minimal and maximal temperature at that date.letcreateTempChart(days,tm,tmUpper,tmLower)=Chart.Range(// data arraysdays,tm,tmUpper,tmLower,StyleParam.Mode.Lines_Markers,Color=Color.fromString"#3D1244",RangeColor=Color.fromString"#F99BDE",// Name for line in legendName="Average temperature over 24 timepoints each day",// Name for lower point when hovering over chartLowerName="Min temp",// Name for upper point when hovering over chartUpperName="Max temp")// Configure the chart with the legend from above|>Chart.withLegendlegend// Add name to y axis|>Chart.withYAxisStyle("daily temperature [°C]")|>Chart.withSize(1000.,600.)/// Chart for original data set letrawChart=processedData|>createTempChart
As you can see the data looks chaotic and is difficult to analyze. Trends are hidden in daily
temperature fluctuations and correlating events with temperature can get difficult. So next we want to
smooth the data to clearly see temperature trends.
val printfn : format:Printf.TextWriterFormat<'T> -> 'T <summary>Print to <c>stdout</c> using the given format, and add a newline.</summary> <param name="format">The formatter.</param> <returns>The formatted result.</returns>
namespace System
namespace System.Text
namespace System.Text.RegularExpressions
val processedData : string [] * float [] * float [] * float [] Tuple of 4 data arrays representing the measured temperature for over a year.
type StringSplitOptions =
| None = 0
| RemoveEmptyEntries = 1
| TrimEntries = 2 <summary>Specifies whether applicable <see cref="Overload:System.String.Split" /> method overloads include or omit empty substrings from the return value.</summary>
field StringSplitOptions.RemoveEmptyEntries: StringSplitOptions = 1 <summary>The return value does not include array elements that contain an empty string.</summary>
val arr : string []
property Array.Length: int with get <summary>Gets the total number of elements in all the dimensions of the <see cref="T:System.Array" />.</summary> <exception cref="T:System.OverflowException">The array is multidimensional and contains more than <see cref="F:System.Int32.MaxValue" /> elements.</exception> <returns>The total number of elements in all the dimensions of the <see cref="T:System.Array" />; zero if there are no elements in the array.</returns>
type Array =
interface ICollection
interface IEnumerable
interface IList
interface IStructuralComparable
interface IStructuralEquatable
interface ICloneable
new : unit -> unit
member Clone : unit -> obj
member CopyTo : array: Array * index: int -> unit + 1 overload
member GetEnumerator : unit -> IEnumerator
... <summary>Provides methods for creating, manipulating, searching, and sorting arrays, thereby serving as the base class for all arrays in the common language runtime.</summary>
val map : mapping:('T -> 'U) -> array:'T [] -> 'U [] <summary>Builds a new array whose elements are the results of applying the given function
to each of the elements of the array.</summary> <param name="mapping">The function to transform elements of the array.</param> <param name="array">The input array.</param> <returns>The array of transformed elements.</returns> <exception cref="T:System.ArgumentNullException">Thrown when the input array is null.</exception>
val data : string
val whitespacePattern : string
val matchEval : MatchEvaluator
type MatchEvaluator =
new : object: obj * method: nativeint -> unit
member BeginInvoke : match: Match * callback: AsyncCallback * object: obj -> IAsyncResult
member EndInvoke : result: IAsyncResult -> string
member Invoke : match: Match -> string <summary>Represents the method that is called each time a regular expression match is found during a <see cref="Overload:System.Text.RegularExpressions.Regex.Replace" /> method operation.</summary> <param name="match">The <see cref="T:System.Text.RegularExpressions.Match" /> object that represents a single regular expression match during a <see cref="Overload:System.Text.RegularExpressions.Regex.Replace" /> method operation.</param> <returns>A string returned by the method that is represented by the <see cref="T:System.Text.RegularExpressions.MatchEvaluator" /> delegate.</returns>
val tabSeparated : string
Multiple items type Regex =
interface ISerializable
new : unit -> unit + 4 overloads
member GetGroupNames : unit -> string []
member GetGroupNumbers : unit -> int []
member GroupNameFromNumber : i: int -> string
member GroupNumberFromName : name: string -> int
member InitializeReferences : unit -> unit
member IsMatch : input: string -> bool + 4 overloads
member Match : input: string -> Match + 5 overloads
member Matches : input: string -> MatchCollection + 4 overloads
... <summary>Represents an immutable regular expression.</summary>
Multiple items [<Struct>]
type DateTime =
new : year: int * month: int * day: int -> unit + 10 overloads
member Add : value: TimeSpan -> DateTime
member AddDays : value: float -> DateTime
member AddHours : value: float -> DateTime
member AddMilliseconds : value: float -> DateTime
member AddMinutes : value: float -> DateTime
member AddMonths : months: int -> DateTime
member AddSeconds : value: float -> DateTime
member AddTicks : value: int64 -> DateTime
member AddYears : value: int -> DateTime
... <summary>Represents an instant in time, typically expressed as a date and time of day.</summary>
Multiple items type CultureInfo =
interface ICloneable
interface IFormatProvider
new : culture: int -> unit + 3 overloads
member ClearCachedData : unit -> unit
member Clone : unit -> obj
member Equals : value: obj -> bool
member GetConsoleFallbackUICulture : unit -> CultureInfo
member GetFormat : formatType: Type -> obj
member GetHashCode : unit -> int
member ToString : unit -> string
... <summary>Provides information about a specific culture (called a locale for unmanaged code development). The information includes the names for the culture, the writing system, the calendar used, the sort order of strings, and formatting for dates and numbers.</summary>
property Globalization.CultureInfo.InvariantCulture: Globalization.CultureInfo with get <summary>Gets the <see cref="T:System.Globalization.CultureInfo" /> object that is culture-independent (invariant).</summary> <returns>The object that is culture-independent (invariant).</returns>
Multiple items val float : value:'T -> float (requires member op_Explicit) <summary>Converts the argument to 64-bit float. This is a direct conversion for all
primitive numeric types. For strings, the input is converted using <c>Double.Parse()</c>
with InvariantCulture settings. Otherwise the operation requires an appropriate
static conversion method on the input type.</summary> <param name="value">The input value.</param> <returns>The converted float</returns>
-------------------- [<Struct>]
type float = Double <summary>An abbreviation for the CLI type <see cref="T:System.Double" />.</summary> <category>Basic Types</category>
-------------------- type float<'Measure> =
float <summary>The type of double-precision floating point numbers, annotated with a unit of measure.
The unit of measure is erased in compiled code and when values of this type
are analyzed using reflection. The type is representationally equivalent to
<see cref="T:System.Double" />.</summary> <category index="6">Basic Types with Units of Measure</category>
val sortBy : projection:('T -> 'Key) -> array:'T [] -> 'T [] (requires comparison) <summary>Sorts the elements of an array, using the given projection for the keys and returning a new array.
Elements are compared using <see cref="M:Microsoft.FSharp.Core.Operators.compare" />.</summary> <remarks>This is not a stable sort, i.e. the original order of equal elements is not necessarily preserved.
For a stable sort, consider using <see cref="M:Microsoft.FSharp.Collections.SeqModule.Sort" />.</remarks> <param name="projection">The function to transform array elements into the type that is compared.</param> <param name="array">The input array.</param> <returns>The sorted array.</returns> <exception cref="T:System.ArgumentNullException">Thrown when the input array is null.</exception>
val createTempChart : days:seq<#IConvertible> * tm:seq<#IConvertible> * tmUpper:seq<#IConvertible> * tmLower:seq<#IConvertible> -> GenericChart.GenericChart This function will take 'processedData' as input and return a range chart with a line for the average temperature and a different colored area for the range between minimal and maximal temperature at that date.
type Mode =
| None
| Lines
| Lines_Markers
| Lines_Text
| Lines_Markers_Text
| Markers
| Markers_Text
| Text
member Convert : unit -> obj
override ToString : unit -> string
static member convert : (Mode -> obj)
static member toString : (Mode -> string)
union case StyleParam.Mode.Lines_Markers: StyleParam.Mode
Multiple items module Color
from Plotly.NET
-------------------- type Color =
private new : obj:obj -> Color
override Equals : other:obj -> bool
override GetHashCode : unit -> int
static member fromARGB : a:int -> r:int -> g:int -> b:int -> Color
static member fromColorScaleValues : c:seq<#IConvertible> -> Color
static member fromColors : c:seq<Color> -> Color
static member fromHex : s:string -> Color
static member fromKeyword : c:ColorKeyword -> Color
static member fromRGB : r:int -> g:int -> b:int -> Color
static member fromString : c:string -> Color
... <summary>
Plotly color can be a single color, a sequence of colors, or a sequence of numeric values referencing the color of the colorscale obj
</summary>
static member Color.fromString : c:string -> Color
static member Chart.withLegend : showlegend:bool -> (GenericChart.GenericChart -> GenericChart.GenericChart) static member Chart.withLegend : legend:Legend -> (GenericChart.GenericChart -> GenericChart.GenericChart)
val rawChart : GenericChart.GenericChart Chart for original data set
module GenericChart
from Plotly.NET <summary>
Module to represent a GenericChart
</summary>
val toChartHTML : gChart:GenericChart.GenericChart -> string <summary>
Converts a GenericChart to it HTML representation. The div layer has a default size of 600 if not specified otherwise.
</summary>
val savitzkyGolay : windowSize:int -> order:int -> deriv:int -> rate:int -> data:float [] -> float [] <summary>
Smooth (and optionally differentiate) data with a Savitzky-Golay filter.
The Savitzky-Golay filter is a type of low-pass filter and removes high frequency noise from data.
</summary>