Deedle.Math — MathNet.Numerics integration
Deedle.Math is a separate NuGet package that extends Deedle with linear algebra,
advanced statistics, PCA, and financial time-series functions via the
MathNet.Numerics library.
Installation
dotnet add package Deedle.Math
Then reference in F# script or notebook:
|
Frame and Series ↔ Matrix conversions
Deedle.Math adds a Frame type alias and a Series type alias in the Deedle.Math
namespace that provide toMatrix / ofMatrix and toVector / ofVector helpers.
Opening Deedle.Math after Deedle makes these available without qualifying.
// Build a simple 3×3 frame
let df =
frame [ "A" => series [ 1 => 1.0; 2 => 4.0; 3 => 7.0 ]
"B" => series [ 1 => 2.0; 2 => 5.0; 3 => 8.0 ]
"C" => series [ 1 => 3.0; 2 => 6.0; 3 => 9.0 ] ]
// Convert frame to a MathNet DenseMatrix
let m : Matrix<float> = Frame.toMatrix df
m
|
// Convert matrix back to a frame with named rows and columns
Frame.ofMatrix [1;2;3] ["A";"B";"C"] m
|
Series ↔ Vector works the same way:
let s = series [ "x" => 1.0; "y" => 2.0; "z" => 3.0 ]
let v : Vector<float> = Series.toVector s
v
|
Series.ofVector ["x";"y";"z"] v
|
There is also a Matrix type with explicit ofFrame / toFrame helpers and dot-product
overloads that let you multiply frames, series, and vectors directly:
// Frame × Frame matrix multiply (column keys of left must equal row keys of right)
Matrix.dot df df
// Frame × Vector
Matrix.dot df v
// Series (as row vector) × Frame
Matrix.dot s df
Linear algebra on frames
LinearAlgebra provides matrix operations that accept and return Frame<'R,'C> values
directly. All operations convert to/from Matrix<float> internally.
// Transpose (faster than generic Frame.transpose for numeric frames)
LinearAlgebra.transpose df
|
// Matrix inverse
let sq = frame [ "A" => series [1=>4.0;2=>7.0]; "B" => series [1=>3.0;2=>6.0] ]
LinearAlgebra.inverse sq
|
Other available operations:
Function |
Description |
|---|---|
|
Moore–Penrose pseudo-inverse |
|
Scalar determinant |
|
Scalar trace |
|
Integer rank |
|
Frobenius norm (float) |
|
Vector of row norms |
|
Vector of column norms |
|
Condition number |
|
Nullity |
|
Kernel (null space) |
|
Boolean symmetry test |
|
Cholesky decomposition |
|
LU decomposition |
|
QR decomposition |
|
SVD decomposition |
|
Eigenvalues and eigenvectors |
Descriptive statistics
Deedle.Math.Stats extends the base Deedle.Stats with richer descriptive statistics
from MathNet.Numerics.
let air = Frame.ReadCsv(root + "airquality.csv", separators=";")
let ozone = air?Ozone |> Series.dropMissing
// Median (uses MathNet's exact median algorithm)
Stats.median ozone
|
// 25th and 75th percentile
Stats.quantile(ozone, 0.25), Stats.quantile(ozone, 0.75)
|
// Ranks (average rank for ties by default)
ozone |> Stats.ranks |> Series.take 6
|
All three functions also work on entire frames:
// Median of each numeric column
Stats.median air
// 90th percentile of each numeric column
Stats.quantile(air, 0.90)
Correlation and covariance
// Use a small subset of air quality numeric columns
let numAir = air |> Frame.sliceCols ["Ozone";"Solar.R";"Wind";"Temp"] |> Frame.dropSparseRows
// Pearson correlation matrix (default)
Stats.corr numAir
|
// Spearman rank correlation
Stats.corr(numAir, CorrelationMethod.Spearman)
|
// Covariance frame
Stats.cov numAir
|
To correlate two individual series:
Stats.corr(air?Ozone, air?Temp)
Converting between correlation and covariance
Stats.cov2Corr decomposes a covariance matrix into a standard-deviation series and a
correlation frame; Stats.corr2Cov inverts that operation:
let stdDevs, corrFrame = Stats.cov2Corr (Stats.cov numAir)
let recoveredCov = Stats.corr2Cov(stdDevs, corrFrame)
Exponentially weighted moving statistics
The Stats and Finance types provide a full suite of exponentially weighted moving
(EWM) statistics. The decay rate can be specified via one of four mutually exclusive
parameters:
Parameter |
Meaning |
|---|---|
|
Center of mass — α = 1 / (1 + com), com ≥ 0 |
|
Span — α = 2 / (span + 1), span ≥ 1 |
|
Half-life — α = 1 − exp(ln(0.5)/halfLife), halfLife > 0 |
|
Direct smoothing factor — 0 < α ≤ 1 |
// Sample daily returns series
let returns =
series [ for i in 1..20 -> i => Math.Sin(float i * 0.3) * 0.02 ]
// EWM mean with span=5
Stats.ewmMean(returns, span=5.0)
|
// EWM mean on a whole frame (applied column by column)
Stats.ewmMean(numAir, span=10.0)
|
Moving statistics on frames
Stats.movingStdDevParallel, Stats.movingVarianceParallel, and
Stats.movingCovarianceParallel compute rolling window standard deviation, variance,
and covariance matrices over a frame using parallel evaluation:
// Rolling 10-day standard deviation of each column
let rollingStd = Stats.movingStdDevParallel 10 numAir
// Rolling 10-day covariance matrix (returns Series<rowKey, Matrix<float>>)
let rollingCov = Stats.movingCovarianceParallel 10 numAir
Financial time-series: EWM volatility and covariance
Finance (in Deedle.Math) provides exponentially weighted volatility and covariance
functions that are common in quantitative finance.
let prices =
series [ for i in 1..30 -> i => 100.0 * Math.Exp(Math.Sin(float i * 0.2) * 0.1) ]
let dailyReturns = prices.Diff(1) / prices.Shift(1)
// Mean-corrected EWM volatility (standard deviation form) with half-life of 10 days
Finance.ewmVolStdDev(dailyReturns, halfLife=10.0)
|
Finance.ewmVolRMS computes the same quantity using root-mean-square (no mean correction),
which is appropriate for already-centred return series:
Finance.ewmVolRMS(dailyReturns, span=20.0)
Note: the older Finance.ewmVol is deprecated. Use ewmVolStdDev or ewmVolRMS
depending on whether you want mean correction.
EWM variance
// Scalar EWM variance per time step
Finance.ewmVar(dailyReturns, com=5.0)
EWM covariance and correlation on frames
// Returns Series<rowKey, Frame<colKey,colKey>> — one covariance frame per row
let ewmCovFrames = Finance.ewmCov(numAir, span=20.0)
// Returns Series<rowKey, Frame<colKey,colKey>> — one correlation frame per row
let ewmCorrFrames = Finance.ewmCorr(numAir, span=20.0)
Principal Component Analysis (PCA)
The PCA module provides a simple API for principal component analysis. It normalises
the columns by z-score internally and returns a record containing the eigen values and
eigen vectors in descending order of explained variance.
// Use the numeric air quality columns
let normed = PCA.normalizeColumns numAir
let result = PCA.pca numAir
// Eigen values (proportion of variance explained by each PC)
result.EigenValues
|
// Eigen vectors (loadings): rows = original variables, columns = PC1, PC2, …
result.EigenVectors
|
Access the fields via the helper functions PCA.eigenValues and PCA.eigenVectors:
let ev = PCA.eigenValues result // Series<string, float>
let vecs = PCA.eigenVectors result // Frame<colKey, string>
Linear regression
LinearRegression.ols fits an ordinary-least-squares model from columns in a frame:
open Deedle.Math
// Fit: Ozone ~ Solar.R + Wind + Temp (with intercept)
let fit = LinearRegression.ols ["Solar.R"; "Wind"; "Temp"] "Ozone" true numAir
The returned Fit.t record provides:
let fit = LinearRegression.ols ["Solar.R"; "Wind"; "Temp"] "Ozone" true numAir
// Regression coefficients (Intercept, Solar.R, Wind, Temp)
LinearRegression.Fit.coefficients fit
|
// Fitted values (ŷ)
LinearRegression.Fit.fittedValues fit |> Series.take 6
|
// Residuals (y − ŷ)
LinearRegression.Fit.residuals fit |> Series.take 6
|
For a full summary including the t-table and R²:
let summary = LinearRegression.Fit.summary fit
printfn "%O" summary
// Formula: Ozone ~ Solar.R + Wind + Temp
// Min: 1Q: Median: 3Q Max:
// ...
// R^2: 0.606, Adj. R^2: 0.596
To fit without an intercept pass false as the third argument to ols.
Tips and common patterns
Working with the full pipeline
A typical quantitative pipeline combines Deedle frame operations with Deedle.Math:
open Deedle
open Deedle.Math
// 1. Load data
let prices = Frame.ReadCsv("prices.csv") |> Frame.indexRowsDate "Date"
// 2. Compute daily log-returns
let logReturns = log prices - log (Frame.shift 1 prices) |> Frame.dropSparseRows
// 3. Rolling 60-day correlation matrix (one frame per row)
let rollingCorr = Finance.ewmCorr(logReturns, span=60.0)
// 4. Latest correlation frame
let latestCorr = rollingCorr |> Series.lastValue
Missing values
LinearRegression.ols will raise an error if any input column has missing values.
Use Frame.dropSparseRows or Frame.fillMissingWith to clean data first:
let cleanDf = numAir |> Frame.dropSparseRows
let fit = LinearRegression.ols ["Solar.R";"Wind";"Temp"] "Ozone" true cleanDf
Similarly, Stats.corrMatrix / Stats.corr treat NaN as 0 in the covariance step
(matching MATLAB semantics). Use Frame.dropSparseRows if you want listwise deletion.
Performance note
LinearAlgebra.transpose on a purely numeric frame is significantly faster than
Frame.transpose because it bypasses the generic object boxing layer and works
directly in float[] space.
<summary> A function for constructing data frame from a sequence of name - column pairs. This provides a nicer syntactic sugar for `Frame.ofColumns`. </summary>
<example> To create a simple frame with two columns, you can write: <code> frame [ "A" => series [ 1 => 30.0; 2 => 35.0 ] "B" => series [ 1 => 30.0; 3 => 40.0 ] ] </code></example>
<category>Frame construction</category>
<summary> Create a series from a sequence of key-value pairs that represent the observations of the series. This function can be used together with the `=>` operator to create key-value pairs. </summary>
<example> // Creates a series with squares of numbers let sqs = series [ 1 => 1.0; 2 => 4.0; 3 => 9.0 ] </example>
type Matrix = static member dot: df: Frame<'R,'C> * m2: Matrix<float> -> Frame<'R,'R> (requires equality and equality) + 11 overloads static member ofFrame: df: Frame<'a,'b> -> Matrix<float> (requires equality and equality) static member toFrame: rows: 'R seq -> cols: 'C seq -> m: Matrix<float> -> Frame<'R,'C> (requires equality and equality)
<summary> Matrix conversions and operators between Frame and Series <category>Matrix conversions and operators</category> </summary>
--------------------
type Matrix<'T (requires default constructor and value type and 'T :> IEquatable<'T> and 'T :> IFormattable and 'T :> ValueType)> = interface IFormattable interface IEquatable<Matrix<'T>> interface ICloneable member Add: scalar: 'T -> Matrix<'T> + 3 overloads member Append: right: Matrix<'T> -> Matrix<'T> + 1 overload member AsArray: unit -> 'T array2d member AsColumnArrays: unit -> 'T array array member AsColumnMajorArray: unit -> 'T array member AsRowArrays: unit -> 'T array array member AsRowMajorArray: unit -> 'T array ...
<summary> Defines the base class for <c>Matrix</c> classes. </summary>
<summary> Defines the base class for <c>Matrix</c> classes. </summary>
<typeparam name="T">Supported data types are <c>double</c>, <c>single</c>, <see cref="N:MathNet.Numerics.LinearAlgebra.Complex" />, and <see cref="N:MathNet.Numerics.LinearAlgebra.Complex32" />.</typeparam>
<summary> Defines the base class for <c>Matrix</c> classes. </summary>
<summary> Defines the base class for <c>Matrix</c> classes. </summary>
val float: value: 'T -> float (requires member op_Explicit)
--------------------
type float = Double
--------------------
type float<'Measure> = float
module Frame from Deedle
<summary> The `Frame` module provides an F#-friendly API for working with data frames. The module follows the usual desing for collection-processing in F#, so the functions work well with the pipelining operator (`|>`). For example, given a frame with two columns representing prices, we can use `Frame.pctChange` to calculate daily returns like this: let df = frame [ "MSFT" => prices1; "AAPL" => prices2 ] let rets = df |> Frame.pctChange 1 rets |> Stats.mean Note that the `Stats.mean` operation is overloaded and works both on series (returning a number) and on frames (returning a series). You can also use `Frame.diff` if you need absolute differences rather than relative changes. The functions in this module are designed to be used from F#. For a C#-friendly API, see the `FrameExtensions` type. For working with individual series, see the `Series` module. The functions in the `Frame` module are grouped in a number of categories and documented below. Accessing frame data and lookup ------------------------------- Functions in this category provide access to the values in the fame. You can also add and remove columns from a frame (which both return a new value). - `addCol`, `replaceCol` and `dropCol` can be used to create a new data frame with a new column, by replacing an existing column with a new one, or by dropping an existing column - `cols` and `rows` return the columns or rows of a frame as a series containing objects; `getCols` and `getRows` return a generic series and cast the values to the type inferred from the context (columns or rows of incompatible types are skipped); `getNumericCols` returns columns of a type convertible to `float` for convenience. - You can get a specific row or column using `get[Col|Row]` or `lookup[Col|Row]` functions. The `lookup` variant lets you specify lookup behavior for key matching (e.g. find the nearest smaller key than the specified value). There are also `[try]get` and `[try]Lookup` functions that return optional values and functions returning entire observations (key together with the series). - `sliceCols` and `sliceRows` return a sub-frame containing only the specified columns or rows. Finally, `toArray2D` returns the frame data as a 2D array. Grouping, windowing and chunking -------------------------------- The basic grouping functions in this category can be used to group the rows of a data frame by a specified projection or column to create a frame with hierarchical index such as <c>Frame<'K1 * 'K2, 'C></c>. The functions always aggregate rows, so if you want to group columns, you need to use `Frame.transpose` first. The function `groupRowsBy` groups rows by the value of a specified column. Use `groupRowsBy[Int|Float|String...]` if you want to specify the type of the column in an easier way than using type inference; `groupRowsUsing` groups rows using the specified _projection function_ and `groupRowsByIndex` projects the grouping key just from the row index. More advanced functions include: `aggregateRowsBy` which groups the rows by a specified sequence of columns and aggregates each group into a single value; `pivotTable` implements the pivoting operation [as documented in the tutorials](../frame.html#pivot). The `melt` and `unmelt` functions turn the data frame into a single data frame containing columns `Row`, `Column` and `Value` containing the data of the original frame; `unmelt` can be used to turn this representation back into an original frame. The `stack` and `unstack` functions implement pandas-style reshape operations. `stack` converts `Frame<'R,'C>` to a long-format `Frame<'R*'C, string>` where each cell becomes a row keyed by `(rowKey, colKey)` with a single `"Value"` column. `unstack` promotes the inner row-key level to column keys, producing `Frame<'R1, 'C*'R2>` from `Frame<'R1*'R2,'C>`. A simple windowing functions that are exposed for an entire frame operations are `window` and `windowInto`. For more complex windowing operations, you currently have to use `mapRows` or `mapCols` and apply windowing on individual series. Sorting and index manipulation ------------------------------ A frame is indexed by row keys and column keys. Both of these indices can be sorted (by the keys). A frame that is sorted allows a number of additional operations (such as lookup using the `Lookp.ExactOrSmaller` lookup behavior). The functions in this category provide ways for manipulating the indices. It is expected that most operations are done on rows and so more functions are available in a row-wise way. A frame can alwyas be transposed using `Frame.transpose`. Index operations: The existing row/column keys can be replaced by a sequence of new keys using the `indexColsWith` and `indexRowsWith` functions. Row keys can also be replaced by ordinal numbers using `indexRowsOrdinally`. The function `indexRows` uses the specified column of the original frame as the index. It removes the column from the resulting frame (to avoid this, use overloaded `IndexRows` method). This function infers the type of row keys from the context, so it is usually more convenient to use `indexRows[Date|String|Int|...]` functions. Finally, if you want to calculate the index value based on multiple columns of the row, you can use `indexRowsUsing`. Sorting frame rows: Frame rows can be sorted according to the value of a specified column using the `sortRows` function; `sortRowsBy` takes a projection function which lets you transform the value of a column (e.g. to project a part of the value). The functions `sortRowsByKey` and `sortColsByKey` sort the rows or columns using the default ordering on the key values. The result is a frame with ordered index. Expanding columns: When the frame contains a series with complex .NET objects such as F# records or C# classes, it can be useful to "expand" the column. This operation looks at the type of the objects, gets all properties of the objects (recursively) and generates multiple series representing the properties as columns. The function `expandCols` expands the specified columns while `expandAllCols` applies the expansion to all columns of the data frame. Frame transformations --------------------- Functions in this category perform standard transformations on data frames including projections, filtering, taking some sub-frame of the frame, aggregating values using scanning and so on. Projection and filtering functions such as `[map|filter][Cols|Rows]` call the specified function with the column or row key and an <c>ObjectSeries<'K></c> representing the column or row. You can use functions ending with `Values` (such as `mapRowValues`) when you do not require the row key, but only the row series; `mapRowKeys` and `mapColKeys` can be used to transform the keys. You can use `reduceValues` to apply a custom reduction to values of columns. Other aggregations are available in the `Stats` module. You can also get a row with the greaterst or smallest value of a given column using `[min|max]RowBy`. The functions `take[Last]` and `skip[Last]` can be used to take a sub-frame of the original source frame by skipping a specified number of rows. Note that this does not require an ordered frame and it ignores the index - for index-based lookup use slicing, such as `df.Rows.[lo .. hi]`, instead. Finally the `shift` function can be used to obtain a frame with values shifted by the specified offset. This can be used e.g. to get previous value for each key using `Frame.shift 1 df`. The `diff` function calculates difference from previous value using `df - (Frame.shift offs df)`. Processing frames with exceptions --------------------------------- The functions in this group can be used to write computations over frames that may fail. They use the type <c>tryval<'T></c> which is defined as a discriminated union with two cases: Success containing a value, or Error containing an exception. Using <c>tryval<'T></c> as a value in a data frame is not generally recommended, because the type of values cannot be tracked in the type. For this reason, it is better to use <c>tryval<'T></c> with individual series. However, `tryValues` and `fillErrorsWith` functions can be used to get values, or fill failed values inside an entire data frame. The `tryMapRows` function is more useful. It can be used to write a transformation that applies a computation (which may fail) to each row of a data frame. The resulting series is of type <c>Series<'R, tryval<'T>></c> and can be processed using the <c>Series</c> module functions. Missing values -------------- This group of functions provides a way of working with missing values in a data frame. The category provides the following functions that can be used to fill missing values: * `fillMissingWith` fills missing values with a specified constant * `fillMissingUsing` calls a specified function for every missing value * `fillMissing` and variants propagates values from previous/later keys We use the terms _sparse_ and _dense_ to denote series that contain some missing values or do not contain any missing values, respectively. The functions `denseCols` and `denseRows` return a series that contains only dense columns or rows and all sparse rows or columns are replaced with a missing value. The `dropSparseCols` and `dropSparseRows` functions drop these missing values and return a frame with no missing values. Joining, merging and zipping ---------------------------- The simplest way to join two frames is to use the `join` operation which can be used to perform left, right, outer or inner join of two frames. When the row keys of the frames do not match exactly, you can use `joinAlign` which takes an additional parameter that specifies how to find matching key in left/right join (e.g. by taking the nearest smaller available key). Frames that do not contian overlapping values can be combined using `merge` (when combining just two frames) or using `mergeAll` (for larger number of frames). Tha latter is optimized to work well for a large number of data frames. Finally, frames with overlapping values can be combined using `zip`. It takes a function that is used to combine the overlapping values. A `zipAlign` function provides a variant with more flexible row key matching (as in `joinAlign`) Hierarchical index operations ----------------------------- A data frame has a hierarchical row index if the row index is formed by a tuple, such as <c>Frame<'R1 * 'R2, 'C></c>. Frames of this kind are returned, for example, by the grouping functions such as <c>Frame.groupRowsBy</c>. The functions in this category provide ways for working with data frames that have hierarchical row keys. The functions <c>applyLevel</c> and <c>reduceLevel</c> can be used to reduce values according to one of the levels. The <c>applyLevel</c> function takes a reduction of type <c>Series<'K, 'T> -> 'T</c> while <c>reduceLevel</c> reduces individual values using a function of type <c>'T -> 'T -> 'T</c>. The functions <c>nest</c> and <c>unnest</c> can be used to convert between frames with hierarchical indices (<c>Frame<'K1 * 'K2, 'C></c>) and series of frames that represent individual groups (<c>Series<'K1, Frame<'K2, 'C>></c>). The <c>nestBy</c> function can be used to perform group by operation and return the result as a series of frems. </summary>
<category>Frame and series operations</category>
--------------------
type Frame = static member ofMatrix: rowKeys: 'R seq -> colKeys: 'C seq -> m: Matrix<'T> -> Frame<'R,'C> (requires equality and equality and default constructor and value type and 'T :> IEquatable<'T> and 'T :> IFormattable and 'T :> ValueType) static member toMatrix: df: Frame<'R,'C> -> Matrix<float> (requires equality and equality)
<summary> Frame to matrix conversion <category>Matrix conversions and operators</category> </summary>
--------------------
type Frame<'TRowKey,'TColumnKey (requires equality and equality)> = interface IDynamicMetaObjectProvider interface INotifyCollectionChanged interface IFrameFormattable interface IFsiFormattable interface IFrame new: rowIndex: IIndex<'TRowKey> * columnIndex: IIndex<'TColumnKey> * data: IVector<IVector> * indexBuilder: IIndexBuilder * vectorBuilder: IVectorBuilder -> Frame<'TRowKey,'TColumnKey> + 1 overload member AddColumn: column: 'TColumnKey * series: 'V seq -> unit + 3 overloads member AggregateRowsBy: groupBy: 'TColumnKey seq * aggBy: 'TColumnKey seq * aggFunc: Func<Series<'TRowKey,'a>,'b> -> Frame<int,'TColumnKey> member Clone: unit -> Frame<'TRowKey,'TColumnKey> member ColumnApply: f: Func<Series<'TRowKey,'T>,ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey> + 1 overload ...
<summary> A frame is the key Deedle data structure (together with series). It represents a data table (think spreadsheet or CSV file) with multiple rows and columns. The frame consists of row index, column index and data. The indices are used for efficient lookup when accessing data by the row key `'TRowKey` or by the column key `'TColumnKey`. Deedle frames are optimized for the scenario when all values in a given column are of the same type (but types of different columns can differ). </summary>
<remarks><para>Joining, zipping and appending:</para><para> More info </para></remarks>
<category>Core frame and series types</category>
--------------------
new: names: 'TColumnKey seq * columns: ISeries<'TRowKey> seq -> Frame<'TRowKey,'TColumnKey>
new: rowIndex: Indices.IIndex<'TRowKey> * columnIndex: Indices.IIndex<'TColumnKey> * data: IVector<IVector> * indexBuilder: Indices.IIndexBuilder * vectorBuilder: Vectors.IVectorBuilder -> Frame<'TRowKey,'TColumnKey>
type Vector = static member ofOptionalValues: data: 'T option seq -> IVector<'T> + 2 overloads static member ofValues: data: 'T array -> IVector<'T> + 1 overload
<summary> Type that provides a simple access to creating vectors represented using the built-in `ArrayVector` type that stores the data in a continuous block of memory. </summary>
--------------------
type Vector<'T (requires default constructor and value type and 'T :> IEquatable<'T> and 'T :> IFormattable and 'T :> ValueType)> = interface IFormattable interface IEquatable<Vector<'T>> interface IList interface ICollection interface IEnumerable interface IList<'T> interface ICollection<'T> interface IEnumerable<'T> interface ICloneable override AbsoluteMaximum: unit -> 'T ...
<summary> Defines the generic class for <c>Vector</c> classes. </summary>
<typeparam name="T">Supported data types are double, single, <see cref="N:MathNet.Numerics.LinearAlgebra.Complex" />, and <see cref="N:MathNet.Numerics.LinearAlgebra.Complex32" />.</typeparam>
module Series from Deedle
<summary> The `Series` module provides an F#-friendly API for working with data and time series. The API follows the usual design for collection-processing in F#, so the functions work well with the pipelining (<c>|></c>) operator. For example, given a series with ages, we can use `Series.filterValues` to filter outliers and then `Stats.mean` to calculate the mean: ages |> Series.filterValues (fun v -> v > 0.0 && v < 120.0) |> Stats.mean The module provides comprehensive set of functions for working with series. The same API is also exposed using C#-friendly extension methods. In C#, the above snippet could be written as: [lang=csharp] ages .Where(kvp => kvp.Value > 0.0 && kvp.Value < 120.0) .Mean() For more information about similar frame-manipulation functions, see the `Frame` module. For more information about C#-friendly extensions, see `SeriesExtensions`. The functions in the `Series` module are grouped in a number of categories and documented below. Accessing series data and lookup -------------------------------- Functions in this category provide access to the values in the series. - The term _observation_ is used for a key value pair in the series. - When working with a sorted series, it is possible to perform lookup using keys that are not present in the series - you can specify to search for the previous or next available value using _lookup behavior_. - Functions such as `get` and `getAll` have their counterparts `lookup` and `lookupAll` that let you specify lookup behavior. - For most of the functions that may fail, there is a `try[Foo]` variant that returns `None` instead of failing. - Functions with a name ending with `At` perform lookup based on the absolute integer offset (and ignore the keys of the series) Series transformations ---------------------- Functions in this category perform standard transformations on series including projections, filtering, taking some sub-series of the series, aggregating values using scanning and so on. Projection and filtering functions generally skip over missing values, but there are variants `filterAll` and `mapAll` that let you handle missing values explicitly. Keys can be transformed using `mapKeys`. When you do not need to consider the keys, and only care about values, use `filterValues` and `mapValues` (which is also aliased as the `$` operator). Series supports standard set of folding functions including `reduce` and `fold` (to reduce series values into a single value) as well as the `scan[All]` function, which can be used to fold values of a series into a series of intermeidate folding results. The functions `take[Last]` and `skip[Last]` can be used to take a sub-series of the original source series by skipping a specified number of elements. Note that this does not require an ordered series and it ignores the index - for index-based lookup use slicing, such as `series.[lo .. hi]`, instead. Finally the `shift` function can be used to obtain a series with values shifted by the specified offset. This can be used e.g. to get previous value for each key using `Series.shift 1 ts`. The `diff` function calculates difference from previous value using `ts - (Series.shift offs ts)`. Processing series with exceptions --------------------------------- The functions in this group can be used to write computations over series that may fail. They use the type <c>tryval<'T></c> which is defined as a discriminated union with two cases: Success containing a value, or Error containing an exception. The function `tryMap` lets you create <c>Series<'K, tryval<'T>></c> by mapping over values of an original series. You can then extract values using `tryValues`, which throws `AggregateException` if there were any errors. Functions `tryErrors` and `trySuccesses` give series containing only errors and successes. You can fill failed values with a constant using `fillErrorsWith`. Hierarchical index operations ----------------------------- When the key of a series is tuple, the elements of the tuple can be treated as multiple levels of a index. For example <c>Series<'K1 * 'K2, 'V></c> has two levels with keys of types <c>'K1</c> and <c>'K2</c> respectively. The functions in this cateogry provide a way for aggregating values in the series at one of the levels. For example, given a series `input` indexed by two-element tuple, you can calculate mean for different first-level values as follows: input |> applyLevel fst Stats.mean Note that the `Stats` module provides helpers for typical statistical operations, so the above could be written just as `input |> Stats.levelMean fst`. Grouping, windowing and chunking -------------------------------- This category includes functions that group data from a series in some way. Two key concepts here are _window_ and _chunk_. Window refers to (overlapping) sliding windows over the input series while chunk refers to non-overlapping blocks of the series. The boundary behavior can be specified using the `Boundary` flags. The value `Skip` means that boundaries (incomplete windows or chunks) should be skipped. The value `AtBeginning` and `AtEnding` can be used to define at which side should the boundary be returned (or skipped). For chunking, `AtBeginning ||| Skip` makes sense and it means that the incomplete chunk at the beginning should be skipped (aligning the last chunk with the end). The behavior may be specified in a number of ways (which is reflected in the name): - `dist` - using an absolute distance between the keys - `while` - using a condition on the first and last key - `size` - by specifying the absolute size of the window/chunk The functions ending with `Into` take a function to be applied to the window/chunk. The functions `window`, `windowInto` and `chunk`, `chunkInto` are simplified versions that take a size. There is also `pairwise` function for sliding window of size two. Missing values -------------- This group of functions provides a way of working with missing values in a series. The `dropMissing` function drops all keys for which there are no values in the series. The `withMissingFrom` function lets you copy missing values from another series. The remaining functions provide different mechanism for filling the missing values. * `fillMissingWith` fills missing values with a specified constant * `fillMissingUsing` calls a specified function for every missing value * `fillMissing` and variants propagates values from previous/later keys Sorting and index manipulation ------------------------------ A series that is sorted by keys allows a number of additional operations (such as lookup using the `Lookp.ExactOrSmaller` lookup behavior). However, it is also possible to sort series based on the values - although the functions for manipulation with series do not guarantee that the order will be preserved. To sort series by keys, use `sortByKey`. Other sorting functions let you sort the series using a specified comparer function (`sortWith`), using a projection function (`sortBy`) and using the default comparison (`sort`). In addition, you can also replace the keys of a series with other keys using `indexWith` or with integers using `indexOrdinally`. To pick and reorder series values using to match a list of keys use `realign`. Sampling, resampling and advanced lookup ---------------------------------------- Given a (typically) time series sampling or resampling makes it possible to get time series with representative values at lower or uniform frequency. We use the following terminology: - `lookup` and `sample` functions find values at specified key; if a key is not available, they can look for value associated with the nearest smaller or the nearest greater key. - `resample` function aggregate values values into chunks based on a specified collection of keys (e.g. explicitly provided times), or based on some relation between keys (e.g. date times having the same date). - `resampleUniform` is similar to resampling, but we specify keys by providing functions that generate a uniform sequence of keys (e.g. days), the operation also fills value for days that have no corresponding observations in the input sequence. Joining, merging and zipping ---------------------------- Given two series, there are two ways to combine the values. If the keys in the series are not overlapping (or you want to throw away values from one or the other series), then you can use `merge` or `mergeUsing`. To merge more than 2 series efficiently, use the `mergeAll` function, which has been optimized for large number of series. If you want to align two series, you can use the _zipping_ operation. This aligns two series based on their keys and gives you tuples of values. The default behavior (`zip`) uses outer join and exact matching. For ordered series, you can specify other forms of key lookups (e.g. find the greatest smaller key) using `zipAlign`. functions ending with `Into` are generally easier to use as they call a specified function to turn the tuple (of possibly missing values) into a new value. For more complicated behaviors, it is often convenient to use joins on frames instead of working with series. Create two frames with single columns and then use the join operation. The result will be a frame with two columns (which is easier to use than series of tuples). </summary>
<category>Frame and series operations</category>
--------------------
type Series = static member ofVector: keys: 'K seq -> v: Vector<float> -> Series<'K,float> (requires equality) static member toVector: s: Series<'K,float> -> Vector<float> (requires equality)
--------------------
type Series<'K,'V (requires equality)> = interface ISeriesFormattable interface IFsiFormattable interface ISeries<'K> new: index: IIndex<'K> * vector: IVector<'V> * vectorBuilder: IVectorBuilder * indexBuilder: IIndexBuilder -> Series<'K,'V> + 3 overloads member After: lowerExclusive: 'K -> Series<'K,'V> member Aggregate: aggregation: Aggregation<'K> * keySelector: Func<DataSegment<Series<'K,'V>>,'TNewKey> * valueSelector: Func<DataSegment<Series<'K,'V>>,OptionalValue<'R>> -> Series<'TNewKey,'R> (requires equality) + 1 overload member AsyncMaterialize: unit -> Async<Series<'K,'V>> member Before: upperExclusive: 'K -> Series<'K,'V> member Between: lowerInclusive: 'K * upperInclusive: 'K -> Series<'K,'V> member Compare: another: Series<'K,'V> -> Series<'K,Diff<'V>> ...
<summary> The type <c>Series<K, V></c> represents a data series consisting of values `V` indexed by keys `K`. The keys of a series may or may not be ordered </summary>
<category>Core frame and series types</category>
--------------------
new: pairs: Collections.Generic.KeyValuePair<'K,'V> seq -> Series<'K,'V>
new: keys: 'K seq * values: 'V seq -> Series<'K,'V>
new: keys: 'K array * values: 'V array -> Series<'K,'V>
new: index: Indices.IIndex<'K> * vector: IVector<'V> * vectorBuilder: Vectors.IVectorBuilder * indexBuilder: Indices.IIndexBuilder -> Series<'K,'V>
(+0 other overloads)
static member Matrix.dot: s: Series<'C,float> * df: Frame<'R,'C> -> Series<'C,float> (requires comparison and equality)
(+0 other overloads)
static member Matrix.dot: df: Frame<'R,'C> * s: Series<'C,float> -> Series<'C,float> (requires equality and comparison)
(+0 other overloads)
static member Matrix.dot: df1: Frame<'R0,'C> * df2: Frame<'C,'R1> -> Frame<'R0,'R1> (requires equality and comparison and equality)
(+0 other overloads)
static member Matrix.dot: s: Series<'R,float> * v: MathNet.Numerics.LinearAlgebra.Vector<float> -> float (requires equality)
(+0 other overloads)
static member Matrix.dot: v: MathNet.Numerics.LinearAlgebra.Vector<float> * s: Series<'R,float> -> float (requires equality)
(+0 other overloads)
static member Matrix.dot: m: MathNet.Numerics.LinearAlgebra.Matrix<float> * s: Series<'K,float> -> MathNet.Numerics.LinearAlgebra.Vector<float> (requires equality)
(+0 other overloads)
static member Matrix.dot: s: Series<'K,float> * m: MathNet.Numerics.LinearAlgebra.Matrix<float> -> MathNet.Numerics.LinearAlgebra.Vector<float> (requires equality)
(+0 other overloads)
static member Matrix.dot: df: Frame<'R,'C> * v: MathNet.Numerics.LinearAlgebra.Vector<float> -> Series<'R,float> (requires equality and equality)
(+0 other overloads)
static member Matrix.dot: v: MathNet.Numerics.LinearAlgebra.Vector<float> * df: Frame<'R,'C> -> MathNet.Numerics.LinearAlgebra.Vector<float> (requires equality and equality)
(+0 other overloads)
<summary> Linear algebra on frame using MathNet.Numerics library. <category>Linear Algebra</category> </summary>
static member Frame.ReadCsv: stream: IO.Stream * [<Runtime.InteropServices.Optional>] hasHeaders: Nullable<bool> * [<Runtime.InteropServices.Optional>] inferTypes: Nullable<bool> * [<Runtime.InteropServices.Optional>] inferRows: Nullable<int> * [<Runtime.InteropServices.Optional>] schema: string * [<Runtime.InteropServices.Optional>] separators: string * [<Runtime.InteropServices.Optional>] culture: string * [<Runtime.InteropServices.Optional>] maxRows: Nullable<int> * [<Runtime.InteropServices.Optional>] missingValues: string array * [<Runtime.InteropServices.Optional>] preferOptions: Nullable<bool> * [<Runtime.InteropServices.Optional>] encoding: Text.Encoding -> Frame<int,string>
static member Frame.ReadCsv: location: string * [<Runtime.InteropServices.Optional>] hasHeaders: Nullable<bool> * [<Runtime.InteropServices.Optional>] inferTypes: Nullable<bool> * [<Runtime.InteropServices.Optional>] inferRows: Nullable<int> * [<Runtime.InteropServices.Optional>] schema: string * [<Runtime.InteropServices.Optional>] separators: string * [<Runtime.InteropServices.Optional>] culture: string * [<Runtime.InteropServices.Optional>] maxRows: Nullable<int> * [<Runtime.InteropServices.Optional>] missingValues: string array * [<Runtime.InteropServices.Optional>] preferOptions: bool * [<Runtime.InteropServices.Optional>] encoding: Text.Encoding -> Frame<int,string>
static member Frame.ReadCsv: path: string * ?hasHeaders: bool * ?inferTypes: bool * ?inferRows: int * ?schema: string * ?separators: string * ?culture: string * ?maxRows: int * ?missingValues: string array * ?preferOptions: bool * ?typeResolver: (string -> string option) * ?encoding: Text.Encoding -> Frame<int,string>
static member Frame.ReadCsv: stream: IO.Stream * ?hasHeaders: bool * ?inferTypes: bool * ?inferRows: int * ?schema: string * ?separators: string * ?culture: string * ?maxRows: int * ?missingValues: string array * ?preferOptions: bool * ?typeResolver: (string -> string option) * ?encoding: Text.Encoding -> Frame<int,string>
static member Frame.ReadCsv: path: string * indexCol: string * ?hasHeaders: bool * ?inferTypes: bool * ?inferRows: int * ?schema: string * ?separators: string * ?culture: string * ?maxRows: int * ?missingValues: string array * ?preferOptions: bool * ?typeResolver: (string -> string option) * ?encoding: Text.Encoding -> Frame<'R,string> (requires equality)
<summary> Drop missing values from the specified series. The returned series contains only those keys for which there is a value available in the original one. </summary>
<param name="series">An input series to be filtered</param>
<example><code> let s = series [ 1 => 1.0; 2 => Double.NaN ] s |> Series.dropMissing // val it : Series<int,float> = series [ 1 => 1] </code></example>
<category>Missing values</category>
<summary> Statistical analysis using MathNet.Numerics <category>Statistical Analysis</category> </summary>
static member Stats.median: df: Frame<'R,'C> -> Series<'C,float> (requires equality and equality)
static member Stats.median: series: Series<'R,'V> -> float (requires equality and member op_Explicit)
--------------------
static member Stats.median: frame: Frame<'R,'C> -> Series<'C,float> (requires equality and equality)
static member Stats.median: series: Series<'K,'V> -> float (requires equality)
static member Stats.quantile: df: Frame<'R,'C> * tau: float * ?definition: MathNet.Numerics.Statistics.QuantileDefinition -> Series<'C,float> (requires equality and equality)
static member Stats.quantile: series: Series<'R,'V> * tau: float * ?definition: MathNet.Numerics.Statistics.QuantileDefinition -> float (requires equality and member op_Explicit)
--------------------
static member Stats.quantile: quantiles: float array * series: Series<'K,'V> -> Series<string,float> (requires equality)
static member Stats.ranks: series: Series<'R,'V> * ?rankDefinition: MathNet.Numerics.Statistics.RankDefinition -> Series<'R,float> (requires equality and member op_Explicit)
<summary> Returns a series that contains the specified number of keys from the original series. </summary>
<param name="count">Number of keys to take; must be smaller or equal to the original number of keys</param>
<param name="series">Input series from which the keys are taken</param>
<category>Series transformations</category>
<summary> Returns a frame consisting of the specified columns from the original data frame. The function uses exact key matching semantics. <category>Accessing frame data and lookup</category> </summary>
<summary> Creates a new data frame that contains only those rows of the original data frame that are _dense_, meaning that they have a value for each column. The resulting data frame has the same number of columns, but may have fewer rows (or no rows at all). </summary>
<category>Missing values</category>
static member Stats.corr: df: Frame<'R,'C> * ?method: CorrelationMethod -> Frame<'C,'C> (requires equality and equality)
static member Stats.corr: s1: Series<'K,float> * s2: Series<'K,float> * ?method: CorrelationMethod -> 'a (requires equality)
--------------------
static member Stats.corr: series1: Series<'K,'V1> -> series2: Series<'K,'V2> -> float (requires equality)
<summary> Correlation method (Pearson or Spearman) <category>Statistical Analysis</category> </summary>
<summary> Spearman correlation </summary>
static member Stats.cov: df: Frame<'R,'C> -> Frame<'C,'C> (requires equality and equality)
--------------------
static member Stats.cov: series1: Series<'K,'V1> -> series2: Series<'K,'V2> -> float (requires equality)
static member Stats.corr: s1: Series<'K,float> * s2: Series<'K,float> * ?method: CorrelationMethod -> 'a1 (requires equality)
<summary>Provides constants and static methods for trigonometric, logarithmic, and other common mathematical functions.</summary>
static member Stats.ewmMean: x: Series<'R,float> * ?com: float * ?span: float * ?halfLife: float * ?alpha: float -> Series<'R,float> (requires equality)
<summary> Financial analysis <category>Financial Analysis</category> </summary>
static member Finance.ewmVolStdDev: x: Series<'R,float> * ?com: float * ?span: float * ?halfLife: float * ?alpha: float -> Series<'R,float> (requires equality)
static member Finance.ewmVar: x: Series<'R,float> * ?com: float * ?span: float * ?halfLife: float * ?alpha: float -> Series<'R,float> (requires equality)
<summary> Normalizes the columns in the dataframe using a z-score. That is (X - mean) / (std. dev) </summary>
<param name="df">The dataframe to normalize</param>
<summary> Computes the principal components from the data frame. The principal components are listed from PC1 .. PCn Where PC1 explains most of the variance. </summary>
<param name="dataFrame">A PCA datatype that contains the eigen values and vectors.</param>
<summary> Performs linear regression on the values in a dataframe. </summary>
<param name="xCols">The column keys that constitutes the independent variables.</param>
<param name="yCol">The column key that consitutes the dependent variable.</param>
<param name="fitIntercept">An option type that specifies a key to use for the intercept in the result, if set to None, the fit will not produce an intercept.</param>
<param name="df">The dataframe to perform the regression on</param>
<returns>A series with column keys as keys, and regression coefficients as values.</returns>
<summary> The coefficients found by the linear regression. </summary>
<param name="fit">The fit.</param>
<summary> Computes the fitted values from the linear model. </summary>
<param name="fit">The fit.</param>
<summary> Computes the residuals of the regression (y - yHat) </summary>
<param name="fit">The fit.</param>
module Frame from Deedle
--------------------
type Frame = static member ofMatrix: rowKeys: 'R seq -> colKeys: 'C seq -> m: Matrix<'T> -> Frame<'R,'C> (requires equality and equality and default constructor and value type and 'T :> IEquatable<'T> and 'T :> IFormattable and 'T :> ValueType) static member toMatrix: df: Frame<'R,'C> -> Matrix<float> (requires equality and equality)
--------------------
type Frame<'TRowKey,'TColumnKey (requires equality and equality)> = interface IDynamicMetaObjectProvider interface INotifyCollectionChanged interface IFrameFormattable interface IFsiFormattable interface IFrame new: rowIndex: IIndex<'TRowKey> * columnIndex: IIndex<'TColumnKey> * data: IVector<IVector> * indexBuilder: IIndexBuilder * vectorBuilder: IVectorBuilder -> Frame<'TRowKey,'TColumnKey> + 1 overload member AddColumn: column: 'TColumnKey * series: 'V seq -> unit + 3 overloads member AggregateRowsBy: groupBy: 'TColumnKey seq * aggBy: 'TColumnKey seq * aggFunc: Func<Series<'TRowKey,'a>,'b> -> Frame<int,'TColumnKey> member Clone: unit -> Frame<'TRowKey,'TColumnKey> member ColumnApply: f: Func<Series<'TRowKey,'T>,ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey> + 1 overload ...
--------------------
new: names: 'TColumnKey seq * columns: ISeries<'TRowKey> seq -> Frame<'TRowKey,'TColumnKey>
new: rowIndex: Indices.IIndex<'TRowKey> * columnIndex: Indices.IIndex<'TColumnKey> * data: IVector<IVector> * indexBuilder: Indices.IIndexBuilder * vectorBuilder: Vectors.IVectorBuilder -> Frame<'TRowKey,'TColumnKey>
static member Frame.ReadCsv: stream: System.IO.Stream * ?hasHeaders: bool * ?inferTypes: bool * ?inferRows: int * ?schema: string * ?separators: string * ?culture: string * ?maxRows: int * ?missingValues: string array * ?preferOptions: bool -> Frame<int,string>
static member Frame.ReadCsv: reader: System.IO.TextReader * ?hasHeaders: bool * ?inferTypes: bool * ?inferRows: int * ?schema: string * ?separators: string * ?culture: string * ?maxRows: int * ?missingValues: string array * ?preferOptions: bool -> Frame<int,string>
static member Frame.ReadCsv: stream: System.IO.Stream * [<System.Runtime.InteropServices.Optional>] hasHeaders: System.Nullable<bool> * [<System.Runtime.InteropServices.Optional>] inferTypes: System.Nullable<bool> * [<System.Runtime.InteropServices.Optional>] inferRows: System.Nullable<int> * [<System.Runtime.InteropServices.Optional>] schema: string * [<System.Runtime.InteropServices.Optional>] separators: string * [<System.Runtime.InteropServices.Optional>] culture: string * [<System.Runtime.InteropServices.Optional>] maxRows: System.Nullable<int> * [<System.Runtime.InteropServices.Optional>] missingValues: string array * [<System.Runtime.InteropServices.Optional>] preferOptions: System.Nullable<bool> -> Frame<int,string>
static member Frame.ReadCsv: location: string * [<System.Runtime.InteropServices.Optional>] hasHeaders: System.Nullable<bool> * [<System.Runtime.InteropServices.Optional>] inferTypes: System.Nullable<bool> * [<System.Runtime.InteropServices.Optional>] inferRows: System.Nullable<int> * [<System.Runtime.InteropServices.Optional>] schema: string * [<System.Runtime.InteropServices.Optional>] separators: string * [<System.Runtime.InteropServices.Optional>] culture: string * [<System.Runtime.InteropServices.Optional>] maxRows: System.Nullable<int> * [<System.Runtime.InteropServices.Optional>] missingValues: string array * [<System.Runtime.InteropServices.Optional>] preferOptions: bool -> Frame<int,string>
static member Frame.ReadCsv: path: string * indexCol: string * ?hasHeaders: bool * ?inferTypes: bool * ?inferRows: int * ?schema: string * ?separators: string * ?culture: string * ?maxRows: int * ?missingValues: string array * ?preferOptions: bool -> Frame<'R,string> (requires equality)
module Series from Deedle
--------------------
type Series = static member ofVector: keys: 'K seq -> v: Vector<float> -> Series<'K,float> (requires equality) static member toVector: s: Series<'K,float> -> Vector<float> (requires equality)
--------------------
type Series<'K,'V (requires equality)> = interface ISeriesFormattable interface IFsiFormattable interface ISeries<'K> new: index: IIndex<'K> * vector: IVector<'V> * vectorBuilder: IVectorBuilder * indexBuilder: IIndexBuilder -> Series<'K,'V> + 3 overloads member After: lowerExclusive: 'K -> Series<'K,'V> member Aggregate: aggregation: Aggregation<'K> * keySelector: Func<DataSegment<Series<'K,'V>>,'TNewKey> * valueSelector: Func<DataSegment<Series<'K,'V>>,OptionalValue<'R>> -> Series<'TNewKey,'R> (requires equality) + 1 overload member AsyncMaterialize: unit -> Async<Series<'K,'V>> member Before: upperExclusive: 'K -> Series<'K,'V> member Between: lowerInclusive: 'K * upperInclusive: 'K -> Series<'K,'V> member Compare: another: Series<'K,'V> -> Series<'K,Diff<'V>> ...
--------------------
new: pairs: System.Collections.Generic.KeyValuePair<'K,'V> seq -> Series<'K,'V>
new: keys: 'K seq * values: 'V seq -> Series<'K,'V>
new: keys: 'K array * values: 'V array -> Series<'K,'V>
new: index: Indices.IIndex<'K> * vector: IVector<'V> * vectorBuilder: Vectors.IVectorBuilder * indexBuilder: Indices.IIndexBuilder -> Series<'K,'V>
Deedle