Creating lazily loaded series
When loading data from an external data source (such as a database), you might want to create a virtual time series that represents the data source, but does not actually load the data until needed. If you apply some range restriction (like slicing) to the data series before using the values, then it is not necessary to load the entire data set into memory.
Deedle supports lazy loading through the DelayedSeries.FromValueLoader
method. It returns an ordinary data series of type Series<K, V>
which has a
delayed internal representation.
Creating lazy series
We will not use a real database in this tutorial, but let's say that you have the following function which loads data for a given day range:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: |
|
Using random numbers as the source in this example is not entirely correct, because it means that we will get different values each time a new sub-range of the series is required - but it will suffice for the demonstration.
Now, to create a lazily loaded series, we need to open the Indices
namespace,
specify the minimal and maximal value of the series and use DelayedSeries.FromValueLoader
:
1: 2: 3: 4: 5: 6: 7: 8: 9: |
|
To make the diagnostics easier, we print the required range whenever a request
is made. After running this code, you should not see any output yet.
The parameter to DelayedSeries.FromValueLoader
is a function that takes 4 arguments:
-
lo
andhi
specify the low and high boundaries of the range. Their type is the type of the key (e.g.DateTime
in our example) -
lob
andhib
are values of typeBoundaryBehavior
and can be eitherInclusive
orExclusive
. They specify whether the boundary value should be included or not.
Our sample function does not handle boundaries correctly - it always includes the
boundary (and possibly more values). This is not a problem, because the lazy loader
automatically skips over such values. But if you want, you can use lob
and hib
parameters to build a more optimal SQL query.
Using un-evaluated series
Let's now have a look at the operations that we can perform on un-evaluated series.
Any operation that actually accesses values or keys of the series (such as Series.observations
or lookup for a specific key) will force the evaluation of the series.
However, we can use range restrictions before accessing the data:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: |
|
As you can see from the output on line 9, the series obtained data for the
15 day range that we created by restricting the original series. When we requested
another value within the specified range, it was already available and it was
returned immediately. Note that janHalf
is restricted to the specified 15 day
range, so we cannot access values outside of the range. Also, when you access a single
value, entire series is loaded. The motivation is that you probably need to access
multiple values, so it is likely cheaper to load the whole series.
Another operation that can be performed on an unevaluated series is to add it to a data frame with some existing key range:
1: 2: 3: 4: 5: 6: |
|
When adding lazy series to a data frame, the series has to be evaluated (so that the values can be properly aligned) but it is first restricted to the range of the data frame. In the above example, only one month of data is loaded.
Given a time range, generates random values for dates (at 12:00 AM)
starting with the day of the first date time and ending with the
day after the second date time (to make sure they are in range)
type DateTime =
struct
new : ticks:int64 -> DateTime + 10 overloads
member Add : value:TimeSpan -> DateTime
member AddDays : value:float -> DateTime
member AddHours : value:float -> DateTime
member AddMilliseconds : value:float -> DateTime
member AddMinutes : value:float -> DateTime
member AddMonths : months:int -> DateTime
member AddSeconds : value:float -> DateTime
member AddTicks : value:int64 -> DateTime
member AddYears : value:int -> DateTime
...
end
--------------------
DateTime ()
(+0 other overloads)
DateTime(ticks: int64) : DateTime
(+0 other overloads)
DateTime(ticks: int64, kind: DateTimeKind) : DateTime
(+0 other overloads)
DateTime(year: int, month: int, day: int) : DateTime
(+0 other overloads)
DateTime(year: int, month: int, day: int, calendar: Globalization.Calendar) : DateTime
(+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int) : DateTime
(+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, kind: DateTimeKind) : DateTime
(+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, calendar: Globalization.Calendar) : DateTime
(+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, millisecond: int) : DateTime
(+0 other overloads)
DateTime(year: int, month: int, day: int, hour: int, minute: int, second: int, millisecond: int, kind: DateTimeKind) : DateTime
(+0 other overloads)
type Random =
new : unit -> Random + 1 overload
member Next : unit -> int + 2 overloads
member NextBytes : buffer:byte[] -> unit
member NextDouble : unit -> float
--------------------
Random() : Random
Random(Seed: int) : Random
val int : value:'T -> int (requires member op_Explicit)
--------------------
type int = int32
--------------------
type int<'Measure> = int
val seq : sequence:seq<'T> -> seq<'T>
--------------------
type seq<'T> = Collections.Generic.IEnumerable<'T>
active recognizer KeyValue: Collections.Generic.KeyValuePair<'Key,'Value> -> 'Key * 'Value
--------------------
type KeyValue =
static member Create : key:'K * value:'V -> KeyValuePair<'K,'V>
val float : value:'T -> float (requires member op_Explicit)
--------------------
type float = Double
--------------------
type float<'Measure> = float
Random.Next(maxValue: int) : int
Random.Next(minValue: int, maxValue: int) : int
static member FromIndexVectorLoader : scheme:IAddressingScheme * vectorBuilder:IVectorBuilder * indexBuilder:IIndexBuilder * min:'K * max:'K * loader:('K * BoundaryBehavior -> 'K * BoundaryBehavior -> Async<IIndex<'K> * IVector<'V>>) -> Series<'K,'V> (requires equality)
static member FromIndexVectorLoader : scheme:IAddressingScheme * vectorBuilder:IVectorBuilder * indexBuilder:IIndexBuilder * min:'K * max:'K * loader:Func<'K,BoundaryBehavior,'K,BoundaryBehavior,Task<IIndex<'K> * IVector<'V>>> -> Series<'K,'V> (requires equality)
static member FromValueLoader : min:'K * max:'K * loader:('K * BoundaryBehavior -> 'K * BoundaryBehavior -> Async<seq<KeyValuePair<'K,'V>>>) -> Series<'K,'V> (requires comparison)
static member FromValueLoader : min:'K * max:'K * loader:Func<'K,BoundaryBehavior,'K,BoundaryBehavior,Task<seq<KeyValuePair<'K,'V>>>> -> Series<'K,'V> (requires comparison)
static member DelayedSeries.FromValueLoader : min:'K * max:'K * loader:Func<'K,BoundaryBehavior,'K,BoundaryBehavior,Threading.Tasks.Task<seq<Collections.Generic.KeyValuePair<'K,'V>>>> -> Series<'K,'V> (requires comparison)
module Frame
from Deedle
--------------------
type Frame =
static member ReadCsv : stream:Stream * hasHeaders:Nullable<bool> * inferTypes:Nullable<bool> * inferRows:Nullable<int> * schema:string * separators:string * culture:string * maxRows:Nullable<int> * missingValues:string [] * preferOptions:Nullable<bool> -> Frame<int,string>
static member ReadCsv : location:string * hasHeaders:Nullable<bool> * inferTypes:Nullable<bool> * inferRows:Nullable<int> * schema:string * separators:string * culture:string * maxRows:Nullable<int> * missingValues:string [] * preferOptions:bool -> Frame<int,string>
static member ReadReader : reader:IDataReader -> Frame<int,string>
static member CustomExpanders : Dictionary<Type,Func<obj,seq<string * Type * obj>>>
static member NonExpandableInterfaces : ResizeArray<Type>
static member NonExpandableTypes : HashSet<Type>
--------------------
type Frame<'TRowKey,'TColumnKey (requires equality and equality)> =
interface IDynamicMetaObjectProvider
interface INotifyCollectionChanged
interface IFsiFormattable
interface IFrame
new : names:seq<'TColumnKey> * columns:seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>
new : rowIndex:IIndex<'TRowKey> * columnIndex:IIndex<'TColumnKey> * data:IVector<IVector> * indexBuilder:IIndexBuilder * vectorBuilder:IVectorBuilder -> Frame<'TRowKey,'TColumnKey>
member AddColumn : column:'TColumnKey * series:ISeries<'TRowKey> -> unit
member AddColumn : column:'TColumnKey * series:seq<'V> -> unit
member AddColumn : column:'TColumnKey * series:ISeries<'TRowKey> * lookup:Lookup -> unit
member AddColumn : column:'TColumnKey * series:seq<'V> * lookup:Lookup -> unit
...
--------------------
new : names:seq<'TColumnKey> * columns:seq<ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey>
new : rowIndex:IIndex<'TRowKey> * columnIndex:IIndex<'TColumnKey> * data:IVector<IVector> * indexBuilder:IIndexBuilder * vectorBuilder:Vectors.IVectorBuilder -> Frame<'TRowKey,'TColumnKey>