A scatter plot (also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram) is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data
Scatter plots are one of the fundamental graphs in my day-to-day work. In their most basic form, they plot two variables on a 2D plane as points. The two coordinates on that plane are often referred to as pairs of (X,Y) coordinates, and the position scales are called X and Y axes.
var x = Enumerable.Range(1, 63).Select(x => x * 0.1);
var y = x.Select(x => System.Math.Sin(x));
#r "nuget: Plotly.NET, 3.0.0"
#r "nuget: Plotly.NET.Interactive, 3.0.0"
#r "nuget: Plotly.NET.CSharp"
Loading extensions from `C:\Users\schne\.nuget\packages\plotly.net.interactive\3.0.0\interactive-extensions\dotnet\Plotly.NET.Interactive.dll`
Plotly.NET offers many high-level abstractions for graphs with its Chart
API.
Chart.Scatter
creates a scatter plot. it needs at least x
and y
arguments, and a mode
argument that defines how the coordinates are visualized.
For our first example, let's use Mode.Markers
here to create simple points:
using Plotly.NET.CSharp;
Chart.Scatter<double,double,string>(
x: x,
y: y,
mode: Plotly.NET.StyleParam.Mode.Markers
)
Chart.Point
is just Chart.Scatter
with mode
set to markers
. If we just want to plot points, it is the method of choice. Additionally, let's use some chart styling functions to add titles to the axes:
Chart.Point<double, double, string>(
x: x,
y: y,
Name: "y = sin(x)",
ShowLegend: true
)
.WithXAxisStyle<string,string,string>(TitleText: "x")
.WithYAxisStyle<string,string,string>(TitleText: "y")
When the order of our points has a meaning (for example, our x
could be time
, and y
could be an observation we make in dependence of time), it can make sense to connect them via lines.
We could use Mode.Lines
with Chart.Scatter
, but there is also Chart.Line
which does just that:
Chart.Line<double, double, string>(
x: x,
y: y
)
.WithXAxisStyle<string, string, string>(TitleText: "time [s]")
.WithYAxisStyle<string, string, string>(TitleText: "some observation value")
As pointed out above, connecting our data points via lines only makes sense if there is some inherent meaning to the connection, such as indicating observations occurring in sequence.
Scatter plots can also be used to investigate the distribution of data points across two dimensions (often leading to a 'point cloud').
If we connect the points on such a plot where we have no inherent meaning to the succession of values, it does not make too much sense.
Let's take a look at this side-by-side, using Chart.Grid
for a multi-chart layout:
var rnd = new System.Random(69);
var rnd_x = Enumerable.Range(0,200).Select(x =>rnd.NextDouble());
var rnd_y = Enumerable.Range(0,200).Select(x =>rnd.NextDouble());
Chart.Grid(
nRows: 1,
nCols: 2,
gCharts: new Plotly.NET.GenericChart.GenericChart [] {
Chart.Point<double, double, string>(
x: rnd_x,
y: rnd_y,
Name: "Point cloud"
)
.WithXAxisStyle<string, string, string>(TitleText: "x")
.WithYAxisStyle<string, string, string>(TitleText: "y"),
Chart.Line<double, double, string>(
x: rnd_x,
y: rnd_y,
Name: "lines"
)
.WithXAxisStyle<string, string, string>(TitleText: "x")
.WithYAxisStyle<string, string, string>(TitleText: "y")
}
)
.WithSize(Width: 1000, Height: 600)
In the previous examples, all points are of the same size, and the size does not have any assiocated dimension.
Bubble charts change that by associating a third variable to the point size.
Imagine the following scenario:
We have two imaginary countries - ALand
and BLand
, which we want to compare across their population
and GDP
over time.
using Chart.Bubble
, we can add the one dimension (here, i choose GDP
) to the point size:
// A Land
var gdp_a = new int [] {20, 25, 35, 40, 50};
var population_a = new int [] {500, 530, 520, 500, 510};
//B Land
var gdp_b = new int [] {30, 30, 33, 31, 35};
var population_b = new int [] {400, 500, 600, 700, 800};
var times = new int [] {1900, 1910, 1920, 1930, 1940};
Chart.Combine(
gCharts: new Plotly.NET.GenericChart.GenericChart [] {
Chart.Bubble<int, int, int>(
x: times,
y: population_a,
sizes: gdp_a,
Name: "ALand",
MultiText: gdp_a, // show gdp values as text in addition to bubble size
TextPosition: Plotly.NET.StyleParam.TextPosition.Auto // set textposition to make gdp values visible
),
Chart.Bubble<int, int, int>(
x: times,
y: population_b,
sizes: gdp_b,
Name: "ALand",
MultiText: gdp_b, // show gdp values as text in addition to bubble size
TextPosition: Plotly.NET.StyleParam.TextPosition.Auto // set textposition to make gdp values visible
)
}
)
.WithXAxisStyle<string, string, string>(TitleText: "Time [y]")
.WithYAxisStyle<string, string, string>(TitleText: "Population [Million]")
What can we see here?
ALand
has a pretty stagnant population size (y value), while its GDP (bubble size) is increasing with time, meaning the wealth of individuals is rising over time.BLand
, in contrast, has a rising population, while having a stagnant GDP, indicating that individual wealth is decreasing.Since we have a time
axis, we can also connect the bubbles via lines to further emphasize the time evolution.
As Chart.Bubble
has no arguments for that, we can fall back on using Chart.Scatter
like this:
using Plotly.NET.TraceObjects;
Chart.Combine(
gCharts: new Plotly.NET.GenericChart.GenericChart [] {
Chart.Scatter<int, int, int>(
x: times,
y: population_a,
mode: Plotly.NET.StyleParam.Mode.Lines_Markers_Text,
Name: "ALand",
MultiText: gdp_a,
TextPosition: Plotly.NET.StyleParam.TextPosition.Auto,
Marker: Marker.init(MultiSize:gdp_a) // the marker object controls the style of the individual points
),
Chart.Scatter<int, int, int>(
x: times,
y: population_b,
mode: Plotly.NET.StyleParam.Mode.Lines_Markers_Text,
Name: "ALand",
MultiText: gdp_b,
TextPosition: Plotly.NET.StyleParam.TextPosition.Auto,
Marker: Marker.init(MultiSize:gdp_b) // the marker object controls the style of the individual points
)
}
)
.WithXAxisStyle<string, string, string>(TitleText: "Time [y]")
.WithYAxisStyle<string, string, string>(TitleText: "Population [Million]")
In general, Scatter plots are used to visualize the relationship of 2 variables on a 2D plane. Examples include:
x
dimension being time
and the y
dimension being any other variable which is observed over time.Depending on the type of visualization, it can make sense to connect data points with lines.
You can add another dimension to a scatter plot by changing the point size, leading to a Bubble chart.
Plotly.NET offers several easy ways of creating different scatter plots, such as
Chart.Point
Chart.Line
Chart.Bubble
Plotly.NET is a feature-rich graphing library for .NET programming languages.
Check out the source repository on github and in-depth F# docs !