API

Statistical functions

distimate.stats.mean()[source]

Estimate mean from a histogram.

The approximated mean is for sanity checks only, it is ineffective and imprecise to estimate mean from a histogram.

Return NaN for distributions with no samples.

  • Inner bins are represented by their midpoint (assume that samples are evenly distributed in bins).
  • The left outer bin is represented by the leftmost edge (assume that there are no samples bellow the supported range).
  • Return NaN if the rightmost bin is not empty (because we cannot approximate outliers).
Parameters:
  • edges – 1-D array-like, ordered histogram edges
  • hist – 1-D array-like, one item longer than edges
Returns:

float number

class distimate.stats.PDF[source]

Probability density function (PDF).

Callable object with .x and .y attributes. The attributes can be used for plotting, or the function can be called to estimate a PDF value at arbitrary point. The callable accepts a single value or an array-like.

The returned callable takes inputs from a distribution domain and returns outputs between 0 and 1 (inclusive).

The PDF values provides relative likelihoods of various distribution values. It is computed from a histogram by normalizing relative frequencies by bucket widths.

  • For inputs less than the first edges, the PDF will always return zero.
  • For inputs equal to the first edge (typically zero), the PDF function will return zero or NaN, depending on whether the first histogram bucket is empty. This is because the PDF is not defined for discrete distributions.
  • For inputs in each of inner histogram buckets (which are left-open), one value is returned. On a plot, this will form a staircase. To plot a non-continuous distribution, x-values are duplicated.
  • For inputs greater than the last edge, the PDF returns either zero or NaN, depending on whether the last histogram bucket is empty.
Parameters:
  • edges – 1-D array-like, ordered histogram edges
  • hist – 1-D array-like, one item longer than edges`
__call__(v)

Compute function value at the given point.

Parameters:v – scalar value or Numpy array-like
Returns:scalar value or Numpy array depending on x
x

Return numpy.array of x-values for plotting

y

Return numpy.array of y-values for plotting

class distimate.stats.CDF[source]

Cumulative distribution function (CDF).

Callable object with .x and .y attributes. The attributes can be used for plotting, or the function can be called to estimate a CDF value at arbitrary point. The callable accepts a single value or an array-like.

The returned callable takes inputs from a distribution domain and returns outputs between 0 and 1 (inclusive).

cdf(x) returns a probability that a distribution value will be less than or equal to x.

  • For inputs less than the first edge, the CDF will always return zero.
  • Function return exact values for inputs equal to histogram edges. Values inside histogram buckets are interpolated.
  • CDF of the first edge can be used to obtain how many samples were equal to that edge (typically zero)
  • For inputs greater than the last edge, the PDF returns either one or NaN, depending on whether the last histogram bucket is empty.
Parameters:
  • edges – 1-D array-like, ordered histogram edges
  • hist – 1-D array-like, one item longer than edges
__call__(v)

Compute function value at the given point.

Parameters:v – scalar value or Numpy array-like
Returns:scalar value or Numpy array depending on x
x

Return numpy.array of x-values for plotting

y

Return numpy.array of y-values for plotting

class distimate.stats.Quantile[source]

Create a quantile function.

Returns a callable object with .x and .y attributes. The attributes can be used for plotting, or the function can be called to estimate a quantile value at arbitrary point. The function accepts a single value or an array-like.

The returned callable takes inputs from a range between 0 and 1 (inclusive) and returns outputs from a distribution domain.

quantile(q) returns the smallest .x for which cdf(x) >= q.

  • If the first histogram bucket is not empty, the quantile value can return the first edge for many inputs.
  • If an inner histogram bucket is empty, then the quantile value can be ambiguous. In that case, duplicate x-values will be plotted. When called, the quantile function will a middle of possible values.
  • The function returns NaN for values outside of the <0, 1> range.
  • When called with zero, returns the left edge of the smallest non-empty bucket. If the first bucket is not empty, returns the first edge.
  • When called with one, returns the right edge of the greatest non-empty bucket. If the last bucket is not empty, returns NaN.
__call__(v)

Compute function value at the given point.

Parameters:v – scalar value or Numpy array-like
Returns:scalar value or Numpy array depending on x
x

Return numpy.array of x-values for plotting

y

Return numpy.array of y-values for plotting

Distributions

class distimate.distributions.Distribution(edges, values=None)[source]

Statistical distribution represented by its histogram.

Provides an object interface on top of a histogram array. Supports distribution merging and comparison. Implements approximation of common statistical functions.

Parameters:
  • edges – 1-D array-like, ordered histogram edges
  • values – 1-D array-like, histogram, one item longer than edges
__eq__(other)[source]

Return whether distribution histograms are equal.

__add__(other)[source]

Combine this distribution with other distribution.

__iadd__(other)[source]

Combine this distribution with other distribution inplace.

edges

Edges of the underlying histogram

Returns:
class:1-D numpy.array, ordered histogram edges
values

Values of the underlying histogram.

Returns:1-D numpy.array, histogram values
classmethod from_samples(edges, samples, weights=None)[source]

Create a distribution from a list of values.

Parameters:
  • edges – 1-D array-like, ordered histogram edges
  • samples – 1-D array-like
  • weights – optional scalar or 1-D array-like with same length as samples.
Returns:

a new Distribution

classmethod from_histogram(edges, histogram)[source]

Create a distribution from a histogram.

Parameters:
  • edges – 1-D array-like, ordered histogram edges
  • histogram – 1-D array-like, one item longer than edges
Returns:

a new Distribution

classmethod from_cumulative(edges, cumulative)[source]

Create a distribution from a cumulative histogram.

Parameters:
  • edges – 1-D array-like, ordered histogram edges
  • cumulative – 1-D array-like, one item longer than edges
Returns:

a new Distribution

to_histogram()[source]

Return a histogram of this distribution as a NumPy array.

Returns:1-D numpy.array
to_cumulative()[source]

Return a cumulative histogram of this distribution as a NumPy array.

Returns:1-D numpy.array
add(value, weight=None)[source]

Add a new item to this distribution.

Parameters:
  • value – item to add
  • weight – optional item weight
update(values, weights=None)[source]

Add multiple items to this distribution.

Parameters:
  • values – items to add, 1-D array-like
  • weights – optional scalar or 1-D array-like with same length as samples.
weight

Return a total weight of samples in this distribution.

Returns:float number
mean

Estimate mean of this distribution.

The approximated mean is for sanity checks only, it is ineffective and imprecise to estimate mean from a histogram.

See mean() for details.

Returns:float number
pdf

Probability density function (PDF) of this distribution.

See PDF for details.

Returns:a PDF instance
cdf

Cumulative distribution function (CDF) of this distribution.

See CDF for details.

Returns:a CDF instance
quantile

Quantile function of this distribution.

See Quantile for details.

Returns:a Quantile instance
class distimate.types.DistributionType(edges)[source]

Factory for creating distributions with constant histogram edges.

Parameters:edges – 1-D array-like, ordered histogram edges
edges

Edges of the underlying histogram

Returns:
class:1-D numpy.array, ordered histogram edges
empty()[source]

Create an empty distribution.

Returns:a new Distribution
from_samples(samples, weights=None)[source]

Create a distribution from a list of values.

Parameters:
  • samples – 1-D array-like
  • weights – optional 1-D array-like
Returns:

a new Distribution

from_histogram(histogram)[source]

Create a distribution from a histogram.

Parameters:histogram – 1-D array-like
Returns:a new Distribution
from_cumulative(cumulative)[source]

Create a distribution from a cumulative histogram.

Parameters:cumulative – 1-D array-like
Returns:a new Distribution

Pandas integration

class distimate.pandasext.DistributionAccessor(series)[source]

Implements .dist accessor on pandas.Series.

Allows to easily call Distribution methods on all instances in Pandas Series:

df[col] = pd.Series.dist.from_histogram(dist_type, histograms)
median = df[col].dist.quantile(0.5)
static from_histogram(dist_type, histograms, *, name=None)[source]

Construct a new pandas.Series from histograms.

This is a static method that can be accessed as pd.Series.dist.from_histogram().

Parameters:
  • dist_typeDistributionType or 1-D array-like with histogram edges
  • histogramspandas.DataFrame or 2-D array-like
  • name – optional name of the series.
Returns:

pandas.Series

static from_cumulative(dist_type, cumulatives, *, name=None)[source]

Construct a new pandas.Series from cumulative histograms.

This is a static method that can be accessed as pd.Series.dist.from_cumulative().

Parameters:
  • dist_typeDistributionType or 1-D array-like with histogram edges
  • histogramspandas.DataFrame or 2-D array-like
  • name – Optional name of the series.
Returns:

pandas.Series

to_histogram()[source]

Convert pandas.Series of Distribution instances to histograms.

Returns:pandas.DataFrame with histogram values.
to_cumulative()[source]

Convert pandas.Series of Distribution instances to cumulative histograms.

Returns:pandas.DataFrame with cumulative values
pdf(v)[source]

Compute PDF for pandas.Series of Distribution instances.

Parameters:v – input value, or list of them
Returns:pandas.Series
cdf(v)[source]

Compute CDF for series of distribution instances.

Parameters:v – input value, or list of them
Returns:pandas.Series
quantile(v)[source]

Compute quantile function pandas.Series of Distribution intances.

Parameters:v – input value, or list of them
Returns:pandas.Series
values

Values of the underlying histograms.

Returns:2-D numpy.array