API¶
Statistical functions¶
-
distimate.stats.
mean
()[source]¶ Estimate mean from a histogram.
The approximated mean is for sanity checks only, it is ineffective and imprecise to estimate mean from a histogram.
Return NaN for distributions with no samples.
- Inner bins are represented by their midpoint (assume that samples are evenly distributed in bins).
- The left outer bin is represented by the leftmost edge (assume that there are no samples bellow the supported range).
- Return NaN if the rightmost bin is not empty (because we cannot approximate outliers).
Parameters: - edges – 1-D array-like, ordered histogram edges
- hist – 1-D array-like, one item longer than edges
Returns: float number
-
class
distimate.stats.
PDF
[source]¶ Probability density function (PDF).
Callable object with
.x
and.y
attributes. The attributes can be used for plotting, or the function can be called to estimate a PDF value at arbitrary point. The callable accepts a single value or an array-like.The returned callable takes inputs from a distribution domain and returns outputs between 0 and 1 (inclusive).
The PDF values provides relative likelihoods of various distribution values. It is computed from a histogram by normalizing relative frequencies by bucket widths.
- For inputs less than the first edges, the PDF will always return zero.
- For inputs equal to the first edge (typically zero), the PDF function will return zero or NaN, depending on whether the first histogram bucket is empty. This is because the PDF is not defined for discrete distributions.
- For inputs in each of inner histogram buckets (which are left-open), one value is returned. On a plot, this will form a staircase. To plot a non-continuous distribution, x-values are duplicated.
- For inputs greater than the last edge, the PDF returns either zero or NaN, depending on whether the last histogram bucket is empty.
Parameters: - edges – 1-D array-like, ordered histogram edges
- hist – 1-D array-like, one item longer than edges`
-
__call__
(v)¶ Compute function value at the given point.
Parameters: v – scalar value or Numpy array-like Returns: scalar value or Numpy array depending on x
-
x
¶ Return
numpy.array
of x-values for plotting
-
y
¶ Return
numpy.array
of y-values for plotting
-
class
distimate.stats.
CDF
[source]¶ Cumulative distribution function (CDF).
Callable object with
.x
and.y
attributes. The attributes can be used for plotting, or the function can be called to estimate a CDF value at arbitrary point. The callable accepts a single value or an array-like.The returned callable takes inputs from a distribution domain and returns outputs between 0 and 1 (inclusive).
cdf(x)
returns a probability that a distribution value will be less than or equal tox
.- For inputs less than the first edge, the CDF will always return zero.
- Function return exact values for inputs equal to histogram edges. Values inside histogram buckets are interpolated.
- CDF of the first edge can be used to obtain how many samples were equal to that edge (typically zero)
- For inputs greater than the last edge, the PDF returns either one or NaN, depending on whether the last histogram bucket is empty.
Parameters: - edges – 1-D array-like, ordered histogram edges
- hist – 1-D array-like, one item longer than edges
-
__call__
(v)¶ Compute function value at the given point.
Parameters: v – scalar value or Numpy array-like Returns: scalar value or Numpy array depending on x
-
x
¶ Return
numpy.array
of x-values for plotting
-
y
¶ Return
numpy.array
of y-values for plotting
-
class
distimate.stats.
Quantile
[source]¶ Create a quantile function.
Returns a callable object with
.x
and.y
attributes. The attributes can be used for plotting, or the function can be called to estimate a quantile value at arbitrary point. The function accepts a single value or an array-like.The returned callable takes inputs from a range between 0 and 1 (inclusive) and returns outputs from a distribution domain.
quantile(q)
returns the smallest.x
for whichcdf(x) >= q
.- If the first histogram bucket is not empty, the quantile value can return the first edge for many inputs.
- If an inner histogram bucket is empty, then the quantile value can be ambiguous. In that case, duplicate x-values will be plotted. When called, the quantile function will a middle of possible values.
- The function returns NaN for values outside of the <0, 1> range.
- When called with zero, returns the left edge of the smallest non-empty bucket. If the first bucket is not empty, returns the first edge.
- When called with one, returns the right edge of the greatest non-empty bucket. If the last bucket is not empty, returns NaN.
-
__call__
(v)¶ Compute function value at the given point.
Parameters: v – scalar value or Numpy array-like Returns: scalar value or Numpy array depending on x
-
x
¶ Return
numpy.array
of x-values for plotting
-
y
¶ Return
numpy.array
of y-values for plotting
Distributions¶
-
class
distimate.distributions.
Distribution
(edges, values=None)[source]¶ Statistical distribution represented by its histogram.
Provides an object interface on top of a histogram array. Supports distribution merging and comparison. Implements approximation of common statistical functions.
Parameters: - edges – 1-D array-like, ordered histogram edges
- values – 1-D array-like, histogram, one item longer than edges
-
edges
¶ Edges of the underlying histogram
Returns: class: 1-D numpy.array, ordered histogram edges
-
values
¶ Values of the underlying histogram.
Returns: 1-D numpy.array, histogram values
-
classmethod
from_samples
(edges, samples, weights=None)[source]¶ Create a distribution from a list of values.
Parameters: - edges – 1-D array-like, ordered histogram edges
- samples – 1-D array-like
- weights – optional scalar or 1-D array-like with same length as samples.
Returns: a new
Distribution
-
classmethod
from_histogram
(edges, histogram)[source]¶ Create a distribution from a histogram.
Parameters: - edges – 1-D array-like, ordered histogram edges
- histogram – 1-D array-like, one item longer than edges
Returns: a new
Distribution
-
classmethod
from_cumulative
(edges, cumulative)[source]¶ Create a distribution from a cumulative histogram.
Parameters: - edges – 1-D array-like, ordered histogram edges
- cumulative – 1-D array-like, one item longer than edges
Returns: a new
Distribution
-
to_histogram
()[source]¶ Return a histogram of this distribution as a NumPy array.
Returns: 1-D numpy.array
-
to_cumulative
()[source]¶ Return a cumulative histogram of this distribution as a NumPy array.
Returns: 1-D numpy.array
-
add
(value, weight=None)[source]¶ Add a new item to this distribution.
Parameters: - value – item to add
- weight – optional item weight
-
update
(values, weights=None)[source]¶ Add multiple items to this distribution.
Parameters: - values – items to add, 1-D array-like
- weights – optional scalar or 1-D array-like with same length as samples.
-
weight
¶ Return a total weight of samples in this distribution.
Returns: float number
-
mean
¶ Estimate mean of this distribution.
The approximated mean is for sanity checks only, it is ineffective and imprecise to estimate mean from a histogram.
See
mean()
for details.Returns: float number
-
pdf
¶ Probability density function (PDF) of this distribution.
See
PDF
for details.Returns: a PDF
instance
-
class
distimate.types.
DistributionType
(edges)[source]¶ Factory for creating distributions with constant histogram edges.
Parameters: edges – 1-D array-like, ordered histogram edges -
edges
¶ Edges of the underlying histogram
Returns: class: 1-D numpy.array, ordered histogram edges
-
from_samples
(samples, weights=None)[source]¶ Create a distribution from a list of values.
Parameters: - samples – 1-D array-like
- weights – optional 1-D array-like
Returns: a new
Distribution
-
Pandas integration¶
- class
distimate.pandasext.
DistributionAccessor
(series)[source]¶Implements
.dist
accessor onpandas.Series
.Allows to easily call
Distribution
methods on all instances in Pandas Series:df[col] = pd.Series.dist.from_histogram(dist_type, histograms) median = df[col].dist.quantile(0.5)
- static
from_histogram
(dist_type, histograms, *, name=None)[source]¶Construct a new
pandas.Series
from histograms.This is a static method that can be accessed as
pd.Series.dist.from_histogram()
.
Parameters:
- dist_type –
DistributionType
or 1-D array-like with histogram edges- histograms –
pandas.DataFrame
or 2-D array-like- name – optional name of the series.
Returns:
pandas.Series
- static
from_cumulative
(dist_type, cumulatives, *, name=None)[source]¶Construct a new
pandas.Series
from cumulative histograms.This is a static method that can be accessed as
pd.Series.dist.from_cumulative()
.
Parameters:
- dist_type –
DistributionType
or 1-D array-like with histogram edges- histograms –
pandas.DataFrame
or 2-D array-like- name – Optional name of the series.
Returns:
pandas.Series
to_histogram
()[source]¶Convert
pandas.Series
ofDistribution
instances to histograms.
Returns: pandas.DataFrame
with histogram values.
to_cumulative
()[source]¶Convert
pandas.Series
ofDistribution
instances to cumulative histograms.
Returns: pandas.DataFrame
with cumulative values
Compute PDF for
pandas.Series
ofDistribution
instances.
Parameters: v – input value, or list of them Returns: pandas.Series
cdf
(v)[source]¶Compute CDF for series of distribution instances.
Parameters: v – input value, or list of them Returns: pandas.Series
quantile
(v)[source]¶Compute quantile function
pandas.Series
ofDistribution
intances.
Parameters: v – input value, or list of them Returns: pandas.Series
values
¶Values of the underlying histograms.
Returns: 2-D numpy.array