# Miscellaneous Statistical Functions – catsim.stats¶

Miscellaneous statistical functions

catsim.stats.bincount(x)[source]

Count the number of occurrences from each integer in a list or 1-D numpy.ndarray. If there are gaps between the numbers, then the numbers in those gaps are given a 0 value of occurrences.

>>> bincount(numpy.array([-4, 0, 1, 1, 3, 2, 1, 5]))
array([1, 0, 0, 0, 1, 3, 1, 1, 0, 1], dtype=int32)

catsim.stats.coef_variation(x: numpy.ndarray, axis: int = 0)numpy.ndarray[source]

Calculates the coefficientof variation of the rows or columns of a matrix. The coefficient of variation is given by the standard deviation divided by the mean of a variable:

$\frac{\sigma}{\mu}$
Parameters
• x – the data matrix

• axis0 to calculate for columns, 1 for rows

Returns

a vector containing the coefficient of variations along the chosen axis

catsim.stats.covariance(x: numpy.ndarray, minus_one: bool = True)[source]

Calculates the covariance matrix of another matrix

Parameters
• x – a data matrix

• minus_one – subtract one from the total number of observations

>>> from sklearn.datasets import load_iris
>>> print(numpy.array_equal(covariance(x), numpy.cov(x.T)))
True

catsim.stats.scatter_matrix(data: numpy.ndarray)numpy.ndarray[source]

Calculates the scatter matrix of a data matrix. The scatter matrix is an unnormalized version of the covariance matrix, in which the means of the observation values are subtracted.

The calculations done by this function follow the following equation:

$S=\sum_{{j=1}}^{n}({\mathbf{x}}_{j}-\overline {{\mathbf{x}}})({\mathbf{x}}_{j}-\overline {{\mathbf{x}}})^{T}=\sum _{{j=1}}^{n}({\mathbf{x}}_{j}-\overline {{\mathbf{x}}})\otimes({\mathbf{x}}_{j}-\overline{{\mathbf{x}}})=\left(\sum _{{j=1}}^{n}{\mathbf {x}}_{j}{\mathbf {x}}_{j}^{T}\right)-n\overline {{\mathbf {x}}}\overline {{\mathbf {x}}}^{T}$
Parameters

data – the data matrix

Returns

the scatter matrix of the given data matrix