Miscellaneous Statistical Functions – catsim.stats

Miscellaneous statistical functions


Count the number of occurrences from each integer in a list or 1-D numpy.ndarray. If there are gaps between the numbers, then the numbers in those gaps are given a 0 value of occurrences.

>>> bincount(numpy.array([-4, 0, 1, 1, 3, 2, 1, 5]))
array([1, 0, 0, 0, 1, 3, 1, 1, 0, 1], dtype=int32)
catsim.stats.coef_variation(x: numpy.ndarray, axis: int = 0) → numpy.ndarray[source]

Calculates the coefficientof variation of the rows or columns of a matrix. The coefficient of variation is given by the standard deviation divided by the mean of a variable:

Return type:


  • x (ndarray) – the data matrix
  • axis (int) – 0 to calculate for columns, 1 for rows

a vector containing the coefficient of variations along the chosen axis

catsim.stats.covariance(x: numpy.ndarray, minus_one: bool = True)[source]

Calculates the covariance matrix of another matrix

  • x (ndarray) – a data matrix
  • minus_one (bool) – subtract one from the total number of observations
>>> from sklearn.datasets import load_iris
>>> x = load_iris()['data']
>>> print(numpy.array_equal(covariance(x), numpy.cov(x.T)))
catsim.stats.scatter_matrix(data: numpy.ndarray) → numpy.ndarray[source]

Calculates the scatter matrix of a data matrix. The scatter matrix is an unnormalized version of the covariance matrix, in which the means of the observation values are subtracted.

The calculations done by this function follow the following equation:

\[S=\sum_{{j=1}}^{n}({\mathbf{x}}_{j}-\overline {{\mathbf{x}}})({\mathbf{x}}_{j}-\overline {{\mathbf{x}}})^{T}=\sum _{{j=1}}^{n}({\mathbf{x}}_{j}-\overline {{\mathbf{x}}})\otimes({\mathbf{x}}_{j}-\overline{{\mathbf{x}}})=\left(\sum _{{j=1}}^{n}{\mathbf {x}}_{j}{\mathbf {x}}_{j}^{T}\right)-n\overline {{\mathbf {x}}}\overline {{\mathbf {x}}}^{T}\]
Return type:ndarray
Parameters:data (ndarray) – the data matrix
Returns:the scatter matrix of the given data matrix