Statistics

1. Average
2. Mean
3. Expectation
4. Variance
- 4.1. Law of Total Variance
5. Covariance
6. Principal Component Analysis
7. Linear Discriminant Analysis
8. Kelly Criterion
- 8.1. Heuristic Proof
9. Reference

1. Average

Average is a umbrella term for a single number that best represents a set of data, usually one of the measure of central tendency.

1.1. Central Tendency

1.1.1. Via Solution to Variational Problems

Dispersion precedes location

Several measures of central tendency can be characterized as solving a variational problem, in the sense of the calculus of variations. However, this center may not be unique.

\(L^p\)	dispersion	central tendency
\(L^0\)	variation ratio	mode
\(L^1\)	average absolute deviation	median
\(L^2\)	standard deviation	mean
\(L^\infty\)	maximum deviation	midrange

The dispersion of a vector \(\mathbf{x} = (x_1, \dots, x_n)\) abount a point \(\mathbf{c} = (c,c,\dots, c)\) is the distance in the sense of the (normalized) p-norm between them: \[ \|\mathbf{x} - \mathbf{c}\|_p := \left(\frac{1}{n}\sum_{i=1}^n |x_i - c |^p\right)^{1/p}. \]

1.2. Mean

See mean.

1.3. Mode

Most frequenct element

1.4. Median

1.4.1. Definition

A value that is in the middle.

1.4.2. Calculation

1.4.2.1. Set

For a ordered set \(\{a_1, a_2, \dots, a_n\}\) in which \(a_1\le a_2\le \dots\le a_n\), the median is
- \(a_{n/2}\) when \(n\) is even
- \((a_{(n-1)/2}+a_{(n+1)/2})/2\) when \(n\) is odd.

1.4.2.2. Function

The median of a function \(f: [a,b]\to \mathbb{R}\) is \(t_0\) which minimizes \[ g(t)=\int_a^b|f(x)-t|\,dx. \]

1.5. Geometric Median

Spatial Median, Euclidean Minisum Point, Torricelli Point, 1-Median

1.5.1. Definition

\[ \operatorname*{arg\,min}_{y\in \mathbb{R}^n}\sum_{i=1}^m\| x_i - y\|_2 \]

1.6. Midrange

The arithmetic mean of the maximum and minimum

1.7. Midhinge

The arithmetic mean of the first and third quartiles

2. Mean

2.1. Intuition

The mean is defined relative to the notion of the total of a specific problem.
A mean is a single value such that, when replaced all the data with the mean, it would calculate to the same total.
- This is the idea of the
The arithmetic mean and its variations are the solution to the optimization problem of the (statistical) variation.

2.2. Minimum and Maximum

2.3. Arithmetic Mean

Often simply, Mean
\[ m_1 = \frac{x_1 + x_2 + \cdots + x_n}{n} \]

2.4. Geometric Mean

2.4.1. Properties

It is equal to the arithmetic-harmonic mean defined by the limit of the sequnces \(a_i\) and \(g_i\):
- \[ a_0 = x, h_0 = y\\ a_{n+1} = \mathrm{AM}(a_n, h_n), h_{n+1} = \mathrm{HM}(a_n,h_n) \]

2.5. Spherical Mean

2.5.1. Definition

For a continuous function \(u\colon U \to \mathbb{F}\), with \(U\) being the open subset of the Euclidean space \(\mathbb{R}^n\) and \(\mathbb{F}\) being either the real or complex number,
The spherical mean over the sphere of radius \(r\) centered at \(x\) is defined by:
- \[ \frac{1}{\omega_{n-1}(r)}\int_{\partial B(x,r)} u(y)\mathrm{d}^{\wedge n-1}y \]
  - where \(B(x,r)\subset U\), \(\mathrm{d}^{n-1}y\) is the spherical measure, and \(\omega_{n-1}(r)\) is the size of the hypersurface of \((n-1)\)-sphere.
The spherical mean is often denoted as:
- \[ \raisebox{.4em}{\underline{\smash{\raisebox{-.4em}{\displaystyle\int}}}}\raisebox{-1em}{\scriptstyle\partial B(x,r)} u(y)\,\mathrm{d}S(y) \]
- This notation is also used for the Cauchy principal value sometimes.

2.6. Arithmetic-Geometric Mean

2.6.1. Definition

It is the limit of the sequnces \(a_i\) and \(g_i\):
- \[ a_0 = x, g_0 = y\\ a_{n+1} = \mathrm{AM}(a_n, g_n), g_{n+1} = \mathrm{GM}(a_n,g_n) \]

2.7. Quadratic Mean

Root Mean Square(RMS)

2.8. Relations

\[ \rm min(\mathbf{x}) \le HM(\mathbf{x}) \le GM(\mathbf{x}) \le LM(\mathbf{x}) \le AM(\mathbf{x}) \le QM(\mathbf{x}) \le CM(\mathbf{x}) \le max(\mathbf{x}), \] where equality holds if and only if all the variables are equal.

\[ \mathrm{AM}(a,b)\cdot \mathrm{HM}(a,b) = \mathrm{GM}(a,b) \]
\[ \mathrm{GM}(\mathrm{AM}(a,b), \mathrm{HM}(a,b)) = \mathrm{GM}(a,b) \]
\[ \mathrm{AM}(\mathrm{HM}(a,b), \mathrm{CM}(a,b)) = \mathrm{AM}(a,b) \]

2.9. Elementary Symmetric Mean

For a sequence of nonnegative real numbers \((a_i)_{i=1}^n\), the elementary symmetric means \(S_k\) are given by: \[ S_k = \frac{e_k}{\binom{n}{k}}. \]

The numerator is the ((6659b4b7-3e70-4daf-869a-ca654cccfb1b)), and the denominator is the number of such polynomials

2.10. Newton's Inequalities

2.10.1. Statement

\[ S_{k-1}S_{k+1} \le S_k^2 \]
with equality if and only if all the numbers \(a_i\) are equal.

2.11. Maclaurin's Inequality

2.11.1. Statement

\[ S_1 \ge \sqrt{S_2} \ge \sqrt[3]{S_3} \ge \cdots \ge \sqrt[n]{S_n} \]
with equality if and only if all the \(a_i\) are equal.
The case \(n=2\) is already known as the inequality of arithmetic and geometric mean.

2.12. Bernoulli's Inequality

It approximates exponentiations of \(1+x\), hence related to bernoulli distribution and binomial distribution.

2.12.1. Statement

\((1+x)^r \ge 1+rx\)
- for every integer \(r\ge 1\) and real number \(x\ge -1\), with strict inequality if \(x\neq 0\land r\ge 2\). logseq.order-list-type:: number
- for every integer \(r\ge 0\) and real number \(x\ge -2\). logseq.order-list-type:: number
- for every even integer \(r\ge 0\) and real number \(x\). logseq.order-list-type:: number
- for every real number \(r\ge 1\) and \(x\ge -1\), with strict inequality if \(x\neq 0\land r\neq 1\). logseq.order-list-type:: number
- for every real number \(0\le r\le 1\) and \(x\ge -1\). logseq.order-list-type:: number

2.13. Harmonic Mean

2.13.1. Definition

\[ m_{-1} = \frac{n}{\dfrac{1}{x_1}+\dfrac{1}{x_2}+\cdots +\dfrac{1}{x_n}} \]

2.13.2. Interpretataion

2.13.2.1. Using Graph

2.13.2.2. Using Average

Average of averages with the same numerator. For example, the average speed over the distance \(2d\), within which \(d\) is traveled with the speed of \(v_1\), and the remaining \(d\) is traveled with the speed of \(v_2\), then the total average speed would be: \[ \bar{v} = \frac{2}{\dfrac{1}{v_1}+\dfrac{1}{v_2}}. \]

On the other hand, the average of averages with the same denominator would just be the arithmetic mean.

2.14. Pythagorean Mean

Arithmetic mean, geometric mean, and harmonic mean

2.14.1. Para-Axioms

First-Order Homogeneity: \(\mathrm{M}(bx_1, \dots, bx_n) = b\mathrm{M}(x_1,\dots, x_n)\)
Total Symmetry: \(\mathrm{M}(\dots, x_i,\dots,x_j,\dots) = \mathrm{M}(\dots, x_j,\dots,x_i,\dots)\)
Monotonicity (in all variables): \(a\le b \implies \mathrm{M}(a,x_2,\dots,x_n) \le \mathrm{M}(b,x_2,\dots, x_n)\)
Idempotence: \(\forall x, \mathrm{M}(x,x,\dots,x) = x\)

2.15. Contraharmonic Mean

Complementary to the harmonic mean

2.15.1. Definition

For positive real numbers \(x_1, \dots, x_n\),
\[ \mathrm{C}(x_1, \dots,x_n) = \frac{\frac1n(x_1^2 + \cdots + x_n^2)}{\frac1n(x_1+\cdots + x_n)} \]

2.15.2. Properties

\[ \mathrm{AM}(\mathrm{HM}(a,b), \mathrm{CM}(a,b)) = \mathrm{AM}(a,b) \]

2.16. Fréchet Mean

2.16.1. Fréchet Variance

For a complete \((M, d)\), the Fréchet variance is:
- \[ \Psi(p) := \sum_{i=1}^N d(p,x_i)^2 \]

2.16.2. Definition

Karcher Means
- \[ m = \operatorname*{arg\,min}_{p\in M}\sum_{i=1}^Nd(p,x_i)^2 \]
If there is a unique \(m\) that strictly minimizes \(\Psi\), then it is Fréchet mean.

2.17. Generalized Mean

Power Mean

2.17.1. Definition

For a nonzero real number \(p\), and positive real numbers \(x_1,\dots, x_n\), the generalized mean with exponent \(p\) is:
- \[ m_p(x_{i\in I}) = \left(\frac{\sum_{i\in I}x_i^p}{|I|}\right)^{\frac{1}{p}} \]

2.17.2. Special Cases

\(m_1\) is arithmetic mean
\(m_0\) is geometric mean via limit.
\(m_{-1}\) is harmonic mean.
\(m_{-\infty}\) and \(m_\infty\) are minimum and maximum.
\(m_2\) is quadratic mean, or root mean square.

2.17.3. General Mean Inequality

If \(p

2.18. Quasi-Arithmetic Mean

Generalized \(f\)-Mean, Kolmogorov-Najumo-de Finetti Mean, Kolmogorov Mean

2.18.1. Definition

For a injective continuous function \(f\colon I\to \mathbb{R}\) with an interval \(I\), \[ M_f(\mathbf{x}) = f^{-1}\left(\frac{1}{n}\sum_{i=1}^nf(x_i)\right). \]

2.18.2. LogSumExp

RealSoftMax(LSE), Multivariable Softplus

Quasi-arithmetic mean with \( f = \exp \) that smoothly approximate the maximum function: \[ \mathrm{LSE}(x_1,\dots,x_n) := \log(\exp(x_1),\dots,\exp(x_n)) \]

2.18.3. Special Cases

\(f\ \text{identity}\): Arithmetic mean

2.18.4. Properties

If \( f \) is a convex function, then quasi-arithmetic mean satisfies the Jensen's inequality.

2.19. Heronian Mean

2.19.1. Definition

\[ H = \frac{1}{3}(a + \sqrt{ab} + b) \]

2.19.2. Properties

The volume of a frustum is the product of the height and the Heronian mean of areas of the opposing parallel faces.

2.20. Chisini Mean

Substitution Mean

2.20.1. Definition

A function of \(n\) variables give rises to a Chisini mean \(M\), if for every vector \((x_1,\dots, x_n)\) there exists a unique \(M\) such that:
- \[ f(M, M,\cdots, M) = f(x_1, x_2, \dots, x_n). \]

2.20.2. Special Cases

\(f\ \text{summation}\): arithmetic mean
\(f\ \text{product}\): geometric mean
\(f\ \text{reciprocal summation}\): harmonic mean
\(f\ \text{summation after squaring}\): quadratic mean
\(f\ \text{summation after exponentiation}\): generalized mean
\(f\ \text{summation after filtering with a function}\): quasi-arithmetic mean
\(f\ \text{volume of a frustum in terms of the areas of the bases}\): Heronian mean

2.21. Lehmer Mean

2.21.1. Definition

\[ L_p(\mathbf{x}) = \frac{\sum_{k=1}^nx_k^p}{\sum_{k=1}^nx_k^{p-1}} \]

2.21.2. Weighted Lehmer Mean

\[ L_{p,w}(\mathbf{x}) = \frac{\sum_{k=1}^nw_kx_k^p}{\sum_{k=1}^nw_kx_k^{p-1}} \]

2.21.3. Special Cases

\(L_0\) is the harmonic mean
\(L_{1/2}((x_1, x_2))\) is the geometric mean
\(L_1\) is the arithmetic mean
\(L_2\) is the contraharmonic mean
\(\lim_{p\to -\infty}L_p\) and \(\lim_{p\to \infty}L_p\) are the minimum and maximum

2.22. Heinz Mean

2.22.1. Definition

For two non-negative number \(a, b\),
\[ \mathrm{H}_x(a,b) = \frac{a^xb^{1-x} + a^{1-x}b^x}{2} \]
- with \(0\le x\le 1/2\).

2.22.2. Properties

It interpolates between the arithmetic (\(x=0\)) and geometric \((x=1/2)\) mean.

2.23. a-Mean

2.23.1. Definition

For any real vector \(a = (a_1,\dots, a_n)\), the \(\mathbf{a}\)-mean \([a]\) of positive real numbers are:
- \[ [a] := \frac{1}{n!}\sum_{\sigma\in S_n}x_{\sigma(1)}^{a_1}\cdots x_{\sigma(n)}^{a_n} \]

2.23.2. Special Cases

\(a = (1,0,\dots, 0)\) is the arithmetic mean
\(a = (1/n,\dots,1/n)\) is the geometric mean
\(a = (x,1-x)\) is the Heinz mean
\(a = (-1, 0, \dots, 0)\) is the harmonic mean

2.23.3. Muirhead's Inequality

\[ [a] \le [b] \iff a=Pb \] where \(P\) is a doubly stochastic matrix. The equality holds if and only if \(a=b\) or all \(x_i\) are equal.

2.24. Logarithmic Mean

2.24.1. Definition

\[ \lim_{(\xi,\eta)\to (x,y)} \frac{\eta - \xi}{\ln\eta - \ln\xi} \]

2.24.2. Properties

\(\mathrm{GM} \le \mathrm{LM} \le \mathrm{AM}\)

2.25. Identric Mean

For two positive real numbers \(x,y\):
\[ I(x,y) := \frac{1}{e}\cdot \lim_{(\xi,\eta)\to (x,y)}\left(\frac{\xi^\xi}{\eta^\eta}\right)^{\frac{1}{\xi-\eta}} = \lim_{(\xi,\eta)\to (x,y)}\exp\left(\frac{\xi\ln\xi - \eta\ln\eta}{\xi-\eta}-1\right) \]
Consider the function \(x\mapsto x\ln x\), is is finding the slope of the secant and applying the inverse of the derivative.

2.26. Stolarsky Mean

2.26.1. Definition

For \(0

2.26.2. Properties

\[ S_p(a,b)=f'^{-1}\left(\frac{f(b)-f(a)}{b-a}\right) \]
- where \(f(x)=x^p\). Here, \(S_p(a,b)\) is guaranteed to be in \((a,b)\) by Mean value theorem.
- Calculate the average rate of change of \(x^p\) on the interval \((a,b)\) and invert it into the value that has the same instantaneous rate of change.
\[ S_2(a,b)=\frac{a+b}{2}. \]
- The property of the standard parabola
\[ S_{-1}(a,b) = \sqrt{ab}\, . \]

2.26.3. Special Cases

\(S_{-\infty}\) and \(S_\infty\) are the minimum and maximum
\(S_{-1}\) is the geometric mean
\(S_0\) is the logarithmic mean
\(S_{1/2}\) is the power mean with exponent \(1/2\)
\(S_1\) is the identric mean
\(S_2\) is the arithmetic mean
\(S_3 = \mathrm{QM}(x,y,\mathrm{GM}(x,y))\)

2.27. Circular Mean

2.27.1. Definition

\[ \bar{\alpha} = \operatorname{atan2}\left(\frac{1}{n}\sum_{j=1}^n\sin\alpha_j, \frac{1}{n}\sum_{j=1}^n\cos\alpha_j\right) \]

2.28. Cesáro Summation

Cesáro Mean, Cesáro Limit

2.28.1. Definition

For a sequence \((a_n)_{n=1}^\infty\), and the partial sum \(s_k\) of the first \(k\) terms, The sequence is called Cesáro summable, if the arithmetic mean of its first \(n\) partial sums tends to a finite number: \[ \lim_{n\to \infty} \frac{1}{n}\sum_{k=1}^ns_k = A < \infty \]

2.29. Hölder Summation

For a sequence \((a_n)_{n=1}^\infty\), \(H_n^0 := s_n\), and \(H_n^{k+1}\) is defined to be the partial arithmetic mean of the first \(n\) terms of the \(H_n^k\). If the limit \[ \lim_{n\to \infty}H_n^k \] exists for some \(k\), it is called the Hölder sum, or the \((H,k)\) sum of the series.

The series is called Hölder summable if the following limit exists: \[ \lim_{n\to \infty, k\to \infty} H_n^k \]

3. Expectation

Expected Value, Expectancy, Expectation Operator, Mathematical Expectation, Mean, Expectation Value, First Moment
Generalization of weighted average

3.1. Definition

For a random variable \(X\) defined on a probability space \((\Omega, \Sigma, \mathrm{P})\): \[ \operatorname{E}[X] := \int_\Omega X\,\mathrm{dP}. \]

3.2. Properties

\(\mathrm{E}[X+Y] = \mathrm{E}[X]+\mathrm{E}[Y]\)
\(\mathrm{E}[aX] = a\mathrm{E}[X]\)
\(\mathrm{E}[XY] = \mathrm{E}[X]\mathrm{E}[Y] + \mathrm{Cov}[X,Y]\)
\[ \mathrm{E}[XY] = \mathrm{E}[\mathrm{E}[XY\mid Y]] = \mathrm{E}[Y\cdot\mathrm{E}[X\mid Y]] \]

3.3. Law of Total Expectation

Law of Iterated Expectations (LIE), Adam's Law, Tower Rule, Smoothing Property of Conditional Expectation

For two random variables \( X, Y \) on the same probability space, given that \( \mathrm{E}[X] \) exists, \[ \mathrm{E}[X] = \mathrm{E}[\mathrm{E}[X|Y]] \] where, in the right hand side, the first expectation is taken over \( X \) and the second expectation is taken over \( Y \)

3.4. Jensen's Inequality

For a convex function \(f:\mathbb{R} \to \mathbb{R}\), and a random variable \(X\) with finite expectation, \[ f(\mathrm{E}[X]) \le \mathrm{E}[f(X)]. \]

4. Variance

4.1. Law of Total Variance

Variance Decomposition Formula, Comditional Variance Formula, Law of Iterated Variances

For two random variables \( X, Y \) on the same probability space, with \( Y \) having a finite variance, \[ \mathrm{Var}[Y] = \mathrm{E}[\mathrm{Var}[Y | X]] + \mathrm{Var}[\mathrm{E}[Y|X]]. \]

Intuitively, one can think of the observable \( Y \) being chunked up around each discrete \( X \), so that the total variance of \( Y \) is the sum of

the average variance within the chunks, and
the variance in the position of chunks.

5. Covariance

5.1. Definition

The covariance of two random variables \(X, Y\) is: \[ \operatorname{Cov}[X, Y] := \operatorname{Cov}[X, Y] = \operatorname{E}[(X - \operatorname{E}[X])(Y - \operatorname{E}[Y])]. \]

5.2. Properties

\[ \operatorname{Var}[X+Y] = \operatorname{Var}[X]+\operatorname{Var}[Y] + 2\operatorname{Cov}[X, Y]. \]
Covariance is zero if the random variables are independent.

5.3. Law of Total Covariance

Covariance Decomposition Formula, Conditional Covariance Formula

For random variables \( X, Y, Z \) defined on the same probability space, with the \( \mathrm{Cov}[X,Y] \) being finite, \[ \mathrm{Cov}[X,Y] = \mathrm{E}[\mathrm{Cov}[X,Y|Z]] + \mathrm{Cov}[\mathrm{E}[X|Z], \mathrm{E}[Y|Z]]. \]

5.4. Covariance Matrix

Auto-Covariance Matrix, Dispersion Matrix, Variance Matrix, Variance-Covariance Matrix
\(\mathbf{K}_{\mathbf{XX}}\), \(\Sigma\), \(S\)

For a vector of random variables \(\mathbf{X} = (X_1, X_2, \dots, X_n)\) called random vector, the auto-covariance matrix is given by \[ \Sigma_{ij} = (\mathbf{K}_{\mathbf{X}\mathbf{X}})_{ij} := \operatorname{Cov}[X_i, X_j]. \]

5.5. Cross-Covariance Matrix

Given two random vectors \( \mathbf{X}, \mathbf{Y} \), the cross-covariance matrix of \( \mathbf{X} \) and \( \mathbf{Y} \) is given by: \( (\mathbf{K}_{\mathbf{XY}})_{ij} ;= \operatorname{Cov}[X_i, Y_j] = \operatorname{E}[(X_i - \operatorname{E}[X_i])(Y_j - \operatorname{E}[Y_j])]\).

5.6. Principal Component Analysis

The eigenvalues of the covariance matrix are called the principal components. They can be used to analyze the shape of the joint propability distribution \( f_{X,Y}(x,y) \).

Generally, It does not work for nonlinear data.

6. Principal Component Analysis

The eigenvector of covariance matrix with the largest eigenvalue is the direction in which the data is most spread out, that is, maximum variance upon projection onto the line along the vector.

Other eigenvectors are all orthogonal to each other, with the eigenvalue getting smaller as the importance in the direction decreases.

7. Linear Discriminant Analysis

The variance between the projected means from each group, and the variance within each projected groups are maximized.

8. Kelly Criterion

How to maximize the long-term expected value in a sequence of bets.

8.1. Heuristic Proof

Start with the wealth of \( W_0 \), and the fraction \( f \) of the wealth is at stake at each bet, with the hope of gaining \( bf \) of wealth at the risk of losing \( af \) at each bet. The final wealth after \( n \) bets with \( k \) wins would be \[ W_n = W_0(1+fb)^k(1-af)^{n-k}. \]

Now comes the heuristic part, where we maximize the expected value of the logatirhm of geometric growth rate \( r \):

\begin{align*} \mathrm{E}[\log(r)] &= \mathrm{E} \left[ \frac{1}{n} \log \frac{W_n}{W_0} \right] \\ &= \mathrm{E} \left[ \frac{k}{n} \log (1 + bf) + \left( 1 - \frac{k}{n} \right) \log (1- af) \right] \\ &= p\log (1+ bf) + q \log (1 - af). \end{align*}

where \( q = 1 - p\) is the probability of losing. Taking the derivative with respect to \( f \) and setting it equal to zero, \[ \frac{pb}{1+bf^{*}} - \frac{qa}{1-af^{ *}} = 0 \] for the optimal value \( f^{*} \). After rearranging, we obtain \[ f^{*} = \frac{p}{a} - \frac{q}{b}. \]