Statistics

Table of Contents

1. Average

  • Average is a umbrella term for a single number that best represents a set of data, usually one of the measure of central tendency.

1.1. Central Tendency

1.1.1. Via Solution to Variational Problems

Dispersion precedes location

Several measures of central tendency can be characterized as solving a variational problem, in the sense of the calculus of variations. However, this center may not be unique.

\(L^p\) dispersion central tendency
\(L^0\) variation ratio mode
\(L^1\) average absolute deviation median
\(L^2\) standard deviation mean
\(L^\infty\) maximum deviation midrange

The dispersion of a vector \(\mathbf{x} = (x_1, \dots, x_n)\) abount a point \(\mathbf{c} = (c,c,\dots, c)\) is the distance in the sense of the (normalized) p-norm between them: \[ \|\mathbf{x} - \mathbf{c}\|_p := \left(\frac{1}{n}\sum_{i=1}^n |x_i - c |^p\right)^{1/p}. \]

1.2. Mean

See mean.

1.3. Mode

Most frequenct element

1.4. Median

1.4.1. Definition

  • A value that is in the middle.

1.4.2. Calculation

1.4.2.1. Set
  • For a ordered set \(\{a_1, a_2, \dots, a_n\}\) in which \(a_1\le a_2\le \dots\le a_n\), the median is
    • \(a_{n/2}\) when \(n\) is even
    • \((a_{(n-1)/2}+a_{(n+1)/2})/2\) when \(n\) is odd.
1.4.2.2. Function

The median of a function \(f: [a,b]\to \mathbb{R}\) is \(t_0\) which minimizes \[ g(t)=\int_a^b|f(x)-t|\,dx. \]

1.5. Geometric Median

  • Spatial Median, Euclidean Minisum Point, Torricelli Point, 1-Median

1.5.1. Definition

\[ \operatorname*{arg\,min}_{y\in \mathbb{R}^n}\sum_{i=1}^m\| x_i - y\|_2 \]

1.6. Midrange

  • The arithmetic mean of the maximum and minimum

1.7. Midhinge

  • The arithmetic mean of the first and third quartiles

2. Mean

2.1. Intuition

  • The mean is defined relative to the notion of the total of a specific problem.
  • A mean is a single value such that, when replaced all the data with the mean, it would calculate to the same total.
    • This is the idea of the
  • The arithmetic mean and its variations are the solution to the optimization problem of the (statistical) variation.

2.2. Minimum and Maximum

2.3. Arithmetic Mean

  • Often simply, Mean
  • \[ m_1 = \frac{x_1 + x_2 + \cdots + x_n}{n} \]

2.4. Geometric Mean

2.4.1. Properties

  • It is equal to the arithmetic-harmonic mean defined by the limit of the sequnces \(a_i\) and \(g_i\):
    • \[ a_0 = x, h_0 = y\\ a_{n+1} = \mathrm{AM}(a_n, h_n), h_{n+1} = \mathrm{HM}(a_n,h_n) \]

2.5. Spherical Mean

2.5.1. Definition

  • For a continuous function \(u\colon U \to \mathbb{F}\), with \(U\) being the open subset of the Euclidean space \(\mathbb{R}^n\) and \(\mathbb{F}\) being either the real or complex number,
  • The spherical mean over the sphere of radius \(r\) centered at \(x\) is defined by:
    • \[ \frac{1}{\omega_{n-1}(r)}\int_{\partial B(x,r)} u(y)\mathrm{d}^{\wedge n-1}y \]
      • where \(B(x,r)\subset U\), \(\mathrm{d}^{n-1}y\) is the spherical measure, and \(\omega_{n-1}(r)\) is the size of the hypersurface of \((n-1)\)-sphere.
  • The spherical mean is often denoted as:
    • \[ \raisebox{.4em}{\underline{\smash{\raisebox{-.4em}{\displaystyle\int}}}}\raisebox{-1em}{\scriptstyle\partial B(x,r)} u(y)\,\mathrm{d}S(y) \]
    • This notation is also used for the Cauchy principal value sometimes.

2.6. Arithmetic-Geometric Mean

  • AGM

2.6.1. Definition

  • It is the limit of the sequnces \(a_i\) and \(g_i\):
    • \[ a_0 = x, g_0 = y\\ a_{n+1} = \mathrm{AM}(a_n, g_n), g_{n+1} = \mathrm{GM}(a_n,g_n) \]

2.7. Quadratic Mean

  • Root Mean Square(RMS)

2.8. Relations

  • \[ \rm min(\mathbf{x}) \le HM(\mathbf{x}) \le GM(\mathbf{x}) \le LM(\mathbf{x}) \le AM(\mathbf{x}) \le QM(\mathbf{x}) \le CM(\mathbf{x}) \le max(\mathbf{x}), \] where equality holds if and only if all the variables are equal.

QM_AM_GM_HM_inequality_visual_proof.svg

  • \[ \mathrm{AM}(a,b)\cdot \mathrm{HM}(a,b) = \mathrm{GM}(a,b) \]
  • \[ \mathrm{GM}(\mathrm{AM}(a,b), \mathrm{HM}(a,b)) = \mathrm{GM}(a,b) \]
  • \[ \mathrm{AM}(\mathrm{HM}(a,b), \mathrm{CM}(a,b)) = \mathrm{AM}(a,b) \]

2.9. Elementary Symmetric Mean

For a sequence of nonnegative real numbers \((a_i)_{i=1}^n\), the elementary symmetric means \(S_k\) are given by: \[ S_k = \frac{e_k}{\binom{n}{k}}. \]

The numerator is the ((6659b4b7-3e70-4daf-869a-ca654cccfb1b)), and the denominator is the number of such polynomials

2.10. Newton's Inequalities

2.10.1. Statement

  • \[ S_{k-1}S_{k+1} \le S_k^2 \]
  • with equality if and only if all the numbers \(a_i\) are equal.

2.11. Maclaurin's Inequality

2.11.1. Statement

  • \[ S_1 \ge \sqrt{S_2} \ge \sqrt[3]{S_3} \ge \cdots \ge \sqrt[n]{S_n} \]
  • with equality if and only if all the \(a_i\) are equal.
  • The case \(n=2\) is already known as the inequality of arithmetic and geometric mean.

2.12. Bernoulli's Inequality

2.12.1. Statement

  • \((1+x)^r \ge 1+rx\)
    • for every integer \(r\ge 1\) and real number \(x\ge -1\), with strict inequality if \(x\neq 0\land r\ge 2\). logseq.order-list-type:: number
    • for every integer \(r\ge 0\) and real number \(x\ge -2\). logseq.order-list-type:: number
    • for every even integer \(r\ge 0\) and real number \(x\). logseq.order-list-type:: number
    • for every real number \(r\ge 1\) and \(x\ge -1\), with strict inequality if \(x\neq 0\land r\neq 1\). logseq.order-list-type:: number
    • for every real number \(0\le r\le 1\) and \(x\ge -1\). logseq.order-list-type:: number

2.13. Harmonic Mean

2.13.1. Definition

\[ m_{-1} = \frac{n}{\dfrac{1}{x_1}+\dfrac{1}{x_2}+\cdots +\dfrac{1}{x_n}} \]

2.13.2. Interpretataion

2.13.2.1. Using Graph

2.13.2.2. Using Average

Average of averages with the same numerator. For example, the average speed over the distance \(2d\), within which \(d\) is traveled with the speed of \(v_1\), and the remaining \(d\) is traveled with the speed of \(v_2\), then the total average speed would be: \[ \bar{v} = \frac{2}{\dfrac{1}{v_1}+\dfrac{1}{v_2}}. \]

On the other hand, the average of averages with the same denominator would just be the arithmetic mean.

2.14. Pythagorean Mean

2.14.1. Para-Axioms

  • First-Order Homogeneity: \(\mathrm{M}(bx_1, \dots, bx_n) = b\mathrm{M}(x_1,\dots, x_n)\)
  • Total Symmetry: \(\mathrm{M}(\dots, x_i,\dots,x_j,\dots) = \mathrm{M}(\dots, x_j,\dots,x_i,\dots)\)
  • Monotonicity (in all variables): \(a\le b \implies \mathrm{M}(a,x_2,\dots,x_n) \le \mathrm{M}(b,x_2,\dots, x_n)\)
  • Idempotence: \(\forall x, \mathrm{M}(x,x,\dots,x) = x\)

2.15. Contraharmonic Mean

2.15.1. Definition

  • For positive real numbers \(x_1, \dots, x_n\),
  • \[ \mathrm{C}(x_1, \dots,x_n) = \frac{\frac1n(x_1^2 + \cdots + x_n^2)}{\frac1n(x_1+\cdots + x_n)} \]

2.15.2. Properties

  • \[ \mathrm{AM}(\mathrm{HM}(a,b), \mathrm{CM}(a,b)) = \mathrm{AM}(a,b) \]

2.16. Fréchet Mean

2.16.1. Fréchet Variance

  • For a complete \((M, d)\), the Fréchet variance is:
    • \[ \Psi(p) := \sum_{i=1}^N d(p,x_i)^2 \]

2.16.2. Definition

  • Karcher Means
    • \[ m = \operatorname*{arg\,min}_{p\in M}\sum_{i=1}^Nd(p,x_i)^2 \]
  • If there is a unique \(m\) that strictly minimizes \(\Psi\), then it is Fréchet mean.

2.17. Generalized Mean

  • Power Mean

2.17.1. Definition

  • For a nonzero real number \(p\), and positive real numbers \(x_1,\dots, x_n\), the generalized mean with exponent \(p\) is:
    • \[ m_p(x_{i\in I}) = \left(\frac{\sum_{i\in I}x_i^p}{|I|}\right)^{\frac{1}{p}} \]

2.17.2. Special Cases

2.17.3. General Mean Inequality

  • If \(p

2.18. Quasi-Arithmetic Mean

  • Generalized \(f\)-Mean, Kolmogorov-Najumo-de Finetti Mean, Kolmogorov Mean

2.18.1. Definition

For a injective continuous function \(f\colon I\to \mathbb{R}\) with an interval \(I\), \[ M_f(\mathbf{x}) = f^{-1}\left(\frac{1}{n}\sum_{i=1}^nf(x_i)\right). \]

2.18.2. LogSumExp

  • RealSoftMax(LSE), Multivariable Softplus

Quasi-arithmetic mean with \( f = \exp \) that smoothly approximate the maximum function: \[ \mathrm{LSE}(x_1,\dots,x_n) := \log(\exp(x_1),\dots,\exp(x_n)) \]

2.18.3. Special Cases

2.18.4. Properties

  • If \( f \) is a convex function, then quasi-arithmetic mean satisfies the Jensen's inequality.

2.19. Heronian Mean

2.19.1. Definition

  • \[ H = \frac{1}{3}(a + \sqrt{ab} + b) \]

2.19.2. Properties

  • The volume of a frustum is the product of the height and the Heronian mean of areas of the opposing parallel faces.

2.20. Chisini Mean

  • Substitution Mean

2.20.1. Definition

  • A function of \(n\) variables give rises to a Chisini mean \(M\), if for every vector \((x_1,\dots, x_n)\) there exists a unique \(M\) such that:
    • \[ f(M, M,\cdots, M) = f(x_1, x_2, \dots, x_n). \]

2.20.2. Special Cases

2.21. Lehmer Mean

2.21.1. Definition

  • \[ L_p(\mathbf{x}) = \frac{\sum_{k=1}^nx_k^p}{\sum_{k=1}^nx_k^{p-1}} \]

2.21.2. Weighted Lehmer Mean

  • \[ L_{p,w}(\mathbf{x}) = \frac{\sum_{k=1}^nw_kx_k^p}{\sum_{k=1}^nw_kx_k^{p-1}} \]

2.21.3. Special Cases

2.22. Heinz Mean

2.22.1. Definition

  • For two non-negative number \(a, b\),
  • \[ \mathrm{H}_x(a,b) = \frac{a^xb^{1-x} + a^{1-x}b^x}{2} \]
    • with \(0\le x\le 1/2\).

2.22.2. Properties

  • It interpolates between the arithmetic (\(x=0\)) and geometric \((x=1/2)\) mean.

2.23. a-Mean

2.23.1. Definition

  • For any real vector \(a = (a_1,\dots, a_n)\), the \(\mathbf{a}\)-mean \([a]\) of positive real numbers are:
    • \[ [a] := \frac{1}{n!}\sum_{\sigma\in S_n}x_{\sigma(1)}^{a_1}\cdots x_{\sigma(n)}^{a_n} \]

2.23.2. Special Cases

2.23.3. Muirhead's Inequality

\[ [a] \le [b] \iff a=Pb \] where \(P\) is a doubly stochastic matrix. The equality holds if and only if \(a=b\) or all \(x_i\) are equal.

2.24. Logarithmic Mean

2.24.1. Definition

  • \[ \lim_{(\xi,\eta)\to (x,y)} \frac{\eta - \xi}{\ln\eta - \ln\xi} \]

2.24.2. Properties

  • \(\mathrm{GM} \le \mathrm{LM} \le \mathrm{AM}\)

2.25. Identric Mean

  • For two positive real numbers \(x,y\):
  • \[ I(x,y) := \frac{1}{e}\cdot \lim_{(\xi,\eta)\to (x,y)}\left(\frac{\xi^\xi}{\eta^\eta}\right)^{\frac{1}{\xi-\eta}} = \lim_{(\xi,\eta)\to (x,y)}\exp\left(\frac{\xi\ln\xi - \eta\ln\eta}{\xi-\eta}-1\right) \]
  • Consider the function \(x\mapsto x\ln x\), is is finding the slope of the secant and applying the inverse of the derivative.

2.26. Stolarsky Mean

2.26.1. Definition

  • For \(0

2.26.2. Properties

  • \[ S_p(a,b)=f'^{-1}\left(\frac{f(b)-f(a)}{b-a}\right) \]
    • where \(f(x)=x^p\). Here, \(S_p(a,b)\) is guaranteed to be in \((a,b)\) by Mean value theorem.
    • Calculate the average rate of change of \(x^p\) on the interval \((a,b)\) and invert it into the value that has the same instantaneous rate of change.
  • \[ S_2(a,b)=\frac{a+b}{2}. \]
  • \[ S_{-1}(a,b) = \sqrt{ab}\, . \]

2.26.3. Special Cases

2.27. Circular Mean

2.27.1. Definition

\[ \bar{\alpha} = \operatorname{atan2}\left(\frac{1}{n}\sum_{j=1}^n\sin\alpha_j, \frac{1}{n}\sum_{j=1}^n\cos\alpha_j\right) \]

2.28. Cesáro Summation

  • Cesáro Mean, Cesáro Limit

2.28.1. Definition

For a sequence \((a_n)_{n=1}^\infty\), and the partial sum \(s_k\) of the first \(k\) terms, The sequence is called Cesáro summable, if the arithmetic mean of its first \(n\) partial sums tends to a finite number: \[ \lim_{n\to \infty} \frac{1}{n}\sum_{k=1}^ns_k = A < \infty \]

2.29. Hölder Summation

For a sequence \((a_n)_{n=1}^\infty\), \(H_n^0 := s_n\), and \(H_n^{k+1}\) is defined to be the partial arithmetic mean of the first \(n\) terms of the \(H_n^k\). If the limit \[ \lim_{n\to \infty}H_n^k \] exists for some \(k\), it is called the Hölder sum, or the \((H,k)\) sum of the series.

The series is called Hölder summable if the following limit exists: \[ \lim_{n\to \infty, k\to \infty} H_n^k \]

3. Expectation

  • Expected Value, Expectancy, Expectation Operator, Mathematical Expectation, Mean, Expectation Value, First Moment
  • Generalization of weighted average

3.1. Definition

For a random variable \(X\) defined on a probability space \((\Omega, \Sigma, \mathrm{P})\): \[ \operatorname{E}[X] := \int_\Omega X\,\mathrm{dP}. \]

3.2. Properties

  • \(\mathrm{E}[X+Y] = \mathrm{E}[X]+\mathrm{E}[Y]\)
  • \(\mathrm{E}[aX] = a\mathrm{E}[X]\)
  • \(\mathrm{E}[XY] = \mathrm{E}[X]\mathrm{E}[Y] + \mathrm{Cov}[X,Y]\)
  • \[ \mathrm{E}[XY] = \mathrm{E}[\mathrm{E}[XY\mid Y]] = \mathrm{E}[Y\cdot\mathrm{E}[X\mid Y]] \]

3.3. Law of Total Expectation

  • Law of Iterated Expectations (LIE), Adam's Law, Tower Rule, Smoothing Property of Conditional Expectation

For two random variables \( X, Y \) on the same probability space, given that \( \mathrm{E}[X] \) exists, \[ \mathrm{E}[X] = \mathrm{E}[\mathrm{E}[X|Y]] \] where, in the right hand side, the first expectation is taken over \( X \) and the second expectation is taken over \( Y \)

3.4. Jensen's Inequality

For a convex function \(f:\mathbb{R} \to \mathbb{R}\), and a random variable \(X\) with finite expectation, \[ f(\mathrm{E}[X]) \le \mathrm{E}[f(X)]. \]

4. Variance

4.1. Law of Total Variance

  • Variance Decomposition Formula, Comditional Variance Formula, Law of Iterated Variances

For two random variables \( X, Y \) on the same probability space, with \( Y \) having a finite variance, \[ \mathrm{Var}[Y] = \mathrm{E}[\mathrm{Var}[Y | X]] + \mathrm{Var}[\mathrm{E}[Y|X]]. \]

Intuitively, one can think of the observable \( Y \) being chunked up around each discrete \( X \), so that the total variance of \( Y \) is the sum of

  • the average variance within the chunks, and
  • the variance in the position of chunks.

5. Covariance

5.1. Definition

The covariance of two random variables \(X, Y\) is: \[ \operatorname{Cov}[X, Y] := \operatorname{Cov}[X, Y] = \operatorname{E}[(X - \operatorname{E}[X])(Y - \operatorname{E}[Y])]. \]

5.2. Properties

  • \[ \operatorname{Var}[X+Y] = \operatorname{Var}[X]+\operatorname{Var}[Y] + 2\operatorname{Cov}[X, Y]. \]
  • Covariance is zero if the random variables are independent.

5.3. Law of Total Covariance

  • Covariance Decomposition Formula, Conditional Covariance Formula

For random variables \( X, Y, Z \) defined on the same probability space, with the \( \mathrm{Cov}[X,Y] \) being finite, \[ \mathrm{Cov}[X,Y] = \mathrm{E}[\mathrm{Cov}[X,Y|Z]] + \mathrm{Cov}[\mathrm{E}[X|Z], \mathrm{E}[Y|Z]]. \]

5.4. Covariance Matrix

  • Auto-Covariance Matrix, Dispersion Matrix, Variance Matrix, Variance-Covariance Matrix
  • \(\mathbf{K}_{\mathbf{XX}}\), \(\Sigma\), \(S\)

For a vector of random variables \(\mathbf{X} = (X_1, X_2, \dots, X_n)\) called random vector, the auto-covariance matrix is given by \[ \Sigma_{ij} = (\mathbf{K}_{\mathbf{X}\mathbf{X}})_{ij} := \operatorname{Cov}[X_i, X_j]. \]

5.5. Cross-Covariance Matrix

Given two random vectors \( \mathbf{X}, \mathbf{Y} \), the cross-covariance matrix of \( \mathbf{X} \) and \( \mathbf{Y} \) is given by: \( (\mathbf{K}_{\mathbf{XY}})_{ij} ;= \operatorname{Cov}[X_i, Y_j] = \operatorname{E}[(X_i - \operatorname{E}[X_i])(Y_j - \operatorname{E}[Y_j])]\).

5.6. Principal Component Analysis

  • PCA

The eigenvalues of the covariance matrix are called the principal components. They can be used to analyze the shape of the joint propability distribution \( f_{X,Y}(x,y) \).

Generally, It does not work for nonlinear data.

6. Principal Component Analysis

7. Penalized Least Squares Criterion

Given datapoints \( \{(x_i, y_i)\}_i \), peanlized least squares (PLS) looks for the the function \( \hat{f} \) within a Hilbert space that fits the data the most: \[ \hat{f} := \min_{f\in \mathcal{H}} \left[ \frac{1}{n}\sum_i(y_i - f(x_i))^{2} + P(\| f\|^2) \right]. \]

7.1. Kimeldorf-Wahba Representer Theorem

The solution to the PLS can be given in terms of \( K \)-function \[ \hat{f}(x) = \sum_{i=1}^n \beta_i K(x,x_i) \] where \( K \)-function is the reproducing kernels in the reproducing kernel Hilbert space (RKHS), that satisfies \[ \langle f, K(\cdot, x_i)\rangle = f(x_i). \]

8. Kelly Criterion

How to maximize the long-term expected value in a sequence of bets.

8.1. Heuristic Proof

Start with the wealth of \( W_0 \), and the fraction \( f \) of the wealth is at stake at each bet, with the hope of gaining \( bf \) of wealth at the risk of losing \( af \) at each bet. The final wealth after \( n \) bets with \( k \) wins would be \[ W_n = W_0(1+fb)^k(1-af)^{n-k}. \]

Now comes the heuristic part, where we maximize the expected value of the logatirhm of geometric growth rate \( r \):

\begin{align*} \mathrm{E}[\log(r)] &= \mathrm{E} \left[ \frac{1}{n} \log \frac{W_n}{W_0} \right] \\ &= \mathrm{E} \left[ \frac{k}{n} \log (1 + bf) + \left( 1 - \frac{k}{n} \right) \log (1- af) \right] \\ &= p\log (1+ bf) + q \log (1 - af). \end{align*}

where \( q = 1 - p\) is the probability of losing. Taking the derivative with respect to \( f \) and setting it equal to zero, \[ \frac{pb}{1+bf^{*}} - \frac{qa}{1-af^{ *}} = 0 \] for the optimal value \( f^{*} \). After rearranging, we obtain \[ f^{*} = \frac{p}{a} - \frac{q}{b}. \]

9. Reference

Created: 2025-06-18 Wed 02:21