\\section{Fisher Linear Discriminant} A classic problem in statistics involves finding a \\it{linear dsicriminant}, that is, a linear combination of the features of a datapoint whose \\it{sign} classifies it into one of two (or more) categories. For example, a datapoint $\\vec{X}$ might represent a particular tissue sample, the features $(x_1,...,x_n)$ the expression levels of various genes, and the classes a clinical outcome. We seek a $\\vec{w}$ such that $$\\vec{w}\' \\CircleTimes \\vec{X} \\text{is} \\{ \\begin{array} > 0 \\forall \\vec{X} \\in \\text{Class A}\\\\ < 0 \\forall \\vec{X} \\in \\text{Class B} \\end{array}$$ or as close as can be found. Fisher\'s measure of the discriminating ability of such a $\\vec{w}$ is given by $$J(\\vec{w}) = \\frac{(\\mu_\\text{Class A} - \\mu_\\text{Class B})^2}{\\sigma^2_A+ \\sigma^2_B }$$ that is, the ratio of the (squared) distance between class means to the projected variance for the classes determined by choice of $\\vec{w}$. If $\\vec{X_A}$ is the \\it{centered} datapoints in class A, define the \\it{Scatter Matrix} of A as $$S_A := \\vec{X_A}\' \\CircleTimes \\vec{X_A} = \\under{\\sum}{\\vec{x} \\in A} (\\vec{x_i} - \\mu_A)\' \\CircleTimes (\\vec{x_i} - \\mu_A) = n_A \\Sigma_A$$ IE, simple the covariance matrix for the class weighted by the number of elements in the class $n_A$. If we then define the \\it{Scatter-Within} matrix as $S_W = S_A + S+A$ to represent the scatter within the classes, and the \\it{Scatter Between} matrix relating the two classes as $S_B = (\\vec{\\mu_A}-\\vec{\\mu_B})\' \\CircleTimes (\\vec{\\mu_A}-\\vec{\\mu_B})$ then we can wewrite the discriminant function as $$J(\\vec{w}) = \\vec{w}\' ( S_B\\CircleTimes{S_W}^-) \\vec{w}$$ but this is just a quadratic form. So we know the best linear discriminant $\\vec{w}$ is given by the first eigenvector of $S_B \\CircleTimes S^{-}_W$. This is given by $$\\vec{w} = {S_W}^{-} (\\mu_A - \\mu_B)$$ which can be seen as intuitive.