<?xml version="1.0" encoding="UTF-8" standalone="no"?>
	<?xml-stylesheet type="text/xsl" href="mathml.xsl"?>
	<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN" "http://www.w3.org/Math/DTD/mathml2/xhtml-math11-f.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta http-equiv="pragma" content="no-cache" />
<meta http-equiv="Expires" content="-1" />
<title>svdfinal</title>
<style type="text/css">body {
	font-size : 12;
	font-family : serif;
	color : #000000
}
math {
color : #000000;
font-family : Mathematica1, Mathematica2, Mathematica3, Mathematica4, Mathematica5, serif, CMSY10, Symbol, Times, Lucida Sans Unicode, MT Extra
}
</style>
</head>
<body>
<h2>Matrices as Arrays</h2><br />
The simplest way we think of matrices is as arrays of numbers, with an associated way to 'multiply' them, yeilding new such arrays:<br />
<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mo>[</mo><mtable><mtr columnalign="right"><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd></mtr>
<mtr columnalign="right"><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd></mtr>
<mtr columnalign="right"><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd></mtr>
</mtable><mo>]</mo><mo>&CircleTimes;</mo><mo>[</mo><mtable><mtr columnalign="right"><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd></mtr>
<mtr columnalign="right"><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd></mtr>
<mtr columnalign="right"><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd></mtr>
</mtable><mo>]</mo><mo>=</mo><mo>[</mo><mtable><mtr columnalign="right"><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd></mtr>
<mtr columnalign="right"><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd></mtr>
<mtr columnalign="right"><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd></mtr>
</mtable><mo>]</mo></mstyle></math></div>This 'multiplication' can also 'apply' a matrix <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math> to a vector <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><mi>V</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></math>:<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mi>M</mi><mo>&CircleTimes;</mo><mover accent="true"><mrow><mi>V</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>=</mo>
<mo>[</mo><mtable><mtr columnalign="right"><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd></mtr>
<mtr columnalign="right"><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd></mtr>
<mtr columnalign="right"><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd></mtr>
</mtable><mo>]</mo><mo>&CircleTimes;</mo><mrow><mo>(</mo><mtable><mtr columnalign="right"><mtd><mo>&Square;</mo></mtd></mtr>
<mtr columnalign="right"><mtd><mo>&Square;</mo></mtd></mtr>
<mtr columnalign="right"><mtd><mo>&Square;</mo></mtd></mtr>
</mtable><mo>)</mo></mrow><mo>=</mo><mrow><mo>(</mo><mtable><mtr columnalign="right"><mtd><mo>&Square;</mo></mtd></mtr>
<mtr columnalign="right"><mtd><mo>&Square;</mo></mtd></mtr>
<mtr columnalign="right"><mtd><mo>&Square;</mo></mtd></mtr>
</mtable><mo>)</mo></mrow></mstyle></math></div>Provide an 'inner product' mapping two vectors to a scalar:<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mrow><mo>(</mo><mtable><mtr columnalign="right"><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd></mtr>
</mtable><mo>)</mo></mrow><mo>&CircleTimes;</mo><mo>[</mo><mtable><mtr columnalign="right"><mtd><mo>&Square;</mo></mtd></mtr>
<mtr columnalign="right"><mtd><mo>&Square;</mo></mtd></mtr>
<mtr columnalign="right"><mtd><mo>&Square;</mo></mtd></mtr>
</mtable><mo>]</mo><mo>=</mo><mo>[</mo><mo>&Square;</mo><mo>]</mo></mstyle></math></div>Or even an <i>Outer Product</i>, seldom discussed but very useful and interesting, which takes two vectors and returns a matrix:<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true">
<mo>[</mo><mtable><mtr columnalign="right"><mtd><mo>&Square;</mo></mtd></mtr>
<mtr columnalign="right"><mtd><mo>&Square;</mo></mtd></mtr>
<mtr columnalign="right"><mtd><mo>&Square;</mo></mtd></mtr>
</mtable><mo>]</mo><mo>&CircleTimes;</mo><mrow><mo>(</mo><mtable><mtr columnalign="right"><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd></mtr>
</mtable><mo>)</mo></mrow><mo>=</mo><mo>[</mo><mtable><mtr columnalign="right"><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd></mtr>
<mtr columnalign="right"><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd></mtr>
<mtr columnalign="right"><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd><mtd><mo>&Square;</mo></mtd></mtr>
</mtable><mo>]</mo></mstyle></math></div><br />
<br />
<br />
<h2>Matrices as Functions</h2><br />
 Another way to think of a matrix is as a 'mapping' from vectors to vectors. This can be thought of an actual transform (generally, a scaling or rotation) or as a change in coordinate system.    <br />
   For example, the following matrix 'projects' a two-dimensional plane onto the line <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>y</mi><mo>=</mo><mo>-</mo><mi>x</mi></math>: <br />
<br />
<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mo>[</mo><mtable><mtr columnalign="right"><mtd><mn>1</mn></mtd><mtd><mo>-</mo><mn>1</mn></mtd></mtr>
<mtr columnalign="right"><mtd><mo>-</mo><mn>1</mn></mtd><mtd><mn>1</mn></mtd></mtr>
</mtable><mo>]</mo><mo>&CircleTimes;</mo><mo>[</mo><mtable><mtr columnalign="right"><mtd><mi>x</mi></mtd></mtr>
<mtr columnalign="right"><mtd><mi>y</mi></mtd></mtr>
</mtable><mo>]</mo><mo>=</mo>
<mo>[</mo><mtable><mtr columnalign="right"><mtd><mi>x</mi><mo>-</mo><mi>y</mi></mtd></mtr>
<mtr columnalign="right"><mtd><mi>y</mi><mo>-</mo><mi>x</mi></mtd></mtr>
</mtable><mo>]</mo></mstyle></math></div>  <br />
   A very important point is that information is being lost in this transform. For example, the point <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo>(</mo><mn>2,2</mn><mo>)</mo></mrow></math> is the 'image' of both <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo>(</mo><mn>5,3</mn><mo>)</mo></mrow></math> and <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo>(</mo><mn>12,10</mn><mo>)</mo></mrow></math> in the original space. The fact that this mapping 'loses' a dimension is reflected in the fact that the two rows of the matrix are (trivially) linearly dependant. This matrix has 'rank' ONE, since it can only transform ONE dimension without losing information.<br />
<br />
Note that a <i>rectangular</i> matrix of size <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi><mo>&times;</mo><mi>n</mi><mo>,</mo><mi>m</mi><mo>&gt;</mo><mi>n</mi></math> can have at most rank n.<h2>Matrices as Datasets</h2><br />
 Another useful way to think of matrices is as observations of Random Variables. For example, if we have two random variables <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math> and <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>y</mi></math>, with three observations each, we can tabulate our data as <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>=</mo>
<mrow><mo>(</mo><mtable><mtr columnalign="right"><mtd><msub><mi>x</mi><mn>1</mn></msub></mtd></mtr>
<mtr columnalign="right"><mtd><msub><mi>x</mi><mn>2</mn></msub></mtd></mtr>
<mtr columnalign="right"><mtd><msub><mi>x</mi><mn>3</mn></msub></mtd></mtr>
</mtable><mo>)</mo></mrow></math> or even put both variables together as <div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mi>D</mi><mo>=</mo>
<mo>[</mo><mtable><mtr columnalign="right"><mtd><msub><mi>x</mi><mn>1</mn></msub></mtd><mtd><msub><mi>y</mi><mn>1</mn></msub></mtd></mtr>
<mtr columnalign="right"><mtd><msub><mi>x</mi><mn>2</mn></msub></mtd><mtd><msub><mi>y</mi><mn>2</mn></msub></mtd></mtr>
<mtr columnalign="right"><mtd><msub><mi>x</mi><mn>3</mn></msub></mtd><mtd><msub><mi>y</mi><mn>3</mn></msub></mtd></mtr>
</mtable><mo>]</mo></mstyle></math></div>. It is common and convenient to 'center' these data matrices by substracting off the <b>mean</b> for each variable (the vector of means is easily added back in when appropriate. With data arranged this way, our familiar matrix operations become very useful; Our inner product yeilds the variance (a fact used to great effect in the geometry of statistics):<br />
<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mi>X</mi>'<mo>&CircleTimes;</mo><mi>X</mi><mo>=</mo><mrow><mo>(</mo><mtable><mtr columnalign="right"><mtd><mrow><mo>(</mo><msub><mi>x</mi><mn>1</mn></msub><mo>-</mo><msub><mo>&mu;</mo><mi>x</mi></msub><mo>)</mo></mrow></mtd><mtd><mrow><mo>(</mo><msub><mi>x</mi><mn>2</mn></msub><mo>-</mo><msub><mo>&mu;</mo><mi>x</mi></msub><mo>)</mo></mrow></mtd><mtd><mrow><mo>(</mo><msub><mi>x</mi><mn>3</mn></msub><mo>-</mo><msub><mo>&mu;</mo><mi>x</mi></msub><mo>)</mo></mrow>
</mtd></mtr>
</mtable><mo>)</mo></mrow><mo>&CircleTimes;</mo>
<mrow><mo>(</mo><mtable><mtr columnalign="right"><mtd><msub><mi>x</mi><mn>1</mn></msub></mtd></mtr>
<mtr columnalign="right"><mtd><msub><mi>x</mi><mn>2</mn></msub></mtd></mtr>
<mtr columnalign="right"><mtd><msub><mi>x</mi><mn>3</mn></msub></mtd></mtr>
</mtable><mo>)</mo></mrow><mo>=</mo><msub><mo>&sum;</mo><mi>i</mi></msub><msup><mrow><mo>(</mo><msub><mi>x</mi><mi>i</mi></msub><mo>-</mo><msub><mo>&mu;</mo><mi>i</mi></msub><mo>)</mo></mrow><mn>2</mn></msup><mo>=</mo><mtext>cov</mtext><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mstyle></math></div><br />
 Extending this to multiple variables brings us into the land of covariance matrices, the fundamental object in applied statistics:<br />
<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mi>D</mi>'<mo>&CircleTimes;</mo><mi>D</mi><mo>=</mo>
<mrow><mo>(</mo><mtable><mtr columnalign="right"><mtd><msub><mi>x</mi><mn>1</mn></msub></mtd><mtd><msub><mi>x</mi><mn>2</mn></msub></mtd><mtd><msub><mi>x</mi><mn>3</mn></msub></mtd></mtr>
<mtr columnalign="right"><mtd><msub><mi>y</mi><mn>1</mn></msub></mtd><mtd><msub><mi>y</mi><mn>2</mn></msub></mtd><mtd><msub><mi>y</mi><mn>3</mn></msub></mtd></mtr>
</mtable><mo>)</mo></mrow><mo>&CircleTimes;</mo>
<mo>[</mo><mtable><mtr columnalign="right"><mtd><msub><mi>x</mi><mn>1</mn></msub></mtd><mtd><msub><mi>y</mi><mn>1</mn></msub></mtd></mtr>
<mtr columnalign="right"><mtd><msub><mi>x</mi><mn>2</mn></msub></mtd><mtd><msub><mi>y</mi><mn>2</mn></msub></mtd></mtr>
<mtr columnalign="right"><mtd><msub><mi>x</mi><mn>3</mn></msub></mtd><mtd><msub><mi>y</mi><mn>3</mn></msub></mtd></mtr>
</mtable><mo>]</mo>
<mo>=</mo>
<mo>[</mo><mtable><mtr columnalign="right"><mtd><mtext>  var</mtext><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mtd><mtd><mtext>cov</mtext><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow></mtd></mtr>
<mtr columnalign="right"><mtd><mtext>cov</mtext><mrow><mo>(</mo><mi>y</mi><mo>,</mo><mi>x</mi><mo>)</mo></mrow></mtd><mtd><mtext>  var</mtext><mrow><mo>(</mo><mi>y</mi><mo>)</mo></mrow></mtd></mtr>
</mtable><mo>]</mo></mstyle></math></div><br />
Note that these covariance matrices are always real, symmetric, and <i>positive semi-definate</i>.<br />
<h2>Eigenvalues and Eigenvectors</h2><br />
An <i>eigenvector</i> <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math> associated with a matrix <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math> is a vector whose image under <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math> is a simple scaling, thus:<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mi>M</mi><mo>&CircleTimes;</mo><mi>V</mi><mo>=</mo><mo>&lambda;</mo><mi>V</mi></mstyle></math></div>Note that every eigenvalue is thus associated with a 1-dimensional linear subspace defined by it\'s eigen vector, since <math xmlns="http://www.w3.org/1998/Math/MathML"><mo>&forall;</mo><mi>k</mi><mo>&in;</mo><mo>&Ropf;</mo></math><div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mi>M</mi><mo>&CircleTimes;</mo><mover accent="true"><mrow><mi>V</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>=</mo><mo>&lambda;</mo><mover accent="true"><mrow><mi>V</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>&Rightarrow;</mo><mover accent="true"><mrow><mi>V</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>&CircleTimes;</mo><mrow><mo>(</mo><mi>k</mi><mi>V</mi><mo>)</mo></mrow><mo>=</mo><mi>k</mi><mo>&lambda;</mo><mover accent="true"><mrow><mi>V</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></mstyle></math></div><br />
<br />
<h3>Examples</h3>Another way to think of the line defined by the eigenvector is as an <b>invariant subspace</b> - every 'point' along that line is mapped to that line by the action of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>.<br />
<br />
In dynamical systems, the 'acceleration' or 'flow' is often represented as a field of vectors, each assigned to a position <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>=</mo><mrow><mo>(</mo><msub><mi>x</mi><mn>1</mn></msub><mo>,</mo><msub><mi>y</mi><mn>1</mn></msub><mo>)</mo></mrow></math> by the equation <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>=</mo><mi>M</mi><mo>&CircleTimes;</mo><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></math>, usually approximated as <div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mover accent="true"><mrow><msub><mi>X</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>=</mo><mover accent="true"><mrow><msub><mi>X</mi><mi>i</mi></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>+</mo><mi>M</mi><mo>&CircleTimes;</mo><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></mstyle></math></div><br />
<br />
 This means the dynamics along these lines is known - and by 'continuity', they must be 'similar nearby'. Thus, understanding the eigenstuff yields understanding about the overall behavior of the system:<br />
<br />
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi><mo>=</mo><mo>[</mo><mtable><mtr columnalign="right"><mtd><mo>-</mo><mn>2</mn></mtd><mtd><mn>1</mn></mtd></mtr>
<mtr columnalign="right"><mtd><mn>1</mn></mtd><mtd><mo>-</mo><mn>2</mn></mtd></mtr>
</mtable><mo>]</mo></math> has eigenvectors <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><msub><mi>V</mi><mn>1</mn></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>=</mo><mrow><mo>(</mo><mn>1,1</mn><mo>)</mo></mrow></math> and <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><msub><mi>V</mi><mn>2</mn></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>=</mo><mrow><mo>(</mo><mn>1</mn><mo>,</mo><mo>-</mo><mn>1</mn><mo>)</mo></mrow></math>, with eigenvalues <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mo>&lambda;</mo><mrow><mn>1,2</mn></mrow></msub><mo>=</mo><mo>-</mo><mn>1</mn></math> (why?). The phase portrait is 'fixed' by the dynamics of these subspaces:<br />
<img src="svdfinal_im1.gif"/><br />
<br />
<h2>Diagonalization</h2><br />
A fundamental result is that, for every n-dimensional <i>real</i>, <i>symmetric</i> matrix <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>, there exists an <i>orthogonal</i> matrix <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>U</mi></math> and a <i>diagonal</i> matrix <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>D</mi></math> such that<br />
 <div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mi>M</mi><mo>=</mo><mi>U</mi>'<mo>&CircleTimes;</mo><mi>D</mi><mo>&CircleTimes;</mo><mi>U</mi></mstyle></math></div><br />
 and the diagonal elements of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>D</mi></math> are <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mo>&lambda;</mo><mn>1</mn></msub><mo>&geq;</mo><mo>.</mo><mo>.</mo><mo>.</mo><mo>&geq;</mo><msub><mo>&lambda;</mo><mi>n</mi></msub></math> (the ordered eigenvalues of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>) corresponding to <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo>(</mo><mover accent="true"><mrow><msub><mi>U</mi><mn>1</mn></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>,</mo><mo>.</mo><mo>.</mo><mo>.</mo><mo>,</mo><mover accent="true"><mrow><msub><mi>U</mi><mi>n</mi></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>)</mo></mrow></math> the columns of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>U</mi></math> (the eigenvectors of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>).<br />
<br />
This is huge.<br />
<br />
One way to think of this is that every (full-rank) matrix has a coordinate system (<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>U</mi></math> is essentially a change of coordinates) in which the action of the matrix is a simple scaling along the coordinate axes.<br />
<br />
Another is as a way of 'decomposing' a matrix. Reflection and hand-waving will show that the above 'factoring' of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math> allows us to write M as the sum of simple (<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo>&times;</mo><mi>n</mi></math>)-matrices determined by the eigensystem:<br />
<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mi>M</mi><mo>=</mo><msub><mo>&lambda;</mo><mn>1</mn></msub><mover accent="true"><mrow><msub><mi>U</mi><mn>1</mn></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover>'<mo>&CircleTimes;</mo><mover accent="true"><mrow><msub><mi>U</mi><mn>1</mn></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>+</mo><mo>.</mo><mo>.</mo><mo>.</mo><mo>+</mo>
<msub><mo>&lambda;</mo><mi>n</mi></msub><mover accent="true"><mrow><mi>U</mi><msub><mi>'</mi><mi>n</mi></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>&CircleTimes;</mo><mover accent="true"><mrow><msub><mi>U</mi><mi>n</mi></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover></mstyle></math></div><br />
Note further that since the <math xmlns="http://www.w3.org/1998/Math/MathML"><mo>{</mo><msub><mi>U</mi><mi>i</mi></msub><msub><mo>}</mo><mrow><mi>i</mi><mo>&in;</mo><mn>1</mn><mo>.</mo><mo>.</mo><mi>n</mi></mrow></msub></math> are all the same 'length' (part of the orthonormal condition on <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>U</mi></math>), the relative contributions of the matrices are determined by the size of the eigenvalues <math xmlns="http://www.w3.org/1998/Math/MathML"><mo>{</mo><msub><mo>&lambda;</mo><mi>i</mi></msub><msub><mo>}</mo><mrow><mi>i</mi><mo>&in;</mo><mn>1</mn><mo>.</mo><mo>.</mo><mi>n</mi></mrow></msub></math>.<br />
Thus, an excellent approximation to <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math> may be achievable using only the first few vectors of the eigensystem. This is especially useful if the eigenvectors describe important behaviors of the system, as they always seem to.<br />
<br />
<h2>Quadratic Forms</h2><br />
<br />
A common construction with a matrix <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math> is in a <i>Quadratic Form</i>, thusly:<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mi>y</mi><mo>=</mo><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover>'<mo>&CircleTimes;</mo><mi>M</mi><mo>&CircleTimes;</mo><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></mstyle></math></div> note that <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>y</mi></math> is a scalar. Thus we are generating a real-valued function of the vector <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></math>. This can be a wieghted norm or average, or arise as a 'squared-distance' function.<br />
<br />
An important and useful result is that the <i>maximum</i> and <i>minimum</i> of this function of <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></math> are <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mo>&lambda;</mo><mn>1</mn></msub></math> and <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mo>&lambda;</mo><mn>2</mn></msub></math>, acheived at their corresponding eigenvectors <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>U</mi><mn>1</mn></msub></math> and <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>U</mi><mi>n</mi></msub></math>, respectively.<br />
More specifically, <br />
<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mfrac><mrow><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></mrow><mrow><mn>2</mn></mrow></mfrac></mstyle></math></div><div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><msub><mtext>sup</mtext><mrow><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></mrow></msub><mfrac><mrow><mover accent="true"><mrow><mi>X</mi>'</mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>&CircleTimes;</mo><mi>M</mi><mo>&CircleTimes;</mo><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></mrow><mrow><mover accent="true"><mrow><mi>X</mi>'</mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>&CircleTimes;</mo><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></mrow></mfrac><mo>=</mo><msub><mi>U</mi><mn>1</mn></msub><mtext>for some k <mo>&in;</mo><mo>&Ropf;</mo></mtext></mstyle></math></div><br />
Since the <math xmlns="http://www.w3.org/1998/Math/MathML"><mo>{</mo><msub><mi>U</mi><mi>i</mi></msub><mo>}</mo></math> form a <i>basis</i>, we can write any <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>X</mi></math> as in terms of them:<br />
<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>=</mo><msub><mi>C</mi><mn>1</mn></msub><mover accent="true"><mrow><msub><mi>U</mi><mn>1</mn></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>+</mo><mo>.</mo><mo>.</mo><mo>.</mo><mo>+</mo><msub><mi>C</mi><mi>N</mi></msub><mover accent="true"><mrow><msub><mi>U</mi><mi>n</mi></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover></mstyle></math></div> which allows us to us to write the value of the (normalized) Quadratic Form as:<br />
<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mfrac><mrow><mover accent="true"><mrow><mi>X</mi>'</mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>&CircleTimes;</mo><mi>M</mi><mo>&CircleTimes;</mo><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></mrow><mrow><mover accent="true"><mrow><mi>X</mi>'</mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>&CircleTimes;</mo><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></mrow></mfrac><mo>=</mo><mfrac><mrow><msub><mi>C</mi><mn>1</mn></msub><msubsup><mo>&lambda;</mo><mn>1</mn><mn>2</mn></msubsup><mo>+</mo><mo>.</mo><mo>.</mo><mo>.</mo><mo>+</mo><msub><mi>C</mi><mi>n</mi></msub><msubsup><mo>&lambda;</mo><mi>n</mi><mn>2</mn></msubsup></mrow><mrow><msubsup><mo>&lambda;</mo><mn>1</mn><mn>2</mn></msubsup><mo>+</mo><mo>.</mo><mo>.</mo><mo>.</mo><mo>+</mo><msubsup><mo>&lambda;</mo><mi>n</mi><mn>2</mn></msubsup></mrow></mfrac></mstyle></math></div><br />
It is clear upon examination this is maximized as <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mo>&lambda;</mo><mn>1</mn></msub></math> at <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>C</mi><mo>=</mo><mrow><mo>(</mo><mn>1,0,0</mn><mo>,</mo><mo>.</mo><mo>.</mo><mo>.</mo><mo>)</mo></mrow></math>; that is, when <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>=</mo><mover accent="true"><mrow><msub><mi>U</mi><mn>1</mn></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover></math>. Similar treatments show the minimum is acieved at <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><msub><mi>U</mi><mi>n</mi></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover></math>.<br />
<br />
<h2>Least Squares Theory</h2><br />
Many classical and useful problems in statistics reduce to the system<br />
<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mover accent="true"><mrow><mi>Y</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>=</mo><mi>M</mi><mo>&CircleTimes;</mo><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></mstyle></math></div><br />
<math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><mi>Y</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></math> may be a vector of observations, and we seek to estimate the value of <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></math> for known parameters <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>. Note that we may well have many more observations than Random Variables, so this system may be underdetermined; that is, the <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math> may not be a real symmetric matrix. The <i>Least Squares Estimate</i> of <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></math> will minimize <br />
<br />
<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mrow><mo>(</mo><mi>Y</mi><mo>-</mo><mi>M</mi><mo>&CircleTimes;</mo><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>)</mo></mrow>'<mo>&CircleTimes;</mo><mrow><mo>(</mo><mi>Y</mi><mo>-</mo><mi>M</mi><mo>&CircleTimes;</mo><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>)</mo></mrow></mstyle></math></div><br />
Taking derivatives w.r.t. <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></math> yields the <i>Normal Equation</i>:<br />
<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mrow><mo>(</mo><mi>M</mi>'<mo>&CircleTimes;</mo><mi>M</mi><mo>)</mo></mrow><mo>&CircleTimes;</mo><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>=</mo><mi>M</mi>'<mo>&CircleTimes;</mo><mover accent="true"><mrow><mi>Y</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></mstyle></math></div><br />
Since <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo>(</mo><mi>M</mi>'<mo>&CircleTimes;</mo><mi>M</mi><mo>)</mo></mrow></math> is (generally) a real, symmetric matrix, we can write its spectral decompostion <br />
<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mrow><mo>(</mo><mi>M</mi>'<mo>&CircleTimes;</mo><mi>M</mi><mo>)</mo></mrow><mo>=</mo><msub><mo>&lambda;</mo><mn>1</mn></msub><msub><mi>U</mi><mn>1</mn></msub>'<msub><mi>U</mi><mn>1</mn></msub><mo>+</mo><mo>.</mo><mo>.</mo><mo>.</mo><mo>+</mo><msub><mo>&lambda;</mo><mi>n</mi></msub><msub><mi>U</mi><mi>n</mi></msub>'<msub><mi>U</mi><mi>n</mi></msub></mstyle></math></div> which in turn provides us with a (psuedo) inverse:<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><msup><mrow><mo>(</mo><mi>M</mi>'<mo>&CircleTimes;</mo><mi>M</mi><mo>)</mo></mrow><mrow><mo>-</mo></mrow></msup><mo>=</mo><mfrac><mrow><mn>1</mn></mrow><mrow><msub><mo>&lambda;</mo><mn>1</mn></msub></mrow></mfrac><msub><mi>U</mi><mn>1</mn></msub>'<msub><mi>U</mi><mn>1</mn></msub><mo>+</mo><mo>.</mo><mo>.</mo><mo>.</mo><mo>+</mo><mfrac><mrow><mn>1</mn></mrow><mrow><msub><mo>&lambda;</mo><mi>n</mi></msub></mrow></mfrac><msub><mi>U</mi><mi>n</mi></msub>'<msub><mi>U</mi><mi>n</mi></msub></mstyle></math></div> so we can solve the system:<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>=</mo><msup><mrow><mo>(</mo><mi>M</mi>'<mo>&CircleTimes;</mo><mi>M</mi><mo>)</mo></mrow><mo>-</mo></msup><mi>M</mi>'<mo>&CircleTimes;</mo><mover accent="true"><mrow><mi>Y</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></mstyle></math></div><br />
<br />
<h2>Fisher Linear Discriminant</h2><br />
A classic problem in statistics involves finding a <i>linear dsicriminant</i>, that is, a linear combination of the features of a datapoint whose <i>sign</i> classifies it into one of two (or more) categories. For example, a datapoint <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></math> might represent a particular tissue sample, the features <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo>(</mo><msub><mi>x</mi><mn>1</mn></msub><mo>,</mo><mo>.</mo><mo>.</mo><mo>.</mo><mo>,</mo><msub><mi>x</mi><mi>n</mi></msub><mo>)</mo></mrow></math> the expression levels of various genes, and the classes a clinical outcome. We seek a <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><mi>w</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></math> such that <br />
<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mover accent="true"><mrow><mi>w</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover>'<mo>&CircleTimes;</mo><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mtext>is</mtext><mo>{</mo><mtable><mtr columnalign="right"><mtd><mo>&gt;</mo><mn>0</mn><mo>&forall;</mo><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>&in;</mo><mtext>Class A</mtext></mtd></mtr>
<mtr columnalign="right"><mtd><mo>&lt;</mo><mn>0</mn><mo>&forall;</mo><mover accent="true"><mrow><mi>X</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>&in;</mo><mtext>Class B</mtext></mtd></mtr>
</mtable></mstyle></math></div>or as close as can be found.<br />
<br />
Fisher's measure of the discriminating ability of such a <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><mi>w</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></math> is given by<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mi>J</mi><mrow><mo>(</mo><mover accent="true"><mrow><mi>w</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>)</mo></mrow><mo>=</mo><mfrac><mrow><msup><mrow><mo>(</mo><msub><mo>&mu;</mo><mtext>Class A</mtext></msub><mo>-</mo><msub><mo>&mu;</mo><mtext>Class B</mtext></msub><mo>)</mo></mrow><mn>2</mn></msup></mrow><mrow><msubsup><mo>&sigma;</mo><mi>A</mi><mn>2</mn></msubsup><mo>+</mo><msubsup><mo>&sigma;</mo><mi>B</mi><mn>2</mn></msubsup></mrow></mfrac></mstyle></math></div><br />
<br />
that is, the ratio of the (squared) distance between class means to the projected variance for the classes determined by choice of <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><mi>w</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></math>. If <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><msub><mi>X</mi><mi>A</mi></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover></math> is the <i>centered</i> datapoints in class A, define the <i>Scatter Matrix</i> of A as<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><msub><mi>S</mi><mi>A</mi></msub><mo>:</mo><mo>=</mo><mover accent="true"><mrow><msub><mi>X</mi><mi>A</mi></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover>'<mo>&CircleTimes;</mo><mover accent="true"><mrow><msub><mi>X</mi><mi>A</mi></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>=</mo><munder><mrow><mo>&sum;</mo></mrow><mrow><mover accent="true"><mrow><mi>x</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>&in;</mo><mi>A</mi></mrow></munder><mrow><mo>(</mo><mover accent="true"><mrow><msub><mi>x</mi><mi>i</mi></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>-</mo><msub><mo>&mu;</mo><mi>A</mi></msub><mo>)</mo></mrow>'<mo>&CircleTimes;</mo><mrow><mo>(</mo><mover accent="true"><mrow><msub><mi>x</mi><mi>i</mi></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>-</mo><msub><mo>&mu;</mo><mi>A</mi></msub><mo>)</mo></mrow><mo>=</mo><msub><mi>n</mi><mi>A</mi></msub><msub><mo>&Sigma;</mo><mi>A</mi></msub></mstyle></math></div>IE, simple the covariance matrix for the class weighted by the number of elements in the class <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>n</mi><mi>A</mi></msub></math>. If we then define the <i>Scatter-Within</i> matrix as <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>S</mi><mi>W</mi></msub><mo>=</mo><msub><mi>S</mi><mi>A</mi></msub><mo>+</mo><mi>S</mi><mo>+</mo><mi>A</mi></math> to represent the scatter within the classes, and the <i>Scatter Between</i> matrix relating the two classes as <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>S</mi><mi>B</mi></msub><mo>=</mo>
<mrow><mo>(</mo><mover accent="true"><mrow><msub><mo>&mu;</mo><mi>A</mi></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>-</mo><mover accent="true"><mrow><msub><mo>&mu;</mo><mi>B</mi></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>)</mo></mrow>'<mo>&CircleTimes;</mo><mrow><mo>(</mo><mover accent="true"><mrow><msub><mo>&mu;</mo><mi>A</mi></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>-</mo><mover accent="true"><mrow><msub><mo>&mu;</mo><mi>B</mi></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>)</mo></mrow></math>then we can wewrite the discriminant function as<br />
<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mi>J</mi><mrow><mo>(</mo><mover accent="true"><mrow><mi>w</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>)</mo></mrow><mo>=</mo><mover accent="true"><mrow><mi>w</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover>'<mrow><mo>(</mo><msub><mi>S</mi><mi>B</mi></msub><mo>&CircleTimes;</mo><msup><mrow><msub><mi>S</mi><mi>W</mi></msub></mrow><mo>-</mo></msup><mo>)</mo></mrow><mover accent="true"><mrow><mi>w</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></mstyle></math></div><br />
but this is just a quadratic form. So we know the best linear discriminant <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><mi>w</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover></math> is given by the first eigenvector of <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>S</mi><mi>B</mi></msub><mo>&CircleTimes;</mo><msup><mi>S</mi><msub><mrow><mo>-</mo></mrow><mi>W</mi></msub></msup></math>. This is given by <br />
<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mover accent="true"><mrow><mi>w</mi></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>=</mo><msup><mrow><msub><mi>S</mi><mi>W</mi></msub></mrow><mrow><mo>-</mo></mrow></msup><mrow><mo>(</mo><msub><mo>&mu;</mo><mi>A</mi></msub><mo>-</mo><msub><mo>&mu;</mo><mi>B</mi></msub><mo>)</mo></mrow></mstyle></math></div>which can be seen as intuitive.<br />
<br />
<h2>Singular Value Decomposition</h2><br />
So, eigensystems tell you everything you need to go. What else could be needed? Well, nothing. Unfortunately, most matrices encountered in the real world are not real and symmetric. For example, in statistical applications, we often have many more observations than variables, so the data matrices are not square.<br />
So, we cannot generally diagonalize our matrix and work with the Spectral Decompostion. The <i>Singular Value Decomposition</i> should be seen as a way to come as close as possible to this enviaable state of things. <br />
<br />
In short, for <b>any</b> matrix <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>, we can find <i>orthogonal</i> matrices <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>U</mi></math> and <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math>, and diagonal matrix <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>D</mi></math>, such that:<br />
<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mi>M</mi><mo>=</mo><mi>U</mi><mo>&CircleTimes;</mo><mi>D</mi><mo>&CircleTimes;</mo><mi>V</mi></mstyle></math></div><br />
where the rows of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math> are an orthonormal basis for the rows of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>, the columns of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>U</mi></math> are an orthonormal basis for the coloumns of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>, and the diagonal elements of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>D</mi></math> are the <i>Singular Values</i> of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>. The singular values are the closest to eigenvalues we can get for a singular matrix <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>; they are the eigenvalues of <math xmlns="http://www.w3.org/1998/Math/MathML"><msqrt><mi>M</mi>'<mo>&CircleTimes;</mo><mi>M</mi></msqrt></math>, and correspond to the <i>Left and Right Singular Vectors</i> making up <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>U</mi></math> and <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math> respectively, just as do eigenvalues and eigenvectors.<br />
<br />
So the situation is akin to a diagonalized matrix, but we are unable to use a single coordinate tranform to get into the 'optimized' space where the action of the matrix is easy to identify. Instead, we have two coordinate transforms - one for the rows, and one for the columns. It's an elegant solution to the Curse of Dimensionality, and a powerful one.<br />
<br />
There is an interesting and useful duality between <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>U</mi></math> and <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math>. Suppose we wish to find out what linear combination of solumns from <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math> gave rise to one of the right singular vectors (the columns of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>U</mi></math>). We'd like to get <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>U</mi></math> in terms of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>:<br />
<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mi>M</mi><mo>&CircleTimes;</mo><msup><mrow><mo>(</mo><mi>D</mi><mi>V</mi><mo>)</mo></mrow><mrow><mo>-</mo></mrow></msup><mo>=</mo><mi>U</mi></mstyle></math></div><br />
In other words, <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow><mo>(</mo><mi>D</mi><mi>V</mi><mo>)</mo></mrow><mrow><mo>-</mo></mrow></msup></math> will tell us how to build the columns of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>U</mi></math> from those of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>. This is useful when we find something interesting and want to relate it back to the raw data. Note that <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo>(</mo><mi>D</mi><mi>V</mi><mo>)</mo></mrow></math> is very easy to invert. <br />
<br />
For any matrices <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>A</mi></math> and <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>B</mi></math>, <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow><mo>(</mo><mi>A</mi><mi>B</mi><mo>)</mo></mrow><mo>-</mo></msup><mo>=</mo><msup><mi>B</mi><mo>-</mo></msup><msup><mi>A</mi><mo>-</mo></msup></math> - multiply them out and see. Now, <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>D</mi><mo>-</mo></msup></math> is easy: for any diagonal matrix <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>D</mi><mo>=</mo><mtext>diag</mtext><mrow><mo>(</mo><msub><mi>d</mi><mn>1</mn></msub><mo>,</mo><mo>.</mo><mo>.</mo><mo>.</mo><mo>,</mo><msub><mi>d</mi><mi>n</mi></msub><mo>)</mo></mrow><mo>&Rightarrow;</mo><msup><mi>D</mi><mo>-</mo></msup><mo>=</mo><mtext>diag</mtext><mrow><mo>(</mo><mn>1</mn>/<msub><mi>d</mi><mrow><mn>1</mn></mrow></msub><mn>,...,1</mn>/<msub><mi>d</mi><mrow><mi>n</mi></mrow></msub><mo>)</mo></mrow></math>.<br />
The inverse of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math> brings out the beauty of the SVD. We know that <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math> is orthogonal. This <i>implies</i> lots of nice things, but the <i>definition</i> is that <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>V</mi><mo>-</mo></msup><mo>=</mo><mi>V</mi>'</math>. In other words, the linear combination of columns from <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math> used to make the <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>i</mi><mrow><mi>t</mi><mi>h</mi></mrow></msup></math> left singular vector (column of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>U</mi></math>) is given by the i^{th} row of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math>, scaled by the <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>i</mi><mrow><mi>t</mi><mi>h</mi></mrow></msup></math> singular value.<br />
<br />
Finally, it is worth noting that if the original matrix <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math> is centered along the rows, then the LSVs are the <i>Principal Components</i> of the rows of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>, with <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>D</mi><mi>V</mi></math> giving the PCA 'coordinates'. Similarly, centering the columns yields the PCAs in <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math>.<br />
<br />
<br />
<h2>Image Processing Example</h2><br />
A color-indexed or RGB digital image is essentially a big matrix (or three), and as such provides an excellent example of some of the properties of the SVD. We can make use of the ordering of the singular values to reduce the data needed to describe the most 'important' aspects of the system. Specifically, we can write the following SVD version of the Spectral Decomposition:<br />
<div align="center"><math xmlns="http://www.w3.org/1998/Math/MathML"><mstyle displaystyle="true"><mi>M</mi><mo>=</mo><msub><mi>d</mi><mn>1</mn></msub><mover accent="true"><mrow><msub><mi>U</mi><mn>1</mn></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>&CircleTimes;</mo><mover accent="true"><mrow><msub><mi>V</mi><mn>1</mn></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>+</mo><msub><mi>d</mi><mn>2</mn></msub><mover accent="true"><mrow><msub><mi>U</mi><mn>2</mn></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>&CircleTimes;</mo><mover accent="true"><mrow><msub><mi>V</mi><mn>2</mn></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>+</mo><mo>.</mo><mo>.</mo><mo>.</mo><mo>+</mo><msub><mi>d</mi><mi>n</mi></msub><mover accent="true"><mrow><msub><mi>U</mi><mi>n</mi></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover><mo>&CircleTimes;</mo><mover accent="true"><mrow><msub><mi>V</mi><mi>n</mi></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover></mstyle></math></div><br />
where <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>d</mi><mi>i</mi></msub></math>, <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><msub><mi>U</mi><mi>i</mi></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover></math>, and <math xmlns="http://www.w3.org/1998/Math/MathML"><mover accent="true"><mrow><msub><mi>V</mi><mi>i</mi></msub></mrow><mo stretchy="true">&RightArrow;</mo></mover></math> are the i^{th} Singular Value, Left Singular Vector (column of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>U</mi></math>), and Right Singular Vector (row of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math>), respectively.<br />
Just as with the eigenvalues in a Spectral Decomposition, the first few Singular Values will be the largest, and thus these terms will make the largest 'contributions' to <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>. In fact, the matrix constructed by adding the first <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi></math> such terms is the rank-k matrix closest to <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>, in terms of least-squares distance between elements. This provides a simple compression or feature extraction technique.<br />
The following image is the blue channel from a 650 \times 600 RGB image. We convert the elements to floats, contruct the SVD, and then recreate the image using the approximation above for various values of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi></math>. Note that for, say, <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi><mo>=</mo><mn>10</mn></math>, we are using (at most) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi><mo>*</mo><mn>600</mn><mo>*</mo><mn>2</mn><mo>+</mo><mn>1</mn><mo>=</mo><mn>1201</mn></math> numbers, rather then the original <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>600</mn><mo>*</mo><mn>650</mn><mo>=</mo><mn>390,000</mn></math>, a significant compression.<br />
<br />
<h2>SVD for Microarray Data</h2><br />
Microarray data, with its massive datasets and high dimensionality, can be a fruitful domain for the application of SVD techniques. We demonstrate here one such approach.<br />
This example uses a synthetic dataset for clarity of exposition. We simulate 100 hybridizations, the last 10 of which are associated with some phenotype of interest. We have 1000 genes, the last 900 of which are slightly upregulated in the above 10 samples. Note that the increase in mean expression in this set is less than the general variances of the datapoints, so the upregulation is very difficult to detect naively.<br />
We then calculate the SVD. Note that the columns of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math> are arrays (chips, hybridizations).<br />
We would like to find an eigen-array (Left Singular Value) that focuses on the behavior we are interested in. Recall that the 'recipe' for each LSV is given by the corresponding row of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math>. So we calculate the correlation of each of these columns to our 'phenotype vector'. This correlation is maximal for the second eigen-array, and examination of its 'recipe' shows that it is indeed capturing the overall differences between the two pheontype classes.<br />
So we know that the second eigenarray is a coordinate axis for the information we're looking for. We now need only find genes whose 'component' along this axis is large, indicating that they may be differentially expressed between the classes. <br />
<br />

</body>
</html>
