March | 2009 | Thesis Blog

Many times in artificial intelligence we deal with large data sets of a large number of features or variables and there is not a clear way to visualize or interpret the relationships between these variables. Principle components analysis (PCA) is a technique for finding a basis set that may represent the data well under some circumstances. PCA finds a set of orthogonal basis vectors which are aligned in the directions of maximum variance, the so called principle components. PCA also ranks the components by importance (i.e. variance).

Principle components analysis assumes that the distributions of the random variables are characterizes by their mean and variance, exponential distributions such as the Gaussian distribution for example. PCA also makes the assumption that the direction of maximum variance is indeed the most “important”. This assumption implies that the data has a high signal-to-noise ratio; the variance added by any noise is small in comparison to the variance from the important dynamics of the system.

The principal components are calculated by finding the eigenvectors and eigenvalues of the covariance matrix. The covariance matrix is a square matrix which describes the covariance between random variables, where is the covariance between the th and th random variables. The random variables must be in mean deviation form, meaning that the mean has been subtracted so the distribution of the random variable has . The eigenvalues and eigenvectors are usually arranged such that is a diagonal matrix where the value of is the th eignenvalue , and is a matrix where the th column of is the th eigenvector. Note that the covariance matrix may have eigenvalues/eigenvectors, so and are matrices as well. At this point the eigenvectors can be ranked by importance by sorting the eigenvalues in . Since each eigenvector is associated with a particular eigenvalue, the columns of must be rearranged so the th column corresponds to the th eigenvalue.

If the goal is to reduce the dimensionality of the data set (the number of random variables), the principle components which are least important may be discarded. This leaves us with two matrices, where is the number of principal components to keep.

This matrix of eigenvectors represents a linear transform which orients the basis vectors in the directions of maximal variance, while maintaining orthogonality.

A mathematical structure is an association between one set of mathematical objects with another in order to give those objects additional meaning or power. Common mathematical objects include numbers, sets, relations, and functions.

An algebraic structure is defined by a collection of objects and the operations which are allowed to be applied to those objects. A structure may consist of a single set or multiple sets of mathematical objects (e.g. numbers). These sets are closed under a particular operation or operations, meaning that the result of the operation applied to any element of the set is also in the set. Axioms are conditions which the sets and operations must satisfy.

A simple but ubiquitous algebraic structure is the group. A group is a set and a single binary operation, usually denoted , which satisfy the axiom of associativity and contain and identity element and inverse elements. Associativity specifies that the order in which the operations are performed does not matter; that is . The identity element is a special element such that the operation applied to it and another element results in the other element; formally: . Each element of the set must have an inverse element that yields the identity element when the two are combined: . If the axiom of commutativity is added, the group is referred to as an Abelian group. Commutativity allows the operands to be reorganized: . If the requirement of inverse elements is removed from the group structure, the structure is called a monoid.

A group homomorphism is a function which preserves the relationships between the elements of the set, or the group structure. For groups and , is a homomorphism iff . If the map is invertible, i.e. it has an inverse such that , then is said to be an isomorphism. A group endomorphism is a homomorphism from a group onto itself, , while an invertible endomorphism is called a automorphism. A subgroup is a group within a larger group. For a subgroup of a group , and the identity element of is the identity element of .

A ring is an algebraic structure which adds another operations and a few axioms to the group structure. The operations and of a ring satisfy different axioms. The set and the addition operator must form an Abelian group as described above, while the set and the multiplication operator must form a monoid. In addition, the operators must satisfy the axiom of distribution, specifically the operator must be able to distribute over the operator, and . If forms a commutative monoid, that is, a monoid with commutativity, then the ring is said to be a commutative ring.

Similarly to a group, a ring may also have a ring homomorphism if satisfies and . Likewise, is an isomorphism if it has an inverse that satisfies the identity relation described for group isomorphism above. A ring is a subring of a ring if and contains the multiplicative identity from .

An algebraic structure is a field if it satisfies the axioms of a ring with a few addition axioms. Both operators of a field must satisfy commutativity, and the set must contain inverses under both operators, except that the field may not contain a multiplicative inverse for the additive identity element. Another way to describe a field is to say that the additive group is an Abelian group, and the multiplicative group without the additive identity is also an Abelian group. The inclusion of inverses for both operators lead to the intuitive notion of subtraction and division (except division by 0). An example of a well known field is the field of real numbers, .

A metric space is a mathematical structure , where is a binary function which defines the real-valued distance between two elements of the set . A distance is a non-negative quantity, and only equal to zero when the two element are equal. A distance should also satisfy the axiom of symmetry, and the triangle inequality . For example, the real-valued vector space equipped with the Euclidean distance metric yields the Euclidean metric space.

Month: March 2009

Principle Components Analysis

Basic Mathematical Structures and Operations