Fitness levels of real LSRDRs of the octonion-like algebra during training

Показать описание

The octonions are up-to-isomorphism, a unique 8 dimensional real inner product space O together with a bilinear operation * and a unit 1 such that ||x*y||=||x||*||y|| and 1*x=x*1=x for all x,y in O. The bilinear operation * is non-associative. While linear algebra extends to quaternionic matrices quite well (except for some quantum information theory), due to the non-associativity of the octonions, there is no reasonable theory of octonionic matrices. The octonions are therefore not as popular as the quaternions for this and other reasons.

We define an octonion-like algebra to be an 8 dimensional real inner product space A together with a bilinear operation * such that ||x*y||=||x||*||y|| for all x,y in A. Let A be an octonion-like algebra.

Let e_1,...,e_8 be an orthonormal basis of A. Let A_1,...,A_8 be the linear operators from A to A defined by setting A_j x=e_j *x. Then the operators A_1,...,A_8 are orthogonal.

Let d be a natural number at most 8. Let B_1,...,B_8 be d by d real matrices. Then we say that the fitness level of B_1,...,B_8 is
rho(kron(A_1,B_1)+...+kron(A_8,B_8))^2
rho(kron(B_1,B_1)+...+kron(B_8,B_8)).

'rho' stands for the spectral radius. 'kron' stands for the Kronecker product. While I have stated the fitness function in terms of bases and coordinates, we can reformulate everything in this animation without referring to any specific bases.

For each d from 1 to 8, we maximize the fitness level of the d by d real matrices B_1,...,B_8 eight different times using gradient ascent, and we show all 64 fitness levels of the local maximization during the course of the gradient ascent. Each value d in {1,...,8} is assigned a different color in the visualization, and lower values of d are put to the left of the higher values of d.

Since we are using power iteration to calculate the spectral radius and gradient, near the beginning of the visualization, the calculated fitness levels are unstable. We observe that for each particular value d, and each instance of the gradient ascent, we attain a fitness level of precisely d. We also have a (not yet proven) description of all global maxima for this fitness function.

Suppose that d is in {1,...,8}. Suppose that R is a d by 8 real matrix and S is an 8 by d real matrix and SR is an orthogonal projection operator. Then if we set B_j=RA_j S for all j, then (B_1,...,B_8) is presumably a local maximum for our fitness function and all global maxima for our fitness function are of this form. The dominant eigenvalue of kron(A_1,B_1)+...+kron(A_8,B_8) also has a simple description. Let G be the superoperator mapping 8 by 8 matrices to the 8 by 8 matrices defined by setting G(X)=A_1 X transpose(A_1)+...+PA_8P X transpose(PA_8P). Then G is similar to a direct sum of kron(A_1,B_1)+...+kron(A_8,B_8) and a zero matrix, but the dominant eigenvector of G is the matrix P.

The notion of an LSRDR is my own. I have made this visualization in order to demonstrate some of the interesting properties of both the octonion-like algebras and LSRDRs. I cannot see any way that the fitness levels of these LSRDRs could have a simpler interpretation (but producing a mathematical proof that 1,2,3,4,5,6,7,8 are the global maximum values may take some work). This simplicity of the fitness levels demonstrates that the octonion-like algebras are elegant mathematical structures and that LSRDRs behave mathematically in a way that is compatible with the octonion-like algebras. We should strive for this level of interpretability in machine learning, and we should develop machine learning algorithms that behave as mathematically as LSRDRs.

We observe that the fitness levels of the real LSRDRs are strictly lower than the fitness levels of the corresponding complex LSRDRs of A_1,...,A_8 when the dimension is kept the same. This demonstrates that while the real and complex LSRDRs of matrices often coincide, there are instances where the real LSRDR has a lower fitness than the corresponding complex LSRDR.

Correction: While the visualization title says ". . . LSRDRs of octonions", it is more accurate to say ". . . LSRDRs of octonion-like algebras"

Рекомендации по теме

Fitness levels of real LSRDRs of the octonion-like algebra during training

Fitness levels of real LSRDRs of the octonion-like algebra during training

Fitness levels of complex LSRDRs of the octonion-like algebra during training

Spectra of real and complex LSRDRs of octonion-like algebra during training

Spectra of 2 dimensional LSRDRs of 6 qubit noise channel during training

Spectra of 2 dimensional LSRDRs of 7 qubit noise channel during training

Spectra of 1 dimensional LSRDRs of 6 qubit noise channel during training

Spectra of 1 dimensional LSRDRs of 7 qubit noise channel during training

Spectra of 1 dimensional LSRDRs of 8 qubit noise channel during training

AI finds states that retain purity after going through a quantum channel obtained from octonions.

The spectrum of an LSRDR of Hermitian matrices during training

Catastrophic and complete forgetting in machine learning visualized

AI finds a pure quantum state that retains the most purity after going through a quantum channel.

The evolutionary algorithm maximizes the sum of a 32 by 32 submatrix of a 64 by 64 matrix 10 times.

Training a real completely positive operator to approximate a randomly generated function 10 times.

A fitness function for reducing tuples of symmetric matrices to vectors that has few local maxima

Handwritten digits generated by an AI that maximizes fitness

These simple AI models perfectly memorize data and become interpretable after training.

Hessian of a graph embedding during gradient ascent training

We find a zero of a neural network vector field on a sphere

This quantum channel became the identity mapping after training.

AI finds states that retain purity after going through a quantum channel obtained from sedenions.

Estimating the operator norm of a bilinear operator using gradient ascent many times

A randomly initialized neural network apparently has just one local maximum.

An evolutionary algorithm maximizes the sum of a 32 by 32 submatrix of a 64 by 64 matrix 10 times.