- Ben Sprott

# Theory-Based Feature Selection

Updated: Aug 28, 2021

Feature selection, or dimensionality reduction, is a critical part of the Machine Learning pipeline. It is used to reduce the amount of effort spent on regressing high dimensional functions. Here at Cavenwell Industrial AI, our software is designed to learn the theory of whatever your data concerns. In general, we believe your data will contain a theory about your business, including fine details of your products, process, operations and management. When you want to optimize any given aspect to your business, you are in the realm of needing to regress a high dimensional function. Cavenwell is designed to get you to those optima in an explainable manner.

Since we require extra data to learn a theory from a data set, we are currently testing our AI on data that comes from the Chrono physics simulator. This way, we can always dump more data to learn a rich theory. To simplify and clarify our testing, we have chosen a simple damped harmonic oscillator as a system to learn about. Our assumption is that business systems behave like mechanical systems and if we can prove out our AI on these mechanical systems, they will work on real business data.

A damped harmonic oscillator consists of a mass, a spring and a damping mechanism.

The columns you would expect from this system include the following:

mass position,

*x*mass velocity,

*x_dot*mass acceleration,

*x_ddot*spring length,

*L = x*spring velocity (speed of length change),

*L_dot*spring force,

*F*damping coefficient,

*c*settling time,

*T*mass,

*m*spring constant,

*k*

The spring length is equal to the mass position as the mass is a point sitting at the end of the spring (so *x=L*).

In a data-centric world, each of these is a column and a snapshot is a row made of all these columns. We set the mass and the spring constant to being fixed and so we don't have those as columns.

The familiar equation of motion for the velocity of the damped harmonic system is as follows:

*x_dot = m/c x_ddot - k/c x*

Both m and k are constant, so the only real predictors in the data are:

*x, k, x_ddot*

The Cavenwell AI learns a theory of this system just by reading the csv file with all these columns. It uses this theory to pick the best features to predict the mass velocity, or x_dot. Select-k-best is a standard method to select columns, but you can miss critical parts. For instance, select-k-best will miss the damping constant if k is too low and this is a common problem for feature selection. If you know the precise theory of the mechanical system, i.e., you knew the equation of motion, you would pick out what you need from the equation. Since the Cavenwell AI has a theory of the system, it can, similarly, pick out the exact columns which compute a given feature.

After running, the AI selects the following columns as the features for mass velocity or *x_dot*:

*k, F, L*

It is interesting that it chose the spring force and spring length, as opposed the mass acceleration and mass position. We know that the mass position and spring length are equal, so *L = x*. Also, the mass acceleration is a function of the spring force. More interestingly, we know that the causal origin of the mass acceleration is due to the spring force. The AI has chosen the root cause features.