Symbolic regression: When regression took it seriously (2024)

Symbolic regression: When regression took it seriously (1)

Published in

Towards AI

·

5 min read

·

1 day ago

--

In a great mission to unravel the mysteries of the universe, humans always sought the patterns in the vast abundance of data. Going to the previous century, John Kepler, the brilliant mind behind Kepler’s laws of planetary motion, was not armed with a computer that could run a genetic algorithm to find the patterns in his data. Kepler made use of his sheer mathematical brilliance and fixed his eyes and mind on a huge dataset and unveiled the relationship between a planet’s time period and radius. His method was simple, immerse himself in the data, trust his intuitions, and let the patterns emerge. This method was used by various scientists throughout the history of science. But do not mistake simplicity for ease.

Symbolic regression: When regression took it seriously (3)

Scientists who used the above method in the past had to put a lot of effort and dedication into their discoveries. In the modern world, where data grows in size and dimensions, the task of uncovering complex patterns becomes more daunting. In the agony, symbolic regression emerges, a modern computational marvel that kills pain in the pursuit of patterns.

Symbolic regression stands apart from classic regression by offering greater flexibility in identifying patterns within data. A traditional regression model looks for a specific pattern in data, i.e. a model designed for finding linear relationships cannot work accurately on data that has an exponential relationship between variables. Symbolic regression thrives on this diversity. It is closely related to the genetic algorithm, in fact, it is a modification of the evolutionary algorithm.

Symbolic regression: When regression took it seriously (4)

A normal evolutionary algorithm goes as follows:

  • We have a population of individuals (mathematical functions), and a fitness function (accuracy of the function on our data)
  • We randomly select an n-sized subset of the population
  • We calculate the fitness function of all the individuals in the subset
  • We select one individual from the subset. Selection is made in such a way that the fittest individual is the most likely to be selected, but still, there is a chance that another individual is selected. (The more the fitness, the more the probability of being selected)
  • We create a copy of this selected individual and do a random mutation on it (randomly change an operation in the mathematical function)
  • We replace the weakest member of the subset with this mutated copy.

These steps are repeated to get a subset of high-average fitness

Crossover between individuals is not done in the above algorithm. You can introduce crossover by selecting more than one individual from the subset and doing a crossover among them before mutation.

Symbolic regression: When regression took it seriously (5)

Symbolic regression makes modifications to this evolutionary algorithm to improve its output. One important modification is the introduction of age-regularization. Instead of replacing the weakest member of the subset, the earliest created individual in the subset is replaced. This could prevent the early convergence of the population and avoid getting stuck at a local maximum fitness function.

When it comes to symbolic regression, a fascinating twist is introduced to improve the algorithm — age-regularization. Age-regularization ushers a chronological method of regeneration, i.e. instead of adding the newly formed individual by replacing the weakest individual, we replace the oldest element in the subset, thus making way for a new generation. This change can have a profound effect on the evolutionary journey of the population. Age-regularization ensures that genetic diversity is maintained within the population. It prevents the population from stagnating at a local maximum, thus encouraging a robust search in the solution landscape.

Symbolic regression also uses a new variable called temperature, which controls the mutations in the algorithm. Based on the value of temperature, the algorithm can reject the mutations when the fitness of the mutated individual is lesser than the original chosen individual. This process is called simulated annealing. This temperature value can also be coupled with the probability of choosing an individual from the population. Using this variable we can have good control of converging or diverging our expression. Simulated annealing has been experimentally proven to speed up the search process.

Symbolic regression: When regression took it seriously (6)

In symbolic regression, the genetic algorithm goes through an evolve-simplify-optimize loop, i.e., after every evolution, the mathematical expressions are simplified and optimized. The simplification and optimization can reduce the complexity of the algorithm, but they are introduced only after some iterations of evolution alone. This is done to avoid losing some important individuals.

Suppose the equation we are searching for is (x*y) — (x/y). In one step, we reach (x*y) — (x*y). We know that we are one correct mutation away from finding our solution. But, if we simplify the equation, it will go to 0 and we won’t be able to achieve the output. So, simplifying only occasionally can keep the redundant but useful expressions while reducing the complexity.

Symbolic regression: When regression took it seriously (7)

Symbolic regression is like the maverick scientist of the regression world, shaking things up with its cool genetic algorithm vibes. It’s the rebel that doesn’t settle for a “close enough” mathematical model, but instead crafts an actual, tangible model that scientific researchers can hold onto.

While traditional regression models might come close to fitting the data, they can sometimes play havoc with related theories and derivations. Symbolic regression, on the other hand, is the renegade that refuses to compromise, delivering precise and impactful results for the scientific community.

Symbolic regression: When regression took it seriously (2024)

References

Top Articles
Latest Posts
Article information

Author: Wyatt Volkman LLD

Last Updated:

Views: 6123

Rating: 4.6 / 5 (46 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Wyatt Volkman LLD

Birthday: 1992-02-16

Address: Suite 851 78549 Lubowitz Well, Wardside, TX 98080-8615

Phone: +67618977178100

Job: Manufacturing Director

Hobby: Running, Mountaineering, Inline skating, Writing, Baton twirling, Computer programming, Stone skipping

Introduction: My name is Wyatt Volkman LLD, I am a handsome, rich, comfortable, lively, zealous, graceful, gifted person who loves writing and wants to share my knowledge and understanding with you.