Genetic Algorithms and Explicit Search Statistics

Part of Advances in Neural Information Processing Systems 9 (NIPS 1996)

Bibtex Metadata Paper

Authors

Shumeet Baluja

Abstract

The genetic algorithm (GA) is a heuristic search procedure based on mechanisms abstracted from population genetics. In a previous paper [Baluja & Caruana, 1995], we showed that much simpler algorithms, such as hillcIimbing and Population(cid:173) Based Incremental Learning (PBIL), perform comparably to GAs on an optimiza(cid:173) tion problem custom designed to benefit from the GA's operators. This paper extends these results in two directions. First, in a large-scale empirical comparison of problems that have been reported in GA literature, we show that on many prob(cid:173) lems, simpler algorithms can perform significantly better than GAs. Second, we describe when crossover is useful, and show how it can be incorporated into PBIL.

1 IMPLICIT VS. EXPLICIT SEARCH STATISTICS

Although there has recently been controversy in the genetic algorithm (GA) community as to whether GAs should be used for static function optimization, a large amount of research has been, and continues to be, conducted in this direction [De Jong, 1992]. Since much of GA research focuses on optimization (most often in static environments), this study exam(cid:173) ines the performance of GAs in these domains. In the standard GA, candidate solutions are encoded as fixed length binary vectors. The ini(cid:173) tial group of potential solutions is chosen randomly. At each generation, the fitness of each solution is calculated; this is a measure of how well the solution optimizes the objective function. The subsequent generation is created through a process of selection, recombina(cid:173) tion, and mutation. Recombination operators merge the information contained within pairs of selected "parents" by placing random subsets of the information from both parents into their respective positions in a member of the subsequent generation. The fitness propor(cid:173) tional selection works as selective pressure; higher fitness solution strings have a higher probability of being selected for recombination. Mutations are used to help preserve diver(cid:173) sity in the population by introducing random changes into the solution strings. The GA uses the population to implicitly maintain statistics about the search space. The selection, cross(cid:173) over, and mutation operators can be viewed as mechanisms of extracting the implicit statis(cid:173) tics from the population to choose the next set of points to sample. Details of GAs can be found in [Goldberg, 1989] [Holland, 1975].

Population-based incremental learning (PBIL) is a combination of genetic algorithms and competitive learning [Baluja, 1994]. The PBIL algorithm attempts to explicitly maintain statistics about the search space to decide where to sample next. The object of the algorithm is to create a real valued probability vector which, when sampled, reveals high quality solu(cid:173) tion vectors with high probability. For example, if a good solution can be encoded as a string of alternating O's and l's, a suitable final probability vector would be 0.01, 0.99, 0.01, 0.99, etc. The PBIL algorithm and parameters are shown in Figure 1. Initially, the values of the probability vector are initialized to 0.5. Sampling from this vec(cid:173) tor yields random solution vectors because the probability of generating a I or 0 is equal. As search progresses, the values in the probability vector gradually shift to represent high