# Structure of fitness landscapes in the NK model

The NK model of rugged fitness landscapes consists of $$N$$ sites where fitness contribution of each site depends on its state $${0,1}$$ and is epistatically affected by $$K$$ other sites. When defining the model, Kauffman & Weinberger (1989) stressed that:

The actual structure of [fitness] landscapes, although knowable is currently unknown

They then go on to assume either epistatic interactions are with adjacent sites in a linear ordering or at random, and that fitnesses are assigned to each of the $$2^{K + 1}$$ epistatic combinations at each site uniformly at random in $$[0,1]$$. Although it is definitely possible to justify this random generation, it tends to produce a very structured fitness landscapes, that although rugged (for high $$K$$), has a very regular distribution of local fitness optima. As a computer scientist, this assumption of an arbitrary random distribution over possible NK landscapes seems very misguided. I would be more comfortable with a worst-case analysis on a "reasonable" (hopefully as determined by experiment) subset of possible NK landscapes.

What is known about the typical structure of NK landscapes in biological domains? Also, are there examples of biological papers that do a worst case instead of an arbitrary random analysis on some non-trivial subset of landscapes? The only example I am familiar with is the gadgets used by Weinberger (1996) to show that the unconstrained model is NP-hard, but that is a trivial subset (all possible landscapes) and not of biological interest.

## References

Kauffman, S. and Weinberger, E. (1989) The NK Model of rugged fitness landscapes and its application to the maturation of the immune response. Journal of Theoretical Biology, 141(2): 211-245

Weinberger, E. (1996). NP-completeness of Kauffman's N-k model, a Tuneable Rugged Fitness Landscape. Santa Fe Institute Working Paper, 96-02-003.

Very little is known about the structure of fitness landscapes.

H.A. Orr (2005; also Whitlock et al., 1995; Kryazhimskiy et al., 2009) explains that most experimental results do not actually attempt to measure the fitness landscape, but instead report just the average fitness versus time and average number of acquired adaptations versus time. This can't be used to discern epistatic interactions, or any combinatorial structure of the landscape. In general, the theory of adaptive walks has developed without reference to real data and even the most refined theories can only correspond in vague qualitative ways (see Kryazhimskiy et al., 2009 for the best example I know).

Szendro et al. (2013) surveyed the few recent experiments that conducted a methodical examination of all mutations in a subset of loci of model organisms, but most studies (6 out of 12) were able to empirically realize only small fitness landscapes of just 4 to 5 loci (so 16 to 32 vertexes), with the largest full fitness landscape having length 6 with all 64 vertexes examined (Hall et al., 2010), and the largest number of vertexes in a single study being 418 out of the possible 512 in a length 9 landscape (O'Maille et al., 2008). These are extremely small landscapes, and thus don't have much power in distinguishing proposed theoretical models.

After extensive searching, I have not been able to find examples of theorists doing worst case analysis of fitness landscapes. The closest to worst-case analysis is Orr-Gillespie theory (Gillespie, 1991; Orr, 2002; 2006; Kryazhimskiy et al., 2009) that assumes that the wild-type is still very fit after an environmental change, and so beneficial mutations are extremely rate. This allows the authors to introduce extreme value theory and consider just the asymptotic properties of distribution tails thus allowing a more general treatment with fewer parameters; in some cases leading to a full classification of adaptive trajectories (Kryazhimskiy et al., 2009). This approach is easier to relate to experiment, but unfortunately completely ignores the combinatorial structure (beyond first-order) of the fitness landscape and always just samples the next potential mutant from the tail of a distribution without taking into account how it is progressing in the landscape. Although Orr (2006) introduced the basics of fitness landscape structure, the structure was much simpler than the NK model; he used Perelson & Macken's (1995) block model with each block as completely unstructured beyond first-order.

### References

Gillespie, J.H. (1991). The causes of molecular evolution. Oxford University Press.

Hall, D. W., Agan, M., & Pope, S. C. (2010). Fitness epistasis among 6 biosynthetic loci in the budding yeast Saccharomyces cerevisiae. Journal of Heredity, 101: S75-S84.

Kryazhimskiy, S., Tkacik, G., & Plotkin, J.B. (2009). The dynamics of adaptation on correlated fitness landscapes. Proc. Natl. Acad. Sci. USA 106(44): 18638-18643.

O'Maille, P. E., Malone, A., Dellas, N., Hess, B. A., Smentek, L., Sheehan, I., Greenhagen, B.T., Chappell, J., Manning, G., & Noel, J.P. (2008). Quantitative exploration of the catalytic landscape separating divergent plant sesquiterpene synthases. Nature Chemical Biology, 4(10): 617-623.

Orr, H.A. (2002). The population genetics of adaptation: the adaptation of DNA sequences. Evilution 56: 1317-1330.

Orr, H.A. (2005). The genetic theory of adaptation: a brief history. Nature Reviews Genetics 6(2): 119-127.

Orr, H.A. (2006). The population genetics of adaptation on correlated fitness landscapes: the block model. Evolution 60(6): 1113-1124.

Perelson, A.S., & Macken, C.A. (1995). Protein evolution of partially correlated landscapes. Proc. Natl. Acad. Sci. USA 92:9657-9661.

Szendro, I. G., Schenk, M. F., Franke, J., Krug, J., & de Visser, J. A. G. (2013). Quantitative analyses of empirical fitness landscapes. Journal of Statistical Mechanics: Theory and Experiment 2013(01): P01005.

Whitlock, M.C., Phillips, P.C., Moore, F.B.-G., & Tonsor, S.J. (1995). Multiple fitness peaks and epistasis. Annu. Rev. Ecol. Syst. 26: 601-629.