Finding limitations with common analysis methods: my new paper

A common goal in evolutionary biology is to understand how selection acts on traits and how genetic variants associated with those traits are affected by selection. The effect of selection on the genome is particularly interesting because there are situations where we know that populations are likely under different selection pressures (for example, one population of fish lives in freshwater and the other lives in saltwater), but the exact traits that selection is acting on may not be known or measurable. In the freshwater-saltwater fish example, the relevant trait experiencing selection pressure may be related to the ability for the gills to extract oxygen from the water – but measuring that might be tricky. So, researchers turn to the genome to attempt to understand how selection is acting on populations.

A basic distinction can be made between directional and balancing selection – is selection favoring one particular trait within the population (directional selection) or is selection favoring a mix of traits (balancing selection). To return to the freshwater-saltwater fish example, you might think that directional selection is most likely to be involved, because the freshwater and saltwater environments are incredibly different. But what if the freshwater environment is really a brackish environment that experiences fluctuations of low-salinity? Then perhaps the population will maintain variation among individuals in their ability to extract oxygen from the water because of variation in the micro-climate or temporally fluctuating conditions.

At the genetic level, the difference between directional and balancing selection can be thought of in this way: under directional selection, the populations will likely diverge, so the loci experiencing the effects of selection will have different allele frequencies (high FST between populations). However, directional selection will also erode genetic diversity (each population will tend towards only having one allele). With balancing selection, genetic diversity will be maintained (there will be many alleles in the populations) so the populations won’t diverge very much (low FST between populations).

A common approach to detecting these differences was proposed by Beaumont and Nichols in 1996, which it essentially identifies loci that have extreme FST values relative to their expected heterozygosity (which is a way of measuring their genetic diversity) by comparing the actual data to a simulated dataset with similar sampling parameters. This method then identifies loci that are under directional vs balancing selection by comparing FST values based on how much genetic diversity is expected for each locus. The simulations that are used to identify which loci are more extreme than expected (and therefore likely to be experiencing selection) are based on the infinite island model, which is a model of migration that assumes that there are infinite islands from which migrants arise. Although this is an abstraction from reality, Beaumont and Nichols showed that as long as a large number of independent populations are sampled (>10), the abstraction doesn’t skew the results very much. The Beaumont and Nichols (1996) approach has been widely used, especially since it has been developed into a user-friendly program called LOSITAN (Antao et al. 2008).

However, when I was conducting my population genomics study, I ran my data in LOSITAN and found some surprising results. I had sampled 12 populations, so I thought I should have enough samples, but I ended up with this graph:


My pipefish genomic data analyzed by LOSITAN. The light grey area in the middle background is the region that is supposedly full of neutral loci, and the darker grey areas represent areas under balancing selection (bottom – darkest grey) and under directional selection (top – medium grey).

This graph was surprising because it identified hundreds of loci as being under selection, and it looked disturbingly skewed. For comparison, the figure below is from a study of lamprey populations by Hess and colleagues (2012), and shows what an expected distribution should look like:


Genetic data from lamprey. Figure from Hess et al (2012), published in Molecular Ecology – not my own! Copyright held by Hess et al (2012)

My PhD advisor (Adam Jones) and I decided to investigate whether this skewed pattern was a symptom of a larger problem in our dataset or whether it was a common pattern in the literature. We found that the majority of studies reporting figures from LOSITAN analyses have unexpected patterns. Using simulations, we found that these patterns are caused by the relationship between FST and expected heterozygosity (FST is calculated using the expected heterozygosity), and that the skewed patterns like the one I found occur primarily when few independent populations are sampled, especially when migration rates are low between them. The skewed patterns are not a problem, per se, as they do result from a mathematical constraint between FST and heterozygosity. However, the confidence intervals used to identify putatively selected loci do not align with the actual patterns, leading to an excess of outlier loci – and therefore those outliers are not as reliable as candidate genes of interest. The results of these analyses have just been published in the Journal of Heredity

But wait, you might be thinking, didn’t you sample 12 populations? Good memory! Yes, I did. However, those populations clustered into larger clusters, due to isolation by distance, suggesting that they may not be truly independent. Therefore, the FST-heterozygosity distribution of my data reflects more closely the distribution of a sample from only 3 or 4 populations.


Genetic groupings: the populations sort into 3-4 groups (Flanagan et al. 2016)

So what do my recent results mean for researchers? First, be aware of the assumptions underlying the analysis methods you’re using! I was incredibly surprised by the number of studies that found an odd or skewed pattern that also didn’t meet the specified requirements (>10 populations). Second, if your study doesn’t fit the assumptions of the models you’re using, it may be best not to use that model! I was also amazed that no other researchers had mentioned the skewed Fst-heterozygosity relationship in their papers! Of the 112 papers presenting LOSITAN figures, 87 of them likely have an excess of outlier loci. This will affect inferences regarding the signature of selection as well as the future use of those loci as potential candidate regions for targeted studies. If people really want to use the Fst-heterozygosity comparison, especially if their dataset is only a little skewed, I have developed an R package called fsthet that will allow you to identify loci using quantiles drawn from the distribution of your data (rather than from simulations with model assumptions). This has its own drawbacks but might be useful for some people. Finally, using multiple approaches may help identify when an analysis isn’t right for your dataset. – one of the reasons the LOSITAN results stood out to me was because it identified so many more ‘significant’ loci than the other analyses I did. To summarize: think critically about your data, your analyses, and your results.

References (with links)

Antao T, Lopes A, Lopes RJ, Beja-Pereira A, and Luikart G. 2008. LOSITAN: a workbench to detect molecular adaptation based on a FST-outlier method. BMC Bioinformatics. 9:323.

Beaumont MA and Nichols RA. 1996. Evaluating loci for use in the genetic analysis of population structure. Proceedings of the Royal Society of London B. 263:1619–1626.

Flanagan SP, Rose E, and Jones AG. 2016. Population genomics reveals multiple drivers of population differentiation in a sex-role-reversed pipefish. Molecular Ecology. 25(20): 5043-5072. doi: 10.1111/mec.13794

Flanagan SP, and Jones AG. 2017. Constraints on the FST-heterozygosity outlier approach. Journal of Heredity. esx048. doi: 10.1093/jhered/esx048

Hess JE, Campbell NR, Close DA, Docker, MF, and Narum SR. 2013. Population genomics of Pacific lamprey: adaptive variation in a highly dispersive species. Molecular Ecology. 22:2898-2916.


Why I Marched

On Saturday, April 22, 2017, an unprecedented number of scientists and science enthusiasts turned out around the country to rally and march for science.

I showed up to march (and to help administer a social/political science survey–I helped do science at the science march!) for many reasons. Most importantly, the current political climate has demonstrated how the country has in many ways has devalued science. This devaluation of science is reflected in the proposed budget cuts, but has been evident for many years in the numerous ways in which scientific consensuses have been viewed with unnecessarily skeptical opinions.

This current anti-science (“post-truth”) social climate is not different from the world  scientists live in — we all live on the same planet. Society has gotten to where it is because scientists haven’t been vocal, have (generally) avoided politics, and have not taken responsibility for communicating our findings to the general public in a way they can understand. We scientists are in part to blame for the current political climate, and I believe that we need to make up for lost time and start defending what it is we do!

Another important message I hope the March for Science sent is the value of science to society. The programming at the March for Science in Washington, DC did a good job of highlighting the importance of basic science: it has led to many discoveries of economic and public good, all of which would have been impossible to predict. Supporting these basic science research programs is an important part of what has made the US a leader in science. Even though supporting basic research may seem in some ways like a waste of money (because it has no obvious direct benefits), the real benefit of basic research is that it can yield unforeseen and inconceivably transformative results. SCIENCE MATTERS!


A snapshot of the diversity of signs at the march

The march was inspiring because so many people turned up to show their support for science and science-based policy. Despite the rain, despite concerns about potential backlash for becoming politically engaged, people showed up! And everyone was optimistic and hopeful and excited to be there. I know the job isn’t done, and there is still much to be done to promote science in our society. But the March for Science was an excellent start.


Before the march people completely covered the National Mall near the Washington Monument

Pipefish pairing

In my recent paper published in Behavioral Ecology and Sociobiology, I described the results of some of the work I did while in Sweden (which I’ve written about previously 1,2,3). I discovered that individual quality (both male quality and female quality) and timing of reproduction impact reproductive success in the broad-nosed pipefish, Syngnathus typhle. This is an important finding because it highlights the complex dynamics of mating systems. The results are covered in a press release, and I wrote about my experiences for Biosphere Magazine, an online nature magazine. My story in Biosphere just came out (Issue 23) and you can read it here.

Understanding the different components of selection

Selection is a process that acts on variation in traits to determine the fitness (i.e., evolutionary success) of individuals, and is a key mechanism of evolution as long as the selected traits have a heritable basis. Selection is often split into sexual selection, which arises due to variance in mating/reproductive success, and natural selection, which is due to variance in all other aspects of fitness. One reason that we often distinguish between these two types of selection is because they can often oppose each other – so an estimate of total selection over an individual’s life might come out looking really small if sexual selection and natural selection act equally strongly but in opposite directions. It would be like one person walking up 50 stairs (50) and another walking down 50 stairs (-50) and saying that on average they climbed 0 stairs.

But selection can have trade-offs at many different points during an individual’s lifetime, not just between natural and sexual selection. Males and females are often under different selection pressures, and natural selection can also be broken down into different episodes or components. When it comes to measuring selective pressure at different episodes, Arnold & Wade (1984a,b) developed a systematic approach to comparing phenotypes of individuals to their fitnesses at a given episode of selection to estimate selection strength. This has been a very popular approach to understanding how selection works in any given system (and I used it to quantify sexual selection strength in pipefish), but it doesn’t get at the heritability part of the story. To do that, we need genetics.


A generic life cycle of an animal with some important components of selection highlighted.

I’ve written about the idea of selection components analysis before, and it is basically the genetic equivalent of comparing phenotypes and fitnesses. Instead, the frequency of different gene variants (alleles) are compared between individuals at different stages in the life cycle. This method allows us to isolate the effects of different types of selection (like sexual selection vs natural selection).

In my most recent paper, Genome-wide selection components analysis in a fish with male pregnancy, which is published in the journal Evolution, I used the selection components analysis approach in a population of pipefish to identify SNPs that have different allele frequencies in adult males and adult females (to find SNPs associated with differential viability in the sexes) and between successfully-mated females and the females in the population (to find SNPs associated with sexual selection).

To compare successfully-mated females and the total population of females, I used one of the cool features of pipefish as a model system: male pregnancy. The males who have mated are collected with their offspring in their brood pouch, so at each gene we can rule out which of the alleles in the offspring was contributed by the father and therefore deduce which allele was contributed by the mother. For example, if the father has a genotype C/C and the offspring has a genotype C/T, then we know that the mother had at least one copy of the T allele. Doing this, I was able to estimate allele frequencies in the females that had mated and compare those frequencies to those in the population.

In the population of pipefish that I studied, I found that sexual selection and differential viability selection on males and females (in other words, selection that puts different pressures on males than females or vice versa) both affect regions throughout the genome. Interestingly, some of the genetic regions under selection were significant in both the sexual selection and the males-females comparison — these regions may be experiencing the type of tradeoffs between episodes of selection I discussed above. It’s also possible that those regions are involved in traits that are under selection acting in the same direction in both episodes. One limitation of selection components analysis is that we can’t say which traits are under selection without doing more experiments. But it is a useful tool at picking apart the types of selection affecting the genome, and could have widespread uses across biological disciplines.

Note: If you would like a copy of my paper and don’t have access to it through a university library, please email me! Due to copyright restrictions I can’t post the PDF but I’d be happy to send it to you.

Population genomics: what is it and why should you care?

Recently one of my dissertation chapters was published in the journal Molecular Ecology. It’s titled “Population genomics reveals multiple drivers of differentiation in a sex-role-reversed pipefish, Syngnathus scovelli“.

This slideshow requires JavaScript.

In the study, my labmate/coauthor Emily Rose and I collected pipefish from 12 populations in the Gulf of Mexico (I wrote about the collecting trip in a series of blog posts1,2,3,4,5,6,7,8). I took the DNA and cut it up into a bunch of little pieces using special proteins and sequenced those little pieces using ‘high-throughput sequencing’–basically, using the latest sequencing technology to get millions of short sequences reads.  I then used the sequencing information to discover how similar the different populations were using a variety of statistical techniques. When we collected the fish, Emily and I had also photographed them, and from the photographs I was able to measure the size of the fish and to quantify the female bands (those silvery stripes on their bodies in the images above)–so I was able to compare traits in addition to genetics among the populations of pipefish.


Basically, I found that most of the genetic differences between the populations are due to so-called ‘neutral’ evolutionary processes such as migration and random genetic drift (i.e., not selection). On the other hand, the traits values were not correlated with geographic distance, suggesting that something else (possibly selection) might explain variation in the traits. We did find some genetic regions correlated with the trait values, and some that were correlated with environmental variables like temperature. But overall, we found that the traits and genotypes followed different patterns. Because the gene regions I studied used were distributed throughout the genome, these findings suggest that selection acting on the traits we measured do not have genome-wide effects but may have effects concentrated in certain genomic regions.

My paper describes a population genomics study. Population genomics is the genome-wide extension of population genetics–both aim to understand microevolutionary processes (i.e., shifts in frequencies of different forms of genes), but population genomics does so on a genome-wide scale (Luikart et al. 2003). Population genomics studies are important because they help researchers understand how populations are related to each other, how populations differ, how species adapt to new environments and evolve into new species, and which genetic regions are associated with traits (including disease traits). Population genomics is in part what allows companies like 23andMe to tell you what proportion of your genome comes from your Neandertal ancestors, and population genomics has helped identify genes associated with diseases (e.g., BRCA1 and breast cancer). Population genomics has also started to become a common method within the fields of evolutionary genetics, molecular ecology, and conservation genetics.

So why should you care about my population genomics study? First, it shows us that multiple evolutionary processes (migration, genetic drift, and selection) are prominent in shaping the genome and traits of pipefish. Evolutionary biologists want to know the relative importance of these forces because  we want to know whether evolution is adaptive (driven by selection to help the species better fit the environment) or whether it is stochastic (driven by changes in population demographics like being cut off from other populations). This helps us predict how species might react to various threats like climate change and fragmenting populations. As more population genomics studies of wild populations accumulate, we can start to compare between species and look for broad patterns that might provide insight into common patterns of evolution. Additionally, genomic studies such as this one can be used to identify possible genetic regions that are associated with environmental variables like temperature that could be useful for monitoring populations in the face of a changing climate.

Note: If you would like a copy of my paper and don’t have access to it through a university library, please email me! Due to copyright restrictions I can’t post the PDF but I’d be happy to send it to you.


Parallels between yoga and science

Several months ago I completed a 35-hour yoga teacher training class. In the course of that training, and in the months following as I tried to maintain my yoga practice while finishing my doctoral dissertation, I found that several of the core yoga principles were translatable to my scientific process. By applying those translatable principles to my daily scientific life, I felt more productive and focused. So I want to outline the principles and my application of them to my life as a grad student here.

Follow your instincts. This is the number one lesson from my yoga teacher: a guided-from-within life. In a yoga class, this means that you should listen to your body and do what feels natural rather than force yourself into an uncomfortable or painful pose. If it’s a free-form class, being guided-from-within means that you flow through the poses in a natural way without thinking too hard about the sequence. In my life as a graduate student, I applied the guided-from-within approach to which project I wanted to work on in each moment. I always have a long list of things to do, so to apply this principle I choose to work on the one that I feel the most into at the moment. Sometimes I don’t feel like reading a paper but would rather work on an analysis and create a figure. When that no longer feels right, I’ll switch to reading the paper, or maybe working on writing up a manuscript. By following my instincts, I increased my productivity by not feeling as forced to do my work. And as long as I get started on a project well ahead of the deadline, this approach doesn’t compromise my ability to turn things in on time. Of course, there will always be some tasks that are always unappealing (I’m looking at you, animal care and use protocols) and some deadlines that must be rushed towards. But on the whole, the guided-from-within yoga mentality can be really useful for improving productivity as a scientist/grad student.

Be in the moment. In yoga class, we’re encouraged to set aside our to-do lists and focus on feeling the movements. This principle holds up for researchers as well. Once I’ve chosen a task to work on, if I apply this yoga principle to my work I set aside facebook, emails, and my to-do list to simply focus on the one task in front of me. It sounds easy, but it can be quite challenging, especially if the task is reading a rather dry paper. But focusing on a single task really improves my productivity.

Meditation. Yoga is really all about meditation. Meditation involves clearing the mind of all the mundane, day-to-day things and refocusing the brain. I like to think about meditation as listening to my subconscious, although each person has a different way to describe it. Meditation can be an incredibly powerful tool for a busy scientist, because we’ve always got so many projects at different stages (project ideas, experiments currently running, analyses we’re working on, and papers we’re writing) that it is easy to lose track of yourself in all the madness. Sitting and meditating for a few minutes reconnects me with why I got into all the crazy projects in the first place and leaves me feeling more centered, more grounded, and less likely to become a mad scientist.

Practice the mindset at a small scale. My yoga instructor talks about the actual yoga practice (all the poses we do on the mat) as a way to practice the yoga mindset (being in the meditative, guided-from-within, in-the-moment, connected-to-the-subconscious mindset) in an easy place. Does it feel good to have my arm that way? No? Then I’ll move it. It’s an easy way of listening to yourself and a safe and relatively easy place to practice that yoga mindset. I like to think about science in a similar way. For me, the best example is when I’m reading a scientific paper. I can just read the paper and take in all the words. That would be the equivalent of just going through the motions in yoga class. Alternatively, I can read the paper in a scientific way, and practice thinking critically, asking questions, and really evaluating the paper. Reading a paper is an easy place to practice the scientific approach, but it’s the approach I want to have for all of my scientific endeavors. I should approach my own experiments and writing with the same critical thinking and questioning approach as reading a paper. So just like yoga, there are ways to practice the appropriate mindset so that it becomes easier to slip into it and utilize the skills built in that easy space.

Sometimes we need the positive energy of a safe space. You can practice yoga alone or with other people. The two settings, alone or surrounded by people, result in very different energies. Sometimes what you really want and need from a yoga practice is the energy of other people doing yoga alongside you, even though it’s an independent practice and you don’t really interact much. The energy of having other people doing a similar thing in a safe space is incredibly therapeutic. Similarly, you can work alone at home/office or you can work in a crowded coffee shop/lab. Sometimes I want to work alone, holed up without any company to really focus on my work. Other times, I crave the company of others similarly slaving away on a grant proposal or paper—and that’s when it’s best to find a safe space like a coffee shop or a writing group or even just my shared lab/office space to work in.

Everything has multiple dimensions. Yoga comes in five types: Hatha yoga (movement of the body), Jnana yoga (knowledge and study), Bhakti yoga (love and creativity), Karma yoga (charity), and Raja yoga (meditation). A true yogi will be balanced in the practice of all five of these yogic schools. I believe that it’s important to balance these five aspects as a person, but especially as a scientist who is expected to be incredibly dedicated to the job. Being a scientist means that our lives are mostly guided by the Jnana yoga school of thought. However, I think it’s important that we don’t forget to be active (because activity leads to longer and healthier lives), to reconnect with ourselves (meditation), or to give back to the community (charity). Importantly, I think it’s incredibly important to remain connected to the creative and artistic aspects of ourselves. Coming up with our new ideas requires a lot of creativity, and I personally think it’s important to cultivate my creativity in my hobbies, which primarily involve reading fantasy novels and crafting. Finding a balance between these five aspects helps enrich my life and also helps enhance my scientific practice.

I try to apply these lessons from my yoga training to my life as a graduate student (soon-to-be-postdoc!). They help me stay grounded and happy while improving my productivity. What do you think? Do you have other ways of staying productive?

Women in Science: challenges and solutions

Women are consistently under-represented at the upper levels of the scientific enterprise, such as at the level of professors1, administrators1, or as members of scientific academies2 (Fig. 1). The lack of diversity in science is something that most people in the scientific community wish to address, but there seems to be a lack of consensus about the best way to do so.


Figure 1. Borrowed from Urry1

Many people point to motherhood as being one of the major leaks in the so-called ‘leaky pipeline’ of science, but the repeated sexist ‘twitterstorms’3 and cases of sexual harassment in academia4 point to a hostile or inherently sexist work environment as the real culprit. However, there have been some major strides recently, including the election of the first female president of the Howard Hughes Medical Institute (one of the richest biomedical research institutions in the world)5.

One of the problems with the discussion of the inequality of women in science is that it often conflates many different issues: (1) the often grueling hours and/or expectations of an academic career6,7; (2) finding a work-life balance; (3) a lack of sufficient parental leave programs/family support programs7; (4) a long history of casual sexism in science and society7; (5) a lack of diverse role models in science7; and (6) women being unprepared for things like negotiating.

One solution is to generate and create support networks and host educational conferences aimed at educating and connecting women in science. About a week ago my university’s Women In Science and Engineering (WISE) group had a day-long conference, in which six distinguished women in science were invited to come and give a talk. The theme of this year’s conference was finding a work-life balance–which most agreed is best described as an equilibrium (Fig. 2). The reason it’s not a balance is because an equilibrium will maintain the same overall energy but the proportion of work and life can shift based on the various things we need to do. Each person’s equilibrium or balance will be different. Tied into this idea of equilibrium was using mindfulness and positive thinking to help make your work, which takes up a huge amount of your time, into feeling more like an enjoyable, or at least rewarding, part of your life.


Figure 2. A schematic of the work-life equilibrium. The line represents the same energy output, so an individual can shift along the line while maintaining energy output as required.


One of the most useful parts of the WISE conference was the workshop on negotiating. A major reason that women have lower pay than men is because they fail to negotiate. This workshop walked us through how to determine a reasonable negotiating range and helped us prepare for the negotiation conversation itself. It was an incredibly informative and useful workshop and I highly recommend everyone look into attending a similar workshop if possible.

An interesting  theme I noticed throughout the workshop was that some of the advice was to take on characteristically ‘male’ traits to be successful in the workplace, especially if dealing with the more ‘traditional’ (aka sexist) men. One of the panelists even suggested that women start manspreading and taking up physical space to exert dominance and maintain power in a conversation or relationship. Several of the women recommended setting strong boundaries with male colleagues, such as not going out for drinks after work, or not staying past 9pm. Discussions of appropriate behaviors in the workplace (e.g. not getting too ‘flirty’) also included being the ‘being one of the guys’ approach.

These suggestions are all good approaches to dealing with sexism and gender bias in the workplace, but it also made me wonder if that will just perpetuate the problems. A component of modern feminism includes embracing feminine traits and not just having women break the glass ceiling by adopting masculine traits and behaviors. So just like the problem of helping keep women in science, the solutions are nuanced and layered and not at all straightforward. How can we change the culture in science while simultaneously succeeding? 

What are your thoughts? Let me know what you think in the comments!



1Urry, 2015. Science and gender: Scientists must work harder on equality. Nature 528, 471–473.

2Gibney, 2016. Women under-represented in world’s science academies. Nature.

3Morello, 2015. Science and sexism: In the eye of the Twitterstorm. Nature 527, 148–151.

4Harmon, 2016. Chicago Professor Resigns Amid Sexual Misconduct Investigation. New York Times.

5Willyard, 2016. Howard Hughes’s next president: ‘Promote under-represented groups in science’. Nature.

6Duffy, 2015. You do not Need to Work 80 Hours a Week to Succeed in Academia.

7Shen, 2013. Inequality quantified: Mind the gender gap. Nature 495, 22–24.