Hardy Weinberg Analysis

来源:百度文库 编辑:神马文学网 时间:2024/05/17 09:03:22
  Hardy Weinberg Analysis

(adapted from JonBaker’scurriculum unit)

 
Summary
           The first question thatpopulation geneticists ask after they collect genotype data for a population iswhether or not the population is in Hardy Weinberg equilibrium for that locus. This test involves comparing the observed and expected genotypefrequencies that you calculated for each of the populations above. When a population is in Hardy-Weinberg equilibrium for a given geneticlocus it means that there is random mating (with respect to that locus), noselection, no mutation, no gene flow and a population large enough to avoid therandom effects of genetic drift.  Populationgeneticists generally use population genetic profiles to determine howreproductively isolated populations are from one another. However, if selection is operating on a locus and one finds differencesbetween populations at that locus, the difference may be due to selection actingdifferently on the two populations, rather than the result of reproductiveisolation between the two populations.  Thus, if one is comparing the allele frequencies of twopopulations , first one needs to determine whether each population is in HardyWeinberg equilibrium.  If they arenot, one may well be comparing apples and oranges, since one population may haveselection causing it to be out of equilibrium, and the other might havenonrandom mating or some other process causing the deviation fromHardy-Weinberg. equilibrium.  Ifboth populations are in Hardy Weinberg equilibrium, then it is more likely thatdifference in allele frequencies of the two populations is due to reproductiveisolation.

 

CalculatingGenotype and Allele Frequencies

           From your  Hinf1 restrictiondigest gel you will (hopefully!) be able to distinguish a genotype for eachfish.   (See gel above, andschematic drawing below).

           Use the data sheet to score the number of fish that have each genotype:AA, AB, or BB.   (Use aseparate data sheet for each population).  Disregardsamples where a genotype can not clearly be established.  If desired, you could try repeating the PCR on that sample,or digesting more of the PCR product, or running more of the digest on the gel,depending on what the problem seems to be.  

           After recording the genotypes on the data sheet under the"Observed" heading, calculate the frequencies of each genotype in thatpopulation, by dividing each genotype's total by the total number of genotypesobtained.  

           Now it's time to calculate the allele frequencies, i.e. the proportion ofA alleles and B alleles in your population (how many A's and B's you had,divided by the total number of alleles - use bottom part of data sheet). Remembering that each fish is diploid, how many total alleles were inyour population?  That's right,double the total number of fish scored.  Tocalculate the frequency of each allele, start by counting how many A's and B'syou had.   For example, if you had 6 AA, 4 AB, and 2 BB genotypes(total =12 fish), then there are 16 A's and 8 B's (total = 24 alleles: remember,each AA genotype contributes two A's and each AB genotype contributes one A tothe total; likewise with B's).   Nowdivide the number of A's by the total number of alleles to get the frequency ofA (which we will now call "p"- in this example 16¸24or .67), and likewise for frequency of B (which we will now call "q" -it would have to be .33 since they have to add up to one). 

           Now you're ready to do the Hardy Weinberg analysis. Essentially this analysis involves figuring out how many fish of eachgenotype you would have expected given your allele frequencies, IF there wasrandom mating, no selection, mutation, migration and large population size withrespect to the prolactin-2 gene.  TheHardy Weinberg theorem says that if the five basic conditions are met, allelefrequencies for a given gene will not change from one generation to the next. This is the first thing you want to establish, before you can compare onepopulation to another. 

           The Hardy-Weinberg Theorem really is a model that states that for any setof allele frequencies one can figure out the expected genotype frequencies ifthe population is not undergoing any evolution. If a population is in Hardy Weinberg equilibrium (HWE), the observedgenotype frequencies will conform to p2 + 2pq + q2, wherep2 = freq (AA),  2pq =freq (AB), and q2 = freq (BB).  Todetermine whether your population at your locus is in HWE take the p and qvalues you calculated above for your population (observed data), and calculate the genotype frequencies you would EXPECT if the populationwere in HWE.  To do thiscalculation, use the value for p that you calculated and square it to get theexpected frequency of AA genotypes.  Continuefor each genotype using 2pq and q2. Write these numbers on the data table line for expected genotypefrequencies.  To convert theseexpected frequencies into actual numbers of fish to compare to your actualpopulation you need to multiply each of the expected frequencies by the totalnumber of fish in the actual population to figure out how many fish you wouldhave expected to be of each genotype, if the population were in HWE. For the example given here, expected AA frequency = p2 = (.67)2= .45.  Multiplying .45 x 12 totalfish = 5.4 expected AA fish.  Do thesame set of calculations to get your expected number of fish for the other twogenotypes, and write the numbers on the data sheet under "expectedfish."

O.K. so you expected 5.4 AA fish,and you actually counted 6 AA fish.  Doesn'tseem that different, but the question is how do you know whether your expectednumber of genotypes is significantlydifferent from your observed number of different genotypes? To answer this question  weneed a statistical tool to help us. 

 

Chi Square Goodness-of-Fit Test

           We will use the statistical  testknown as Chi Square Goodness of Fit to determine whether there is a significantdifference between the number of actual and expected genotypes.  We first formulate a conservative hypothesis, called the nullhypothesis (Ho), which states that there is no difference between theobserved and expected values; therefore the population is in Hardy Weinbergequilibrium.  We also state thealternate hypothesis (HA) that there is a significantdifference, and therefore the population is notin Hardy Weinberg equilibrium.  Usingthis test will tell you the probability of arriving at the differences you findby chance alone, i.e. the lower the probability generated by the test, thegreater the likelihood that the difference you see between the observed andexpected result is actually the result of a violation of one of the fiveconditions listed above.

           The test determines the significance of the difference between the twosets of numbers, by first plugging them into the equation:

 

c2= å(observed - expected)2                                                             expected

 

This equation gives us the Chi Square critical value,which, by looking at a table of Chi square critical values, will tell us theprobability that the difference we find is by chance alone. .For the example given above, the c2calcuation would be:

 

(6 - 5.4) + (4 - 5.3) + (2 - 1.3) = .07 + .32 + .38 = .77    5.4          5.3          1.3

 

Beforewe look up this critical value in the Chi Square table, we have to choose alevel of certainty with which we are comfortable.  For instance, a certainty (also called the P value or alphavalue) of 0.05 means that 5% of the time you might actually say that there is adifference, when there really is no difference.  For most scientific purposes, the level of certainty isarbitrarily set at 0.05, meaning there is only a 5% probability that thedifference between observed and expected is due to chance alone.  

We also need to calculate thedegrees of freedom (n). The number of degrees of freedom is equal to the number of classes (inour case, the three genotypes) minus one (because if we know two of the expectedgenotype frequencies we automatically know the third) minus the number ofindependent values we calculated from our observed data to determine ourexpected values (these independent values are the allele frequencies – onlyone of which is independent, because if we know p then we automatically know q). The equation for calculating degrees of freedom is:

d.f = k – 1 - m

 

where k is the number of classes (genotypes) and m is thenumber of independent values we calculate from the data (allele frequencies). For a two allele system there are three classes and one independentvalue, thus there is 1 degree of freedom.

d.f. = 3 – 1 – 1

 

Now you are ready to go to the Chi Square table ofcritical values (note: this is a partial table)   

 

                   Probability ofexceeding the critical value

od.f.        0.10      0.05    0.025      0.01    0.001

----------------------------------------------------------------

  1         2.706     3.841    5.024     6.635   10.828

  2         4.605    5.991     7.378    9.210    13.816

  3         6.251     7.815    9.348    11.345   16.266

  4         7.779     9.488   11.143    13.277   18.467

  5         9.236    11.070   12.833    15.086   20.515

 

Therows are arranged by increasing degrees of freedom, so you will be using onlythe top row (for one degree of freedom) for the goodness of fit test. The columns are arranged by probability, with 0.10 (or 10%) on the leftand 0.001 (or 0.1% chance) on the far right.  For the 5% level, go to the 0.05 column, where the number in the top rowis 3.84. If your critical value is less than 3.84 then there is a 95%probability that the difference between observed and expected was caused bychance alone; therefore you would accept the null hypothesis that the observedand expected values are not significantly different, and that your population isindeed in Hardy Weinberg equilibrium.

Ifyour critical value is greater than 3.84, then there is only a 5% probabilitythat the difference between your observed and expected values is due solely tochance.   Therefore, thedifference would be consideredsignificant, and you would reject the null hypothesis that the difference wasdue to chance.  You would thenaccept the alternative hypothesis that there was indeed a significant differencedue to something besides chance. 

            Inthis case, you would have cause to believe that one of the five conditions for

H-W equilibrium had beenviolated.  You would probably wantto confirm your results by increasing your sample size before trying todetermine which condition may have been violated.

            Ifyour population is in Hardy Weinberg equilibrium for a given locus, then ittells you that there is no evolution going on in terms of that particular gene. These fish mate randomly with each other and AA fish don't prefer otherAA fish; they'd be just as happy to mate with AB or BB fish.  BB fish are not surviving to parenthood any more than the AAs or ABs. The A and B alleles are not mutating at a significant rate. And the population is behaving as though there is a large pool of fishwith these alleles.  

           Once we know our population is in Hardy Weinberg equilibrium, andtherefore that it is nonevolving in terms of the gene in question, we are ready to compare it to another population for that locus.  If it is not in Hardy Weinberg equilibrium, there may be naturalselection or nonrandom mating or something else going on, so we wouldn't be ableto do a fair comparison to another population. In that case, we would probably want to look at another locus, hopefully one that is in Hardy Weinberg equilibrium.