- The genetic content of individuals from the same family is not independent of each other.
- When we want to calculate the likelihood of genetic or phenotype-genotype data collected in a family, we need to consider family structure.
- The family structure is called “pedigree”.
- One popular algorithm to estimate the likelihood of observed data on a pedigree is the Elston-Stewart algorithm
- There are several R-Packages that implement a general purpose Elston-Stewart algorithm.
- Today we will be using a package called “ElstonStewart” to implement a few different genetic models and calculate the likelihood of a pedigree.
Load Environment
# clean
rm(list = ls(all=TRUE))
# load
library(ElstonStewart)
Example using the ElstonStewart R-Packages and “conrad2” data set
- This family has many generations, inbreeding and loops
- The Elston-Stewart algorithm does not work well in pedigrees with loops/inbreeding
- Packages like the ElstonStewart or pedprobr/pedtools R-Package can deal with such complex pedigrees
Load the conrad2 data
## cf Elston-Stewart vignette for more comments on this example
data(conrad2)
We will make a phenotype list
- In our first model, the phenotype probability for each family member will be 1
- We will change this when we use our custom model
- We will give a dummy phenotype to every family member
# get phenotypes
conrad.phenotypes=rep(
0,
22
)
Make the pedigree object
- The ElstonStewart R-Package uses the “es.pedigree” object
- I believe that’s French for “it is a pedigree”
# make es.pedigree object for Elston-Stewart
X <- es.pedigree(
id = conrad2$id,
father = conrad2$father,
mother = conrad2$mother,
sex = conrad2$sex,
pheno = conrad.phenotypes,
geno = conrad.genotypes
)
plot.es.pedigree(X)

Run the Elston-Stewart algorithm using the basic di allelic model provided in the ElstonStewart R-Package
- modele.di = Models inheritance based on two alleles (as is the case for humans in somatic chromosomes)
- theta = contains the allele frequency of the “A” allele
# input theta
theta=list(p=0.9)
# running Elston-Stewart
r <- Elston(
ped = X,
modele = modele.di,
theta = theta
)
r$result
## [1] 0.0002160156
- This result shows the observed data is very unlikely, as our proband (22) has a homozygous variant phenotype, which should be very unlikely given the allele frequency we input (theta)
Summary of what we learned
- The Elston-Stewart algorithm can calculate the likelihood of family data even in complex pedigrees.
- It is ideal on pedigrees that do not have loops/inbreeding, but packages like ElstonStewart or pedprobr/pedtools can handle inbred pedigrees.
- The algorithm needs as input:
- allele frequency (to calculate founder genotype probabilities)
- Mendelian rules of transmission (to get transmission probabilities)
- Penetrance estimate/ penetrance function (to calculate phenotype probabilities give genotypes)
- On our examples, we implemented an instance of the Elston-Stewart Algorithm for the di-allelic case where phenotype probabilities are always 1.
- The phenotype probabilities need not be 1 in every case. This can be easily adjusted by changing the phenotype probability function.
- For example, a phenotype based on observing a particular event by age t, that can be modeled using a parametric survival analysis (for example, an exponential function), non-parametric (Kaplan-Meier estimator) or semi-parametric (Cox Proportional Hazards framework)
- We will now do a simple example of this kind of penetrance (phenotype probabilities)
- We will write our own custom model from scratch
First, we’ll write a function to get founder probabilities
- These are the probabilities of observing a given genotype in founders
- Founders are individuals without parents
- ‘theta$p’ is the frequency of the major allele (A allele)
- ‘g’ is the genotype of this individual
- this function is exactly the same as the one in “modele.di”, since we are still working on human data and somatic chromosomes.
get.founder.probabilities.from.afreq=function(
g,
theta
)
{
if(g == 0)
{
return(theta$p^2)
}
if(g == 1)
{
return(2 * theta$p * (1 - theta$p))
}
if(g == 2)
{
return((1 - theta$p)^2)
}
stop("Unknown Genotype")
}
Make fuction to get the probability of the phenotypes given a genotype
- We call these penetrance
- In this example, the phenotype probability is not always equal to 1 (different from modele.di)
- The probability of observing a phenotype event by age t is exponentially distributed, with a higher rate parameter for people without variant genotype.
## Make function to get penetrance probabilities based on an exponential function (will use in the future)
get.phenotype.genotype.probabilities=function(
x,
g,
theta
)
{
# get phenotype information
phenotype.event=x[1]
phenotype.time=x[2]
# get probability of phenotype
if(g==0)
{
if(phenotype.event==1)
{
p=dexp(
x=phenotype.time,
rate = 0.1
)
} else if(phenotype.event==0)
{
p=pexp(
q=phenotype.time,
rate = 0.1,
lower.tail = FALSE
)
} else
{
warning('invalid phenotype/genotype')
}
} else if(g==1 | g==2)
{
# get probability
if(phenotype.event==1)
{
p=dexp(
x=phenotype.time,
rate = 0.025
)
} else if(phenotype.event==0)
{
p=pexp(
q=phenotype.time,
rate = 0.025,
lower.tail = FALSE
)
} else
{
warning('invalid phenotype/genotype')
}
}
return(p)
}
Make function to get transmission probabilities
- These are the probabilities of observing a genotype in the offspring given the genotypes of both parents
- They are derived from Mendel’s rules of inheritance
- This function works the same as the transmission probabilities in modele.di, as we are still in the di-allelic setting with somatic chromosomes.
get.transmission.probabilities=function(
of,
fa,
mo
)
{
if (fa == 0)
{
if (mo == 0 & of == 0)
return(1)
if (mo == 1 & (of == 0 | of == 1))
return(0.5)
if (mo == 2 & of == 1)
return(1)
}
if (fa == 1)
{
if (mo == 0 & (of == 0 | of == 1))
return(0.5)
if (mo == 1 & (of == 0 | of == 2))
return(0.25)
if (mo == 1 & of == 1)
return(0.5)
if (mo == 2 & (of == 1 | of == 2))
return(0.5)
}
if (fa == 2)
{
if (mo == 0 & of == 1)
return(1)
if (mo == 1 & (of == 1 | of == 2))
return(0.5)
if (mo == 2 & of == 2)
return(1)
}
return(0)
}
Make a new modele object for ElstonStewart R-Package
new.modele = list(
'name' = 'CCVR.Modele',
'proba.g' = get.founder.probabilities.from.afreq,
'trans' = get.transmission.probabilities,
'p.pheno' = get.phenotype.genotype.probabilities
)
Run the Elston-Stewart algorithm using the ElstonStewart R-Package with our custom model
# make es.pedigree object for Elston-Stewart
Our.Pedigree <- es.pedigree(
id = conrad2$id,
father = conrad2$father,
mother = conrad2$mother,
sex = conrad2$sex,
pheno = conrad.phenotypes,
geno = conrad.genotypes
)
# running Elston-Stewart
r <- Elston(
ped = Our.Pedigree,
modele = new.modele,
theta = theta
)
r$result
## [1] 7.523996e-25
- As you can see, our probabilities changed, which makes sense since our model changed.
- Now, lets see how the probabilities change if we change the genotype and phenotype information.
Make example where many family members have early events
# make most famly members have genotype 1 or 2
conrad.genotypes=rep(
list(1:2),
22
)
conrad.genotypes[[22]]=2
# give phenotype with events to many family members
conrad.phenotypes = rep(
list(c(1,40)),
22
)
conrad.phenotypes[[22]]=c(0, 25)
Run our modele again
# make es.pedigree object for Elston-Stewart
Our.Pedigree <- es.pedigree(
id = conrad2$id,
father = conrad2$father,
mother = conrad2$mother,
sex = conrad2$sex,
pheno = conrad.phenotypes,
geno = conrad.genotypes
)
# running Elston-Stewart
r <- Elston(
ped = Our.Pedigree,
modele = new.modele,
theta = theta
)
r$result
## [1] 3.422755e-51
Final summary/thoughts
- Elston-Stewart is a really powerful algorithm to calculate the likelihood of data observed from a pedigree.
- Several implementations exist in R.
- You can customize the functions for founder probabilities, transmission probabilities, and phenotype probabilities to suit your needs.