Pedigree Likelihood in R

The genetic content of individuals from the same family is not independent of each other.
When we want to calculate the likelihood of genetic or phenotype-genotype data collected in a family, we need to consider family structure.
- The family structure is called “pedigree”.
One popular algorithm to estimate the likelihood of observed data on a pedigree is the Elston-Stewart algorithm
There are several R-Packages that implement a general purpose Elston-Stewart algorithm.
- Today we will be using a package called “ElstonStewart” to implement a few different genetic models and calculate the likelihood of a pedigree.

Load Environment

# clean
rm(list = ls(all=TRUE))

# load
library(ElstonStewart)

Example using the ElstonStewart R-Packages and “conrad2” data set

This family has many generations, inbreeding and loops
- The Elston-Stewart algorithm does not work well in pedigrees with loops/inbreeding
- Packages like the ElstonStewart or pedprobr/pedtools R-Package can deal with such complex pedigrees

Load the conrad2 data

## cf Elston-Stewart vignette for more comments on this example
data(conrad2)

Add the genotype information for each family member

In our example, we do not know the genotypes of any family member, except 1 and 22
- This is coded by giving each family members all possible genotypes (except 1, 22)
- 0 = homozygous wild type
- 1 = heterozygous
- 2 = homozygous variant

# creating an es.pedigree object
conrad.genotypes <-rep(
  list(
    0:2
  ), 22
)
conrad.genotypes[[1]]=2
conrad.genotypes[[22]]=2

We will make a phenotype list

In our first model, the phenotype probability for each family member will be 1
- We will change this when we use our custom model
We will give a dummy phenotype to every family member

# get phenotypes
conrad.phenotypes=rep(
  0,
  22
)

Make the pedigree object

The ElstonStewart R-Package uses the “es.pedigree” object
- I believe that’s French for “it is a pedigree”

# make es.pedigree object for Elston-Stewart
X <- es.pedigree( 
  id = conrad2$id,
  father = conrad2$father,
  mother = conrad2$mother, 
  sex = conrad2$sex,
  pheno = conrad.phenotypes,
  geno = conrad.genotypes
)
plot.es.pedigree(X)

Run the Elston-Stewart algorithm using the basic di allelic model provided in the ElstonStewart R-Package

modele.di = Models inheritance based on two alleles (as is the case for humans in somatic chromosomes)
theta = contains the allele frequency of the “A” allele

# input theta
theta=list(p=0.9)

# running Elston-Stewart
r <- Elston(
  ped = X,
  modele = modele.di,
  theta = theta
)
r$result

## [1] 0.0002160156

This result shows the observed data is very unlikely, as our proband (22) has a homozygous variant phenotype, which should be very unlikely given the allele frequency we input (theta)

Summary of what we learned

The Elston-Stewart algorithm can calculate the likelihood of family data even in complex pedigrees.
It is ideal on pedigrees that do not have loops/inbreeding, but packages like ElstonStewart or pedprobr/pedtools can handle inbred pedigrees.
The algorithm needs as input:
- allele frequency (to calculate founder genotype probabilities)
- Mendelian rules of transmission (to get transmission probabilities)
- Penetrance estimate/ penetrance function (to calculate phenotype probabilities give genotypes)
On our examples, we implemented an instance of the Elston-Stewart Algorithm for the di-allelic case where phenotype probabilities are always 1.
- The phenotype probabilities need not be 1 in every case. This can be easily adjusted by changing the phenotype probability function.
- For example, a phenotype based on observing a particular event by age t, that can be modeled using a parametric survival analysis (for example, an exponential function), non-parametric (Kaplan-Meier estimator) or semi-parametric (Cox Proportional Hazards framework)
We will now do a simple example of this kind of penetrance (phenotype probabilities)
We will write our own custom model from scratch

First, we’ll write a function to get founder probabilities

These are the probabilities of observing a given genotype in founders
- Founders are individuals without parents
‘theta$p’ is the frequency of the major allele (A allele)
‘g’ is the genotype of this individual
this function is exactly the same as the one in “modele.di”, since we are still working on human data and somatic chromosomes.

get.founder.probabilities.from.afreq=function(
  g,
  theta
) 
{
  if(g == 0)
    {
        return(theta$p^2)
    } 
  if(g == 1)
    {
      return(2 * theta$p * (1 - theta$p))
    } 
  if(g == 2)
    {
      return((1 - theta$p)^2)
    }
    stop("Unknown Genotype")
}

Make fuction to get the probability of the phenotypes given a genotype

We call these penetrance
In this example, the phenotype probability is not always equal to 1 (different from modele.di)
The probability of observing a phenotype event by age t is exponentially distributed, with a higher rate parameter for people without variant genotype.

## Make function to get penetrance probabilities based on an exponential function (will use in the future)
get.phenotype.genotype.probabilities=function(
  x,
  g,
  theta
)
{
  # get phenotype information
  phenotype.event=x[1]
  phenotype.time=x[2]
  
  # get probability of phenotype
  if(g==0)
  {
      if(phenotype.event==1)
    {
      p=dexp(
        x=phenotype.time,
        rate = 0.1
      )
    } else if(phenotype.event==0)
    {
      p=pexp(
        q=phenotype.time,
        rate = 0.1,
        lower.tail = FALSE
      )
    } else
      {
        warning('invalid phenotype/genotype')
      }
  } else if(g==1 | g==2)
  {
    # get probability
    if(phenotype.event==1)
    {
      p=dexp(
        x=phenotype.time,
        rate = 0.025
      )
    } else if(phenotype.event==0)
    {
      p=pexp(
        q=phenotype.time,
        rate = 0.025,
        lower.tail = FALSE
      )
    } else
      {
        warning('invalid phenotype/genotype')
      }
  }
  return(p)
}

Make function to get transmission probabilities

These are the probabilities of observing a genotype in the offspring given the genotypes of both parents
They are derived from Mendel’s rules of inheritance
This function works the same as the transmission probabilities in modele.di, as we are still in the di-allelic setting with somatic chromosomes.

get.transmission.probabilities=function(
  of,
  fa,
  mo
) 
{
    if (fa == 0) 
    {
        if (mo == 0 & of == 0) 
            return(1)
        if (mo == 1 & (of == 0 | of == 1)) 
            return(0.5)
        if (mo == 2 & of == 1) 
            return(1)
    }
    if (fa == 1) 
    {
        if (mo == 0 & (of == 0 | of == 1)) 
            return(0.5)
        if (mo == 1 & (of == 0 | of == 2)) 
            return(0.25)
        if (mo == 1 & of == 1) 
            return(0.5)
        if (mo == 2 & (of == 1 | of == 2)) 
            return(0.5)
    }
    if (fa == 2) 
    {
        if (mo == 0 & of == 1) 
            return(1)
        if (mo == 1 & (of == 1 | of == 2)) 
            return(0.5)
        if (mo == 2 & of == 2) 
            return(1)
    }
    return(0)
}

Make a new modele object for ElstonStewart R-Package

new.modele = list(
  'name' = 'CCVR.Modele',
  'proba.g' = get.founder.probabilities.from.afreq,
  'trans' = get.transmission.probabilities,
  'p.pheno' = get.phenotype.genotype.probabilities
)

Make the phenotype information appropriate for our model

No family member has event by age 60, except individual 22, who had event at 35

# give phenotype with events to many family members
conrad.phenotypes = rep(
  list(c(0,60)), 
  22
)
conrad.phenotypes[[22]]=c(1,35)

Run the Elston-Stewart algorithm using the ElstonStewart R-Package with our custom model

# make es.pedigree object for Elston-Stewart
Our.Pedigree <- es.pedigree( 
  id = conrad2$id,
  father = conrad2$father,
  mother = conrad2$mother, 
  sex = conrad2$sex, 
  pheno = conrad.phenotypes, 
  geno = conrad.genotypes 
)

# running Elston-Stewart
r <- Elston(
  ped = Our.Pedigree,
  modele = new.modele,
  theta = theta
)
r$result

## [1] 7.523996e-25

As you can see, our probabilities changed, which makes sense since our model changed.
Now, lets see how the probabilities change if we change the genotype and phenotype information.

Make example where many family members have early events

# make most famly members have genotype 1 or 2
conrad.genotypes=rep(
  list(1:2),
  22
)
conrad.genotypes[[22]]=2

# give phenotype with events to many family members
conrad.phenotypes = rep(
  list(c(1,40)), 
  22
)
conrad.phenotypes[[22]]=c(0, 25)

Run our modele again

# make es.pedigree object for Elston-Stewart
Our.Pedigree <- es.pedigree( 
  id = conrad2$id,
  father = conrad2$father,
  mother = conrad2$mother, 
  sex = conrad2$sex, 
  pheno = conrad.phenotypes, 
  geno = conrad.genotypes 
)

# running Elston-Stewart
r <- Elston(
  ped = Our.Pedigree,
  modele = new.modele,
  theta = theta
)
r$result

## [1] 3.422755e-51

Final summary/thoughts

Elston-Stewart is a really powerful algorithm to calculate the likelihood of data observed from a pedigree.
Several implementations exist in R.
You can customize the functions for founder probabilities, transmission probabilities, and phenotype probabilities to suit your needs.