MB620 Bioinformatics
University of New Haven
Instructor: Joel S. Bader
Class 7: Genetics. Mapping. Genomic DNA Analysis.


Agenda


The Big Picture

Genetics
Traits/Genes to Location Genetic and physical maps
Research Genetics mapping panel
Stanford mapping server
Traits/Genes to Experimental Organisms Jackson Laboratories
Trait/Gene Location Database OMIM, On-line Mendelian Inheritance in Man
Genomic DNA Analysis
Sequences to Contigs CuraTools
CAP, PHRAP
Contigs to mRNA Genscan
Grail
mRNA Analysis
Protein Analysis

Genes to Physical Locations

How do we know where genes are on chromosomes?
Genes started out as units of inheritance, nothing to do with DNA
Traits are linked if inheriting one makes it more likely to inherit the second
Early days: genetic maps, crosses, Morgans, cMs
Linkage example: 2 simple traits
heterozygous at both alleles
Parents(AB)(AB) x (ab)(ab) homozygous at both alleles
F1(AB)(ab)
F2(AB)(AB) + (ab)(ab) + (AB)(ab) if strongly linked
(Ab)(..) + (aB)(..) if weakly linked
Genetic maps: how often do traits recombine?
1% recombination = 1 centi-Morgan

Physical maps:
Traits are genes with a physical location on a chromosome
Recombination depends on physical distance between genes
Large distance (or different chromosomes): high recombination (what is the maximum?)
Small distance: low recombination
1% recombination = 1 Mb (approximately)
How many markers?

Chromosome nomenclature: (chrom #)[p|q](region)(band)
p = short arm, q = long arm
regions and bands are numbered from centromere to teleomere

Making physical maps

Looking at maps: GenBank, Entrez

Using physical maps

Model Organisms

Study human disease in an experimental organism
Mouse genetics: Jackson Labs
Go to Jackson Labs, search for leptin, find mouse genetics information
See how to order mice
Set up crosses, find physical location of gene

Database of Mapped Genes: OMIM

OMIM = On-Line Mendelian Inheritance in Man
Primarily disease genes.

How to use OMIM

Example: obesity
Gene map: find obesity gene, then genes next on the chromosome
Morbidity map: diseases in alphabetical order
Search for obesity, find leptin and leptin receptor

Does mapping a gene mean that you know the sequence?
Positional cloning: find marker, sequence the DNA nearby, look for a gene.
What is the typical location resolution?
How many genes are possible?


Sequences to Contigs

We covered this last week.

Contigs to mRNA (Gene Prediction)

Two widely-used programs:
Genscan
Grail

Example: Take this sequence into Genscan, predict exons and protein sequence.
How does it do? Blastn to find Genbank record, compare features.
What is the difference between intron/exon and CDS?

        1 tttttttttt gagctgggac cgaacccagg gccttcggct tcctaggtaa gcgctctacc
       61 actgagctaa atccccagcc cctctgccct tcctctttga ggcctctttg tctctatgag
      121 accccagcag ctctggggta gaggccaggc atagtgtgtg aacgagttcc tggcttctgt
      181 attctccata cccagcccac tagtaatctg ggacggtttg atgcaaatca ggagagccat
      241 agctacttgg tgagggaaga cgtagggaca cttgaactgc cttcacctct tcaatctgtc
      301 tgctgctttg tcccctgtag gaaccctcta ctctgagggt attgtgtcct gaggttcagg
      361 tatcagagac ctgaaaccct gtgtcttgac tctggctaga ccctgttgac caagggctag
      421 gctcaggagc tccgtagtag gtcatgagcc agatgtgtca gaagaggaca taagtacagg
      481 agggtagaag tgaagggggc ctctcaagct gctgtttcta gaacttgact gggctcatca
      541 cagcaggctg aaatccaggg gtccgctctt gtgacagttg agttcttctc tcacatcccc
      601 accctctctc cctccacccc gggggtcatt gtatgtagtc atggctggta ctgaactttg
      661 gatagtactc cacctcctga atgctaggac tacaggtatg tgccaccata cctggctcta
      721 tcttgtatat tttgtttgtg ggcacacagg acagaggccc atctcagtcc taggagctca
      781 tagtttgact tcttacaggc cccaagaagc tcttcgcaat tccgccatga gaatccacca
      841 gagctgtctt agttagggtt ttactgctgt gaacaaacac catgaccaag gcaactctta
      901 tatggatgac atttcactgg ggctggctta caggttcagt ccattatcat caaggtgggg
      961 gagtggcagt gtccaggcag atgtgggggc tggaggaact gagagttcca cctcttgttc
     1021 caaaggtagc taggagaagg tagctgatgt tcatggtttc tgctgcccag gatttgtcac
     1081 tggcgatccc acatcttaca ggaaacatca ggcagaggtg tcccttcatc tgggccctgg
     1141 ctagaaactg ccctggaatg agacgagtgg ccatggacct gggcactgga ccctcgctct
     1201 ctttatctga ggtcagattg cctggccagt cagctttccc aggctaaaaa taggcggagg
     1261 tgtccaggac aacataactg aggagagagg tagtggctga gttttcggtg ccactctgag
     1321 agatttgggt gaccagataa ggaggtgatt ctttcagcca gccaacccca tgtagcagga
     1381 aaacagttgc ccatactcac ctgtctctga attggcattg caagctactc atgtctcctc
     1441 cctaaacctc agctataggg aggccttggg ctcagaggct tggttctggg gagcaggatg
     1501 ctctgtagat cttctccaga ctccactatt ctggttctcc gcagcctgag gagaggtgca
     1561 cacaccccca aggacccagg cacccaacct ctgccagatg tgggggggtg gctacccaga
     1621 ggcatgctcc tcacccagct ccactgtccc tacctgctgc tgctgctggt ggtgctgtca
     1681 tgtctggtga gtgccgtgca ccccacagca cctgcatgga ggagggttgg ctgctctgta
     1741 cacaagtgct gagagctctc tggttgcttg cctacctgtt tcccagccaa aggcaccctc
     1801 tgcccaggta atggactttt tgtttgagaa gtggaagctc tatagtgacc agtgccacca
     1861 caacctaagc ctgctgcccc cacctactgg tgagtcccac caaagactcc tgtgtcctga
     1921 caccccgcct ggaggtacac tcagagacct tatggggatg taatagtaat ggctgcttta
     1981 taatgcccag ccacttgccc ccagttacag actgacctcc agaggcagtg gcttccctaa
     2041 ggctgtatgg tcaggaaaca gtagaaatgc agaactgcct cagggctgcc ctcatcccca
     2101 gccagctgat gtctgctgtc accgctcaca ctgggcagac agtgaatagg gacagggcag
     2161 ggcagagaga ctgggtcttc ccagtctcag ttgaggggga tgagtgcctg ggagggaggg
     2221 agaaggatga gggagctatg ctacggctgg gcctggaaaa ggtgccagcc aaactggagt
     2281 ctgacatctg acaaggaatg tattaccagg caggaggggc cagtgagatg gctcagcaga
     2341 taatggccct ttcctctggg actcagtgaa aatggataat gatccttgca agttgtccta
     2401 tgatcctgta tgctggtata cacacttgta tgcttctgag tgcatgcaca tgtgcgcgcg
     2461 cgcgcacgcg cgcacacaca cacacacaca cactaaataa atgaacagat aaatgtaaaa
     2521 agctttttac aaatttttat aaaagataca tagaggaaag acacaaaatg gggtctgtgc
     2581 acatttggga tgggatatct tgggcaaaca ccaatgttgc tgggctggag agggtagtca
     2641 tggagtctag aaggcaagat aaccagccgg ggatgcccct tgtggcccac atcacggagg
     2701 cagcccctat agatgcaggt tcagagaagt ggtgttgata ctcagggggc ttctggtccc
     2761 acccccatcc tccttcacct ttacagagct ggtctgcaac agaactttcg acaagtactc
     2821 ctgctggcct gacacccctc ccaacaccac tgccaacatt tcctgcccct ggtacctacc
     2881 ttggtaccac aaaggtaacg gggtagtgag tgcctgggag gctgagggtg caaagtctgg
     2941 gagtgggctg accagagctt acacccatgt cccagtgcag caccgcctag tgttcaagag
     3001 gtgtgggcct gatgggcagt gggttcgagg gccacggggg cagtcatggc gcgacgcctc
     3061 ccaatgtcag atggatgatg acgagatcga ggtccaggtc agctctgaag ggtgtggggt
     3121 ggtgttgcca tggggttgcg tggggccagg ggatatggta ctgcccagcc ccactccacc
     3181 tctggtttgc agaagggggt agccaagatg tatagcagct accaggtgat gtacactgtg
     3241 ggctacagtc tgtccctggg ggccttgctc ctggcgctgg tcatcctgct gggcctcagg
     3301 tacactgctg ttgctcctag ctaataccca gtgtggtgag gggggcagag ggacggggca
     3361 ggagtggtgc tgagacatcg tcatatagga agctgcactg cacccggaac tacatccacg
     3421 ggaacctgtt cgcgtccttc gtgctcaagg ctggctctgt gctggtcatt gattggctgc
     3481 tcaagacacg ctatagccag aagattggag atgacctcag tgtgagcgtc tggctcagtg
     3541 atggggtgag cccagatcta actgctccct agcctgtgta gggcggcggg cagggtgtcg
     3601 tgggctccac tcatgcctca ccttgctcag gcggtggctg gctgcagagt ggccacagtg
     3661 atcatgcagt acggcatcat agccaactac tgctggttgc tggtggaggg tgtgtacctg
     3721 tacagcctgc tgagcatcac caccttctcg gagaagagct tcttctccct ctatctgtgc
     3781 atcggctggg gtgagtaggc ttgtggggga tagggaagga agctaacagg gccgtgggat
     3841 aacaactgct gcttcccaca ggatctcccc tgctgtttgt catcccctgg gtggtggtca
     3901 agtgtctgtt tgagaatgtc cagtgagtat gagctaatag ggtgggctgg ttgatgctgg
     3961 tctttgtaaa gtgaccctgg ggacaggggc aggaggtgag gctagaagtg tgaatccagt
     4021 gtctggacct ggggttcagt gaagactctg tcctttcctt tccagaggca ctcagtaaat
     4081 cccagcagat agggaggggc aaggaacaga gagctgcatc cccactaagt gagcaactgg
     4141 tccctgcagg tgctggacca gcaatgacaa tatgggattc tggtggatcc tgcgtatccc
     4201 tgtactcctg gccatactgg tgaggaaaca aagcccctgc tgccatcagc aaggaaaggg
     4261 tcactggtct ggctccccca ggtccttcct tcatcagcct tctgaggcta agggaagaat
     4321 acattctacc caacggatgg agggtgggca attccagcac tcactcaaga agctgtaata
     4381 cgctgccagc ccagccgtgt gcccagggct cacctggcag cagccttgct tgtagcaggg
     4441 acatctgagg gccatgtagg acacaggaag ttctcgggct gcccttgatg tttgtcttcc
     4501 tcacacagat caattttttc atctttgtcc gcatcattca tcttcttgtg gccaagctgc
     4561 gtgcccatca gatgcactat gctgattaca agttccggtg ggcaggggcc agggccaggg
     4621 tccagactgg agagggatga ggttgggggt caatgtcttt gtgggggaaa gcccagaaga
     4681 ttttgcagag ttctcaagct ctgcccctgc aggctagcca ggtccacgct gaccctcatt
     4741 cctctgctgg gagtccacga agtggtcttt gcctttgtga ctgatgagca tgcccagggc
     4801 accctgcgct ccaccaagct cttttttgac ctgttcttca gctccttcca ggtgagtctt
     4861 catcatagcc catccctggg acacaagagt gctgtccctg accactctct ttctccaggg
     4921 tctgctggtg gctgttctct actgtttcct caacaaggag gtaggtgaag ctgggaacac
     4981 aatcagagct ggccatgagg atggcttgcc ctggtctgac cataccttca tccccattat
     5041 tacagttgca tgcgtcggac gtcaggtccc ctaggtcctc agggagcagg gttatgagat
     5101 ggtccccgcc tttcctggtg acaagtgggc cctgctgaac ccaatgtaaa acttgccttg
     5161 ctttgggtct gcttagattg aggctaggca cctcagagtg gcccagtcag gatcctaacg
     5221 aggtgggttg gtcagagagg ccctacccct ggctcctcta ggtgcaggca gagctactgc
     5281 ggcgttggag gcgatggcaa gaaggcaaag ctcttcagga ggaaaggatg gccagcagcc
     5341 atggcagcca catggcccca gcagggactt gtcatggtga tccctgtgag aaacttcagc
     5401 ttatgagtgc aggcagcagc agtgggactg gctgtgagcc ctctgcgaag acctcattgg
     5461 ccagtagtct cccaaggctg gctgacagcc ccacctgaat ctccactgga ctccagccaa
     5521 gttggattca gaagggcctc acaagacaac ccagaaaca

Outstanding Problems

Localization of complex genetic disorders: multigenic, incomplete penetrance, predisposition.

Gene prediction for genomic DNA sequence. Multiple genes mixed together.


Homework for Week 8

  1. Find chromosome locations for 5 human genetic disorders.
  2. For each of the disorders you chose in question 1, find out whether you can buy a mouse that suffers from this disease.
  3. For each of the following GenBank accession numbers, run GenScan on the nucleotide sequence and see how many of the exons were properly predicted:
    D00251
    M24359 (This one has 2 splice variants. Which is predicted?)
    D00476
    L47480
  4. Extra credit: How does Grail do on the same set of accnos?

Copyright 1999 Joel S. Bader jsbader@curagen.com