MB620 Bioinformatics
University of New Haven
Instructor: Joel S. Bader
Class 6: Structure Databases. Sequence Assembly.


Agenda


Structure Databases

Last week: DNA and protein databases.
Review of protein structure categories: Experimental evidence:
Primary - sequencing
Secondary - NMR
Tertiary - NMR, Xtal
Quaternary - NMR, Xtal
Later in the semester: predicting structure from sequence

NMR vs. Xtal issues: resolution, biological conformation, sample preparation

How much detail do we retain?
residues
C-alpha trace
all atoms (most typical for structural databases)

How do we represent structural information?
labeled graph: atom types + positions + connections
minimal information: atom types + positions + rules for making connections

What does experimental data provide?
What about fluctuating structures?

PDB www.rcsb.org the Protein Data Bank, now at Rutgers

PDB file information

MMDB http://www.ncbi.nlm.nih.gov/Structure/

Sequence neighbors vs. structure neighbors. Divergent vs. convergent evolution.

Viewing structures and structure neighbors


Sequence Assembly

Assembly algorithm

Example: building an assembly from ESTs

Take this sequence into CuraTools:

       1  cgagttcgtc aacgccgctt tcaacgtgac tgtggtggcc acaacacgtg tgggactccg
      61  cccgaggaat actgtgtgca gaccggggtg accgggtcac aagtcctgtc acctgtgcga
     121  cgccgggcag ccccacctgc agcacagggc agccttcctg accgactaca acaaccaggc
     181  cgacaccacc tggtggcaaa gcagagccat gct
Blast against DB Est to build a cluster
Import new sequences
Assemble with CAP2. Does the strand orientation matter?
Get the new contig and repeat

Automated iteration: The Est Extractor at http://hercules.tigem.it/BLASTEXTRACT/estextract.html


Homework for Week 7

  1. Hemoglobin and myoglobin are thought to have evolved from a common ancestor. Show an alignment of hemoglobin and myoglobin structures from the same species.
  2. X-tal structures have been used to help develop better HIV protease inhibitors. Show an alignment of HIV protease with a peptide inhibitor and a non-peptide inhibitor.
  3. Here is a fragment of DNA sequence:
            1 tcaggagcca gccccaccct tagaaaagat gttttccatg aggatcgtct gcctggtcct
           61 aagtgtggtg ggcacagcat ggactgcaga tagtggtgaa ggtgactttc tagctgangg
          121 aggaggcgtg cgtggcccaa gggttgtgga aagacatcaa tctgcctgca aagattcaga
          181 ctggcccttc tgctctgatg aagactggaa ctacaaatgc ccttctggct gcaggatgaa
          241 aagggttgat tgatgaagtc aatcaagatt ttacaaacag aataaataag ctcaaaaatt
          301 cactatttga atatcagaag ancaataagg attctcattc gttgaccact aatataatgg
          361 gaaattttga gaggcgattt ttcctcagcc aattaaccgt ggataatacc tacaaccgag
          421 tgtccagagg atctgaggan gcaggaattt gaagtcctga agcgcaaagt cataggaaaa
          481 gtncagcata tccagcttct ncagaaantg ttaggagctc ngtttggt
    
    See if you can assemble it with other ESTs to build a contig. How long is the contig?
  4. Here is another DNA fragment:
            1 aaatttgtca tggatggagg gtatctggat caacccgaca actgtccaga gagagtcact
           61 gacctcatgc gcatgtgctg gcaattcaac cccaagatga ggccaacctt cctggagatt
          121 gtcaacctgc tcaaggacga cctgcacccc agctttccag aggtgtcgtt cttccacagc
          181 gaggagaaca aggctcccga gagtgaggag ctggagatgg agtttgagga catggagaat
          241 gtgcccctgg accgttcctc gcactgtcag agggaggagg cggggggccg ggatggaggg
          301 tcctcgctgg gtttcaagcg gctacgagga acacatccct tacacacaca tgaacggagg
          361 caagaaaaac gggcggattc tgaccttgcc tcggtccaat ccttcctaac agtgcctacc
          421 gtggcggggg cgggcagggg ttccattttc gctttcctct ggtttgaaag cctctggaaa
          481 actcaggatt ctcacgactc taccatgtcc aatggagttc agagatcgtt cctatacatt
          541 tctgttcatc ttaaggtgga ctcgtttggt taccaattta a
    
    See if you can build an EST assembly.
  5. Take the contigs you built and now search against GenBank to find out what gene the ESTs are from. Which contig has more component sequences? Why do you think that is?
  6. Blast the human genes against mouse ESTs. How does coverage of the two genes compare?
  7. Bonus question.

Copyright 1999 Joel S. Bader jsbader@curagen.com