Genetics | |
---|---|
Traits/Genes to Location |
Genetic and physical
maps Research Genetics mapping panel Stanford mapping server |
Traits/Genes to Experimental Organisms | Jackson Laboratories |
Trait/Gene Location Database | OMIM, On-line Mendelian Inheritance in Man |
Genomic DNA Analysis | |
Sequences to Contigs | CuraTools CAP, PHRAP | Contigs to mRNA | Genscan Grail tblastn (protein query, genomic database) |
mRNA Analysis | |
DNA to Homolog | blastn, blastx, fasta |
DNA to Protein | ORF finders NCBI ORF Finder |
Protein Analysis | |
protein homologs | blastp |
conserved residues | multiple sequence alignment, clustal-w |
evolutionary history | Phylip, Paup |
blastp for linguistics | AltaVista Babelfish |
Prokaryot homologs | Clusters of Orthologous Groups |
Domains |
Pfam Prosite Prodom |
Cellular localization | Psort |
Secondary structure prediction | Consensus prediction |
Tertiary structure prediction | Swiss-Model |
Known folds | SCOP CATH, DALI |
Structure similarity search | VAST |
Bioinformatics Resources | |
GenBank, OMIM, Blast, Entrez | NCBI |
Swiss-Prot, TrEMBL, Prosite | ExPASy |
Where were we?
Evolution: conserved domains
Evidence: sequence similarity in multiple alignments
Use of profiles:
Example of a profile: Zinc finger protein from pfam
ADR1_YEAST/104-126 FVCE...VCT...RAFARQEHLKRHYRS...H ADR1_YEAST/132-155 YPCG...LCN...RCFTRRDLLIRHAQK..IH AGIE_RAT/269-291 YICE...ECG...IRCKKPSMLKKHIRT...H AGIE_RAT/297-321 YVCK...LCN...FAFKTKGNLTKHMKSK.AH consensus YVCE...LCN...RAFKRK..LKKH.RS...H second choice .................KR....R..K.....Profile, pattern, motif = sequence with substitution frequencies
Algorithms for using profiles to search
How did we identify conserved regions?
Position-Specific Iterated Blast
MKVDLHVHSIVSKCSLNPKGLLEKFCIKKNIVPAICDHNKLTKL NFAIPGEEIATNSGEFIGLFLTEEIPANLDLYEALDRVREQGALIYLPHPFDLNRRRS LAKFNVLEEREFLKYVHVVEVFNSRCRSIEPNLKALEYAEKYDFAMAFGSDAHFIWEV GNAYIKFSELNIEKPDDLSPKEFLNLLKIKTDELLKAKSNLLKNPWKTRWHYGKLGSK YNIALYSKVVKNVRRKLNI
Prosite = part of Swiss-Prot, has pre-defined motifs.
Two main functions:
Scan for pre-defined consensus patterns. Try the following:
MGKISSLPTQLFKCCFCDFLKVKMHTMSSSHLFYLALCLLTFTSSATAGP
ETLCGAELVDALQFVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCFRSC
DLRRLEMYCAPLKPAKSARSVRAQRHTDMPKTQKEVHLKNASRGSAGNKN
YRM
Among other features, find the following:
[5] PDOC00235 PS00262 INSULIN Insulin family signature 95-109 CCFRSCDLRRLEMYCMore accession numbers!
Consensus pattern:
C-C-{P}-x(2)-C-[STDNEKPI]-x(3)-[LIVMFS]-x(3)-CUsing the pattern, scan against Swiss-Prot to find homologs.
They used Psi-Blast to build a motif database.
General idea: protein sequence is conserved by evolution.
COGs = pre-analyzed ortholog families from completed genomes
Tour
Notice that Yeast often has pairs of homologs. Why might this be?
Here's a human sequence. Where does it fit in?
mtevgllsin lsinsthaal lpirydnrcr nmsqeqvaqk lakdpkpair frleqvvpaf
qdlvygwnrh evasvegdpv imksdgfpty hlacvvddhh mgishvlrgs ewlvstakhl
llyqalgwqp phfahlplll nrdgsklskr qgdvflehfa adgflpdsll diitncgsgf
aenqmgrtlp elitqfnltq vtchsalldl eklpefnrlh lqrlvsnesq rrqlvgklqv
lveeafgcql qnrdvlnpvy verilllrqg hicrlqdlvs pvysylwtrp avgraqldai
sekvdviakr vlg
Now try our old friend insulin. What happens and why?
malwmrllpl lallalwgpd paaafvnqhl cgshlvealy lvcgergffy tpktrreaed
lqvgqvelgg gpgagslqpl alegslqkrg iveqcctsic slyqlenycn
Predicting the function of a new protein
Where do proteins end up? What are the shipping instructions?
Location | Evidence |
---|---|
nucleus | nuclear localization signal |
cytoplasm | |
plasma membrane | hydrophobic, membrane-spanning regions, glycosylation sites |
secreted | signal peptide |
Expert system
Psort: expert system for guessing cellular localization
Examples:
A secreted protein:
1 malspflaav iplvlllsra ppsadtrttg hlcgkdlvna lyiacgvrgf fydptkmkrd
61 tgalaaflpl ayaednesqd desiginevl kskrgiveqc chkrcsiydl enycn
A transmembrane receptor:
1 mavaplrgal llwqllaagg aaleigrfdp ergrgaapcq aveipmcrgi gynltrmpnl
61 lghtsqgeaa aelaefaplv qygchshlrf flcslyapmc tdqvstpipa crpmceqarl
121 rcapimeqfn fgwpdsldca rlptrndpha lcmeapenat agpaephkgl gmlpvaprpa
181 rppgdlgpga ggsgtcenpe kfqyveksrs caprcgpgve vfwsrrdkdf alvwmavwsa
241 lcffstaftv ltfllephrf qyperpiifl smcynvysla fliravagaq svacdqeaga
301 lyviqeglen tgctlvflll yyfgmasslw wvvltltwfl aagkkwghea ieahgsyfhm
361 aawglpalkt iviltlrkva gdeltglcyv astdaaaltg fvlvplsgyl vlgssflltg
421 fvalfhirki mktggtntek leklmvkigv fsilytvpat cvivcyvyer lnmdfwrlra
481 teqpcaaaag pggrrdcslp ggsvptvavf mlkifmslvv gitsgvwvws sktfqtwqsl
541 cyrkiaagra rakacrapgs ygrgthchyk aptvvlhmtk tdpslenpth l
A nuclear receptor:
1 masredelrn cvvcgdqatg yhfnaltceg ckgffrrtvs ksigptcpfa gscevsktqr
61 rhcpacrlqk cldagmrkdm ilsaealalr rakqaqrraq qtpvqlskeq eelirtllga
121 htrhmgtmfe qfvqfrppah lfihhqplpt lapvlplvth fadintfmvl qvikftkdlp
181 vfrslpiedq isllkgaave ichivlnttf clqtqnflcg plrytiedga rvgfqvefle
241 llfhfhgtlr klqlqepeyv llaamalfsp drpgvtqrde idqlqeemal tlqsyikgqq
301 rrprdrflya kllgllaelr sineaygyqi qhiqglsamm pllqeics
Start with secondary structure prediction: alpha helix, beta sheet, loop
Accuracy: 70-80%
Algorithm
Accuracy depends on sequence similarity
Purposes
The one letter codes for each of the amino acids do have rhyme and reason!
From Koni Stone, please send comments to koni@chem.csustan.edu
First, some reasons:
The following amino acids all use the first initial as their code.
These amino acids are either very common, or they have a unique first letter. To make the following list, use the mnemonic:
Give A Violin Lesson to Isabelle SoThat Cindy can Matriculate and wear a Professional Hat.
Code Letters AA Residue Special properties A Ala alanine methyl functional group C Cys cyst(e)ine disulfide bonds for protein stability D Asp aspartate hydrophilic, COO(-) E Glu glutamate hydrophilic, COO(-) F Phe phenylalanine hydrophobic G Gly glycine no functional group H His histidine pKa of 6, used for acid-base chemistry I Ile isoleucine hydrophobic K Lys lysine hydrophilic, NH3(+) L Leu leucine hydrophobic M Met methionine start codon N Asn asparagine amide form of aspartate. found in asparagus P Pro proline causes turns in protein structure Q Gln glutamine amide form of glutamate R Arg arginine hydrophilic, like NH3(+) S Ser serine -OH group used in serine proteases T Thr threonine -OH group V Val valine hydrophobic W Trp tryptophan strong UV absorbance used to probe protein structure Y Tyr tyrosine strong UV absorbance, phosporylated by tyrosine kinases X any
MEPETLEARINRATNPLNKELDWASINGFCEQLNEDFEGPPLAT RLLAHKIQSPQEWEAIQALTVLETCMKSCGKRFHDEVGKFRFLNELIKVVSPKYLGSR TSEKVKNKILELLYSWTVGLPEEVKIAEAYQMLKKQGIVKSDPKLPDDTTFPLPPPRP KNVIFEDEEKSKMLARLLKSSHPEDLRAANKLIKEMVQEDQKRMEKISKRVNAIEEVN NNVKLLTEMVMSHSQGGAAAGSSEDLMKELYQRCERMRPTLFRLASDTEDNDEALAEI LQANDNLTQVINLYKQLVRGEEVNGDATAGSIP AFVTGLEEQLGKFGDKCIARGWDHQGDPLHKIQQDVAEHHKQIG NVLQIVESCSQLQGFQSEEVSPAEPASPGTPQQVKDKTLQESSFEDIMATRSSDWLRR PLGEDNQPETQLFWDKEPWFWHDTLTEQLWRIFAGMRILAHGELVLATAISSFTRHVF TCGRRGIKVWSLTGQVAEDRFPESHLPIQTPGAFLRTCLLSSNSRSLLTGGYNLASVS VWDLAAPSLHVKEQLPCAGLNCQALDANLDANLAFASFTSGVVRIWDLRDQSVVRDLK GYPDGVKSIVVKGYNIWTGGPDACLRCWDQRTIMKPLEYQFKSQIMSLSHSPQEDWVL LGMANGQQWLQSTSGSQRHMVGQKDSVILSVKFSPFGQWWASVGMDDFLGVYSMPAGT KVFEVPEMSPVTCCDVSSNNRLVVTGSGEHASVYQITY MDPNSILLSPQPQICSHLAEACTEGERSSSPPELDRDSPFPWSQ VPSSSPTDPEWFGDEHIQAKRARVETIVRGMCLSPNPLVPGNAQAGVSPRCPKKARER KRKQNLPTPQGLLMPAPAWDQGNRKGGPRVREQLHLLKQQLRHLQEHILQAAKPRDTA QGPGGCGTGKGPLSAKQGNGCGPRPWVVDGDHQQGTSKDLSGAEKHQESEKPSFLPSG APASLEILRKELTRAVSQAVDSVLQKFNRCITSQMIKWFSNFREFYYIQMEKSARQAI SDGVTNPKMLVVLRNSELFQALNMHYNKGNDFEISADHFSKLKTNFMQDLSVAVYSHL SAGRVYQLDNANSLQLTIAIGAPTSYMQKKLNFRL