Genetics | |
---|---|
Traits/Genes to Location |
Genetic and physical
maps Research Genetics mapping panel Stanford mapping server |
Traits/Genes to Experimental Organisms | Jackson Laboratories |
Trait/Gene Location Database | OMIM, On-line Mendelian Inheritance in Man |
Genomic DNA Analysis | |
Sequences to Contigs | CuraTools CAP, PHRAP | Contigs to mRNA | Genscan Grail |
mRNA Analysis | |
DNA to Homolog | blastn, blastx, fasta |
DNA to Protein | ORF finders NCBI ORF Finder |
Protein Analysis | |
protein homologs | blastp |
conserved residues | multiple sequence alignment, clustal-w |
evolutionary history | Phylip, Paup |
blastp for linguistics | AltaVista Babelfish |
Where were we?
First step: has anyone seen a protein sequence like this before?
Why do this:
Last week: DNA sequence analysis
This week: Protein sequence analysis
We're working on the human genome project. We find a gene, translate it to get an ORF. What does the protein do?
Here is the protein sequence:
mkgsiftlfl fsvlfaisev rskesvrlcg leyirtviyi cassrwrrhl egipqaqqae
tgnsfqlphk refseenpaq nlpkvdasge drlwggqmpt eelwkskkhs vmsrqdlqtl
cctdgcsmtd lsalc
First, find homologs from sequence similarity searching (blastp at NCBI).
Sequences producing significant alignments: (bits) Value
ref|NP_005469.1|PINSL5| insulin-like 5 >gi|4768935|gb|AAD29686.... 282 7e-76
gb|AAD29687.1|AF133817_1 (AF133817) insulin-like peptide INSL5 ... 163 4e-40
gi|3851207 (AC005952) INL3_HUMAN; LEY-I-L; RELAXIN-LIKE FACTOR;... 39 0.014
gi|3719459 (AF094580) relaxin-like factor [Bos taurus] 38 0.024
sp|P51461|INL3_PIG LEYDIG INSULIN-LIKE PEPTIDE PRECURSOR (LEY-I... 38 0.032
sp|P51460|INL3_HUMAN LEYDIG INSULIN-LIKE PEPTIDE PRECURSOR (LEY... 37 0.055
pir||A26463 relaxin - spiny dogfish (fragments) 37 0.055
sp||RELX_SQUAC_1 [Segment 1 of 2] RELAXIN 37 0.072
bbs|179129 (S82815) RLF=relaxin-like factor/insulin homolog {co... 36 0.12
Is this relaxin or another insulin-like protein?
>human-unknown mkgsiftlfl fsvlfaisev rskesvrlcg leyirtviyi cassrwrrhl egipqaqqae tgnsfqlphk refseenpaq nlpkvdasge drlwggqmpt eelwkskkhs vmsrqdlqtl cctdgcsmtd lsalc >mouse-insl-5 mkgptlalfl llvllavvev rsrqtvklcg ldyvrtviyi cassrwrrhl eghfhsqqae trnylqlldr hepskktleh slpktdlsgq elvrdpqapk eglwelkkhs vvsrrdlqal ccregcsmke lstlc >human-insl-3a mdprlpawal vllgpalvfa lgpaptpemr eklcghhfvr alvrvcggpr wstearrpat ggdrellqwl errhllhglv adsnltlgpg lqplpqtshh hrhhraaatn parycclsgc tqqdlltlcp y >bull-insl-3 mdrrpltwal vllgpalaia lgpaaaqeap eklcghhfvr alvrlcggpr wsseedgrpv aggdrellrw legqhllhgl masgdpvlvl apqplpqasr hhhhrratai nparhcclsg ctrqdlltlc ph >pig-insl-3 mdphpltwal vllgpalals rapapaqeap eklcghhfvr alvrlcggpr wspedgrava ggdrellqwl egqhlfhglm asgdpmlvla pqpppqasgh hhhrraaatn parhcclsgc trqdlltlcp h >human-insl-3b mdprlpawal vllgpalvfa lgpaptpemr eklcghhfvr alvrvcggpr wstearrpaa ggdrellqwl errhllhglv adsnltlgpg lqplpqtshh hrhhraaatn parycclsgc tqqdlltlcp y >mouse-relaxin-like mraplllmll algsalrspq ppearaklcg hhlvrtlvrv cggprwspea tqpvetrdre llqwleqrhl lhalvadvdp aldpqlprqa sqrqrrsaat navhrccltg ctqqdllglc ph >marmoset-relaxin-like mdprlpawal vllgpalvfa lgpaptpemr eklcghhfvr alvrvcggpl wstearrpva agdgellqwl errhllyglv ansepapggp glqpmpqtsh hhrhrraaas nparycclsg csqqdlltlc p >human-relaxin mprlflfhll efclllnqfs ravaakwkdd viklcgrelv raqiaicgms twskrslsqe dapqtprpva eivpsfinkd tetiiimlef ianlppelka alserqpslp elqqyvpalk dsnlsfeefk klirnrqsea adsnpselky lgldthsqkk rrpyvalfek ccligctkrs lakyc >salmonella-atp-binding msqpllavng lmmrfgglla vnnvslelre reivsligpn gagkttvfnc ltgfykptgg titlrerhle glpgqqiarm gvvrtfqhvr lfremtvien llvaqhqqlk tglfsgllkt pafrraqsea ldraatwler igllehanrq asnlaygdqr rleiarcmvt qpeilmldep aaglnpketk eldeliaelr nhhnttilli ehdmklvmgi sdriyvvnqg tplangtpee irnnpdvira ylgea >pan-relaxin-2 sravadswmd eviklcgrel vraqiaicgk stwskrslsq edapqtprpv aeivpsfink dtetinmmse fvanlpqelk ltlsemqpal pqlqqyvpvl kdssllfeef kklirnrqse aadsspselk ylgldthsrk krqlysalan kcchvgctkr slarfc
Go to CuraTools, paste in sequences, run Clustal-W
Multiple sequence alignment algorithm
CLUSTAL W (1.7) multiple sequence alignment pan-relaxin-2 -------------------SRAVA----DSWMDEV-IKLCGRELV-RAQI human-relaxin MPRLFLFHLLEFCLLLNQFSRAVA----AKWKDDV-IKLCGRELV-RAQI human-insl-3b --MDPRLPAWALVLLGPALVFALG----PAPTPEMREKLCGHHFV-RALV human-insl-3a --MDPRLPAWALVLLGPALVFALG----PAPTPEMREKLCGHHFV-RALV marmoset-relaxin-like --MDPRLPAWALVLLGPALVFALG----PAPTPEMREKLCGHHFV-RALV pig-insl-3 --MDPHPLTWALVLLGPALALSRA----PAPAQEAPEKLCGHHFV-RALV bull-insl-3 --MDRRPLTWALVLLGPALAIALG----PAAAQEAPEKLCGHHFV-RALV mouse-relaxin-like -------MRAPLLLMLLALGSALR----SPQPPEARAKLCGHHLV-RTLV mouse ------MKGPTLALFLLLVLLAVV----EVRSRQT-VKLCGLDYV-RTVI human-unknown ------MKGSIFTLFLFSVLFAIS----EVRSKES-VRLCGLEYI-RTVI salmonella-atp-binding --MSQPLLAVNGLMMRFGGLLAVNNVSLELREREI-VSLIGPNGAGKTTV : : * * . :: : pan-relaxin-2 AICGKSTWSKRS-LSQEDAPQ-TPRPVAEIVPSFIN------KDTETINM human-relaxin AICGMSTWSKRS-LSQEDAPQ-TPRPVAEIVPSFIN------KDTETIII human-insl-3b RVCGGPRWSTE-----------ARRPAAGGDRELL-------QWLERRHL human-insl-3a RVCGGPRWSTE-----------ARRPATGGDRELL-------QWLERRHL marmoset-relaxin-like RVCGGPLWSTE-----------ARRPVAAGDGELL-------QWLERRHL pig-insl-3 RLCGGPRWSPE-----------DGRAVAGGDRELL-------QWLEGQHL bull-insl-3 RLCGGPRWSSE----------EDGRPVAGGDRELL-------RWLEGQHL mouse-relaxin-like RVCGGPRWSPE-----------ATQPVETRDRELL-------QWLEQRHL mouse YICASSRWRRHL-EG-------HFHSQQAETRNYL-------QLLDRHEP human-unknown YICASSRWRRHL-EG-------IPQAQQAETGNSF-------QLPHKREF salmonella-atp-binding FNCLTGFYKPTGGTITLRERHLEGLPGQQIARMGVVRTFQHVRLFREMTV * : . . : pan-relaxin-2 MSEFVANLPQELKLTLSEMQPALPQLQQYVPVLKDSSLLFEEFKKLIRNR human-relaxin MLEFIANLPPELKAALSERQPSLPELQQYVPALKDSNLSFEEFKKLIRNR human-insl-3b LHGLVADSNLTLG-PG--LQP-LPQTSH-------------------HHR human-insl-3a LHGLVADSNLTLG-PG--LQP-LPQTSH-------------------HHR marmoset-relaxin-like LYGLVANSEPAPGGPG--LQP-MPQTSH-------------------HHR pig-insl-3 FHGLMASGDPMLV-LA--PQP-PPQASG-------------------HHH bull-insl-3 LHGLMASGDPVLV-LA--PQP-LPQASR-------------------HHH mouse-relaxin-like LHALVADVDPALD-----PQL-PRQAS---------------------QR mouse SKKTLEHSLPKTDLSG-QELVRDPQAPK----------------EGLWEL human-unknown SEENPAQNLPKVDASG-EDRLWGGQMPT----------------EELWKS salmonella-atp-binding IENLLVAQHQQLKTGLFSGLLKTPAFRRAQSEALDRAATWLERIGLLEHA . pan-relaxin-2 QSEAADSSPSELKYLGLDTHSRKKRQLYSALANKCCHVGC--TKRSLARF human-relaxin QSEAADSNPSELKYLGLDTHSQKKRRPYVALFEKCCLIGC--TKRSLAKY human-insl-3b HHRAAATNP----------------------ARYCCLSGC--TQQDLLTL human-insl-3a HHRAAATNP----------------------ARYCCLSGC--TQQDLLTL marmoset-relaxin-like HRRAAASNP----------------------ARYCCLSGC--SQQDLLTL pig-insl-3 HRRAAATNP----------------------ARHCCLSGC--TRQDLLTL bull-insl-3 HRRATAINP----------------------ARHCCLSGC--TRQDLLTL mouse-relaxin-like QRRSAATNA----------------------VHRCCLTGC--TQQDLLGL mouse KKHSVVSRR----------------D----LQALCCREGC--SMKELSTL human-unknown KKHSVMSRQ----------------D----LQTLCCTDGC--SMTDLSAL salmonella-atp-binding NRQASNLAYG--------------------DQRRLEIARCMVTQPEILML : .: * : .: pan-relaxin-2 C------------------------------------------------- human-relaxin C------------------------------------------------- human-insl-3b CPY----------------------------------------------- human-insl-3a CPY----------------------------------------------- marmoset-relaxin-like CP------------------------------------------------ pig-insl-3 CPH----------------------------------------------- bull-insl-3 CPH----------------------------------------------- mouse-relaxin-like CPH----------------------------------------------- mouse C------------------------------------------------- human-unknown C------------------------------------------------- salmonella-atp-binding DEPAAGLNPKETKELDELIAELRNHHNTTILLIEHDMKLVMGISDRIYVV pan-relaxin-2 ---------------------------- human-relaxin ---------------------------- human-insl-3b ---------------------------- human-insl-3a ---------------------------- marmoset-relaxin-like ---------------------------- pig-insl-3 ---------------------------- bull-insl-3 ---------------------------- mouse-relaxin-like ---------------------------- mouse ---------------------------- human-unknown ---------------------------- salmonella-atp-binding NQGTPLANGTPEEIRNNPDVIRAYLGEA
11 Populations Neighbor-Joining/UPGMA method version 3.572c Neighbor-joining method Negative branch lengths allowed +human-insl +--8 ! +human-insl ! --9marmoset-r ! ! +pan-relaxi ! +---------------1 ! ! +---human-rela ! +------4 ! ! ! +--mouse ! ! ! +---------2 ! +--6 +-------3 +-----human-unkn ! ! ! ! ! ! ! +-----------------------------------salmonella +--7 ! ! +---mouse-rela ! ! +pig-insl-3 +--5 +bull-insl- remember: this is an unrooted tree! Between And Length ------- --- ------ 9 8 0.10701 8 human-insl -0.00199 8 human-insl 0.02003 9 marmoset-r 0.18388 9 7 0.21611 7 6 0.41042 6 4 1.13516 4 1 2.69704 1 pan-relaxi -0.11219 1 human-rela 0.62211 4 3 1.28907 3 2 1.71628 2 mouse 0.46292 2 human-unkn 0.91501 3 salmonella 6.00235 6 mouse-rela 0.74337 7 5 0.46742 5 pig-insl-3 0.15327 5 bull-insl- 0.04251Interpreting the cladogram/dendrogram/phylogeny:
>hum-myo mglsdgewql vlnvwgkvea dipghgqevl irlfkghpet lekfdkfkhl ksedemkase dlkkhgatvl talggilkkk ghheaeikpl aqshatkhki pvkylefise ciiqvlqskh pgdfgadaqg amnkalelfr kdmasnykel gfqg >pig-myo mglsdgewql vlnvwgkvea dvaghgqevl irlfkghpet lekfdkfkhl ksedemkase dlkkhgntvl talggilkkk ghheaeltpl aqshatkhki pvkylefise aiiqvlqskh pgdfgadaqg amskalelfr ndmaakykel gfqg >whale-myo vlsegewqlv lhvwakvead vaghgqdili rlfkshpetl ekfdrfkhlk teaemkased lkkhgvtvlt algailkkkg hheaelkpla qshatkhkip ikylefisea iihvlhsrhp gdfgadaqga mnkalelfrk diaakykelg yqg >dog-myo glsdgewqiv lniwgkvetd laghgqevli rlfknhpetl dkfdkfkhlk tedemkgsed lkkhgntvlt alggilkkkg hheaelkpla qshatkhkip vkylefisda iiqvlqskhs gdfhadteaa mkkalelfrn diaakykelg fqg >penguin-myo glndqewqqv ltmwgkvesd laghghavlm rlfkshpetm drfdkfrglk tpdemrgsed mkkhgvtvlt lgqilkkkgh heaelkplsq thatkhkvpv kylefiseai mkviaqkhas nfgadaqeam kkalelfrnd maskykefgf qg >horse-myo glsdgewqqv lnvwgkvead iaghgqevli rlftghpetl ekfdkfkhlk teaemkased lkkhgtvvlt alggilkkkg hheaelkpla qshatkhkip ikylefisda iihvlhskhp gdfgadaqga mtkalelfrn diaakykelg fqg 6 Populations Neighbor-Joining/UPGMA method version 3.572c Neighbor-joining method Negative branch lengths allowed +--pig-myo +--3 ! +------hum-myo ! ! +--------dog-myo --4----1 ! +-------------------------------------penguin-my ! ! +------horse-myo +--2 +------------whale-myo remember: this is an unrooted tree! Between And Length ------- --- ------ 4 3 0.02899 3 pig-myo 0.05372 3 hum-myo 0.12577 4 1 0.07569 1 dog-myo 0.16442 1 penguin-my 0.63682 4 2 0.03773 2 horse-myo 0.10914 2 whale-myo 0.20575
The long-branch problem: why do dog and penguin end up together?
Distance-based algorithms vs. Maximum parsimony algorithms
English one two three four five German ein zwei drei vier funf French un deux t rois quatre cinq Italian un due t re quattro cinque Spanish un dos t res cuatro cinco Similarities (# of common letters) E G F I S E G 6 F 4 5 I 5 6 15 S 5 6 13 14 Branching (I,F) S,(I,F) (G,E) (G,E),(S,(I,F)) |------ English |----------| | |------ German ---| | | | | |-- Spanish |----------| | |- French |---| |- ItalianFun links: Tree of Life, http://phylogeny.arizona.edu/tree/phylogeny.html