An introduction to protein sequence databases and UniProt - Practicals
Answers question 1
- NCBI NR -> CAA26095
- RefSeq -> NP_000790
- UniProtKB -> P01588
- PIR-PSD -> ZUHU
- UniParc -> UPI0000033477
- UniRef50 -> UniRef50_P01588
Answers question 2
- How many protein sequences called Ken and Barbie are found in UniProtKB ? 2
- Where the entries come from (which database ?) UniProtKB/Swiss-Prot
- Look at entry O77459 in UniProt
Look at the history of the entry: at what date was the sequence integrated in UniProtKB/Swiss-Prot? 2004/02/02
- How many 'protein sequences' called Ken and Barbie are found in NCBInr ? 6
- Where the entries DO NOT come from (which database ?) UniProtKB/TrEMBL
- Look at entry O77459 in NCBI and compare it with UniProtKB.
-
Answers question 3
- Starting with the UniProt server (http://beta.uniprot.org/):
Look for the protein sequence of human carbonic anhydrase 2.
- Give its ID or AC: P00918 or CAH2_HUMAN
- Get the corresponding nucleic acid entries in EMBL and GenBank: try to find a nucleic acid sequence derived from genomic DNA sequencing and another one derived from cDNA sequencing.
- Give genomic DNA AC: M77181 or X03251
- Give cDNA AC: Y00339 or J03037
- From the UniProtKB/Swiss-Prot entry, look at the data available for the variant Pro-92 and in particular to its position in the 3D structure (Use the “Astex viewer”).
- Give the variant number VAR_001381
- For fun try to do the same with the NCBI
Answers question 4
- - Look for the human Glucocorticoid receptor entry in UniProtKB/Swiss-Prot.
Give its ID or AC: P04150 or GCR_HUMAN
- - Look at the annotation of alternative protein products.
- What are the biological events which generates protein diversity for this gene ? Alternative splicing AND initiation
- - How many different protein isoforms are produced ? 9
- - Make a multiple alignment with all the protein sequences produced by the gene.
- Visualize the DNA-binding region in the alignment: is it affected by alternative splicing event ? No
- - Look at the PTM annotation (at the sequence level (Sequence annotation (feature table)).
- How many distinct PTMs are described for this protein ? 3
- - How many PTMs have been experimentally proved for this protein ? 4
- - Have a look at the list of post-translational modifications provided by ExPAsy (http://www.expasy.org/cgi-bin/ptmlist.pl) or at beta UniProt (http://beta.uniprot.org/docs/)
Answers question 5
Use the http://beta.uniprot.org/ site to answer:
- Find all nuclear proteins in UniProtKB/Swiss-Prot. How many do you find? about 23000 (23128 Oct07)
annotation:(type:"subcellular location" nucleus) AND reviewed:yes
- What is the proportion of UniProtKB/Swiss-Prot nuclear proteins for which the nuclear localization has been experimentally proven ? 35% (8217/23128 Oct07)
annotation:(type:"subcellular location" nucleus confidence:proven) AND reviewed:yes
- Find all nuclear (proven) proteins for which at least one phosphorylation site has been experimentally proven. 1669 (Oct07)
annotation:(type:"subcellular location" nucleus confidence:proven) AND reviewed:yes AND (annotation:(type:mod_res Phosphoserine confidence:proven) OR annotation:(type:mod_res Phosphothreonine confidence:proven) OR annotation:(type:mod_res Phosphotyrosine confidence:proven))
- Find all proteins in UniProtKB involved in diabetes and whose 3D structure has been solved. 14 (Oct07)
keyword:diabetes AND database:pdb