Databases available for BLAST searches
Protein databases
-
non redundant
-
The protein database is constructed from SwissProt, SwissProt updates, SwissProt splice variants, TrEMBL, TrEMBL updates, TrEMBL splice variants, Genpept, Genpept updates, and a small subsection of PIR containing sequences not in the other databases (usually for good reasons).Updated weekly, usually on weekends.
-
SwissProt
-
The most recent full release of SwissProt. Does include weekly updates and splice variants.
-
UniProt (SwissProt/TrEMBL/TrEMBL_NEW)
-
A combination of SwissProt and TrEMBL (including updates and splice variants but not REM-TrEMBL). This database contains all consolidated proteins and ORFs (updated weekly).
-
TrEMBL/TrEMBL_NEW
-
Same as above, but without SwissProt.
-
GenPept
-
The most recent release of GenPept file. Does not include updates (updated weekly).
-
GenPept updates
-
The most recent GenPept update file. Useful for limiting the search to new sequences (updated weekly).
-
Worm (C. elegans)
-
The protein database contains all annotated C.elegans ORFs (updated occasionally).
-
Yeast (S. cerevisiae)
-
The protein database contains all annotated yeast ORFs (updated occasionally).
-
Non-redundant 3D structure
-
A non-redundant database of all the sequences found in PDB (updated occasionally).
-
All microbial genomes
-
A collection of all available microbial (archae and eubacteria) genomes from the HAMAP project (see list). (updated weekly).
-
trest **** WARNING!
-
This is a database of protein sequences derived from ESTs sequencing data (human, mouse, rat, zebra fish, drosophila, cow, arabidobsis). The ESTs are clustered, contigs are generated from the clusters by an unpublished procedure and translated.
-
It might contain useful sequences, but the user should be aware of the possible errors that are likely to be found in this database.
-
trgen **** WARNING!
-
This is a database of protein sequences derived from human genomic sequencing data (human, mouse, rat, drosophila, arabidobsis, rice). The genomic sequence are analysed by the program GenScan and the putative coding sequences in exons are translated.
-
It might contain useful sequences, but the user should be aware of the possible errors that are likely to be found in this database.
DNA databases
embl is now provided in the different subdivisions (Updated weekly, usually on weekends.):
-
Bacteriophage (phg)
-
Fungi (fun)
-
GSS (gss)
-
STS (sts)
-
HTG (htg)
-
Human (hum)
-
Invertebrate (inv)
-
Organelles (org)
-
Other Mammals (mam)
-
Other Vertegrates (vrt)
-
Patents (patent)
-
Plants (pln)
-
Prokaryotes (pro)
-
Rodents (rod)
-
Synthetic (syn)
-
Other (unclassified) (unc)
-
Viruses (vrl)
You can select the most recent full release of the EMBL databases or the cumulative weekly updates (updated weekly).
Does not contain the EST sections (see below).
dbEST is now provided in subdivisions (updated weekly):
-
Human (est_hum)
-
Mouse (est_mus)
-
Rat (est_rat)
-
Rodent (est_rod)
-
Cow (est_cow)
-
Plants (est_pln)
-
Other Mammals (est_mam)
-
Zebra fish (est_dan)
-
Other Vertebrates (est_vrt)
-
Arabidopsis (est_ara)
-
Drosophila (est_dro)
-
Invertebrates (est_inv)
-
Fungi (est_fun)
-
Prokaryotes (est_pro)
Unigene
Database of EST clusters (list of ESTs known to match the same cDNA) from the NCBI (updated occasionally).
This database contains also useful information like STS matches, tissue distribution, or transcript map.
-
Human
-
Mouse
-
Rat
-
Zebra fish
-
EST Contigs
-
Database of contigs based on EST clusters from Unigene (human, mouse, rat, cow, zebra fish) and SwissClusters (drosophila melanogaster, arabidopsis thaliana)
-
Radiation hybrid
-
Database of radiation hybrid clones mapped on Standford panels using the STSs primer pairs (updated occasionally).
-
yeast (S.cerevisiae)
-
The DNA version of the yeast genome database contains all yeast chromosomes. For easier handling, the long sequences have been split into overlapping fragments of 5000 bases each, the overlap is 1000 bases (updated occasionally).
-
yeast (S.pombe)
-
The DNA version of the yeast genome database contains partial S. pombe chromosomes. For easier handling, the long sequences have been split into overlapping fragments of 5000 bases each, the overlap is 1000 bases (updated occasionally).
-
C.elegans
-
This database contains all raw C.elegans sequence data from the Sanger Centre and the WU Genome Sequencing Center in St. Louis. Some of these data are also present in annotated form in EMBL, but some are available exclusively here (updated occasionally).
-
ResGen verified
-
Sequence verified clones from a collection available at Research Genetics. (updated occasionally)
-
UNIL At clones
-
Arabidopsis thaliana clones from University of Lausanne (updated occasionally)
-
RefSeq
-
The NCBI Reference Sequence (RefSeq) standards for the naturally occurring molecules of the central dogma, from chromosomes to mRNAs to proteins. They provide a stable reference point for mutation analysis, gene expression studies, and polymorphism discovery. (see RefSeq) (updated occasionally)
-
EPD
-
Eukaryotic Promoter Database (see EPD) (updated occasionally)
-
16S ribosomal RNA
-
This database contains 16S rRNA sequence data collected from public DNA databases (updated occasionally).
-
Masking databases
-
repetitive elements (DNA)
-
This database contains consensus sequences for common repetitive elements (like Alu repeats etc.) The database contains representative sequences from REPBASE (part of humrep.ref, invrep.ref, mamrep.ref, plnrep.ref, rodrep.ref, and vertrep.ref). The new version of REPBASE is Copyright (C) 1997 of the Genetic Information Research Institute. This database is useful for masking repetitive regions in the query sequence. (updated occasionally)
-
simple repeats (DNA)
-
This database contains simple DNA repeat sequence (microsatellites etc). See (J. Mol. Evol. 40:120, 1995). This database is useful for masking microsatellite repeat regions in the query sequence. (updated ocasionally)
-
randomized SwissProt (PROT)
-
A window-shuffled version of SwissProt (Window=20). Useful only for special applications, e.g. to check the significance of matches reported in a BLAST search. (not updated)
|