| Home | Genome | Blast / Blat | Batch Genes | Batch Sequences | Markers | Genetic Maps | Submit | More Searches |
![]() |
Prev |
[ Index ] [ Summary ] [ Menu ] [ Submenu ] [ FAQ ] | Next |
BLAST is a method for searching either protein or DNA databases with either a protein or a DNA query sequence. Positive matches in the database to the query sequence are shown with an alignment of the positive to the query, and with a "P value" that represents the probability (given the numerical score of the alignment) that such a query-positive match would occur purely by chance. In other words, small P values are good (very unlikely to be random, i.e., very likely to be meaningful). P values become unreliable above ~0.01, but it is worth noting that a hit which is statistically insignificant in isolation can nevertheless be a real hint of a subtle connection between two highly diverged, yet evolutionarily related sequences.
Full details of how to effectively search databases with BLAST cannot be given here. A short list, by no means complete, of references is:
Jambeck, P. and Gibas, C. (2001). Developing Bioinformatics Computer Skills. O'Reilly, Sebastopol, CA. ISBN: 1-56592-664-1.
http://www.oreilly.com/catalog/bioskills
Mount, D.W. (2001). Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. ISBN: 0-87969-608-7.
http://www.bioinformaticsonline.org/
Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, 2cd ed. A.D. Baxevanis and B.F.F. Ouellette, eds. Wiley-Interscience, New York. ISBN: 0-471-38391-0.
http://www.wiley.com/Corporate/Website/Objects/Products/0,9049,39021,00.html
Also, information about BLAST is available from the NCBI in an overview, http://www.ncbi.nlm.nih.gov/BLAST/blast_overview.html
frequently asked questions, http://www.ncbi.nlm.nih.gov/BLAST/blast_FAQs.html
a tutorial, http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html
or a formal course.
http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html
BLAT, which is short for "BLAST-like alignment tool" is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more.It may miss more divergent or shorter sequence alignments. It was written by James Kent. (e-mail him at kent@biology.ucsc.edu). BLAT is similar in many ways to BLAST. The program rapidly scans for relatively short matches (hits), and extends these into high-scoring pairs (HSPs). However, BLAT differs from BLAST in some significant ways. For instance, where BLAST returns each area of homology between two sequences as separate alignments, BLAT stitches them together into a larger alignment. BLAT has a special code to handle introns in RNA/DNA alignments. Therefore, whereas BLAST delivers a list of exons sorted by exon size, with alignments extending slightly beyond the edge of each exon, BLAT effectively "unsplices" mRNA onto the genome giving a single alignment that uses each base of the mRNA only once, and which correctly positions splice sites.
More information about BLAT can be found at:
Kent WJ. BLAT-The BLAST-Like Alignment Tool. Genome Res 2002 Apr 12(4):656-64.
http://www.genome.org/cgi/content/full/12/4/656
The WormBase site provides BLAST/BLAT searches that are specifically directed to C. elegans DNA or protein sequences; positive hits are then linked by hypertext to their information in Wormbase. The query (search) sequence can be either DNA or protein, and can either from Wormbase itself or a sequence supplied by the user. Sequences are automatically provided from WormBase when a BLAST search is specifically requested for a given sequence by clicking on "BLAST against WormPep/Elegans genome".
The following is an example of "Blast" button on Sequence Report pages:
Back to the Blast Search page. First, sequences must either be completely plain (nothing but one-letter residues) or in the FASTA format. FASTA format sequences start with one header line in the format:
>Sequence_name
[plain residues from here on]
[end of text]
Keep in mind that only nucleotide sequences can be used as queries for BLAT searches.
The following is an example of a BLAST input.
This is very important but easily forgotten by users: the number you entered
for Expected Threshold controls the number of hits displayed in the result. As
noted above, BLAST provides quantitative estimates of the probability that a
given hit is purely by chance, and these E-value scores are better (from the
standpoint of somebody looking for a new, nontrivial sequence similarity) the
smaller they are. The Wormbase BLAST interface allows the user to only see the
subset of BLAST hits that have E values less than a given threshold value. Note
that there can be many hits *not* shown, due to this default, that nevertheless
have significant E values (e.g., E=0.002) or weak but possibly interesting E
values (e.g., E=0.1). Often when one does a BLAST search and 'fails' to see any
hits, it is worth checking to see if there are in fact hits whose E values were
greater than 0.001.
The name "BLAST" is generically used to describe several different varieties of search programs that vary depending on the type of query (search) sequence and the type of database being searched:
Where proteins and DNA are compared, six-frame translations of the DNA sequence are compared to protein.
There are three built-in databases that BLAST on WormBase is enabled to search. These complement the various versions of BLAST/BLAT search that WormBase allows a user:
In typical BLAST/BLAT searches, there is only a small number of hits that will be of interest -- for example, if one is searching for the single C. elegans ortholog of a given gene of interest. On the other hand, it is also possible to have many hits that are both significant and important (e.g., if one is examining a diverse protein family). On the Wormbase BLAST/BLAT server, the default maximum number of hits shown is 20, but this can be set to smaller values, greater values, or to show all hits.
There are two somewhat different versions of the BLAST search software. A publicly available, and reasonably compilable, open-source version is available from the NCBI.
ftp://ftp.ncbi.nlm.nih.gov/blast
The version used in Wormbase, however, is a closed-source version provided from Washington University under a free academic license.
The latter version has been chosen for use in Wormbase because it is thought to be superior to NCBI BLAST.
Prev |
[ User Guide Home ] [ Page Top ] | Next |
| Page maintained by Wen J. Chen | Documentation by Igor Antoshechkin |
| Send comments or questions to WormBase | Graphics by Wen J. Chen |