An Introduction to Bioinformatics
Bioinformatics is in the website of Developmental Biology, because it is not something you can easily learn from print. Fortunately, there are numerous website tutorials that will allow you to become familiar with the new tools available to the public for accessing and comparing sequences of nucleic acids and proteins. I hope you have a few hours and a fast modem, because you'll be hitting several sites. In many cases, I've taken material from some of these sites and summarized it here.
What is bioinformatics?
(after http://www.bioplanet.com/whatis.html )
Bioinformatics is the application of computer technology to the management of biological information. Computers are used to gather, store, analyze and integrate biological and genetic information that can then be applied to developmental biology, evolutionary biology, or to gene-based drug discovery. The need for bioinformatics capabilities has been precipitated by the explosion of publicly available genomic information resulting from the Human Genome Project and from the accumulated data of thousands of individual researchers, who have isolated and published sequences from numerous organisms. Just as the ability to clone DNA has become something that every developmental genetics laboratory is expected to know how to do, so bioinformatic knowledge has become essential for any laboratory in developmental genetics.
Bioinformatics has become an outstanding tool in developmental biology. If science is based on knowledge becoming part of a public resource that all investigators can share, then bioinformatics has to be credited with dissolving some of the barriers that separate our laboratories. Through the World Wide Web, developmental biologists not only have access to gene sequences and other information critical to their research, we also now possess an incredible teaching tool through which movies can be downloaded to one's private computer and through which library searches can be done even if one's own library does not have the journal. The ability to use the Net creatively and to know its resources is becoming an important tool for all developmental biologists.
This site will introduce three areas of bioinformatics:
I. Sequence Homology Searches
II. Microarray Technology
III. Specific Databases
Today, researchers in molecular biology often analyze genetic data resulting from sequencing projects. Many of these tasks can easily be solved using Internet based tools via the World Wide Web, without having to consider technical problems like local installation, maintenance or financial aspects. These tools, offered by different institutions around the world, are growing increasingly indispensable for daily sequence analysis work. Thus, questions often arise:
How do I detect sequence homologies?
How do I derive functions from possible homologies?
How do I find Literature Cited to homologous sequences?
What is the best tool for reconstructing phylogenies?
How do I visualize secondary structure of RNA molecules?
Which PCR primer should I use?
What is the most adequate tool for designing primers?
How do I operate WWW forms in general?
This site will explain three basic areas of biotechnology—sequence homology search engines, microarrays, and specific developmental databases—and will link you to online tutorials designed to introduce students to these technologies.
I. SEQUENCE HOMOLOGY SEARCH ENGINES
How to analyze sequences: BLAST searches
Say you have just cloned a DNA sequence or have just isolated a particular peptide and found a portion of its amino acid sequence. You would next want to know the identity of the sequence. One of the most important tools for finding the identity of your sequence (or anything closely related to it) is to do a BLAST (Basic Local Alignment Search Tool) search. As its name indicates, BLAST programs will take pieces of your sequence and try to find other sequences in the database that are identical or close to it. They will then give you a list of all the known sequences that have sequence homologies with portions of your sequence and tell you where the sequences differ. BLAST is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA. The BLAST programs have been designed for speed, with a minimal sacrifice of sensitivity to distant sequence relationships. The scores assigned in a BLAST search have a well-defined statistical interpretation, making real matches easier to distinguish from random background hits. BLAST uses a heuristic algorithm which seeks local as opposed to global alignments and is therefore able to detect relationships among sequences which share only isolated regions of similarity.
There are two tutorials in bioinformatics that you should now use. The first is the simple BLAST tutorial located under the Education link in the sidebar of the NCBI home page:
If this tutorial is offputting, check out:
Anyone planning to keep up with bioinformatics should bookmark the National Center for Biotechnology information website at http://www.ncbi.nlm.nih.gov, and news of bioinformatic relevance can be accessed at http://genomics.phrma.org/.
Also, the first issue of Nucleic Acids Research each January contains a yearly update on public domain bioinformatics sites.
II. MICROARRAY TECHNOLOGY
Often one wishes to compare gene expression between two or more cells or stages of embryonic development. Indeed, the basic notion of differential gene expression suggests that one of the most important ways to understand development is to analyze changes in gene expression from one time to another and from one tissue to another. However, traditional methods in molecular biology generally work on a "one gene in one experiment" basis, which means that the throughput is very limited and the "whole picture" of gene function is hard to obtain. In the past several years, a new technology, called DNA microarray, has attracted tremendous interest among biologists. This technology promises to monitor the whole genome on a single chip so that researchers can have a better picture of the interactions among thousands of genes simultaneously. Microarray technology provides a tool to potentially identify and quantify levels of gene expression for all genes in an organism.
Microarrays are small spots of DNA fixed to glass slides or nylon membranes. They are constructed using cDNAs (cDNA arrays), genomic sequences, or synthetic oligonucleotides. The DNA spots are 200 microns or less in size. A slide typically contains thousands of genes, represented by specific PCR-amplified DNA sequences ("immobile phase DNA") or thousands of oligonucleotide sequences. In the case of cDNA arrays, levels of gene expression are measured using a preparation of fluorescently labeled nucleic acid ("mobile phase") from a specific cell type or tissue, where the sequences in the preparation reflect the specificity and relative abundance of the expressed genes. The preparations of mobile phase labeled cDNAs are incubated (hybridized) to the immobile phase cDNAs in the microarray to allow specific interaction of labeled sequences with their homologous target DNAs.
Terminologies that have been used in the literature to describe this technology include, but are not limited to: biochip, DNA chip, DNA microarray, and gene array. Affymetrix, Inc. owns a registered trademark, GeneChipAE, which refers to its high density, oligonucleotide-based DNA arrays. However, in some articles appearing in professional journals, popular magzines, and the WWW, the term "gene chip(s)" has been used as a general terminology that refers to the microarray technology. Affymetrix strongly opposes such usage of the term "gene chip(s)". More recently, Leming Shi has proposed the term "genome chip", indicating that this technology is meant to monitor the whole genome on a single chip. Genome Chip would also include the increasingly important and feasible protein chip technology.
The basic principle for nucleic acid-based genome chips is nucleic acid hybridization. In one sense, genome chips are extensions of Northern blots or dot blots. An array is an orderly arrangement of samples. It provides a medium for matching known and unknown DNA samples based on base-pairing rules and automating the process of identifying the unknowns. An array experiment can make use of common assay systems, such as microplates or standard blotting membranes, and can be created by hand or make use of robotics to deposit the sample. In general, arrays are described as macroarrays or microarrays, the difference being the size of the sample spots. Macroarrays contain sample spot sizes of about 300 microns or larger and can be easily imaged by existing gel and blot scanners. The sample spot sizes in microarrays are typically less than 200 microns in diameter, and these arrays usually contains thousands of spots. Microarrays require specialized robotics and imaging equipment that generally are not commercially available as a complete system.
DNA microarray, or DNA chips are fabricated by high-speed robotics, generally on glass but sometimes on nylon substrates, for which probes with known identity are used to determine complementary binding, thus allowing massively parallel gene expression and gene discovery studies. An experiment with a single DNA chip can provide researchers information on thousands of genes simultaneously &mdash a dramatic increase in throughput. (Note: In the literature there are at least two confusing nomenclature systems for referring to hybridization partners. Both use common terms: "probes" and "targets". According to the nomenclature recommended by B. Phimister (Nature Genetics 21  : 1 - 60), a "probe" is the tethered nucleic acid with known sequence, whereas a "target" is the free nucleic acid sample whose identity/abundance is being detected).
The technology for cDNA based microarray technology is based on an approach where cDNA clone inserts are robotically printed onto a glass slide and subsequently hybridized to two differentially fluorescently labeled probes. The probes are pools of cDNAs which are generated after isolating mRNA from cells or tissues in two states that one wishes to compare. Resulting fluorescent intensities are produced using a laser confocal fluorescent microscope, and ratio information is obtained following image processing. For more information and protocols, see:
In developmental biology, DNA microarrays have already been used to:
1. compare gene expression for the entire Drosophila larva during metamorphosis: White KP, Rifkin SA, Hurban P, Hogness DS. 1999. Microarray analysis of Drosophila development during metamorphosis. Science 286: 2179-84.
2. look at which genes are upregulated or downregulated when embryonic stem cells are caused to differentiate in culture: Kelly DL, and Rizzino A. 2000. DNA microarray analyses of genes regulated during the differentiation of embryonic stem cells. Mol Reprod Dev. 56: 113-123.
3. compare the gene expression in activated vs. unactivated B and T cells: Ollila J, and Vihinen M. 1998. Stimulation of B and T cells activates expression of transcription and differentiation factors. Biochem Biophys Res Commun. 249: 475-480.
For two very well-written introductions on the steps involved in a microarray experiment, visit Jeremy Buhler's Anatomy of a Comparative Gene Expression Study http://www.cs.washington.edu/homes/jbuhler/research/array/ and visit http://industry.ebi.ac.uk/~brazma/Data-mining/Biovis/biovis-pres-yeast/sld001.htm
III. SPECIFIC DATABASES
There are numerous ways of organizing biological data. While most of the databases organize their data by gene sequence, sometimes one wants to look at the development of a specific organ or organism. To facilitate this, many researchers have created databases based on their favorite research organisms, cells, molecules, or organs. Some of these are listed here, but new ones are being created each week. Here are some particularly well designed ones:
For the development of particular organs, take a look at
For the development of entire organisms and their genes, there are numerous sites. The C. elegans community has pioneered the global laboratory, and their main homepage can be reached at http://www.wormbase.org. Drosophila development has several outstanding sites including flybase (http://flybase.bio.indiana.edu) and the Interactive Fly (http://www.sdbonline.org/fly/aimain/1aahome.htm). Mouse gene expression patterns can be found at http://teaninich.hgu.mrc.ac.uk/
Even molecules get their own sites. For integrins, check out the integrin family website at http://www.life.uiuc.edu.csb.integrins/index1.html. The Wnt family of paracrine factors and their receptors has a beautiful family album at http://www.stanford.edu/~rnusse/wntwindow.html.
In addition there are websites which update developmental biology links. Here are the servers that will connect you to websites on developing organisms, the laboratories that study them, the journals that publish about them, and even some laboratory exercises you can do on line.