A tutorial on the
ZeneMark
database
(click here
to see a flash demo)
In this tutorial, you will be guided to use the BLAST server at
this website (http://zenemark.znomics.com/) to locate the position
of your gene in the zebrafish genome and then identify if mutation(s)
in your gene is (are) available.
Suppose that you are interested in mutations in the tumor protein, p53
(tumor suppressor p53). There
are several steps to locate the position of tp53 in the zebrafish
genome under the browser window
of the ZeneMark database:
Step 1. Obtain the mRNA sequence of tp53 for zebrafish. This can be
done by pointing your web-browser to the NCBI home page at
"http://www.ncbi.nlm.nih.gov/", then enter the search term "tp53 AND
Danio". Set the database to search to "Nucleotide" and then
click the "Go" button as shown below:
This window will show up after the search is done:
Click on the first link (NM_131327). You will be directed to the page
shown below:
Note the default "Display" is set
to "GenBank". Now using the pull-down menu, set it to "FASTA". The page
will automatically refresh and the mRNA sequence for tp53 will be
displayed in this window:
Copy the mRNA sequence onto
clipboard. Now we have the mRNA sequence for tp53. It will be used as
query sequence in the next step.
Step 2. Run a BLAST search on the ZeneMark database web-site. Point you
web-browser to "http://zenemark.znomics.com/" if you have not open
this page yet. Look at the link at the left side bar which says "Run a
BLAST search". Click on it.
A BLAST search page will open as shown below:
Now paste the mRNA sequence on the clipboard onto the text input window
as shown below and then click the red "Run" button at the very bottom
of this page:
Now the BLAST program starts running. It will take about 10
seconds (depending on the server load) to finish the search. You should
see the page below after that:
The upper portion of this page shows that the best hit is found in
chromosome 5 as indicated by the boxed triangle. Now scroll down to see
the bottom portion of this page:
The bottom pannel of this page
shows the details of the BLAST results. It is ordered by alignment
score in this page. Your display may vary. Your can
customize the display by select or de-select the options in the nine
display option pull-down menu. The first column [A][S][G][C] are
hyperlinks which can be used to view the BLAST alignment[A], query
sequence [S], query sequence in the genomic context [G] or view the
BLAST hits under the "ContigView" window [C]. Clicking on any of these
will pop-up a new window showing the appropriate content under
that link. Now click on the [C] link in the first row. You should see
the page below:
This page only shows the first two exons of tp53. To
view the entire gene, click the zoom button next to the blue zoom
button ( the fourth one from the plus sign). Now you should see this
page:
This page shows the entire tp53
gene on the right half window of the "Detailed view" panel. Now try to
center
the gene by clicking on a blank area in the center of the gene, as show
below. A pop-up menu will appear. Select the "Centre" option.
The page will refresh and display the tp53 gene at the center of the
window:
Now look the purple triangles above the thick blue bar (DNA contigs).
These are the locations of retroviral insertions in the zebrafish
genome. This track is labeled as "ZeneMarker" on the left side of the
"Detailed view" panel. We can see that there are 8 insertions in the
entire tp53 gene.
Now try to mouse over the first purple triangle (the
far left one). You should see that this zenemarker has a id of
"ZM_00057254". This is the id number you should use to refer to this
insertion. We can tell that the first insertion is in the
seventh intron of tp53 (the tp53 gene is mapped on the botton strand
the chromosome 5. By convention, genes mapped on the top strand are
displayed above the thick blue contig bar and genes mapped on the
bottom strand are displayed below the blue contig bar). To see the
locations of the insertions better, you can try to zoom-in the display
region and recenter the region of interest using the technique
mentioned above. Let us try to enlarge the area where the second and
third triangle point to. First, we need to put the area of interest at
the center of the "Detailed view" panel. Now try to click the second
(from the plus sign
on the left) zoom button to enlarge the area. Your should see a display
window as below:
This picture clearly indicates that
the second insertion is in an exon (the fifth exon) of tp53 and the
third insertion is in the fourth intron.
Step 3. We have identified an insertion in an exon of the tp53
gene. You may be interested in the exact base position on the zebrafish
genome this virus is landed. To find the answer for this, let us
collapse the "Basepair view" panel right below the "Detailed view"
panel
if there is a plus sign right before the "Basepair view" label. Now you
should see a triangle in the "ZeneMarker" track and the six frame
translations of the protein sequence. The tracks right-above and
-below the blue bar show the nucleotide sequences. To better view
the sequences, try to use the zoom in button to make the letters for
the bases clearly visible. Below is a snapshot of the window after
zoming the small window in to display only the 50 bp sequences:
Now you can clearly see that the
virus is inserted after the "GAACGGG" and before the "GCAAAGT" in the
nucleotide sequence.
Step 4. Having seen the locations of the insertions, you might be
interested in the actual sequences used in the mapping of the
insertions. Perhaps you also want to check by youself how good a given
zenemarker sequence mapped to a given location or if a zenemarker
sequence can be mapped to several locations. You can follow these step
to do so:
First, mouse over a insertion mark (the purple triangle) of interest
and then click on it. For example, click on the second triangle you
will see a small pop-up menu as shown below:
In this pop-up menu, it shows the zenemarker id you just clicked,
followed by "Details" and "Sequence" options. Now select the "Details"
option. You should see a page like below (since this page does not fit
into a single window, it is splitted into two):
This page shows that the ZM_00101395 is mapped on chromosome 5 as
indicated by the red triangle. If you scroll this page down, you can
see a "Feature Information" table. In this case, we just see a single
row in this table. This means ZM_00101395 is unambiguously
mapped on chromosome 5 at the base position of 16162422. Occationaly,
you may see over one rows in the "Feature Information" table.
This means the zenemarker was mapped on multiple locations with the
same equal match. This could be caused by the nature that the zebrafish
genome contains tandom repeats. Alternatively, it can also be caused by
genome sequence assembly errors. Next, try to click on the
same insertion mark again and select "Sequence". You should
see a page like below:
This page shows the actual viral flanking sequence used for the
mapping. By convention, we put the host-virus boundary at the 3' end of
the seqence. This sequence flanks the 3' end of the virus as indicated
in the FASTA defline. We tried to sequence both the 5' and the 3'
sequences flanking the virus insertion sites. But the sequences on the
other side may not be always available.
Now to confirm this insertion or map this insertion by youself, copy
this sequence (both the first line and the second line in this small
window). Go back to the ZeneMark database home page and click the "Run
a BLAST search" link (you have done this at the beginning of this
tutorial). Paste the sequence onto the query sequence input window then
click the "Run" button. You should see the page bellow after the BLAST
search is done (again, this page is splitted into two):
The top window in the page shows that the insertion is mapped on
chromosome 5. The bottom window shows alignment summary. Note that
there is just one row in the alignment summary table, indicating that
this zenemarker is uniquely mapped on the zebrafish genome. To view the
alignment, click the [S] link to view the query sequence. You should
see a page like this:
Note that all the bases in the query sequence are aligned. All of the
bases are high-lighted with red color. Also note that the query
sequence is 51 bp in length. Next, click the [A] link to view the
actual alignment:
This "BlastView" page shows that the query sequence matched perfectly
to the target (the zebrafihs genomic sequence). Next, click on
the [C] link to go to the "ContigView" window to view the location of
the insertion on the fish genome:
In this window, you can see that the BLAST hits is displayed as a new
track right below the "ZeneMarker" track. The red rectangle in the
"BLAST hits" track indicates the location of the query sequence on the
fish chromosome. Clearly, the 3' end of the query sequence aligned with
the tip of the first purple triangle (ZM_00101395) (Try to center this
region and use the zoom tool to have a better view). This is where the
virus is inserted. Now we have confirmed that there is a insertion
(ZM_00101395) in an exon of the tp53 gene based on the alignment quality
of the flanking sequence on the fish genome and its unique chromosomal
location. At this point, users may contact us about this insertion and
we will further validate this insertion by experimental method.
Congratulations! You have just finished this tutorial and now you
should
feel comfortable with the use of our ZeneMark database.