CybertoryTM Shotgun Sequencing Exercise


Part 1: Initial Fragment Assembly

The first part of the shotgun sequencing experiment has been done for you. Twenty five clones were selected from a random insert library and sequenced on both ends with universal primers. These fifty sequence traces are in a zip file called “shotgun.zip”, together with a configuration file to run the Staden fragment assembly package.

  1. Unzip the shotgun.zip file.

  2. In the new "shotgun" directory, double click the file named "config.pg4". This will launch the PreGap4 program.

  3. Click the "Files to Process" tab.

  4. Click the "Add files" button.

  5. Fine the shotgun directory. Change "Files of type..." to "SCF". Select all 25 SCF files (Control-A), and click "Open".

  6. Click the "Configure Modules" tab.

  7. Check to see that the following modules are selected (You can read the help documentation to see what these modules do):

    1. Estimate Base Accuracy

    2. Trace Format Conversion

    3. Initialize Experiment Files

    4. Augment Experiment Files

    5. Quality Clip

    6. Sequencing Vector Clip

    7. Screen For Unclipped Vector

    8. Gap4 shotgun assembly

  8. Check that the following values are set:

    1. Quality Clip : Clip mode: "by confidence", Window length 50, average confidence 25.

    2. Gap4 database name "HIV", database version "0", "Create new database".

  9. Click "Run". When it finishes, close Pregap4.

  10. Look in the "shotgun" directory again. You should find a new file named "HIV.0.aux": double click this file to launch Gap4.

  11. Two windows will open, the main Gap4 window and the "Contig Selector".

  12. In the main Gap4 window, select "View: Template Display". This opens a "Show templates" dialog box. Be sure the "all contigs" radio button is selcted and the "Templates" and "Readings" checkboxes are checked. Click "OK" (see Figure 1: "original assembly").


The graphical template display shows how the 25 sequencing "reads" have been assembled into 7 contiguous blocks ("contigs"). The "reads" are arrows, and the lines between them are templates. Note that there are two reads per template. Each template was sequenced on each end using universal primers that read into the insert from the plasmid cloning vector. Two ends read from the same template are called a "read pair".


All of our templates are about 1200 bases long, plus or minus about 200 bases, and the read pairs both read in from the ends of the template. We can rearrange the contigs so that the display of read pairs is consistent with these facts.


Note that the two contigs at the right end of Figure 1 are pointing out from their templates, rather than in. RIGHT-CLICK on the contig lines at the bottom of the Templates Display window to bring up a context menu that lets you "Complement" the contig. Figure 2 ("complemented contigs") shows what they should look like when you're done.


Notice the templates drawn in yellow. Each has one read in one contig, and the other read in a different contig. Later in the exercise we will sequence the middle parts of these templates, which will let us join the contigs that these templates span. But first, note that some of the contigs do not have templates that would let us connect them. We must go back to the clone library and sequence more clones.


Save your Gap4 database with a new version number (version 1) by choosing “File” Copy database” from the main window's top menu and entering “1” in the box marked “New version character”. Exit Gap4.


    Part 2: Sequencing additional clones.


Open the web page for the CybertoryTM sequence trace generator.

http://www.cybertory.org/cgi-bin/seq2/getTrace.cgi?cloneSet=HIV_shotgun


The sequence fragments you have already assembled are from clones 1 through 25. We must have sequence from at least five more clones, so teams will be assigned clones to sequence, starting with “HIVsubclone026”. Each clone should be sequenced using both “forward” and “reverse” primers. Copy and paste the appropriate primer sequence into the “Primer sequence” box. Be sure to appropriately select “forward” or “reverse” from the “primer strand” pull down list. Since these are universal primers, select “vector (step 1)” from the “priming site” pull-down list.


The default reaction conditions should work for the universal primers. Be sure to enter the names of everyone on your team into the “user name” box. This will help us to troubleshoot your reactions if they do not work as expected. After you click “run sequencing reaction”, it will take six or seven seconds for your virtual sequence trace to be created. Note the name of the result file: it will be something like “HIVsubclone026-p1t.scf”. Be sure you have the correct subclone. Note the leter following the dash: it will be “p” if you told the program this primer was on the forward strand, and “q” for the reverse. The number “1” indicates that you said you were using a universal primer. Be sure these items are correct, and save the trace file.


Once all groups have sequenced their assigned clones, we will collect them and give everyone a copy of all the traces to use in the next round of assembly.


Part 3: Adding new traces to the assembly.


Copy the new sequence traces into your project folder. Open Pregap4 again by double clicking the “congif.pg4” file. Select the “Files to process” tab, and click the “Add files” button. Select the new SCF files, starting with “HIVsubclone026-p1t.scf”. Be sure to set “Files of type” to “SCF”!


On the “Configure modules” tab, click on the “Gap4 shotgun assembly” module. Enter “1” in the “Gap4 database version” field, and select the “Append to existing database” radio button. Click the “Run” button in the lower left corner of the window. When the program reports “processing finished”, close Pregap4.


Now open Gap4 again, bu this time by double-clicking the file “HIV.1.aux”. This is the new version of the database where Pregap4 put the latest trace data. Open the Templates Display window (“View: Template Display”, “OK”). It should resemble Figure 3.


Note that several of the templates do not seem to be displayed correctly. All the templates in our subclone library have inserts of roughly the same size (1200 +/- ~200 bp). Since each should have been sequenced from both ends using the forward and reverse universal primers, there should be an arrow representing the sequencing read from each end pointing in toward the middle of the insert. Because some of our templates are drawn much too long, and not all of the arrows point in from the ends, we need to rearrange the contigs so they are consistent with what we know about our templates and sequence reads.


As we saw earlier, clicking on a contig with the right mouse button brings up a context menu that lets yo complement the contig. This will change the direction of all the read arrows in that contig. Click on a contig using the MIDDLE MOUSE BUTTON to drag it left or right to a new position (if your mouse has a wheel, it will probably work as a middle mouse button, too). You may have to click a few time in slightly different spots to grab the contig line successfully.


Use these operations to rearrange the contigs until all templates are drawn about the right length, with one read coming in from each end, as in Figure 4.


Note that the templates drawn in yellow or dark yellow all cross boundaries between contigs. The next part of the exercise will be to use custom primers to sequence the middle parts of some of these templates, to see if we can obtain enough sequence to join some of our contigs together. For example, the contigs named "HIVsubclone010-q1t" and "HIVsubclone021-p1t" wold presumably be joined if we had better sequence from the middle part of clone 9, 19, or 21. (Point the mouse at a contig, template, or read to see its name. Each contig is named after the leftmost read that it contains.)


Part 4: Sequencing clones that connect contigs.


At this point, the sequence is assembled into 7 contigs, which means that there are 6 boundaries between contigs. Each group of students will be assigned one of these boundaries, and will do additional sequencing reactions to try to get enough information to join the contigs.

Student group

Clones

1

9, 19 or 21

2

29

3

30 or 29

4

18, 24 or 30

5

26 or 28

6

15, 16, 27


  1. Design custom sequencing primers to sequence the regions of these templates that span across contigs.

  2. Use them in simulated sequencing reactions. Check your traces in the trace viewer (Trev) to be sure they worked (if not, you may need to check your primer design or adjust your reaction conditions).

  3. Submit your reads to the web site (the instructor will demonstrate).

  4. Once all teams have submitted their results, each group should download the trace files and add them to their own assembly.


Part 5: “Finishing” and editing the sequence.


At this point, all the templates should be joined into a single contig. On the Template Display window, choose “View: Quality Plot”, then select the contig. A color-coded display will be drawn below the contig line; see the online help for an explanation of the colors. Blocks of green and blue represent areas that have been sequenced on only one strand. If time permits, we may run additional sequencing reactions to sequence the other strand in these regions.


Editing is the process of resolving inconsistencies among different reads, usually by going back to the traces and deciding what sequence is most consistent with the experimental results. Gap4 is extremely helpful in this process. Click on a problem area in the Quality Plot, and the corresponding sequences will be opened in the Contig Editor. Click on the consensus sequence in the contig editor and the original traces will be displayed together so you can compare them and decide which sequence to believe. Edit the sequences in the Contig Editor to record your choices. Given time, you should be able to resolve all of the ambiguities to determine a high-quality sequence for the entire target gene.


Check the quality of the sequence data, and perform additional reactions to "finish" the sequence.

Figure 1: Original assembly of sequencing reads into contigs by Gap4.



Figure 2. The two rightmost contigs have been complemented so that the read pairs point toward each other (into the template).



Figure 3: New traces added to the assembly.



Figure 4: Contigs have been rearranged so templates are within the expected size, and have reads coming in from both ends.