A web-based bioinformatics solution integrated with next generation sequencing to identify molecular lesions of a C. elegans mutant

The capability of generating sequences in a massive-parallel fashion by next generation sequencing technologies (NGS) has revolutionized C. elegans genetics, greatly accelerating the rate to go from mutant to molecular lesion (Hobert, 2010; Sarin et al., 2008). A major hurdle for many C. elegans researchers is that they lack bioinformatics skills and computing infrastructure to analyze Terabytes of sequence data. While tools such as MAQGene (Bigelow et al., 2009) are available, they require installation procedures and maintenance on Linux systems out of the reach of most genetics labs. In addition, the sheer size of the data makes it an unnecessary burden to transfer it from sequencing facilities to local computers. We have established a one-stop pipeline that offers researchers the option to send in raw DNA materials and retrieve analyzed results remotely after sequencing done by Illumina GAII. The researchers are able to reanalyze their data with different parameters if necessary through a Web-base interface.

The pipeline was used to attempt to identify the molecular lesion of an enhancer of the nuclear migration defect of unc-84 (emu) allele that had been traditionally mapped to ~150 kb on X. (1) A DNA sample from the Starr lab was sent to the DNA technology Core at the UC Davis Genome Center. (2) A library was made from mutant genomic DNA and the Core sequenced 85 bp single ends. One lane of sequencing generated ~1.3 Gb of raw sequence. (3) MAQGene aligned the data to the N2 reference genome and characterized differences; the data and results were available through SLIMS, a web-based Laboratory Information Management System developed by the Bioinformatics Core at the UC Davis Genome Center. (3) MAQGene output a list of mutation candidates and associated annotations in Excel format. Our sequence data covered nearly 99% of the genome at least 1X coverage and 97% at 2x or greater coverage. As internal positive controls, the software correctly identified the original unc-84(n369) lesion, 25 SNPs that we had previously confirmed by traditional methods, and a SNP known to be in the starting strain. One uncovered region of about 50 bp was identified within the mapped region. We are testing candidate mutations and small deletions in the yc21 mapped region by traditional methods to determine if any are the cause of the emu phenotype. The pipeline has proved that integrating DNA sequencing directly with downstream bioinformatics analysis is an efficient way to make new technologies more accessible to average C. elegans geneticists. The contact information and pricing is at http://bit.ly/9NgksT.

References

Bigelow H, Doitsidou M, Sarin S, Hobert O. (2009). MAQGene: software to facilitate C. elegans mutant genome sequence analysis. Nat. Methods 6, 549. PubMed

Hobert O. (2010). The impact of whole genome sequencing on model system genetics: get ready for the ride. Genetics 184, 317-319. PubMed

Sarin S, Prabhu S, O'Meara MM, Pe'er I, Hobert O. (2008). Caenorhabditis elegans mutant allele identification by whole-genome sequencing. Nat. Methods 5, 865-867. PubMed

Published: June 10, 2010 in

Leave a Comment

Your email address will not be displayed and will never be shared or distributed.

Your comment will be held for moderation. The Worm Breeder's Gazette editors reserve the right to refuse offensive or inappropriate comments.