We have created a library of 2,000 mutagenized C. elegans strains, each sequenced to an average depth of 15X to reveal most mutations. The library contains over 700,000 single nucleotide variants (SNVs) with, on average, 8 non-synonymous changes per gene. We generated the library using the mutagens EMS, ENU or a cocktail of EMS plus ENU. F1 populations were screened in nicotine for animals heterozygous for unc-22 mutations to ensure effectiveness of the mutagen. F2 populations were screened again to select non-unc-22 animals, and the resulting lines were selfed for a further eight generations to drive all genomic regions toward homozygosity. Whole-genome sequencing was done with paired-end reads on Illumina GAII or Hi-Seq machines using size-selected and molecularly bar-coded DNAs. Reads were aligned using Phaster (P. Green, unpublished) and SNVs were called using SamTools and custom filters. Indels and rearrangements were identified with custom tools.
Analysis of the data from the first 1,794 strains has yielded 705,748 SNPs in 20,066 genes (averaging 390 per strain). These include 159,338 non-synonymous changes in 19,449 genes (eight new alleles per gene). Of these mutations, 9,829 are knockouts (nonsense or spicing defects) in 6,774 genes, for an average of more than four per strain. Based on read numbers, the rDNA repeat copy number is surprisingly variable, with some strains having fewer than 60 copies and a few having more than 150. We have supplemented these mutagenized strains with 40 natural isolates to recover an additional 500,000 mutations. The mutation data for the first 600 mutated strains have been deposited in WormBase, with the rest of the data in process. A separate website allows direct queries of the data (http://genome.sfu.ca/mmp/). Nearly all of the 2,000 individual strains are available from the Caenorhabditis Genetics Center. We are currently building frozen kits containing all the strains in 96-well arrays, allowing parallel experimentation on a wide spectrum of mutant genes. The resource should provide rapid access to multiple mutations in any gene of interest as well as allow investigation of gene-gene interactions.