AceView gene models now integrate high throughput cDNA sequences

We are pleased to announce that we have successfully integrated 755 million high throughput transcriptome sequences into NCBI AceView worm gene models. Half the genes have changed!

1) You can explore the new features of your gene on our website www.aceview.org (click on "worm"; type gene names, accessions, or meaningful words; our worm site is updated monthly).

2) Please consider sharing your transcriptome data, so it gets integrated into better gene models for the benefit of all. Submit to the public databases NCBI, DDBJ or EBI, or write to us.

AceView gene models are a comprehensive representation of curated experimental cDNA sequences. For the worm, we gathered and hand edited most of 325,419 sequencing traces from the Kohara, Vidal, Exelixis and NCBI Trace cDNA projects, and added community RNA sequences from the GenBank and dbEST public repositories. Today’s high throughput sequences, including data generously provided for integration by the Baillie, John Kim, Kris Gunsalus and Fabio Piano’s labs and data published by Fraser, Mello and Waterston and in NCBI GEO /SRA (excluding 715 million under ModEncode embargo) multiply by 50 the cDNA coverage! The new data confirms almost all previous AceView annotations but also greatly expands the transcriptome. For example, AceView had 89,514 cDNA supported exon junctions (82,192 were part of the 106,315 predicted in WormBase WS190) and now has 105,512, of which 23,337 are not annotated in WormBase WS190. Similarly, the number of trans-spliced leader addition sites almost doubled, from 13,330 to 24,737, with 2.2 million supporting sequences. In the richest library, L1 larvae from Chua/Shin/Baillie, 0.85% of the tags unambiguously contain a transspliced leader.

With these new transcript data, close to half of the WS190 coding gene models are fully confirmed, but approximately 10% have no cDNA support yet (mainly on chromosomes V, IIL, IR). There are above 24,000 un-annotated or modified exons, notably 5’ exons. Most of these changes affect the coding potential. Close to half the genes now undergo alternative splicing. There are around 400 examples of genes that should be merged or split, and over 500 completely new spliced gene models.

AceView is a web community resource which primarily provides information on transcription, including hand edited cDNA sequences, 5’ cap, splice leader and polyA addition sites and signals, regulation by siRNA or RNA editing, antisense and operons. Protein coding potential (motifs, domains, homologies, uORFs, candidate NMD and non canonical Met) are also annotated. Direct pointers to Kohara in situ hybridizations to developmental stages are provided. In a systems biology spirit, lists of genes most related by phenotype, pathways, function, localization or interactions are also proposed in the “Function” page. Half the genes have official names and half of the AceView gene names reflect position and strand only (chromosome, megabase letter, kilobase, odd: + strand, even: - strand). WormBase ‘cosmid.number’ names are also recognized, and we acknowledge their contribution.

Thanks to all the present and future data contributors. Enjoy!

Published: December 1, 2009 in

Leave a Comment

Your email address will not be displayed and will never be shared or distributed.

Your comment will be held for moderation. The Worm Breeder's Gazette editors reserve the right to refuse offensive or inappropriate comments.