Worm Breeder's Gazette 10(1): 24

These abstracts should not be cited in bibliographies. Material contained herein should be treated as personal communication and should be cited as such only with the consent of the author.

Collagen Gene Families

J. Cox, D. Hirsh, B. Rosenzweig, C. Fields, J. Kramer

Figure 1

We determined the DNA sequences of several collagen genes with 
different expression patterns during development and compared them to 
the previously sequenced genes col-1 and col-2.  The genes chosen for 
study were col-1 and col-14 which are expressed at varying levels 
throughout development, col-2 and col-6 which are dauer-specific and 
col-7, e expressed primarily in 
animals molting into adults.  Each gene is 1.0 to 1.2 kb in length and 
includes one or two short introns at variable positions.  The 
presumptive promoter regions contain the expected eukaryotic TATA and 
CAAT sequences.  The sequence TAT CTTTCTCTY TTCTTYCT (Y=C or T) is 
present 30 bp and 74 bp upstream of the CAAT box in col-2 and col-6, 
respectively.  The sequence AAATTT YAYCAATRT TTATT AATT is present 203 
and 183 bp upstream of the presumptive CAAT boxes in col-7 and col-19 (
R=A or G; the relevant region of col-8 was not sequenced).  The 
correlation between the presence of these sequences and the similar 
expression profiles of the relevant genes suggests that these 
sequences may be involved in the developmental regulation of the genes.
The domain structure of the collagen polypeptides is similar to that 
determined for col-1 and col-2: each polypeptide contains two main 
triple-helix forming (Gly-X-Y)n domains, one of 30-33 amino acids, and 
the other of 127-132 amino acids.  The latter domain is interrupted by 
short (2-8 amino acids) non-Gly-X-Y segments in each polypeptide.  
Sets of cysteine residues flank the (Gly-X-Y)n domains in all of the 
polypeptides.  The genes can be placed into three families based upon 
structural features (overall protein length and organization of (Gly-X-
Y)n domains), positions of cysteine residues and amino acid sequence 
homologies.  The amino acid sequence homologies are most evident in 
the non-Gly-X-Y domains.  As an example, the C-terminal tail sequences 
are shown below.  Col-1 and col-2 comprise one family, col-6 and col-
14 comprise a second family and col-8 and col-19, with the less 
homologous col-7, comprise the third family.  Members of a family can 
be coordinately regulated as in the case of col-8, 
ave different expression patterns 
as in the cases of col-1 and col-2 or col-6 and col-14.The codon usage 
in all of the genes is highly asymmetrical, with adenine appearing in 
the third position of 85% of the Gly codons and 93% of the Pro codons.

Figure 1