*** Raymond Lee SOPs: ***
Updated: August 1, 2002
General description:
There are three major parts in an RNAi data object: Sequence, Experiment and Results.
Sequence: an RNAi experiment is almost always based on a specific piece of sequence, therefore the Sequence part. The sequence information allows a stable mapping of the RNAi object to the genome.
Experiment: other experimental information is stored in Experiment, which covers the details of the actual experiment.
Result: this section contains observations result from the experiment, such as the gene inhibited and phenotype. In addition, there are Reference and Remarks.
The rather simple form of an RNAi curation process consists of a curator filling out each of the sections in an ace file format and submitting the file to a local database (citace in my case). However, almost always, the situation is more complex: the curator must also define new Sequence objects and sometimes new Phenotype objects at the same time.
Schemas relevant to RNAi data:
The exact format of an ace file is dependant on ACeDB data models. (model for each data class is flanked by a pair of ***). For the purpose of this document, I assume that the reader has a general understanding of the syntax of ACeDB models. If this assumption is not warranted, I encourage the reader to consult ACeDB documentation, in particular the "Models" section of .
***
?RNAi SMap S_Parent Sequence UNIQUE ?Sequence XREF RNAi // canonical parent (reliable)
PCR_product UNIQUE ?PCR_product XREF RNAi // primer pair (reliable)
Method ?Method
Experiment Laboratory ?Laboratory
Author ?Author
Date UNIQUE DateType
Strain UNIQUE ?Strain
Delivered_by UNIQUE Bacterial_feeding
Injection
Soaking
Transgene_expression
Inhibits Predicted_gene ?Sequence XREF RNAi_result
Locus ?Locus XREF RNAi_result
Supporting_data Movie ?Movie XREF RNAi
Picture ?Picture XREF RNAi
Reference ?Paper XREF RNAi
Phenotype ?Phenotype XREF RNAi #Phenotype_info
Remark ?Text
***
In addition, to fully curate a set of RNAi data, several other ACeDB datatypes need to be included. Listed here are their current (WS82) schemas.
(Only the relevant parts of ?Sequence, ?PCR_product, ?Oligo, #SMap_info are included here)
***
?Sequence SMap S_Parent Canonical_parent UNIQUE ?Sequence XREF Genomic_non_canonical
PCR_product ?PCR_product XREF Canonical_parent UNIQUE Int UNIQUE Int #SMap_info
Nongenomic ?Sequence XREF Genomic_parent UNIQUE Int UNIQUE Int #SMap_info
RNAi ?RNAi XREF Sequence UNIQUE Int UNIQUE Int #SMap_info
***
***
?PCR_product SMap S_Parent Canonical_parent UNIQUE ?Sequence XREF PCR_product
S_Child RNAi ?RNAi XREF PCR_product UNIQUE Int UNIQUE Int #SMap_info
Oligo ?Oligo XREF PCR_product UNIQUE Int UNIQUE Int
***
***
#SMap_info Method ?Method // methods used in child (optional)
***
***
?Oligo Sequence UNIQUE Text // verbatim sequence
Length UNIQUE Int
In_sequence ?Sequence XREF Oligo
PCR_product ?PCR_product XREF Oligo
***
***
#Phenotype_info Remark ?Text // specific remarks about this instance
Penetrance UNIQUE Int // estimated percent penetrance (0-100)
Temperature_sensitive Int UNIQUE Int // temperature (degrees C), penetrance
Quantity UNIQUE Int UNIQUE Int // for quantitative phenotypes low/high
***
Examples:
As far as I can comprehend, there's not one straight forward, universal procedure to curate RNAi data. I'll try to demonstrate how I curate by examples. In each Case, I first include the resulting ace files and then explain how I got at them.
[ace file]
RNAi [cgc4562]:F55A8.2
Author "Stansberry J"
Author "Baude EJ"
Author "Taylor MK"
Author "Chen PJ"
Author "Jin SW"
Author "Ellis RE"
Author "Uhler MD"
Date 2001-02-01
Delivered_by Injection
Predicted_gene F55A8.2a
Predicted_gene F55A8.2b
Reference [cgc4562]
Phenotype Unc Remark "animals move slowly and irregularly"
Remark "template cDNA (cloned)"
[Explanation]
"[cgc4562]:F55A8.2" is the name of this RNAi object. This name must be unique within the class and I routinely use [PAPER]:GENE as the name because I think this format is both informative and likely unique. If a paper describes more than one experiment involving the same gene, I then append :N to the base name.
In this case, the authors failed to provide sufficient information for me to determine what exact sequence was the experiment based on. Thus, there's no Sequence tag. In the Remark, I noted that the template was a cloned cDNA. This is somewhat important since Sequence can only be defined as a contiguous piece in RNAi.
If Seqeunce information is provided, the "Predicted_gene" field should not be entered by the curator but rather calculated based on the sequence information during each database build. In this case, I filled Predicted_gene manually based on authors' statements.
[ace file]
Sequence : R12B2
Nongenomic yk428e9 1075 2865
Sequence : yk428e9
RNAi "[cgc4737]:him-10" 1 1791
Method cDNA_for_RNAi
From_Laboratory YK
Remark "EST clone used in RNAi assay"
RNAi "[cgc4737]:him-10"
Method RNAi
Laboratory "TY"
Author "Howe M"
Author "McDonald KL"
Author "Albertson DG"
Author "Meyer BJ"
Date "2001-06-11"
Reference [cgc4737]
Phenotype "Emb" Remark "97% Emb+Lva or less"
Phenotype "Lva"
Phenotype "Unc"
Phenotype "Stp" Remark "likely due to somatic mitotic defects"
Remark "aberrant mitotic chromosome segregation in somatic cells"
Remark "abnormal mitotic structures: aneuploidy and micronuclei"
Remark "abnormal mitotic kinetochore morphology, visualized by electron microscopy"
Remark "normal HCP-3 localization to centrosomes"
[Explanation]
The sequence object in this case is an EST clone. Sequence information of this clone exists in ACeDB. I used that information to attach the template object ("Nongenomic yk428e9") to the Parent genomic canonical object. This is generally referred to 'Smapping'. The pair of integers "1075 2865" denote the extend of yk428e9. I get at this manually. In theory, however, this should be done automatically since the underlying Genomic canonical sequence can drift.
The RNAi object "[cgc4737]:him-10" is then Smapped onto the Nongenomic Sequence object (from beginning:1 to end:1791). Note here again that even though the RNAi template object is a cDNA clone and therefore may be aligned to the genomic sequence with gaps, the RNAi data model is oblivious to this fact; it treats all sequences as if they were contiguously mapped onto the genomic.
"Method" specifies how each sequence object will be marked for display and perhaps other purposes.
[ace file]
Oligo : [cgc4736]:hcp-4:1
Sequence ggaaatgtacggagcgaaaa
Length 20
In_sequence T03F1
Oligo : [cgc4736]:hcp-4:2
Sequence acattgttggtgggtccaat
Length 20
In_sequence T03F1
PCR_product : [cgc4736]:hcp-4
RNAi [cgc4736]:hcp-4 1 1503
Method GenePairs
Oligo [cgc4736]:hcp-4:1
Oligo [cgc4736]:hcp-4:2
Remark "PCR fragment for hcp-4 RNAi"
Sequence : T03F1
PCR_product [cgc4736]:hcp-4 26157 27659
Oligo [cgc4736]:hcp-4:1 27659 27640
Oligo [cgc4736]:hcp-4:2 26157 26176
RNAi : [cgc4736]:hcp-4
Method RNAi
Locus hcp-4
Laboratory "TH"
Author "Oegema K"
Author "Desai A"
Author "Rybina S"
Author "Kirkham M"
Author "Hyman AA"
Date "2001-06-11"
Delivered_by Injection
Reference [cgc4736]
Phenotype "Emb" Remark "mitotic chromosome segregation defective"
Phenotype "Emb" Remark "blocks CeBub1 (R06C7.8) localization onto chromosomes"
Phenotype "Emb" Remark "blocks CeMCAK (K11D9.1) localization onto chromosomes"
Remark "hcp-4 aka CeCENP-C"
Remark "hcp-4 and hcp-3 RNAi phenotypes are essentially identical"
[Explanation]
In this case I defined two Oligo objects and a PCR_product so that the RNAi can be attached to the PCR_product directly and to T03F1 (genomic canonical) indirectly. Doing it this way allows the RNAi to be dynamically mapped in the genome via the sequence. This is important given that the genome is still fluid. I note that in case II, RNAi can also be remapped via the association to an EST whose sequence (at least the ends) is known.
[ace file]
Phenotype : Unclassified
Description "yet to be classified"
Oligo : [cgc4761]:A1-sense
Sequence ttcaaacggatacttctctcagtgag
Oligo : [cgc4761]:A1-asense
Sequence ATCACTTCGTAATTGACAGTTCGTGTG
Sequence : C03A3
PCR_product [cgc4761]:A1 37234 38940
Oligo [cgc4761]:A1-sense
Oligo [cgc4761]:A1-asense
PCR_product : [cgc4761]:A1
RNAi [cgc4761]:A1 1 1707
Method GenePairs
Oligo [cgc4761]:A1-sense
Oligo [cgc4761]:A1-asense
Remark "cDNA PCR fragment used in RNAi assay"
RNAi : [cgc4761]:A1
Method RNAi
Author "Karabinos A"
Author "Schmidt H"
Author "Harborth J"
Author "Schnabel R"
Author "Weber K"
Date 2001-07-03
Delivered_by Injection
Reference [cgc4761]
Phenotype Lvl Remark "L1 arrest"
Phenotype Unclassified Remark "wavy and swollen intestine"
[Explanation]
In this case I defined a new phenotype "Unclassified" and used it to describe not yet defined (in ACeDB per se) phenotypes. The intention is to allow a curator, at a later time, to pull out all the Unclassified phenotypes to properly classified them into a DAG (directed acyclic graph) structure.