*** Raymond Lee SOPs: ***
Updated: August 1, 2002


General description:

There are three major parts in an RNAi data object: Sequence, Experiment and Results.

Sequence: an RNAi experiment is almost always based on a specific piece of sequence, therefore the Sequence part. The sequence information allows a stable mapping of the RNAi object to the genome.

Experiment: other experimental information is stored in Experiment, which covers the details of the actual experiment.

Result: this section contains observations result from the experiment, such as the gene inhibited and phenotype. In addition, there are Reference and Remarks.

The rather simple form of an RNAi curation process consists of a curator filling out each of the sections in an ace file format and submitting the file to a local database (citace in my case). However, almost always, the situation is more complex: the curator must also define new Sequence objects and sometimes new Phenotype objects at the same time.


Schemas relevant to RNAi data:

The exact format of an ace file is dependant on ACeDB data models. (model for each data class is flanked by a pair of ***). For the purpose of this document, I assume that the reader has a general understanding of the syntax of ACeDB models. If this assumption is not warranted, I encourage the reader to consult ACeDB documentation, in particular the "Models" section of <http://www.acedb.org/Admin_curation/>.


***
?RNAi   SMap S_Parent Sequence UNIQUE ?Sequence XREF RNAi       // canonical parent (reliable)
 		      PCR_product UNIQUE ?PCR_product XREF RNAi // primer pair (reliable)
        Method ?Method
        Experiment      Laboratory ?Laboratory
                        Author ?Author
                        Date UNIQUE DateType
                        Strain UNIQUE ?Strain
                        Delivered_by UNIQUE Bacterial_feeding
                                            Injection
                                            Soaking
                                            Transgene_expression
        Inhibits	Predicted_gene ?Sequence XREF RNAi_result
			Locus ?Locus   XREF RNAi_result
        Supporting_data Movie ?Movie XREF RNAi
                        Picture ?Picture XREF RNAi
	Reference ?Paper XREF RNAi
        Phenotype ?Phenotype XREF RNAi #Phenotype_info
        Remark ?Text
***
In addition, to fully curate a set of RNAi data, several other ACeDB datatypes need to be included. Listed here are their current (WS82) schemas.

(Only the relevant parts of ?Sequence, ?PCR_product, ?Oligo, #SMap_info are included here)

***
?Sequence SMap S_Parent Canonical_parent UNIQUE ?Sequence XREF Genomic_non_canonical
		       	PCR_product ?PCR_product XREF Canonical_parent UNIQUE Int UNIQUE Int #SMap_info
		       	Nongenomic ?Sequence XREF Genomic_parent UNIQUE Int UNIQUE Int #SMap_info
		       	RNAi ?RNAi XREF Sequence UNIQUE Int UNIQUE Int #SMap_info
***

***
?PCR_product SMap S_Parent Canonical_parent UNIQUE ?Sequence XREF PCR_product
 		  S_Child RNAi ?RNAi XREF PCR_product UNIQUE Int UNIQUE Int #SMap_info
 		  Oligo ?Oligo XREF PCR_product UNIQUE Int UNIQUE Int
***

***
#SMap_info Method ?Method    // methods used in child (optional)
***

***
?Oligo  Sequence UNIQUE Text  // verbatim sequence 
        Length UNIQUE Int
        In_sequence ?Sequence XREF Oligo
	PCR_product ?PCR_product XREF Oligo
***

***
#Phenotype_info Remark ?Text                            // specific remarks about this instance
                Penetrance UNIQUE Int                   // estimated percent penetrance (0-100)
                Temperature_sensitive Int UNIQUE Int    // temperature (degrees C), penetrance
                Quantity UNIQUE Int UNIQUE Int          // for quantitative phenotypes low/high
***


Examples:

As far as I can comprehend, there's not one straight forward, universal procedure to curate RNAi data. I'll try to demonstrate how I curate by examples. In each Case, I first include the resulting ace files and then explain how I got at them.

<Case I - Unspecified sequence>

[ace file]
RNAi	[cgc4562]:F55A8.2
Author	"Stansberry J"
Author	"Baude EJ"
Author	"Taylor MK"
Author	"Chen PJ"
Author	"Jin SW"
Author	"Ellis RE"
Author	"Uhler MD"
Date	2001-02-01
Delivered_by	Injection
Predicted_gene	F55A8.2a
Predicted_gene	F55A8.2b
Reference	[cgc4562]
Phenotype	Unc	Remark	"animals move slowly and irregularly"
Remark	"template cDNA (cloned)"

[Explanation]
"[cgc4562]:F55A8.2" is the name of this RNAi object. This name must be unique within the class and I routinely use [PAPER]:GENE as the name because I think this format is both informative and likely unique. If a paper describes more than one experiment involving the same gene, I then append :N to the base name.

In this case, the authors failed to provide sufficient information for me to determine what exact sequence was the experiment based on. Thus, there's no Sequence tag. In the Remark, I noted that the template was a cloned cDNA. This is somewhat important since Sequence can only be defined as a contiguous piece in RNAi.

If Seqeunce information is provided, the "Predicted_gene" field should not be entered by the curator but rather calculated based on the sequence information during each database build. In this case, I filled Predicted_gene manually based on authors' statements.


<Case II - Existing ACeDB sequence>

[ace file]
Sequence : R12B2
Nongenomic yk428e9 1075	2865

Sequence : yk428e9
RNAi "[cgc4737]:him-10" 1 1791
Method cDNA_for_RNAi
From_Laboratory YK
Remark "EST clone used in RNAi assay"

RNAi     "[cgc4737]:him-10"
Method RNAi
Laboratory       "TY"
Author   "Howe M"
Author   "McDonald KL"
Author   "Albertson DG"
Author   "Meyer BJ"
Date     "2001-06-11"
Reference	[cgc4737]
Phenotype	"Emb"	Remark "97% Emb+Lva or less"
Phenotype       "Lva"
Phenotype       "Unc"
Phenotype       "Stp"	Remark "likely due to somatic mitotic defects"
Remark	"aberrant mitotic chromosome segregation in somatic cells"
Remark	"abnormal mitotic structures: aneuploidy and micronuclei"
Remark	"abnormal mitotic kinetochore morphology, visualized by electron microscopy"
Remark	"normal HCP-3 localization to centrosomes"

[Explanation]
The sequence object in this case is an EST clone. Sequence information of this clone exists in ACeDB. I used that information to attach the template object ("Nongenomic yk428e9") to the Parent genomic canonical object. This is generally referred to 'Smapping'. The pair of integers "1075 2865" denote the extend of yk428e9. I get at this manually. In theory, however, this should be done automatically since the underlying Genomic canonical sequence can drift.

The RNAi object "[cgc4737]:him-10" is then Smapped onto the Nongenomic Sequence object (from beginning:1 to end:1791). Note here again that even though the RNAi template object is a cDNA clone and therefore may be aligned to the genomic sequence with gaps, the RNAi data model is oblivious to this fact; it treats all sequences as if they were contiguously mapped onto the genomic.

"Method" specifies how each sequence object will be marked for display and perhaps other purposes.


<Case III - Oligo-defined sequence>

[ace file]
Oligo	: [cgc4736]:hcp-4:1
Sequence        ggaaatgtacggagcgaaaa
Length  20
In_sequence     T03F1

Oligo	: [cgc4736]:hcp-4:2
Sequence        acattgttggtgggtccaat
Length  20
In_sequence     T03F1

PCR_product :   [cgc4736]:hcp-4
RNAi    [cgc4736]:hcp-4 1 1503
Method  GenePairs
Oligo   [cgc4736]:hcp-4:1
Oligo   [cgc4736]:hcp-4:2
Remark  "PCR fragment for hcp-4 RNAi"

Sequence :      T03F1
PCR_product	[cgc4736]:hcp-4 26157 27659
Oligo	[cgc4736]:hcp-4:1	27659 27640
Oligo	[cgc4736]:hcp-4:2	26157 26176

RNAi	: [cgc4736]:hcp-4
Method RNAi
Locus	hcp-4
Laboratory       "TH"
Author   "Oegema K"
Author   "Desai A"
Author   "Rybina S"
Author   "Kirkham M"
Author   "Hyman AA"
Date     "2001-06-11"
Delivered_by	Injection
Reference	[cgc4736]
Phenotype       "Emb"   Remark  "mitotic chromosome segregation defective"
Phenotype	"Emb"	Remark	"blocks CeBub1 (R06C7.8) localization onto chromosomes"
Phenotype	"Emb"	Remark	"blocks CeMCAK (K11D9.1) localization onto chromosomes"
Remark	"hcp-4 aka CeCENP-C"
Remark	"hcp-4 and hcp-3 RNAi phenotypes are essentially identical"

[Explanation]
In this case I defined two Oligo objects and a PCR_product so that the RNAi can be attached to the PCR_product directly and to T03F1 (genomic canonical) indirectly. Doing it this way allows the RNAi to be dynamically mapped in the genome via the sequence. This is important given that the genome is still fluid. I note that in case II, RNAi can also be remapped via the association to an EST whose sequence (at least the ends) is known.


<Case IV - Unclassified pheontype>

[ace file]
Phenotype : Unclassified
Description "yet to be classified"

Oligo :	[cgc4761]:A1-sense
Sequence	ttcaaacggatacttctctcagtgag

Oligo :	[cgc4761]:A1-asense
Sequence	ATCACTTCGTAATTGACAGTTCGTGTG

Sequence : C03A3
PCR_product [cgc4761]:A1 37234 38940
Oligo	[cgc4761]:A1-sense
Oligo	[cgc4761]:A1-asense

PCR_product : [cgc4761]:A1
RNAi [cgc4761]:A1 1 1707
Method GenePairs
Oligo	[cgc4761]:A1-sense
Oligo	[cgc4761]:A1-asense
Remark "cDNA PCR fragment used in RNAi assay"

RNAi : 	[cgc4761]:A1
Method	RNAi
Author	"Karabinos A"
Author	"Schmidt H"
Author	"Harborth J"
Author	"Schnabel R"
Author	"Weber K"
Date	2001-07-03
Delivered_by	Injection
Reference	[cgc4761]
Phenotype	Lvl	Remark "L1 arrest"
Phenotype	Unclassified	Remark "wavy and swollen intestine"

[Explanation]
In this case I defined a new phenotype "Unclassified" and used it to describe not yet defined (in ACeDB per se) phenotypes. The intention is to allow a curator, at a later time, to pull out all the Unclassified phenotypes to properly classified them into a DAG (directed acyclic graph) structure.