Carole SOP for Phenotype curation and Phenotype Ontology -- 20070430 (html assembled by Raymond)

4 sections divided by **********

********** 
Ontology Management:
In general, use OBO-EDIT help for understanding how to do the below standard taskes:
1- Add terms (when add term, add yourself as a reference, add a definition, and a reference (or references) for the definition
2- Delete terms
3- Merge terms.  If you merge two terms, then one term will be rendered obsolete in the .ace ouput.  The obsolete term will have a "Dead" tag and an "Aternate_phenotype" tag that shows the term it has been merged with.  Good to make sure that this doesn't break.
4- Do not destroy a term if it has already been published!
5- Designate which terms are in the phenotype slim file -- these terms will immediately show up on Juacarlos' form as phenotype terms that can be searched
	- Can render all phenotype terms that are part of the phenotype slim as highlighted in the color of your choice.  T0 do this,  in "Advanced Options" tab select Term filtering and Rendering controls.  In the "Term filter tab" select "self" "category" "equals" from each of the three pull-down menus and type phenotype_slim_wb
	- edit by selecting the Category tab at the bottom, and checking the phenotype_slim_wb category
	- (Link to Juancarlos' phenotype search tool: http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/gene_rtv.cgi)
6- Use the reasoner before each build, and periodically in between, to make sure that you do not have any redundant links.  TO use the reasoner, select it from the Plug-in menu.  CLick the reasoner box.  Redundant links are shown with a red arrow.  If you don't understand why that link is redundant, open up the explanation plug-in from the Plug-ins menu and select the link you want to have explained.  AN explanation will appear at the top of the panel.
7- Committing changes to the ontology via CVS:

In terminal window:
$ CVS_RSH="/usr/bin/ssh" ; export CVS_RSH ; echo $CVS_RSH

TO commit:
$ cvs -d :ext:acedb@tazendra.caltech.edu:/var/lib/cvsroot commit PhenOnt

TO checkout, just substitute checkout for commit.  Also can use merge.  To tag a file, substitute "tag WSxxx".

8- Can also use the same commands on a Mac using the terminal window once you download software called, "cvsnt"
http://www.march-hare.com/cvspro/prods.asp? 


**********
Allele-transgene curation form 2

------------------------------------------------
Form Explanation:
------------------------------------------------
Front page, two options:  
1) Query by paper (cgc, pmid, WBPaper)-- grabs all objects (transgene, RNAi, and allele) currently associated with that paper from WormBase dev.  This form will also query Postgres for objects in Postgres but not in WormBase.   Additionally, this form will query the RNAi checkout form, and indicate whether this paper has already been checked out for RNAi curation.  If it has, please do not curate this data type from this paper.  If a paper has already been completely curated, this will also be indicated in the paper query output.

2) Query by object -- 
	a) for allele and transgene, select appropriate object type from drop-down menu, query by entering the name as it appears in the paper (i.e., sy1 or syIsxxx).  The form queries WormBase and Postgres.  If an object is in WormBase, the object is assigned a final name, and at the top of the curation page it will indicate that the object was found in WB.  If an object is already in Postgres, then you are presented with data already entered.  If an object doesn’t have a final name in WormBase, but Mary Ann has already assigned on in Postgres, then a Postgres final name is indicated, and data will be dumped, but  a final name will not appear in the box above the form.
	b) if you are planning to curate a brand-new RNAi object (i.e., if it was not found using a paper search), then select RNAi for object type.  Enter any character in the tempname box.  Tempname will be automatically assigned.  Curate the object as you would the other objects, but provide a descriptor in the "RNAi brief description" text box just below the object name.  This will be used by Igor to identify to which RNAi experiment in the paper the phenotype data should be assigned.  Final name designations for RNAi will be in the format WBRNAixxxxxxxx.
	c) query for a multi-allele object same way as for allele and transgene.  Enter multi-allele information in the format md186;sy1 (in standard order as designated by their chromosomal locations).  Do not use gene names, as these may change.  Nothing will ever be found on WormBase Dev (since we are not dumping this data). Data already entered in Postgres will be presented, if it exists.  These will only have a temp name designation.

***Update RNAi, Update Transgene, Update Variation buttons (at the bottom of the query page) are for Igor's, Wen's, and Mary Ann's use.  If  you are not one of them, then you do not need to use these functions.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Curation Page General:

1) Each "big box" represents data for one paper.  If you perform an object query, and find that there is data already entered, but it is for a different paper, then click the "Add Big Boxes," and you will get another box to enter data for a different paper.

2) Within the big boxes, you can:
	a)  click "More Boxes!" to add more Phentype Ontology terms.
	b) You can also click "Toggle! Hide!" to hide all of the condition, allele information if you want a cleaner window.  This is useful when there is data present from multiple papers for a given object type -- if you want to make it easier for yourself to view annotations from all papers simultaneously.  Helps for consistency.
------------------------------
Curation Page Differences:

Differences between curation pages:
1) RNAi - as indicated above, curators will see an "RNAi brief description" text box.  Enter text here to distinguish one experiment from others contained in the paper that you are curating.  With time, perhaps Igor will let everyone know if he finds he needs a standard format for this text.  WBGene connections are not grabbed for RNAi objects as these mappings are particulary prone to change (described below for alleles and trasngenes).  
2) Allele - sometimes this data may have a Phenotype_remark at the top of the form (at least this is the way we are envisioning it at this point).  This text will come from data imported from Mary Ann.  It will be a one-time import.  This text is static, and we will not add to it.  However, we can use it to translate that text into Phenotype objects, if there is enough information.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Filling in the form:

1) WBGene connections are automatically grabbed from the dev site when you query an object that exists there.  This connection will not be dumped in an ace file.  It provides a double-check that the curator entered the allele or transgene name correctly so that we don't generate new objects accidentally, or end up confusing Mary Ann or Wen when they are reviewing these object names later.  This connection is updated every time one queries and enters the curation page for that object.  These connections can also been updated en-masse.  Having this feature will facilitate keeping track of our curation progress with respect to both gene coverage and allele coverage per gene, in the event that we want to institute some kind of checkout form later on.

2) Paper Reference - input paper name (in WBPaper, cgc, or pmid format), click if you feel that this paper is completely curated for this data type.  This is instituted in the event that people prefer to curate using a gene by gene approach, rather than a paper-by-paper approach for this data type. 

3) Phenotype text Data from geneace only- remarks in this box come from geneace, and are now associated with data from the National Bioresource Project of Japan (NBP alleles).  Some of these remarks are associated with data that was entered prior to Caltech taking over the phenotype data.  If the remarks are useful, then they should be repeated in another Remark box, because data from this box should not be dumped once the new phenotype models are in place (currently we are using transitional models).

4) Remark - This is instituted for flexibility.  Used for internal remarks only.  This data is not being dumped.

5) Genetic Interaction Description - input brief text here relating to RNAi objects that correspond to multiply silenced genes, genetic interactions between alleles, or transgene interactions with the strain background.

6) The "NOT" box is to be checked when a particular phenotype is specifically mentioned in a paper as not being associated with that allele, transgene, RNAi.  It is important only to check this box only when this "NOT" information is specifically stated by the authors.

7) Phenotype ontology term

8) Remark (Phenotype) - Input phenotype details here, especially if you feel that it will be useful for later ontology development.  As we find that there are more than a specified number (that we can decide upon) of genes (not alleles, since many alleles for the same gene may be annotated similarly) annotated to a particular term, we may prioritize expansion of that particular branch of the ontology.  We would then pull out all phenotype text that exists in conjunction with those terms, and a curator could use that text and associated papers to develop the ontology.  

9) QUANTITY REMARK - For RNAi, this remark has been used to describe the quantity value that you can enter below (e.g. "brood size" or "N="

10) QUANTITY: ENTER AS AN INTEGER OR AS A RANGE (see Penetrance range, above)

11) GO term suggestion:  indicate here a GO term that you think would be appropriate for phenotype2GO mappings.  Erich can pull these suggestions out, review them, and make the final decision on the mappings.  Eventually, you will be able to link to an updated list of phenotype to GO mappings.

12) Suggested term: If you don't feel like editing this branch of the ontology right away, input here a suggestion for a more granular child term to be associated with the existing phenotype ontology term that you have input on the form.  If we decide to make the development of this branch a priority, then we can pull out all suggested terms associated with that particular existing term, and use those suggestions as a guide (along with the phenotype text and suggested term's reference and definition (below)).  

13) Suggested term's reference: of course, the paper you are currently curating can be used as a reference, but there may be a citation in that paper that serves as a better reference for the suggested child term.  Or if you just know of a more original paper reference, put it here.

14) Suggested term's definition: input here, while you have the paper in front of you.  Then we won't always have to play catch-up with definitions.

15) Genotype: looked at condition objects that have this already.  Standard format is used (e.g., fem-3(q23gf)).  If there is a strain name, that has also been included in the genotype text box (e.g. JJ518 [mex-3(zu155) dpy-5(e61)/hT1; I]).

16) Life Stage:  click on the Life Stage hyperlink to view Life Stage objects currently in WB.  Copy-paste one of these terms into the text box.

17) Temperature: just enter an integer here; represents temperature used for experiment.  No degree symbol or anything else.

18) Strain: e.g., N2, PSxxxx, CBxxxx, make sure that you are not entering a bogus strain name that is not in WormBase.

19) Preparation:  how worms  were cultured/ isolated before analysis.   Random examples shown below were extracted from this tag in class Condition:
"Worms were grown in liquid culture carefully without formation of dauer. Gravid adults were collected. To the worm pellet, 2.5 times the volume of hypochlorite solution were added. Eggs were collected from the lysate. The eggs were pated to large plates for time course. Plates were put at 20 centigrades. The growth of worms were checked under Nomarski scope. L2 larva were collected for mRNA isolation."

Total RNA was prepared from an individual nematode grown for 9 days using the Absolutely RNA Nanoprep kit (Stratagene). It was then subjected to 2 rounds of linear amplification using the MessageAmp kit (Ambion).

JJ518 strain was grown on HT115 bacteria expressing double-stranded skn-1 RNA and Dpy adults were picked and cut for embryo collection. Mutant embryos were staged in small cohorts by morphology at the four-cell stage and collected in replicate at the ten time points. RNA was extracted, biotinylated, amplified and hybridized to the Affymetrix C. elegans microarray. Array data were quantile normalized and reduced by the robust multi-chip average algorithm (RMA). Expression levels reported here were back-transformed to the linear scale, i.e. reported values are 2^(RMA).

20) Treatment: e.g. if worms were exposed to drugs, combined with preparation info. (above).  Preparation and Treatment fields have now been merged, to make curation more similar to RNAi.

21) Delivered by: only for RNAi experiments, select an option

22) Nature of allele - select one

23) Penetrance/ text – select from drop-down menu, and input text if appropriate (i.e., “penetrance increased when worms are starved”)

24) Penetrance Range - enter (without %) enter as an integer, or as a range (two numbers separated by commas)

25) Mat effect: pick from the drop-down menu

26) Paternal effect - checkbox

27) Heat sensitive – check box, can enter text (as old data has this), better if just enter Int value, if available

28) Cold sensitive – check box, can enter text (as old data has this), better if enter Int value, if available

29) Func. Change: Pick from the drop-down menu

30) Haploinsufficient: check box


**********
Dump phenotype data procedures

--- Dump .ace from .obo file:
1) Test .ace file generated from .obo file:
 - in the terminal window, ssh to acedb@tazendra.caltech.edu
 - cd to /home/acedb/carol/dump_phenotype_ace
 - run script "dump_phenotype_ace2.pl" by typing "./dump_phenotype_ace2.pl"
-->generates new file called "phenotype_from_obo.ace" (if WBPapers or WBPersons are not real objects in WormBase, then this script should indicate that errors and the file should not read in properly.  Good to check that it is actually doing this periodically.)
scp this file to your machine
Read this file into CitaceMinus for errors.

2) Tag file in CVS repository (once you are sure it's free of errors):
i.e., cvs -d :ext:acedb@tazendra.caltech.edu:/var/lib/cvsroot tag WS800 PhenOnt
(to retrieve this version checkout WS800 PhenOnt)

3) This file gets uploaded to Citace (path at end of this document)

--- Dump .ace from allele-phenotype form
1) Make sure that there are no obsolete phenotype annotations:
- Script for finding obsolete terms is find_obsolete.pl at path listed below on tazendra:
/home/postgres/work/citace_upload/allele_phenotype/

2A) click on "Dump .ace" link from front page of the allele-phenotype curation form"
- Errors in .ace file generated from form indicated by "ERROR" in the .ace output, or as comments at the top of the form.

2B) If link dump from curation form does not work:
1) ssh to acedb on tazendra (i.e., "ssh acedb@tazendra.caltech.edu")
2)Paste the below:
/home/postgres/work/citace_upload/allele_phenotype/wrapper.pl

It will create the file and relink it so that you can see the latest
dump on the link on the front page of the form.

*Errors in ,ace file generated from form indicated by "ERROR" in the .ace output.

3) Save output file as a .ace file for upload to Citace (usually save as "allele_phenotype_dump_WSxxx.ace")


-------Checking .ace files:
Checking:
1) Update citaceminus on your machine (first delete old database and wspec files)
	a) cd to CitaceMinus on your machine, and then:
		"scp -r citace@altair:/home/citace/CitaceMinus/database ."
2) Make note of # of strain objects and # of life-stage objects
3) Check to see that no new strain objects created after read in .ace files, and check to see that no new Life_stage objects are created after read in .ace file
4) after read in files, use following AQL query to make sure that there are not RNAi objects that use obsolete phenotype terms:
select all class phenotype where not exists_tag ->Primary_name and exists_tag ->RNAi
	4A) If there are obsolete terms linked to RNAi objects that prepare a -D file updating RNAi object annotations with proper terms.  THis file needs to be uploaded to CitaceMinus, not to Citace.
5) compare # of alleles linked to phenotype objects from previous upload and current upload using the below query (to make sure that data in the dump looks reasonable, i.e., no lost data):
"select all class variation where exists_tag ->phenotype"


--- Deposit for Wen at:
/home/citace/Data_for_citace/Data_from_XXX/ (phenotype_from_obo_WSxxx.ace and allele_phenotype_dump_WSxxx.ace files)
/home/citace/Data_for_CitaceMinus/Data_from_XXX/ (file that updates obsolete RNAi annotations)

AND deposit .obo file:
1) First rename .obo file to "phenotype_ontology.WSXXX.obo"
2) Deposit to:
/home/citace/Data_for_Ontology


-------------

Queries to Postgres:

1) Find number of distinct genes associated with a phenotype via allele (I think it's just allele):
paste :
SELECT COUNT (DISTINCT alp_wbgene) FROM alp_wbgene;
into :
http://tazendra.caltech.edu/~postgres/cgi-bin/referenceform.cgi/home/
and click ``Pg !''

If you want to see them, paste :
SELECT DISTINCT alp_wbgene FROM alp_wbgene;


**********

Processing data sent from the National Bioresource Project of Japan (NBP alleles):

--- Parse new data received into Postgres (data is .ace file received by email, usually from Mary Ann):

1) Move .ace file sent by Sanger into this folder: /home/acedb/carol/read_mary_ann_data/

2) Run script "parse_maryann.pl", located on tazendra @ /home/acedb/carol/read_mary_ann_data/parse_maryann.pl:

	2A) You must specify an inputfile.
  	You may also enter testing or real (to read into postgres), defaults to testing
  	Usage : ./parse_maryann.pl input_file [testing|real]

Script overwrites old NBP data and enters new data.

3) copy, and store a copy of output file on your machine.

4) use list of new and changed allele info. to query allele names in Postgres to access new data.

5) associate appropriate phenotype terms, and any other relevant curation details, using form

One standard curation remark is associated with data that is "lethal or sterile"
Classified as lethal OR sterile by the National Bioresource Project of Japan.