*** Wen Chen SOPs: *** Updated: July 25, 2003 I: User's Guide. 1. Design and distribution tasks. The user's guide is compiled based on the structure of WormBase and the questions we obtained from wormbase-help email list. I determined what topic to include in user's guide and distribute the tasks to other curators at Caltech. I also determine what kinds of the structure the User's Guide should have and how should the interface look like. I wrote a template for all html pages. 2. Making plain version of User's Guide. 2.1. Each person getting the task wrote text documentations for topics, email to me. Then I construct html pages with these text. I took lots of screen shots for each topic, using the linux software xv to grab screen shot from Mozilla or Netscape. I insert images to pages and format the display so that they all look good. 2.2. I link pages together by specifying "next", "prev" ... links in each page. I also specify the layer of each page in the User's Guide in the burried comments field for later insertion of header and footers by a buried script. 2.3. I test every page to make sure that links and displays are OK. 2.4. After all these, A plain version of User's Guide (PlainUG) is ready, it includes no header and footers, but complete information is available. 3. Making final version of User's Guide. 3.1. Since the header and footer changes a lot, and there are many pages of User's Guide sharing similar header or footers. I created two versions of User's Guide. PlainUG is the version with no headers or footers, however, in the html code for each page I buried controlled vocabulary in comments so that I can use a perl script to insert header and footer. 3.2. I put all kinds of headers and footers to a file called ug_style.html. I put the list of all html pages of User's Guide to a file called UserGuideList.txt. I wrote a script called UserGuideBuilder.pl saved under /home/wen/bin 3.3. After PlainUG is ready, make a duplicate of PlainUG and call it NewUG. Then run UserGuideBuilder.pl to insert specific headers and footers it read from ug_style.html under buried comments in each page. The script will go over all pages listed in UserGuideList.txt. 3.4. Check NewUG to make sure everything is OK, then rename "NewUG" into "userguide", which is the final version linked to WormBase site. The User's Guide is very portable. I can simply copy the whole directory to another computer or another account, then a mirror User's Guide is ready. 3.5. Test the pages on Mozilla, Netscape, Internet Explorer. And fix the display. 4. Update. 4.1. I go to individual pages in PlainUG to update information. Then update ug_style.html (if header and footers need to be changed) and UserGuideList.txt (if page structure changed). Make NewUG duplicate, finally run UserGuideBuilder.pl --------------------------------------------------------------------------- II. Expr_pattern curation procedure. 1. Screen for papers containing Expr_pattern. Two approaches. One is going over all the hard copies (We did this to publications before 2001). The other is through first pass curation where each paper is flagged with its content, if Expr_pattern is there, a email notification will be sent to me. 2. Data extraction. 2.1. I read the candidate papers one by one to extract data. Most of the times, the results are in a section or paragraph(s), while the method can be found from "Material and Methods". I have a template .ace file for Expr_pattern, all I need is to fill the data-fields. 2.2. For those papers with text-PDF, I copy-paste the whole paragraphs into a temp.txt file. I edit the paragraphs to get rid of anything other than pure data. The I copy corresponding paragraphs to "Pattern", "Subcellular_localization", "Remark", "Reporter_gene" or "Antibody". I use controlled vocabulary for Remark, Reporter_gene and Antibody. The vocabulary is kept in a file called template.txt under /home/wen/ace_files 2.3. From "Pattern" entry I extract Cell, Cell_group, Life_stage information. I also try my best to fill the data fields of Reference, Locus, Sequence, etc. Sometimes the information provided by the paper is ambiguous, I have to search in ACeDB or WormBase or other references. 2.4. Each experiment is an item (unless the results are inseparatable). I keep reading papers one by one and save all the extracted data in the same file. 2.5. If there are no text version to copy-paste, I have to type information into the .ace file. 3. Check .ace file in empty database for correct syntax. 3.1. After all the papers (I want to read) are done. I clean up the .ace file, get rid of excess blank lines and fields. 3.2. Test in empty database (it is called 'ts' in athena): test the .ace file and fix all syntax error. 4. Check .ace file for correct XREF. 4.1. Prepare dictionary for aceChecker: open the most current WS release (it is called 'ws' in athena), do it class by class: Locus, Sequence, Cell, Cell_group, Life_stage, Paper, Clone. For each class, query for valid objects and dump the list of the names and save under /home/wen/dict/. 4.2. aceChecker is a script I wrote that reads /home/wen/dict and creates dictionaries for Locus, Sequence, Cell, Cell_group, Life_stage, Paper, Clone. Whenever it spots a line in test .ace file that is not in the dictionary, it will print a line in the output file as warning. aceChecker is saved under /home/wen/bin 4.3. After all dictionaries are dumped from WS and .ace file looks OK by ts. Run aceChecker. Look at the output file for warning and fix them one by one. 5. Check .ace in the most recent WS release for correct XREF. 5.1. Open the most recent WS release, count the object number from Locus, Sequence, Cell, Cell_group, Life_stage, Paper, Clone ... keep record as "before" 5.2. Read the test .ace. Count the object number of Locus, Sequence, Cell, Cell_group, Life_stage, Paper, Clone ... keep record as "after" 5.3 Compare "before" and "after" and make sure changes are correct (that I really deleted or added certain amounts of objects). Now the .ace file should be error-free and ready to go to citace. ------------------------------------------------------------------------------ III. Transgene curation. 1. Paper source. Either from Janus screen, or by first pass curation. 2. Data extraction. Transgenes should always be found from "Materials and Methods". I have a Transgene .ace template. I type or copy-paste corresponding information to the data fields. Sometimes the information provided by the paper is ambiguous, I have to search in ACeDB or WormBase or other references. 3. Check .ace file in empty database for correct syntax. 4. Check .ace in the most recent WS release for correct XREF. Since Transgene do not have much XREF and usually there are not many entries at a time. This will be enough to fix all the XREF that has problem. Sometimes I also use a perl script (which is very similar to aceChecker) to check XREF when there are lots of Transgenes. ------------------------------------------------------------------------------- IV. citace maintenance. 1. Set up empty database. 1.1. create a directory called citace and enter it. 1.2. mkdir database. This is an empty directory. 1.3. copy wspec folder from the most recent WS release. 1.4. enter wspec, change layout.wrm for new layout, change database.wrm for new name of the database, change passwd.wrm for writing access (add wen at the very bottom). 1.5. create a shell script (like acedb script that comes together with every WS release, just change the path) to run this new database. Now we have an empty database with a brand new look. 2. Read .ace file. 2.1. Follow the xace rule, read .ace files into citace. The .ace file can either be dumped from WS release or written by a curator. 2.2 Save and exit. 3. Model update Change models.wrm in wspec and follow xace rule to update models. 4. Back up Every time we update citace, we backup the database/ and wspec/ . We also keep records in README.txt 5. Upload. To upload: 5.1. run citace 5.2. Admin-Dump All with timestamp and comments 5.3. tar and gzip into citace_dump_date.tar.gz 5.4. ncftp: read the file /home/citace/Data_from_wen/upload_to_sanger 6. Routines before each upload 6.1. Sequence update. Open WS_current, query for all sequence with From_laboratory tag, dump list and save as l_sequence.ace under/home/wen/dict/ Open citace, query for all sequence object, dump ace file and save as citseq.ace under /home/wen Run seqUpdater.pl, call result file 'out'. The script generated three files. 'out' lists the record of all invalid sequences, whether they existed in WP record or not, and whether they splited. 'out.ace' is an .ace file that rename the sequences and fix all the cross reference. 'out.NotFound' refer to all the invalid Sequence objects that caUpdater.pl cannot find the new names, need to figure out mannually what happened to them, sometimes the record can be found from the WP record. Rename out.ace into auto.ace and save under /home/wen/ace_files/XXXX_XX_XX, or copy the content of out.ace into XXXX_XXXX_mod.ace file. Rename the mannually found out sequence objects .ace file into mannual.ace and save under /home/wen/ace_files/XXXX_XX_XX These two .ace files should go to each upload of citace. 6.2. Locus update Open WS_current: 1. Query for all Locus with Other_name or Old_name tag, dump .ace file and save as RenamedLocus.ace under /home/wen/dict/. 2. Query for all locus that are contains CGC_approved tag, dump list and save as l_cgc_locus.ace under /home/wen/dict/ Open citace: 1. Dump all locus names, save list as l_cit_locus.ace and save under /home/wen Run locUpdater.pl, call result file 'out'. 1. The resulted out.ace file will contain Locus -R infomation, Copy the out.ace file into XXXX_XXXX_mod.ace file saved under /home/wen/ace_files/YEAR_Month_Date/ 2. The out.multi file contains the confused information waiting for curators to figure out. Such as a CGC_approved locus that is also the Other_name or Old_name of another locus, or a locus name that is the old name or other name of more than one locus (do not know how to rename it) ----------------------------------------------------------------------------- V. Download the most recent WormBase release. I usually start from the local directory where I install WS (/usr/local/WS/) 1. login anonymously: ncftp ftp.sanger.ac.uk 2. get to the right directory containing WS release. cd pub/wormbase 3. For example, the current release is WS86. Download the whole release. This taks about 1 hour. get -R WS86 4. Quit ftp bye 5. You should see in WS86 directory. get into the WS86 directory. cd WS86 6. Now expand the database. (This will take ~15 min) ./INSTALL 7. Now the database is expanded. I usually call the most current release as WS_current and I have a script to run WS_current. So all I need to do is remove old WS_current, then rename WS86 into WS_current. Then my script will open WS86 automatically. mv /usr/local/WS/WS_current /usr/local/WS/WS85 mv /usr/local/WS/WS86 /usr/local/WS/WS_current ---------------------------------------------------------------------------------- VI: How to setup citace mirror. 1. First, you need to have xace in your path, if not, download the most recent xace from: http://www.acedb.org/Software/Downloads/ I usually keep my xace under /home/wen/bin 2. Create a directory in your linux box for citace mirror, for example, you can call it "CitaceMirror". Then enter the directory. 3. Secure copy citace database/ and wspec/ directories from athena into your CitaceMirror directory. ... $cd CitaceMirror $scp -r xxx@athena.caltech.edu:/home/citace/database/ . Passwd: ... (This step takes a few minutes) $scp -r xxx@athena.caltech.edu:/home/citace/wspec/ . Passwd: ... You need to do a slight change in wspec/passwd.wrm to add your login name at the bottom of the file so that you can have write access to your mirror. 4. Now make a script to run your CitaceMirror. This script should be kept in your path, I ususally keep mine in /home/wen/bin The script should be something like: #!/bin/csh setenv ACEDB /home/xxx/CitaceMirror setenv DBDIR /home/xxx/CitaceMirror/database/ set path = (/home/xxx/bin $path) xace 5. To update CitaceMirror, you need to remove the whole CitaceMirror directory, and re-copy database/ and wspec/ as shown in step 3.