*** Eimear Kenny SOPs: *** Updated: July 15, 2002 ################################################################## STANDARD OPERATION PROCEEDURE FOR PREPARING ABSTRACTS FOR WORMBASE ################################################################## Written by Eimear Kenny @ WormBase, 07-15-2002 (The following is carried out on vermicelli.caltech.edu in the /home/abstracts directory whenever new abstracts become available) **** Abstract source: TO DOWNLOAD NEW CGC ABSTRACTS: 1 Go to: http://elegans.swmed.edu/ (Leon Avery's C. elegans server) 2 Click "CGC" link 3 Click "C. elegans Bibliography" link 4 Right click the group of papers that you want and select "Save link as..." -> download to a directory of your choice (/home/abstracts/PAPERS/CGC/ for example ...) TO DOWNLOAD WORM BREEDERS GAZETTE ABSTRACTS: 1 Go to: http://elegans.swmed.edu/WBG/tars/ (Leon Avery's C. elegans server) 2 Right click the tarballs of the wgb's that you want, eg "wbg17.2.tar.gz" -> download to a directory of your choice (/home/abstracts/PAPERS/WBG/ for example ...) 3. Type the following commands (this is necessary for running the scripts later): tar zxvf // unpacks tarball mkdir // make a directory called whatever the volume number of the // gazette is, eg, for "wgb17.2.tar.gz", the directory will be "17.2" mv p // moves the p directory and the tarball into the // directory you just made TO DOWNLOAD WORM MEETING ABSTRACTS: 1. Go to the worm meeting home page and try to download a textified version of the worm meeting abstracts to a directory of your choice (/home/abstracts/PAPERS/WM/ for example ..) ***** Scripts: /home/abstracts/DumpFromACeDB.pl // script that dumps .ace files from ACeDB /home/abstracts/CrossReferencer.pl // script that generates a list of words // (/home/abstracts/Exclusion) from the .ace // files to exclude /home/abstracts/Remover.pl // script that removes the excluded words from // the .ace files /home/abstracts/DotAceTester.pl // script that tests the validity of the gernerated // .ace files /home/abstracts/abstract2aceCGC.pl // script that converts CGC abstracts to .ace files /home/abstracts/abstract2aceWBG.pl // script that converts WBG abstracts to .ace files /home/abstracts/abstract2aceWM.pl // script that converts WM abstracts to .ace files PROCEDURE: A. Download Abstracts 1. Notification by Daniel that there is a new batch of CGC, Worm Breeder's Gazette or Worm Meeting abstracts released 2. Download the batch of abstracts using the appropriate method from above. B. Update Object Class Dumps from ACeDB 1. Run the DumpFromACeDB.pl program: ./DumpFromACeDB.pl This program automatically queries ACeDB using aceperl to batch download object data classes. There must be a local version of ACeDB on the machine and the program must pointed toward "~/WSXX/acedb and "~/WSXX/bin/tace". The files are outputted to /home/abstracts/ACEDB. C. Convert ACeDB Dumps to Markup Lists 1. Run the CrossReferencer.pl program: ./CrossReferencer.pl This program reads in the object classes dumped from ACeDB and, according to a set of rules, outputs certain words to be excluded from these classes into a file called Exclusion. 2. Run the Remover.pl program: ./Remover.pl This program reads in the Exclusion file and each of the object class files and prints out a .list file for every object class file that contains each of the terms in that object class EXCEPT those that appear in the Exclusion file. D. Convert the Abstracts from .txt files to .ace files 1. Run the script appropriate to that type of abstracts you are converting, eg the abstract2aceCGC.pl program to convert CGC abstracts: ./abstract2ace.pl This program reads in each of the markup list files and the abstract file. It outputs an "aceified" version of the abstract file that complies with the current ?Paper model. The template for each of the abstracts is in the comments of the program file. Any word or phrase that is matched in the abstracts by the markup lists is outputed as part of the .ace file with the accompanying object data class as its tag. E. Test Validity of the Generated .ace Files 1. Run the DotAceTester.pl program: ./DotAceTester.pl This program reads in the acefied Abstract file and tests the format and structure to ensure it is a valid F. Test Whether File Will Load Correctly in ACeDB. 1. Load into an empty ACeDB database: Set up an empty version of acedb so in order to test whether the newly generated .ace file will load correctly and generate the correct number of objects. [ How to set up an empty version of acedb: a. write the following script and place it in your /usr/bin or /usr/local/bin directory (modify the paths to point to where your version of acedb is kept): ts -------------------------------------------------- #!/bin/csh setenv ACEDB /home/acedb/WS_current setenv DBDIR /home/acedb/WS_current/database/ set path = (/home/acedb/bin $path) xace --------------------------------------------------- b. type the following commands to generate a blank test database: mkdir ts // make a directory called ts cd ts // move into the ts directory cp -r /home/acedb/WS_current/wspec . // recursively copy the wspec directory from // acedb to the ts directory mkdir database // make an empty directory called database] Now type "ts" to launch the blank version of acedb. Read in the abstract .ace file (edit --> read in file --> (choose abstract.ace file) --> ok). Ensure: a. the file is read in correctly b. the correct number of objects are read in. G. Upload File to Citace 1. If file is valid and loads correctly, send file as email attachment to wen (wchen@caltech.edu) for uploading into Citace before the Friday of the next Wormbase build. NOTE: MORE DETAILS ON THE PROCEDURES OUTLINED ABOVE CAN BE FOUND IN THE COMMENTS AND README FILES ACCOMPANYING THE ABOVE MENTIONED SCRIPTS IN vermicelli.caltech.edu:/home/abstracts/ DIRECTORY - ek