In ACeDB, it is very easy to make a error and let the error stay there for a long time before someone finally find it. For example, if a curator entered a cell called "AB,p" instead of "AB.p", a new Cell object called "AB,p" will be created automatically. A simple typo in cross-reference can do that. You can imagine as a curator writing something every day. If we don't check our files carefully, within a few weeks, WormBase will be completely screwed up.
However, it is very time consuming to check all cross-refence one-by-one by hand. This script is created to help curators to do it in a semi-automatic way. Depend on how you customize the script, it will check your .ace file, then tell you which cross-reference will produce a new object so that curator can confirm all the new objects he/she is creating, and fix those new objects created by typo.
First, you need a dictionary, which contains multiple files. The next section tells how to make a dictionary. Each dictionary file is a list of names of a certain class, dumped from ACeDB keysets, for example, dictionary file for "Life_stage" looks like this:
KeySet : "l_Life_stage.ace"
Life_stage : "1-cell embryo"
Life_stage : "1.5-fold embryo"
Life_stage : "2-cell embryo"
...
Second, you need to custumize the script for the class of data you are going to work on. There is a sample script for you to study. It is a PERL script, so the "#" symbol means comments. I hope there is enough documentation I wrote for the script there. For more questions, please email me at wchen@its.caltech.edu. You need to test run your scripts.
Before running the script, you need to make sure that the .ace file you are going to check is correct in syntax. Because our script suppost the input file is in correct .ace format, otherwise, it is not going to recognize the tags and data fields.
The result of the script will be a output file (you specify the name). Please look at the outfile and go back to your .ace file to confirm all the new objects and correct the typos.
How to dump this keyset? The following is an example on making the Life_stage dictionary:
Launch the most current WS release of ACeDB. Query for all Life_stages, since they are not displayed on the index page, we have to go to "Other Class" menu to select it.
A keyset containing Life_stage will appear. On the menu there is a "Export" button. Click it using right mouse, a short menu will appear, choose export "list of names". ACeDB will ask you where to save it and what should it be called. I saved it as l_Life_stage.ace, under /home/wen/dict/
Then you can search for other classes and dump keysets of names, save under your "dict" directory. Sometimes you can do more complicated query, select a limited keyset. For example in the sample script, I made a keyset called "StrangeSequence" which contains a small subset of Sequence objects that do not contain "Method" information.
Written by Wen J. Chen 2002/03/27