*** Daniel Wang SOPs: *** Updated: August 1, 2002 Reference Collection & Maintenance Reference collection is the first step for data curation in Wormbase. It means collecting hardcopy and softcopy CGC and Pubmed papers according to the lists supplied respectively by CGC headquarters (Theresa) and Caltech Wormbase (Andrei) in a timely and complete manner. 1. Paper collection: 1.1. Writing to PIs and request reprints they own by email, fax or letter. 1.2. Digging into Paul's personal colletion and picking out the needed ones. 1.3. Downloading pdf and fulltext files from the journal websites. The main sources are: http://www.caltech.edu/subpages/refdesk.html http://athena.caltech.edu/~wen/userguide/OtherResource/Journals.html After the pdf files are downloaded, they need to be renamed in the format of (CGC#)_(Last Name of First Author)(Last 2 digits of year published).pdf or (pubmed#)_(Last Name of First Author)(Last 2 digits of year published).pdf 1.4. Ordering new files from Milikan library at: http://ibid.caltech.edu/ (user name:WormBase password: worm2000base) pdf files delivered to this site should be downloaded and renamed in the format of (CGC#)_(Last Name of First Author)(Last 2 digits of year published)_lib.pdf or (pubmed#)_(Last Name of First Author)(Last 2 digits of year published).pdf 1.5. Ordering softcopy backup files from Milikan library: Since hardcopy papers are easily subject to loss, and hard to recover once lost, we are putting an effort in backing up all papers in softcopy format. We are requesting the library to scan all reprints which have no softcopies into tif files. tif files received from the library need to be renamed in the format of: (CGC or Pubmed#)_(Last name of First author)(Last 2 digits of year published).tif. Then tif files are converted into pdf files for ease of access from the internet, with format as: (CGC or Pubmed#)_(Last name of First author)(Last 2 digits of year published)_tif.pdf. 1.6. OCR all papers in the future. 2. Collection maintenance: 2.1. Hardcopy papers are organized numerically in the cabinets below the counter. Each drawer contains 300 papers in 30 folders of 10 papers each. All folers are clearly labelled so the papers can be easily located. To prevent loss of papers, a checkout/checkin log is maintained. 2.2. Softcopy papers are categorized as pdf, _lib.pdf, _tif.pdf, html and tif files according to their sources and formats. pdf files are the pdf files downloaded from the journal websites and are usually text convertible. _lib.pdf are the pdf files ordered from the library and are not text convertible. _tif.pdf are the pdf files converted from tif files and are not text convertible. tif are tif files. html are fulltext files downloaded from the journal websites in html format. All pdf, _lib.pdf and html files are organized and stored at minerva.caltech.edu under /home/wen/Reference. All tif files are organized and stored at athena.caltech.edu under /archive/part3,part4. All _tif.pdf files are at /archive/part5. 2.3. Softcopy files backup. All pdf, _lib.pdf and html files on minerva.caltech.edu have a backup at athena.caltech.edu under /archive/part1/daniel/minerva_backup. Every time there's a new file coming in, a copy of it should be sent to minerva and athena separately. All tif files at athena.caltech.edu are tar zipped and burned onto numerically numbered CDs for backup. New tif files are temporarily stored in athena under /archive/part6/ and then tar zipped and burned onto a CD when they accumulate to a necessary quantity. The CDs are placed in the Daniel's personal drawer and the file containing which CD has which papers can be found at athena.caltech.edu under /home/daniel/CDbackup. _tif.pdf files have no backup, because they can be easily converted from the tif files. 2.4. All paper information (hardcopy, pdf, _lib.pdf, _tif.pdf, html, tif, etc.) should be updated in the citation PostgreSQL database in minerva. Such information can be updated and searched at: http://minerva.caltech.edu/~postgres/cgi-bin/endnoter.cgi 2.5. pdf, _lib.pdf, and _tif.pdf files should be linked by running linker.pl at minerva under /home/wen/ and at athena under /home/daniel/ and accessed by curators at: http://minerva.caltech.edu/~postgres/cgi-bin/checkout.cgi Both linker.pl files should be run when pdf, _lib.pdf, or _tif.pdf files are changed or reorganized. 2.6. Reference collection manager should closely cooperate with: curators to understand their need and preferences; programmer to develop necessary scripts or restructure PostgreSQL database to ensure data integrity and operation efficiency.