Share 
Follow 
AboutDownloadsDocumentsForumsWikiSource CodeIssuesNews
Date:
2010-07-17 00:05
Priority:
1
State:
Closed
Submitted by:
Mike Wong (mikewong899)
Assigned to:
Teague Sterling (teague)
Summary:
Can't find PDB files when using RCSB divided data store on case-sensitive OSes

Detailed description
Revision: 699

Description:

featurize gives warnings that it can't find PDB files when using RCSB "divided" data stores. RCSB "divided" data stores uses a directory called "divided" and a subdirectory named for the middle two letters of the PDB ID. For example:

Given PDB_DIR=/usr/local/feature/data/pdb

1A2L would be found as

/usr/local/feature/data/pdb/divided/A2/1A2L.pdb.gz

Somehow featurize calculates the subdirectory to be "A2a2" and fails to find the file.

How to Repeat:

1. Download PDBs using the RCSB "divided" data store structure.
2. Checkout and build r699 on Linux.
3. Run featurize on a pointfile.

Repeatability: 100%

Workaround:

Don't use the RCSB "divided" data store and instead use a flat file system. Current SeqFEATURE models and Thioredoxin use about 5,000 or so PDB files. All targeted systems (Mac OS X, Desktop Linux, and Cluster Linux) can handle this many files in a directory.

Add A Comment: Notepad

Message  ↓
Date: 2010-09-29 00:09
Sender: Teague Sterling

The function responsible for generating all possible names
has been rewritten to be more stable. Closing bug.

Date: 2010-07-29 18:36
Sender: Teague Sterling

This should accommodate for uppercase names only if a PFT
file specify them that way. We could fix this by explicitly
capitalizing them first to ensure we look at both options. I
will add this functionality in soon and then close this bug.

On an aside --- the filename resolution is becoming quite
clunky and brittle. It could be a good, simple, low-risk
candidate for refactoring at some point down the road.

Date: 2010-07-28 23:23
Sender: Mike Wong

Looks like you have a patch submitted, great!

Make sure that it works for upper and lower cases. The directory structure from RCSB is:

./divided/<subdir>/<pdb_id>.pdb.gz

Both the subdir and pdb_id (and even the file entity ".pdb.gz") may be in mixed case, depending on the OS. Mac OS X has a "case-insensitive" mode as well as a case-sensitive mode. The problem is that Mac users will share data with Linux users, so what was upper case on a Mac (and not a problem) will remain upper case on Linux and will be a big problem.

Date: 2010-07-28 06:39
Sender: Teague Sterling

Fix committed (Revision 703)

I don't currently have a sufficiently complete copy of the
"divided" data store to fully test this nor a fast enough
connection to download but debug reports show featurize now
looks in the correct locations.

I will test this against the divided data store soon and
then close the bug.

Date: 2010-07-27 19:40
Sender: Mike Wong

No changes, simply edited the title of the bug since the bug tracker incorrectly handles HTML escaping for bug titles.

Date: 2010-07-18 03:34
Sender: Teague Sterling

This is probably related to a bug which I had to triage in
revision 697 in order to run the new tests. I will look in
to it and apply any needed patches to have featurize to look
in the correct locations.

Utilities.cc (in which pdb/dssp file resolution takes place)
could be a good target for refactoring. In enumerating
possible locations of these files, it leaves out a lot of possibilities --- leading to strange file resolution errors
such as this. Additionally it produces a huge amount of
output which is not particularly useful without knowing what
it is actually doing. It have little to no interaction with
the rest of the system outside of providing the correct
paths to files and would therefore be easy to regression
test.

Field Old Value Date By
status_idOpen2010-09-29 00:09teague
resolution_id1002010-09-29 00:09teague
summaryCan't find PDB files when using RCSB divided data store on case-sensitive OSes2010-09-29 00:09teague
close_date2010-09-29 00:092010-09-29 00:09teague
summaryCan't find PDB files when using RCSB divided data store on case-sensitive OSes2010-07-29 18:36teague
summaryCan't find PDB files when using RCSB divided data store on case-sensitive OSes2010-07-28 06:39teague
summaryCan't find PDB files when using RCSB dvided data store on case-sensitive OSes2010-07-27 19:41mikewong899
summaryCan't find PDB files when using RCSB &amp;quot;divided&amp;quot; data store on case-sensitive OSes2010-07-27 19:40mikewong899
assigned_tomikewong8992010-07-20 23:38teague
summaryCan't find PDB files when using RCSB &quot;divided&quot; data store on case-sensitive OSes2010-07-20 23:38teague
summaryCan't find PDB files when using RCSB "divided" data store on case-sensitive OSes2010-07-18 03:34teague
Feedback