Feature and ViewFeature - Read Me File
October 9, 2001
Documentation by D. R. Banatao, Mike Liang, and Prasanth Pulavarthi
Helix Group (Altman
Lab) - Stanford University
-
About Feature,
FeatureScripts, and ViewFeature
-
Installing
Feature and FeatureScripts
-
Installing
Chimera Visualization Package and ViewFeature Module
-
Running Feature
and ViewFeature
-
Known
Issues
Section I: Background
Feature
- Developed by the Helix group at Stanford University.
- Uses a supervised learning algorithm to build statistical models
of sites from a training set, and then predict similar sites in query structures.
- Requires user knowledge of biology in order to define a "site" and
manually create a training set.
- Feature currently does not automatically select sites and
build training sets.
- Results can be displayed in the ViewFeature extension to Chimera.
- Sites are regions within a protein defined by a central location
and a surrounding neighborhood. More specifically, an atom within the region
of inquiry can be chosen as the center of a neighborhood with a user-specified
radius. Sites are usually picked because of their structural or functional
role, such as enzymatic active sites or Ca++ binding.
- A nonsite may be described as any other random site where
a different function or lack of function may take place.
- The Feature Program will require as input:
-
PDB files (.FULL as the extension)
-
DSSP files (.DSSP as the extension), you can generate these using the
DSSP program
-
it's own internal
parameter files
Training
- Feature takes the defined sites, computes the spatial distributions of
defined biophysical and biochemical properties, and then reports those
regions within the sites where these properties significantly vary from
those of the nonsites.
- Uses a non-parametric test (Mann-Whitney rank-sum test) to identify
properties at which volumes of the known positive sites differ significantly
from that of the negative control sites.
Scanning
- Feature then uses a log-odds scoring function based on Bayes? Rule to
obtain the distribution of distinguishing properties in a query structure.
Feature gives a score that indicates how likely a query region is a site
of interest.
Figure 1. A Diagram of the Feature Algorithm
FeatureScripts
- Written by Mike Liang from the Helix Group
- FeatureScripts are a set of python scripts written to simplify the
usage of Feature.
- Given the minimum set of user parameters, FeatureScripts will prepare
the necessary directories and files to execute Feature, and then execute
the training or scan process.
ViewFeature
- ViewFeature is an extension to the molecular visualization program, Chimera,
that integrates Feature?s statistical models and site predictions with
3-dimensional structures viewed in Chimera.
- Chimera developed by the Computer Graphics Laboratory (CGL), UCSF
- ViewFeature developed by D.R. Banatao (Helix Group) and C.C. Huang
(CGL)
- Enables visualization of distinguishing properties in a 3-D structure
within specified volumes, thus giving users insight into the structural
motifs that define a given site
- ViewFeature's GUI allows:
-
interactive display of statistical model generated by Feature's Training
algorithm
-
easy highlighting of significant properties
-
display and manipulation of concentric shells to spatially orient the user
in a 3-D site
-
simultaneous display of training sites and "hits" generated from Feature's
Scanning algorithm in multiple structures
Figure 2. A diagram of the integration of Feature and visualization in
ViewFeature
Papers on Feature and ViewFeature:
Characterizing
the microenvironment surrounding protein sites.
Bagley
SC, Altman RB. Protein Sci 1995 Apr;4(4):622-35
Recognizing
Protein Binding Sites Using Statistical Descriptions of their 3D Environments
L.
Wei and R.B. Altman; Pacific Symposium on Biocomputing 3:495-506 (1998).
ViewFeature:
Integrated Feature Analysis and Visualization
D.R.
Banatao, C.C. Huang, P.C. Babbitt, R.B. Altman, and T.E. Klein; Pacific
Symposium on Biocomputing 6:240-250 (2001).
Back to Top
Section II: Installation
Installing Feature
-
Download the latest Feature tarball from http://feature.stanford.edu/.
-
Extract the tarball:
% tar xvzf feature-1.4.tar.gz
This will create a new directory under the current directory containing
the source code for the distribution. You should cd into that
directory before proceeding with compiling.
-
Configure the source tree:
% ./configure --prefix=/home/username
This will configure Feature to be installed into /home/username/feature.
If a prefix is not specified, then Feature will be configured for installation
into /usr/local/feature.
-
Build Feature:
% make
-
Install Feature:
% make install
-
Configure your environment
% setenv PATH ${PATH}:/home/user/feature
% setenv FEATURE_DIR /home/user/feature
% setenv PDB_DIR /home/user/databases/pdb
% setenv DSSP_DIR /home/user/databases/dssp
For bash users, use export VARNAME=VALUE. For sh users, use VARNAME=VALUE; export VARNAME.
Installing FeatureScripts
-
Download the latest FeatureScripts tarball from http://feature.stanford.edu/.
-
Extract the tarball:
% tar xvzf feature-scripts-1.0.tar
This will create a new directory under the current directory containing
the source code for the distribution. You should cd into that
directory before proceeding with compiling.
-
Configure the source tree:
% ./configure --prefix=/home/username
This will configure FeatureScripts to be installed into /home/username/feature.
If a prefix is not specified, then Feature will be configured for installation
into /usr/local/feature. This directory should be the same as
was specified for the Feature package.
-
Install FeatureScripts:
% make install
Uninstalling Feature or FeatureScripts
If you ever need to uninstall Feature or FeatureScripts, you can run make
uninstall from the source directory of the package whose installed
files you want to remove.
Back to Top
Section III: Installation Con't
Installing Chimera
- Chimera is free for download
from the CGL at UCSF for academic use.
- For instructions on downloading and installing Chimera go to http://www.cgl.ucsf.edu/chimera/
and follow the proper links.
Installing ViewFeature
-
download ViewFeature.zip
-
move ViewFeature.zip to /usr/local/chimera/share (on
unix) or C:\Program Files\Chimera\share (on windows)
-
unzip ViewFeature.zip
Back to Top
Section IV: Running Feature and ViewFeature
Running Feature
Training
- Training will require as input:
-
a .site
file (containing training sites and nonsites, and analysis name)
-
a
train parameter file (a .txt file containing:
-
directory path the .FULL & .DSSP files
-
number of shells
-
shell thickness
-
analysis name
-
directory path to the .site file
-
output directory )
- Alternatively, Run train_model.py
in directory where Model/ will be created
Scanning
- Scanning will require as input:
-
a
scan parameter file (a .txt file containing:
-
p-level cutoff
-
delta value (grid size)
-
number of shells
-
shell thickness
-
directory path to the .score file
-
output directory
-
directory path the query structures for scanning (.FULL & .DSSP files)
-
residues to exclude )
- Alternatively, Run scan_model.py
in directory where Scan-scanname/ will be created
Example .site file
T signifies a "site"
NIL signifies and "nonsite"
-------------------
((:SITE-NAME "na-test") (:SITE-RADIUS 7.5) (:SITES (
("1dnx" X 15.190 Y 13.919 Z 14.557 T)
("1dnx" X 14.562 Y 15.239 Z 14.284 T)
("1dnx" X 21.624 Y 16.034 Z 4.461 T)
("1evp" X -6.199 Y 9.149 Z 11.709 NIL)
("1evp" X 3.753 Y -3.868 Z 11.87 NIL)
("1evp" X -3.333 Y 10.567 Z 16.935 NIL)
("1evp" X -4.596 Y 9.035 Z 18.047 NIL)
("315d" X 34.298 Y 21.006 Z -5.441 NIL)
("315d" X 28.753 Y 8.320 Z -7.166 NIL)
("315d" X 26.565 Y 12.412 Z -13.403 NIL)
)))
-------------------------
Example train_parameter file
-----------------------------------------
NUM-OF-SHELLS: 6
SHELL-THICKNESS: 1.25
ANALYSIS-NAME: na-test
ALL-SITES-PATH: /home/banatao/TestRun
PROTEINS-PATH: /home/banatao/NAstructs
STAT-FILES-PATH: /home/banatao/TestRun
-----------------------------------------
Example scan_parameter file
-----------------------------------------
P-LEVEL: 0.01
DELTA-VALUE: 1.652
NUM-OF-SHELLS: 6
SHELL-THICKNESS: 1.25
SCORE-FILE: TestRun/ca-tr-te/ca-tr-te
OUTPUT-PATH: TestRun/ca-tr-te/Scan_efhand000train/Hits
PROTEINS-PATH: TestRun/ca-tr-te/Scan_efhand000train/Proteins
EXCLUDED-RESIDUES: CA
-------------------------
-Running Feature using FeatureScripts:
Usage
-----
-
Create .site file for FEATURE
-
Run train_model.py in directory where Model/ will be
created
-
Minimum Parameters
-
-s <sitefile>
-
-n <numshells>
-
-w <shellthickness>
-
Run scan_model.py in directory where Scan-scanname/ will
be created
-
Minimum Parameters
-
-f <proteinlistfile>
-
-g <gridsize>
-
-x <excludedresidues>
-
trainingmodel
Example
-------
# If working directory is FeatureRuns/EFHand/
# If efhand.site is stored in that directory
# Go to working directory
% cd FeatureRuns/EFHand
# Create Model using the training set 'efhand.site'
# with 6 shells of thickness 1 angstrom
% train_model.py --sitefile efhand.site --numshells 6 --shellthickness 1
# Scan the proteins in efhand.site using gridsize of 1 angstroms
# and excluding the residues calcium, zinc, and magnesium.
# Use the model specified in Model/training.ini to perform the scan.
% scan_model.py --proteinfile efhand.site --gridsize 1 --excludedresidues "CA ZN MG" Model/training.ini
Optional Environment Variable Setup
-----------------------------------
-
PATH (Recommended)
Add directory where FEATURE scripts are installed
Add directory where FEATURE binaries are located
-
FEATURE_PATH (overrides PATH)
Set to directory where FEATURE binaries are located
-
FEATURE_PARAMETER_FILE
Set to default parameter file to always include
when running FEATURE scripts
-
LOCAL_PDB_DIR (Recommended)
Set to directory where local cache of PDB files
are stored
-
LOCAL_DSSP_DIR (Recommended)
Set to directory where local cache of DSSP files
are stored
-
PROTEIN_PDB_DIR
Set to directory where global cache of PDB files
are stored
-
PROTEIN_DSSP_DIR
Set to directory where global cache of DSSP files
are stored
Example Environment Setup
-------------------------
setenv PATH $PATH:/home/username/projects/FeatureScripts:/home/username/projects/Feature
setenv LOCAL_PDB_DIR /home/username/databases/pdb
setenv LOCAL_DSSP_DIR /home/username/databases/dssp
Back to Top
Running ViewFeature:
1. Launch Chimera and import PDB file
2. Go to Extensions -> Utilities -> Feature
3. Select the Parent Directory of the .site file for the respective
analysis
4. As long as the directory structure created by FeatureScripts
is not changed, ViewFeature will automatically find the .site file (training
sites), .sitedataaf file (statistical model of site), and the correct .DSSP
file, and .hits files (scan results).
5. ViewFeature's interactive
GUI will open.
6. Highlight different parts of the open structure(s) by clicking the
respective buttons in the ViewFeature GUI (see below)
Feature Statistics Panel (top panel)
- P-value cutoff can be changed for statistical significance.
- scroll bar moves the 2-D plot up and down
- Red boxes are statistically significant property/volume pairs for
the training set.
- Cyan boxes are statistically deficient (significant in the nonsites
training set).
- Statistically insignificant property/volume pairs are left empty.
- By clicking on a red or cyan box, one can highlight that property/volume
pair in the open structure in Chimera.
- Alternatively, by clicking on the actual property's box, one can
highlight all those property/volume pairs that exist in the open structure.
i.e.
clicking on ATOM-NAME-IS-N will highlight all nitrogen backbone atoms
Inspector Panel (middle panel)
- allows visualization of concentric shells (left side of panel)
1. select a site point (see
below) around which to center a shell
2. click on desired shell
number (top row of 2-D plot in Feature Statistics Panel)
3. change "Displayed" button
to TRUE
4. click on "color button"
and then choose a color and opacity level
- representation of highlighted properties can be changed (right side
of panel)
1. change "atom" and "bond"
representation as well as "color" by clicking on the respective buttons
Sites Panel (lower panel)
-allows display of training sites and hits (statistically significant sites)
1. From the pull down menu
in the lower panel, choose the respective protein name for the open model
(default should be for the open structure)
2. Selected site points
are displayed as small red spheres
3. Multiple site points
can be selected and displayed by pressing the <Shift> or <Ctrl> key
while selecting with the mouse
Back to Top
Section V: Known Issues
1. A window generated by tk (like IDLE) cannot be opened while running
ViewFeature. The ViewFeature GUI will fail to open properly. (Bug being
researched.... could be a conflict calling a function in CGLtk)