Feature and ViewFeature - Read Me File

October 9, 2001
Documentation by D. R. Banatao, Mike Liang, and Prasanth Pulavarthi
Helix Group (Altman Lab) - Stanford University

Section I: Background

Feature

- Developed by the Helix group at Stanford University.
- Uses a supervised learning algorithm to build statistical models of sites from a training set, and then predict similar sites in query structures.
- Requires user knowledge of biology in order to define a "site" and manually create a training set.
- Feature currently does not automatically select sites and build training sets.
- Results can be displayed in the ViewFeature extension to Chimera.
- Sites are regions within a protein defined by a central location and a surrounding neighborhood. More specifically, an atom within the region of inquiry can be chosen as the center of a neighborhood with a user-specified radius. Sites are usually picked because of their structural or functional role, such as enzymatic active sites or Ca++ binding.
- A nonsite may be described as any other random site where a different function or lack of function may take place.
- The Feature Program will require as input:

PDB files (.FULL as the extension)
DSSP files (.DSSP as the extension), you can generate these using the DSSP program
it's own i nternal parameter files

Training

- Feature takes the defined sites, computes the spatial distributions of defined biophysical and biochemical properties, and then reports those regions within the sites where these properties significantly vary from those of the nonsites.
- Uses a non-parametric test (Mann-Whitney rank-sum test) to identify properties at which volumes of the known positive sites differ significantly from that of the negative control sites.

Scanning

- Feature then uses a log-odds scoring function based on Bayes? Rule to obtain the distribution of distinguishing properties in a query structure. Feature gives a score that indicates how likely a query region is a site of interest.

Figure 1. A Diagram of the Feature Algorithm

A diagram of the Feature algorithm

FeatureScripts

- Written by Mike Liang from the Helix Group
- FeatureScripts are a set of python scripts written to simplify the usage of Feature.
- Given the minimum set of user parameters, FeatureScripts will prepare the necessary directories and files to execute Feature, and then execute the training or scan process.

ViewFeature

- ViewFeature is an extension to the molecular visualization program, Chimera, that integrates Feature?s statistical models and site predictions with 3-dimensional structures viewed in Chimera.
- Chimera developed by the Computer Graphics Laboratory (CGL), UCSF
- ViewFeature developed by D.R. Banatao (Helix Group) and C.C. Huang (CGL)
- Enables visualization of distinguishing properties in a 3-D structure within specified volumes, thus giving users insight into the structural motifs that define a given site
- ViewFeature's GUI allows:

interactive display of statistical model generated by Feature's Training algorithm
easy highlighting of significant properties
display and manipulation of concentric shells to spatially orient the user in a 3-D site
simultaneous display of training sites and "hits" generated from Feature's Scanning algorithm in multiple structures

Figure 2. A diagram of the integration of Feature and visualization in ViewFeature

A diagram of the integration of Feature and visualization in
ViewFeature

Papers on Feature and ViewFeature:

Characterizing the microenvironment surrounding protein sites.
Bagley SC, Altman RB. Protein Sci 1995 Apr;4(4):622-35

Recognizing Protein Binding Sites Using Statistical Descriptions of their 3D Environments
L. Wei and R.B. Altman; Pacific Symposium on Biocomputing 3:495-506 (1998).

ViewFeature: Integrated Feature Analysis and Visualization
D.R. Banatao, C.C. Huang, P.C. Babbitt, R.B. Altman, and T.E. Klein; Pacific Symposium on Biocomputing 6:240-250 (2001).

Back to Top

Section II: Installation

Installing Feature

Download the latest Feature tarball from http://feature.stanford.edu/.
Extract the tarball:

% tar xvzf feature-1.4.tar.gz

cd

Configure the source tree:

% ./configure --prefix=/home/username

/home/username/feature

/usr/local/feature

Build Feature:

% make

Install Feature:

% make install

Configure your environment

% setenv PATH ${PATH}:/home/user/feature
% setenv FEATURE_DIR /home/user/feature
% setenv PDB_DIR /home/user/databases/pdb
% setenv DSSP_DIR /home/user/databases/dssp
For bash users, use export VARNAME=VALUE. For sh users, use VARNAME=VALUE; export VARNAME.

Installing FeatureScripts

Download the latest FeatureScripts tarball from http://feature.stanford.edu/.
Extract the tarball:

% tar xvzf feature-scripts-1.0.tar

cd

Configure the source tree:

% ./configure --prefix=/home/username

/home/username/feature

/usr/local/feature

Install FeatureScripts:

% make install

Uninstalling Feature or FeatureScripts

If you ever need to uninstall Feature or FeatureScripts, you can run make uninstall from the source directory of the package whose installed files you want to remove.

Back to Top

Section III: Installation Con't

Installing Chimera

- Chimera is free for download from the CGL at UCSF for academic use.
- For instructions on downloading and installing Chimera go to http://www.cgl.ucsf.edu/chimera/ and follow the proper links.

Installing ViewFeature

download ViewFeature.zip
move ViewFeature.zip to /usr/local/chimera/share (on unix) or C:\Program Files\Chimera\share (on windows)
unzip ViewFeature.zip

Back to Top

Section IV: Running Feature and ViewFeature

Running Feature

Training

- Training will require as input:

a .site file (containing training sites and nonsites, and analysis name)
a train parameter file (a .txt file containing:

directory path the .FULL & .DSSP files
number of shells
shell thickness
analysis name
directory path to the .site file
output directory )

- Alternatively, Run train_model.py in directory where Model/ will be created

Scanning

- Scanning will require as input:

a scan parameter file (a .txt file containing:

p-level cutoff
delta value (grid size)
number of shells
shell thickness
directory path to the .score file
output directory
directory path the query structures for scanning (.FULL & .DSSP files)
residues to exclude )

- Alternatively, Run scan_model.py in directory where Scan-scanname/ will be created

Example .site file
T signifies a "site"
NIL signifies and "nonsite"

-------------------

((:SITE-NAME "na-test") (:SITE-RADIUS 7.5) (:SITES (
("1dnx" X 15.190 Y 13.919  Z 14.557  T)
("1dnx" X 14.562 Y 15.239  Z 14.284  T)
("1dnx" X 21.624 Y 16.034  Z 4.461   T)
("1evp" X -6.199 Y  9.149  Z 11.709  NIL)
("1evp" X  3.753 Y -3.868  Z 11.87   NIL)
("1evp" X -3.333 Y 10.567  Z 16.935  NIL)
("1evp" X -4.596 Y  9.035  Z 18.047  NIL)
("315d" X 34.298 Y  21.006 Z -5.441  NIL)
("315d" X 28.753 Y  8.320  Z -7.166  NIL)
("315d" X 26.565 Y  12.412 Z -13.403 NIL)
)))

-------------------------

Example train_parameter file
-----------------------------------------

NUM-OF-SHELLS:   6
SHELL-THICKNESS: 1.25
ANALYSIS-NAME:   na-test
ALL-SITES-PATH:  /home/banatao/TestRun
PROTEINS-PATH:   /home/banatao/NAstructs
STAT-FILES-PATH: /home/banatao/TestRun

-----------------------------------------

Example scan_parameter file
-----------------------------------------

P-LEVEL:           0.01
DELTA-VALUE:       1.652
NUM-OF-SHELLS:     6
SHELL-THICKNESS:   1.25
SCORE-FILE:        TestRun/ca-tr-te/ca-tr-te
OUTPUT-PATH:       TestRun/ca-tr-te/Scan_efhand000train/Hits
PROTEINS-PATH:     TestRun/ca-tr-te/Scan_efhand000train/Proteins
EXCLUDED-RESIDUES: CA

-------------------------

-Running Feature using FeatureScripts:

Usage
-----

Create .site file for FEATURE
Run train_model.py in directory where Model/ will be created

Minimum Parameters: -s <sitefile>; -n <numshells>; -w <shellthickness>

Run scan_model.py in directory where Scan-scanname/ will be created

Minimum Parameters: -f <proteinlistfile>; -g <gridsize>; -x <excludedresidues>; trainingmodel

Example
-------

# If working directory is FeatureRuns/EFHand/
# If efhand.site is stored in that directory
# Go to working directory
% cd FeatureRuns/EFHand

# Create Model using the training set 'efhand.site'
#   with 6 shells of thickness 1 angstrom
% train_model.py --sitefile efhand.site --numshells 6 --shellthickness 1

# Scan the proteins in efhand.site using gridsize of 1 angstroms
#   and excluding the residues calcium, zinc, and magnesium.
#   Use the model specified in Model/training.ini to perform the scan.
% scan_model.py --proteinfile efhand.site --gridsize 1 --excludedresidues "CA ZN MG" Model/training.ini

Optional Environment Variable Setup
-----------------------------------

PATH (Recommended)

FEATURE_PATH (overrides PATH)

FEATURE_PARAMETER_FILE

LOCAL_PDB_DIR (Recommended)

LOCAL_DSSP_DIR (Recommended)

PROTEIN_PDB_DIR

PROTEIN_DSSP_DIR

Example Environment Setup
-------------------------

setenv PATH $PATH:/home/username/projects/FeatureScripts:/home/username/projects/Feature
setenv LOCAL_PDB_DIR /home/username/databases/pdb
setenv LOCAL_DSSP_DIR /home/username/databases/dssp

Back to Top

Running ViewFeature:

1. Launch Chimera and import PDB file
2. Go to Extensions -> Utilities -> Feature
3. Select the Parent Directory of the .site file for the respective analysis
4. As long as the directory structure created by FeatureScripts is not changed, ViewFeature will automatically find the .site file (training sites), .sitedataaf file (statistical model of site), and the correct .DSSP file, and .hits files (scan results).
5. ViewFeature's interactive GUI will open.
6. Highlight different parts of the open structure(s) by clicking the respective buttons in the ViewFeature GUI (see below)

Feature Statistics Panel (top panel)

- P-value cutoff can be changed for statistical significance.
- scroll bar moves the 2-D plot up and down
- Red boxes are statistically significant property/volume pairs for the training set.
- Cyan boxes are statistically deficient (significant in the nonsites training set).
- Statistically insignificant property/volume pairs are left empty.
- By clicking on a red or cyan box, one can highlight that property/volume pair in the open structure in Chimera.
- Alternatively, by clicking on the actual property's box, one can highlight all those property/volume pairs that exist in the open structure. i.e. clicking on ATOM-NAME-IS-N will highlight all nitrogen backbone atoms

Inspector Panel (middle panel)

- allows visualization of concentric shells (left side of panel)
        1. select a site point (see below) around which to center a shell
        2. click on desired shell number (top row of 2-D plot in Feature Statistics Panel)
        3. change "Displayed" button to TRUE
        4. click on "color button" and then choose a color and opacity level
- representation of highlighted properties can be changed (right side of panel)
        1. change "atom" and "bond" representation as well as "color" by clicking on the respective buttons

Sites Panel (lower panel)

-allows display of training sites and hits (statistically significant sites)
        1. From the pull down menu in the lower panel, choose the respective protein name for the open model (default should be for the open structure)
        2. Selected site points are displayed as small red spheres
        3. Multiple site points can be selected and displayed by pressing the <Shift> or <Ctrl> key while selecting with the mouse

Back to Top

Section V: Known Issues

1. A window generated by tk (like IDLE) cannot be opened while running ViewFeature. The ViewFeature GUI will fail to open properly. (Bug being researched.... could be a conflict calling a function in CGLtk)