Feature and ViewFeature - Read Me File

October 9, 2001
Documentation by D. R. Banatao, Mike Liang, and Prasanth Pulavarthi
Helix Group (Altman Lab) - Stanford University
 
  1. About Feature, FeatureScripts, and ViewFeature
  2. Installing Feature and FeatureScripts
  3. Installing Chimera Visualization Package and ViewFeature Module
  4. Running Feature and ViewFeature
  5. Known Issues

Section I: Background

Feature

- Developed by the Helix group at Stanford University.
- Uses a supervised learning algorithm to build statistical models of sites from a training set, and then predict similar sites in query structures.
- Requires user knowledge of biology in order to define a "site" and manually create a training set.
- Feature currently does not automatically select sites and build training sets.
- Results can be displayed in the ViewFeature extension to Chimera.
- Sites are regions within a protein defined by a central location and a surrounding neighborhood. More specifically, an atom within the region of inquiry can be chosen as the center of a neighborhood with a user-specified radius. Sites are usually picked because of their structural or functional role, such as enzymatic active sites or Ca++ binding.
- A nonsite may be described as any other random site where a different function or lack of function may take place.
- The Feature Program will require as input:
  1. PDB files (.FULL as the extension)
  2. DSSP files (.DSSP as the extension), you can generate these using the DSSP program
  3. it's own internal parameter files
Training
- Feature takes the defined sites, computes the spatial distributions of defined biophysical and biochemical properties, and then reports those regions within the sites where these properties significantly vary from those of the nonsites.
- Uses a non-parametric test (Mann-Whitney rank-sum test) to identify properties at which volumes of the known positive sites differ significantly from that of the negative control sites.
Scanning
- Feature then uses a log-odds scoring function based on Bayes? Rule to obtain the distribution of distinguishing properties in a query structure. Feature gives a score that indicates how likely a query region is a site of interest.

Figure 1. A Diagram of the Feature Algorithm

A diagram of the Feature algorithm

FeatureScripts

- Written by Mike Liang from the Helix Group
- FeatureScripts are a set of python scripts written to simplify the usage of Feature.
- Given the minimum set of user parameters, FeatureScripts will prepare the necessary directories and files to execute Feature, and then execute the training or scan process.

 ViewFeature

- ViewFeature is an extension to the molecular visualization program, Chimera, that integrates Feature?s statistical models and site predictions with 3-dimensional structures viewed in Chimera.
- Chimera developed by the Computer Graphics Laboratory (CGL), UCSF
- ViewFeature developed by D.R. Banatao (Helix Group) and C.C. Huang (CGL)
- Enables visualization of distinguishing properties in a 3-D structure within specified volumes, thus giving users insight into the structural motifs that define a given site
- ViewFeature's GUI allows: Figure 2. A diagram of the integration of Feature and visualization in ViewFeature

A diagram of the integration of Feature and visualization in 
ViewFeature

Papers on Feature and ViewFeature:

Characterizing the microenvironment surrounding protein sites.
Bagley SC, Altman RB. Protein Sci 1995 Apr;4(4):622-35

Recognizing Protein Binding Sites Using Statistical Descriptions of their 3D Environments
L. Wei and R.B. Altman; Pacific Symposium on Biocomputing 3:495-506 (1998).

ViewFeature: Integrated Feature Analysis and Visualization
D.R. Banatao, C.C. Huang, P.C. Babbitt, R.B. Altman, and T.E. Klein; Pacific Symposium on Biocomputing 6:240-250 (2001).

Back to Top


Section II: Installation

Installing Feature

  1. Download the latest Feature tarball from http://feature.stanford.edu/.
  2. Extract the tarball:
  3. % tar xvzf feature-1.4.tar.gz
    This will create a new directory under the current directory containing the source code for the distribution. You should cd into that directory before proceeding with compiling.
  4. Configure the source tree:
  5. % ./configure --prefix=/home/username
    This will configure Feature to be installed into /home/username/feature. If a prefix is not specified, then Feature will be configured for installation into /usr/local/feature.
  6. Build Feature:
  7. % make
  8. Install Feature:
  9. % make install
  10. Configure your environment
  11. % setenv PATH ${PATH}:/home/user/feature
    % setenv FEATURE_DIR /home/user/feature
    % setenv PDB_DIR /home/user/databases/pdb
    % setenv DSSP_DIR /home/user/databases/dssp
    
    For bash users, use export VARNAME=VALUE. For sh users, use VARNAME=VALUE; export VARNAME.

Installing FeatureScripts

  1. Download the latest FeatureScripts tarball from http://feature.stanford.edu/.
  2. Extract the tarball:
  3. % tar xvzf feature-scripts-1.0.tar
    This will create a new directory under the current directory containing the source code for the distribution. You should cd into that directory before proceeding with compiling.
  4. Configure the source tree:
  5. % ./configure --prefix=/home/username
    This will configure FeatureScripts to be installed into /home/username/feature. If a prefix is not specified, then Feature will be configured for installation into /usr/local/feature. This directory should be the same as was specified for the Feature package.
  6. Install FeatureScripts:
  7. % make install

Uninstalling Feature or FeatureScripts

If you ever need to uninstall Feature or FeatureScripts, you can run make uninstall from the source directory of the package whose installed files you want to remove.

Back to Top


Section III: Installation Con't

Installing Chimera

- Chimera is free for download from the CGL at UCSF for academic use.
- For instructions on downloading and installing Chimera go to http://www.cgl.ucsf.edu/chimera/ and follow the proper links.

Installing ViewFeature

  1. download ViewFeature.zip
  2. move ViewFeature.zip to /usr/local/chimera/share (on unix) or C:\Program Files\Chimera\share (on windows)
  3. unzip ViewFeature.zip
Back to Top

Section IV: Running Feature and ViewFeature


Running Feature

Training
- Training will require as input:
  1. a .site file (containing training sites and nonsites, and analysis name)
  2. a train parameter file (a .txt file containing:


- Alternatively, Run train_model.py in directory where Model/ will be created

Scanning
- Scanning will require as input:
  1. a scan parameter file (a .txt file containing:


- Alternatively, Run scan_model.py in directory where Scan-scanname/ will be created

 

Example .site file
T signifies a "site"
NIL signifies and "nonsite"

-------------------
((:SITE-NAME "na-test") (:SITE-RADIUS 7.5) (:SITES (
("1dnx" X 15.190 Y 13.919  Z 14.557  T)
("1dnx" X 14.562 Y 15.239  Z 14.284  T)
("1dnx" X 21.624 Y 16.034  Z 4.461   T)
("1evp" X -6.199 Y  9.149  Z 11.709  NIL)
("1evp" X  3.753 Y -3.868  Z 11.87   NIL)
("1evp" X -3.333 Y 10.567  Z 16.935  NIL)
("1evp" X -4.596 Y  9.035  Z 18.047  NIL)
("315d" X 34.298 Y  21.006 Z -5.441  NIL)
("315d" X 28.753 Y  8.320  Z -7.166  NIL)
("315d" X 26.565 Y  12.412 Z -13.403 NIL)
)))
-------------------------
 


 

Example train_parameter file
-----------------------------------------

NUM-OF-SHELLS:   6
SHELL-THICKNESS: 1.25
ANALYSIS-NAME:   na-test
ALL-SITES-PATH:  /home/banatao/TestRun
PROTEINS-PATH:   /home/banatao/NAstructs
STAT-FILES-PATH: /home/banatao/TestRun
-----------------------------------------


 

Example scan_parameter file
-----------------------------------------

P-LEVEL:           0.01
DELTA-VALUE:       1.652
NUM-OF-SHELLS:     6
SHELL-THICKNESS:   1.25
SCORE-FILE:        TestRun/ca-tr-te/ca-tr-te
OUTPUT-PATH:       TestRun/ca-tr-te/Scan_efhand000train/Hits
PROTEINS-PATH:     TestRun/ca-tr-te/Scan_efhand000train/Proteins
EXCLUDED-RESIDUES: CA
-------------------------


 

-Running Feature using FeatureScripts:

Usage
-----

  1. Create .site file for FEATURE
  2. Run train_model.py in directory where Model/ will be created
  3. Minimum Parameters
    -s <sitefile>
    -n <numshells>
    -w <shellthickness>
  4. Run scan_model.py in directory where Scan-scanname/ will be created
  5. Minimum Parameters
    -f <proteinlistfile>
    -g <gridsize>
    -x <excludedresidues>
    trainingmodel
Example
-------
# If working directory is FeatureRuns/EFHand/
# If efhand.site is stored in that directory
# Go to working directory
% cd FeatureRuns/EFHand

# Create Model using the training set 'efhand.site'
#   with 6 shells of thickness 1 angstrom
% train_model.py --sitefile efhand.site --numshells 6 --shellthickness 1

# Scan the proteins in efhand.site using gridsize of 1 angstroms
#   and excluding the residues calcium, zinc, and magnesium.
#   Use the model specified in Model/training.ini to perform the scan.
% scan_model.py --proteinfile efhand.site --gridsize 1 --excludedresidues "CA ZN MG" Model/training.ini


 

Optional Environment Variable Setup
-----------------------------------


 

Example Environment Setup
-------------------------

setenv PATH $PATH:/home/username/projects/FeatureScripts:/home/username/projects/Feature
setenv LOCAL_PDB_DIR /home/username/databases/pdb
setenv LOCAL_DSSP_DIR /home/username/databases/dssp
Back to Top

 

Running ViewFeature:

1. Launch Chimera and import PDB file
2. Go to Extensions -> Utilities -> Feature
3.  Select the Parent Directory of the .site file for the respective analysis
4.  As long as the directory structure created by FeatureScripts is not changed, ViewFeature will automatically find the .site file (training sites), .sitedataaf file (statistical model of site), and the correct .DSSP file, and .hits  files (scan results).
5. ViewFeature's interactive GUI will open.
6. Highlight different parts of the open structure(s) by clicking the respective buttons in the ViewFeature GUI (see below)

 Feature Statistics Panel (top panel)
- P-value cutoff can be changed for statistical significance.
- scroll bar moves the 2-D plot up and down
- Red boxes are statistically significant property/volume pairs for the training set.
- Cyan boxes are statistically deficient (significant in the nonsites training set).
- Statistically insignificant property/volume pairs are left empty.
- By clicking on a red or cyan box, one can highlight that property/volume pair in the open structure in Chimera.
- Alternatively, by clicking on the actual property's box, one can highlight all those property/volume pairs that exist in the open structure. i.e. clicking on ATOM-NAME-IS-N will highlight all nitrogen backbone atoms
 
Inspector Panel (middle panel)
- allows visualization of concentric shells (left side of panel)
        1. select a site point (see below) around which to center a shell
        2. click on desired shell number (top row of 2-D plot in Feature Statistics Panel)
        3. change "Displayed" button to TRUE
        4. click on "color button" and then choose a color and opacity level
- representation of highlighted properties can be changed (right side of panel)
        1. change "atom" and "bond" representation as well as "color" by clicking on the respective buttons
 
Sites Panel (lower panel)
-allows display of training sites and hits (statistically significant sites)
        1. From the pull down menu in the lower panel, choose the respective protein name for the open model  (default should be for the open structure)
        2. Selected site points are displayed as small red spheres
        3. Multiple site points can be selected and displayed by pressing the <Shift> or <Ctrl> key while selecting with the mouse

Back to Top
 

Section V: Known Issues

1. A window generated by tk (like IDLE) cannot be opened while running ViewFeature. The ViewFeature GUI will fail to open properly. (Bug being researched.... could be a conflict calling a function in CGLtk)