INTRODUCTION This is NAPSS version 1.0. The computer code contained herein is designed to enhance the accuracy of RNA secondary structure prediction by incorporating experimental NMR data obtained for a particular RNA structure. More specifically, these data describe helical regions of an RNA without knowledge of the exact nucleotides that comprise the basepairs within these helices. Further information regarding this method can be found in the article "NMR-Assisted Prediction of RNA Secondary Structure: Identification of a Probable Pseudoknot in the Coding Region of an R2 Retrotransposon" by Hart JM, Kennedy SD, Mathews DH, and Turner DH in the Journal of the American Chemical Society (2008) - in press at the time of this release. LICENSING INFORMATION This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program (gpl.txt); if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. INSTALLATION Extract this archive (keeping directory structure intact) to the target location of your choice. The main directory will contain the files necessary for executing the main NAPSS algorithm, two subdirectories, and some example data files. The "data" subdirectory contains the thermodynamic parameters for the dynamic programming algorithm and needs no further modification. Execute a "make" command in the main directory to compile the NAPSS algorithm. Navigate into the "dotplot" subdirectory and execute a "make" command again. This subdirectory contains the code to produce enhanced dotplots from an RNA sequence file, as well as some sample data files. INPUT FILES The NAPSS algorithm requires three text files as input: a sequence file from the program RNAstructure (also available from the Mathews Lab at http://rna.urmc.rochester.edu/software.html), a dotplot file created by the program in the "dotplot" subdirectory of this release, and a user-created file containing the helical walk constraints. Additionally, it may be beneficial to create a fourth text file to specify the run-time configuration options for NAPSS. Each of these files will be described in more detail below. For simplicity, it is recommended that users place each input file in the main NAPSS directory. The sequence file is a text file that can be created in RNAstructure or a text editor. It contains up to three comment lines at the beginning of the file that start with a semicolon. This is followed by one line that contains the name of the sequence, and then as many lines as necessary to list the nucleotides (A/C/G/U) in the sequence, starting from the 5' end. Immediately after the 3'-most nucleotide, the number "1" is appended to signal the end of the file. Please see the file "bm.seq" in the main directory or the "dotplot" subdirectory for an example of this type of file. The dotplot file is a tab-delineated text file created by the program in the "dotplot" subdirectory of this release. To create this file, place a copy of your sequence file in the "dotplot" subdirectory. Navigate to this subdirectory and execute this program with the command "./fold" followed by a space, then the name of your sequence file, another space, and finally the desired name for the dotplot output. For example, "./fold bm.seq bm.dp" This procedure only needs to be performed once per unique RNA sequence; multiple executions of the NAPSS algorithm on the same primary sequence can also use the same dotplot file. Please see the file "bm.dp" in the main directory or the "dotplot" subdirectory for an example of this type of file. Once the dotplot file has been created, place a copy of it in the main NAPSS directory and return to that location. The constraint file should be generated with a plain text editor. It describes the types of basepairs in each helical walk, and individual walks should be separated by carriage returns. The current convention for numerically describing basepair types is as follows: "5" = AU, "6" = GC, and "7" = GU. For example, the line "65666" would indicate a helical walk that consists of one GC pair followed by one AU pair followed by three more GC pairs. Please see the file "bm.con" in the main directory for an example of this type of file. The optional configuration file can also be generated with a plain text editor. If this file is not specified at run-time, NAPSS will automatically enter an interactive mode to prompt the user for the information that this file would otherwise contain. The configuration file consists of several lines of text with each line specifying one specific parameter in the format "PARAMETER=VALUE". Please see the file "config.txt" in the main directory for an example of this type of file. There are four mandatory parameters that a valid configuration file must contain: "inseq", the name of the sequence file "indotplot", the name of the dotplot file "inconstraints", the name of the constraints file "outct", the desired name for the connection-table file that describes the basepairs and calculated free energies of all the output secondary structures Additionally, any combination of five optional parameters may also be specified: "maxtracebacks", the maximum number of refolded structures per constraint match combination [default is 100] "percent", the maximum allowed percent difference from the lowest free energy structure in the dotplot [default is 25] "windowsize", a parameter describing how different suboptimal refoldings must be from each other [default is 0] (a small window size allows very similar structures to be generated while a larger window size requires them to be more different) "cutoff", the maximum allowed percent difference from the lowest free energy structure in the final output [default is 0, which directs NAPSS to output all structures (note that 0 is a flag that indicates "no cutoff," rather than a cutoff of 0)] "outpairs", the desired name for the optional positions-paired output file for secondary structure visualization with PseudoViewer [default is no output] (please see below for recommendations on using this feature). RUNNING NAPSS Once all the input text files have been created and assembled in the main NAPSS directory, execution is accomplished by the command "./NAPSS", optionally followed by a space and then by the name of the configuration text file. For example, "./NAPSS config.txt" If this file is not specified, NAPSS will automatically enter an interactive mode to prompt the user for the necessary configuration parameters. Upon successful or unsuccessful completion, NAPSS will pause to allow Windows-based users to see the output before terminating - press any character followed by Enter to return to the command prompt. GENERAL RECOMMENDATIONS The optional parameters for the configuration file have been assigned default values that correspond to the results reported in Hart JM, et al. JACS (2008), with the exception of "cutoff" which was assigned a value of 25. In general the optimal values for each depend on the size of the RNA being studied as well as the length and uniqueness of the NMR-derived helical constraints. Please refer to the specific discussion of these settings below. The most critical parameter setting is "percent" - this, along with the size of the RNA, dictates how many potential basepairs from the dotplot are considered in the downstream calculations. It is recommended that users start with a rather small value for this (5 or so) and slowly increase the value in subsequent calculations, particularly for structures that are larger than 100 nucleotides in length. Larger "percent" values may be necessary to include poorly-predicted helices, especially in the case of certain pseudoknotted structures, but this can also exponentially increase the number of match combinations that NAPSS will have to refold. Although this cannot at present be reduced to a mathematical relationship, a good rule of thumb is that a 100-nt RNA will take approximately one second per refolding on an average modern CPU. NAPSS will update refolding and energy calculation progress every 100 structures so that users can arrive at a better performance estimate for their particular system. "windowsize" should not be changed from the default value of zero without a compelling reason. This is because it is related only to the output of the refolding prediction, which tends to involve rather small substructures from unconstrained regions. One potential exception to this would be if the overall structure is rather large and the number of constrained base pairs is comparatively small. "maxtracebacks" governs the maximum number of structures that will be output from each refolding and can often be kept at a rather large number without any major detriment to running time. It may be beneficial to decrease this setting if large regions of the secondary structure are unconstrained by experimental results, as this may decrease the number of relatively unstable structures while still outputting the most favorable candidates. "cutoff" is an option for truncating the number of structures that will be output once the NAPSS algorithm has completed. As such it does not have any effect on the running time of the algorithm but can potentially bring the number of results down to a more manageable size. One important caveat to note is that the free energy calculation that is currently incorporated in NAPSS cannot yield an accurate value for extremely complicated pseudoknotted structures (it is actually a variant of the Pseudoknot Energy Model in the NUPACK algorithm - for more information, please refer to Dirks RM, Pierce NA. J. Comput. Chem. 2003, 24, 1664-1677). If such a structure is encountered by the energy-calculating subroutine, it breaks apart the offending substructure and adds a large penalty to that structure's predicted free energy. If the consideration of such structures is desirable, "cutoff" should be set to zero so that all structures are output to the ct file. The final optional parameter, "outpairs", can be used to specify a filename to which NAPSS will output all predicted secondary structures in a Positions-Paired format that can be read by the program PseudoViewer. This program can be helpful for visualizng many pseudoknotted structures and is available for download from http://wilab.inha.ac.kr/pseudoviewer2/ The output file generated by this subroutine of NAPSS has many structures appended into one file with each structure separated from the next by a line of hyphens. PseudoViewer can only read one structure at a time, so the desired structure should be copied and pasted into a new text file before attempting to visualize it in PseudoViewer. BUG REPORTS Please send a detailed email to James_Hart@urmc.rochester.edu with the subject line "NAPSS" to report any bugs that you find in this software.