GTfold
Scalable Multicore Code for RNA Secondary Structure Prediction
Please see Overview for most recent binaries and see Develop for details on download and installation of source code.
Usage
After unpacking the binaries or installing the source code, users should navigating to the directory containing the gtmfe, gtboltzmann, and gtsubopt executibles. These executibles can be called from the command line using:These programs accept input sequence files with a single sequence alone or in FASTA format. When using ./gtmfe, the minimum free energy structure will be written to a .ct file. (See description of options for further output file name details.)
Options
option | description | gtmfe | gtboltzmann | gtsubopt |
-c, --constraints FILE | Load constraints from FILE. See constraint syntax below. | |||
-d, --dangle INT | Restricts treatment of dangling energies, see below for details. | |||
--delta DOUBLE | Compute suboptimal structures within DOUBLE kcal/mole of MFE. Writes structures to file output-prefix.ss. | |||
--detailedhelp | Display detailed help message. Includes examples and additional options useful to developers. | -e, --energydetail | Write loop-by-loop energy decomposition of structures to output-prefix.energy. When using this function in combination with --sample, number of threads must be limited to one (-t 1). | |
--estimatebpp | Write a csv file containing, for each sampled base pair,
that base pair and it's frequency to output-prefix.sbpp. Only valid in combination with --sample. |
|||
--groupbyfreq | Write a csv file (output-prefix.frequency) containing, for each sampled structure, a line with the structure's probability under the Boltzmann Distribution followed by the normalized frequency of that structure, where (normalized frequency) = (structure frequency)/(number of structures sampled). Only valid in combination with --sample. | |||
-h, --help | Display help message and exit. | |||
-l, --limitcd INT | Set a maximum base pair contact distance to INT. If no limit is given, base pairs can be over any distance. | |||
-m, --mismatch | Enable terminal mismatch calculations. | |||
-o, --output NAME | Write output files with prefix given in NAME. | |||
-p, --paramdir DIR | Path to directory from which parameters are to be read. | |||
--pfcount | Output the number of possibles structures (using the partition function). | |||
--prefilter INT | Prohibits any basepair which does not have appropriate neighboring nucleotides such that it could be part of a helix of length INT. | |||
-s, --sample INT | Sample INT structures from Boltzmann distribution. Write structures to file output-prefix.samples. | |||
-t, --threads INT | Limit number of threads used to INT. (Default is max threads available.) | |||
-v, --verbose | Run in verbose mode. (Includes confirmation of constraints satisfied.) | |||
-w, --workdir DIR | Path of directory where output files are to be written. | |||
--rnafold | Run as RNAfold default mode (Vienna RNA Package version 1.8.5). (In this mode calls to -d, -p, -m and --prefilter will be ignored.) |
|||
--unafold | Run as UNAfold default mode (version 3.8), subject to traceback implementation. (In this mode calls to -d, -p, -m and --prefilter will be ignored.) |
|||
--useSHAPE FILE | Use SHAPE values from FILE (see below for syntax). | |||
--bpp | Calculate base pair probabilities and unpaired probabilities and write to output-prefix.bpp. (Beta option) |
Constraint syntax
The constraints given in constraint file should be formated as follows:
- P i j k Prohibits the formation of base pairs (i,j) (i+1,j-1) ... (i+k-1, j-k+1).
- F i j k Forces the formation of base pairs (i,j) (i+1,j-1) ... (i+k-1, j-k+1).
- P i 0 k Makes the bases from i to i+k-1 single stranded bases.
- F i 0 k Forces the bases from i to i+k-1 to be paired (without specifying their pairing parterns). (Beta option)
Dangle
INT=0 | Ignores dangling energies (mostly for debugging). |
INT=1 | Unpaired nucleotides adjacent to a branch in a multi-loop or external loop are allowed to dangle on at most one branch in that loop. This is the default setting for gtmfe. |
INT=2 | Dangling energies are added for nucleotides on either side of each branch in multi-loops and external loops. This is the default setting for gtboltzmann and gtsubopt. (This is the same as the -d2 setting in the RNAfold from the Vienna RNA Package.) |
otherwise | INT is ignored and the default setting is used. |
SHAPE syntax
SHAPE values should be given in a file with two single-space-delimited columns, for example
- 1 0.1
- 2 0.001
- 3 1.67
- etc.,
Setting default parameter directory
To run properly, GTfold requires access to a set of parameter files. If you are using one of the prepackaged binaries, you may need (or chose) to set the GTFOLDDATADIR environment variable to specify the directory in whihc GTfold should look to find default parameter files. In a terminal window, use either the command
GTfold will by default look for parameter files in the following directories:
- (1) The directory pointed to by environment variable GTFOLDDATADIR
- (2) The install directory (eg. /usr/local/share/gtfold), if (1) fails.
- (3) The subdirectory 'data' of the current directory, if (1) and (2) fail.
References
Deigan KE, Lia TW, Mathews DH, Weeks KM. 2009. Accurate SHAPE-directed RNA structure determination. Proc Natl Acad Sci USA. 106:97–102.
Swenson MS, Anderson J, Ash A, Gaurav P, Sukos Z, Bader DA, Harvey SC, and Heitsch CE. 2012. GTfold: Enabling parallel RNA secondary structure prediction on multi-core desktops. BMC Research Notes. 5(1):341.
Updating GTfold
To download and install the latest source code for GTfold, please visit the Develop page.
Support
- We have different trackers enabled for our project. Users can report bugs, request for new features, place support requests etc. at the following tracker page.
- You can email the GTfold administrators for any unresolved issues with GTfold