|
|
X! tandem will be released periodically,
with the version numbering system formulated from the date of release. The
changes made to the system on each release are detailed in the list below.
Releases are listed with the most recent release on top.
|
This release has numerous small changes to reduce the amount of memory
used by the application when utilizing its expert systems methods for finding PTMs and SAVs. It also adds
several mechanisms to reduce the number of false positive assignments, particularly when testing for SAVs.
|
-
The handling of expert systems information that is loaded from files but is not altered during a search has been
changed so that a single, global data structure is used, rather than individual data structures in each thread. This
feature required significant alteration to the code, so any projects that are "forks" of the main
X! Tandem project should take some care to assure that their changes are not effected.
-
A parent ion mass peak tolerance detection system has been added that can detect the minimum necessary
tolerance as part of the report generation phase of the search.
-
Protein SAVs that can be assigned to non-variant peptides are now excluded from the output.
|
This release improves the precision of handling variable modifications.
Several new commands and notational add-ons make specifying how to test modification significantly more nuanced.
The methods for handling variable modifications have been extensively re-written.
|
-
The value of the command "protein, ptm complexity" (C, a floating point number 0.0–12.0)
sets the maximum number of variable modification alternatives that will be tested for a particular
peptide. The number of alternatives is 2.0C. If this number is not specified, the default value C = 6.0 will be used.
-
The specification of a variable modification can include a value for the maximum number of
modification sites to be considered in a single peptide. For example, the modification specification
15.994915@M would normally be used to test for M oxidation. If you wish only to consider one such modification
per peptide, you can now write "15.994915@1M". Any number from 1–10 can be used in this notation. If not
specified, a default value of 10 is used.
- It is possible to specify that a variable modification NOT occur at the C-terminus of a peptide. For
example, previously "42.010565@K" would have been used to test for K acetylation. Using the new notation,
"42.010565@]K" can be used, which will not test C-terminal lysines for acetylation (which are chemically
impossible for tryptic peptides). This notation is useful for most lysine post-translational modifications, as well as dimethyl-arginine.
Note: monomethyl-arginine and -lysine are both susceptible to trypsin cleavage, so this notation is not
recommended for monomethyl variable modifications. It is also not recommended for use with carbamylation
— a urea artifact that can occur during tryptic digestion —
although reducing the number of carbamylations allowed per peptide, e.g., "43.005814@1K", can be quite useful.
-
The legacy command "spectrum, use noise suppression" has been removed from the project: the original
method was created for LCQ spectra and it no longer had any practical utility.
-
Limits have been introduced to the length of peptide that will be considered to be a solution to a mass spectrum.
Previous limits had only been based on the parent ion mass of a fragment ion spectrum. The new limits require a
peptide to be 6–50 residues in length, regardless of the parent ion mass.
-
The Windows version of the code has been updated and adapted for use with Microsoft Visual Studio Community 2015.
It has been fully tested for Windows 8, 8.1 & 10.
-
The Linux version of the code has been updated and adapted for use with Red Hat Enterprise Linux Workstation v.6.7,
using gcc v. 4.4.7.
-
This version was designed and tested to work with the BI GPM Fury version of the generic GPM interface.
|
This release adds a new output format (mzIdentML) and several variants
of the mzML input format (MSNumPress compression). It also corrects an undesired behavior when searching
for protein N-terminal and C-terminal modifications when using a protein modification specification XML file.
|
- New files (MSNumPress.cpp, MSNumPress.hpp) were added to the project (Johan Teleman) to implement
the compression modes that have been added to the mzML specification.
- New files (mzid_report.cpp, mzid_report.hpp) were added to the project to implement
the output of an mzIdentML file, in addition to the existing BIOML output. To generate
an mzIdentML output, set the new parameter:
- "output, mzid": if "yes" the file will be created with the extention .mzid.
- The "score_terminus_single" method has been removed from mprocess and replaced by an altered version
of "score_terminus", which corrects the bad behavior associated with searching for protein N- and C-terminal
modifications when using a protein modification specification file. It also improves the display
progress reporting for this type of search.
|
This release updates the E-value estimation algorithm and corrects several
issues associated with using very high accuracy fragment ion mass tolerances.
|
- The E-value estimation algorithm has been totally rewritten to simplify the code.
- The new E-value algorithm deals more effectively with malformed protein sequence lists, particularly
sequence lists that deliberately have very large numbers of protein sequences that have very similar sequences.
- The method for determining high accuracy fragment ion tolerances has been corrected.
|
This release contains a new method for specifying protein post-translational
modifications specifically by protein coordinate and modification type.
|
- A file format similar to the amino acid polymorphism specification was developed and a reader implimented.
- This first version allowing coordinate-based PTM specification only allows one specified PTM per
tryptic peptide simultaneously. For example, if two PTMs are specified for the same
peptide, each will be tested separately, but not the two together. More than
one PTM may be specified on a particular residue: each will be tested sequentially.
- A significant number of internal changes have been made to eliminate any variability in
output caused by the use of multi-threading. The output files should now
be line-by-line identical, independent of the number of threads used.
|
This release contains a new method of dealing with redundant
protein sequences.
|
- A stacking system is used to track redundant protein sequences to eliminate
multiple processing of identifical sequences. The redundant information
is re-inserted into the results following processing, so that the
resulting output is the same as would have been generated by older versions.
- The letter "X" in protein sequences is now interpreted the same as
an asterisk "*", i.e., it is processed as a stop in translation.
- A new input parameter, spectrum, skyline path, was introduced to make the output
easier to parse for the Skyline MRM utility suite.
|
This release contains several bug fixes as well
as some new features associated with peptide cleavage patterns.
|
- The parameters " protein, cleavage semi" and "refine, cleavage semi" now have four possible values:
- "amino" - simulates cleavage by an aminopeptidase
- "carboxy" - simulates cleavage by a carboxypeptidase
- "yes" - simulates cleavage by both amino- and carboxy-peptidase
- "no" - semi-type cleavage not used (default)
- A bug that could produce negative values for "missed_cleavage" has been
corrected.
- The RTINSECONDS parameter in MGF files is now handled correctly.
- A mechanism for specifying protein-specific modifications that are to be
applied in all rounds of analysis has been added.
|
This is the second release in the CYCLONE project. There are
numerous small fixes and changes from the first CYCLONE release, associated with reducing the memory requirements for
large data sets. Specific new features are listed below.
|
- Additional testing for adventitious cleavage at Asp-Pro residues. These tests are made
for all enzyme cleavage types, except [X]|[X] (cleavage at all residues). Testing Asp-Pro
cleavage does not affect the "missed cleavage" count in an analysis.
- The load balancing method for starting multiple threads has been improved to
take account of data sets that have been re-arranged from their original order in an LC/MS/MS
file.
- Several changes have been made to keep up with changes to "standard" data file formats.
|
This is the first release in the CYCLONE project. There are
numerous small fixes and changes from the last TORNADO release, mainly aimed at improving the
speed of the application. Some of the new features are listed below.
|
- An improved scoring function for ETD data, incorporating the ideas described in Sun, R-X, et al.
J. Proteome Res. 2010 (DOI: 10.1021/pr100648r).
- A more complete implementation of the mzML v. 1.1.0 file format (in collaboration with Fredrik Levander).
- A mechanism for reading the fragmentation type from mzXML files, when available. This mechanism
allows X! Tandem to read mzXML files that contain mixtures of CID/HCD and ETD generated spectra
and correctly apply the appropriate set of fragment ions to the individual spectra for interpretation (in collaboration with Peter Lobel).
- A change to the interpretation of the "refine, unanticipated cleavage" directive to
being a "semi"-type cleavage rather than a full non-specific cleavage. The previous behavior can
be obtained using the new "refine, full unanticipated cleavage" directive.
- An improved implementation of the "quick acetyl" checking mechanism brought
out in the last TORNADO release.
- Explicit use of SIMD pragmas in the Windows version to speed up the native X! Tandem scoring function.
|
This release adds several new features to X! Tandem, as well as
compatibility with changes to some of the existing standard file formats. The new features are listed
below.
|
- New parameter "quick acetyl"
added to control a simplified check for acetylated protein N-terminii.
- New parameter "quick pyrolidone"
added to control the previously existing peptide N-terminal cyclization check.
- New parameter "stP bias"
added to control new behavior regarding the detection and assignment of phosphorylation sites.
- Compatibility with the current mzData format used by PRIDE.
|
This release is a maintenance release that adds one new feature
that is designed to be used in analyzing SILAC experiments. It also has minor changes to improve the
detection of new input data file types.
|
- It is now possible to specify multiple sets of "complete" modifications to be applied
sequentially. This was achieved by adding new commands to the input X! file format that
look like the following:
<note type="input" label="residue, modification mass">57@C</note>
<note type="input" label="residue, modification mass 1">57@C,8@K,10@R</note>
In this case, the data would be checked both for peptides with only cysteine modified by carboxyamidomethyl and for
peptides with carboxyamidomethyl and SILAC labeled lysine and arginine residues. This applies to both the initial round of analysis
as well as all refinement rounds. Any number of sets of complete modifications
can be added, by incrementing the count in the label ("residue, modification mass 2", "residue, modification mass 3", etc.). Processing
stops when either a count increment label is missing (e.g., there is a residue, modification mass 2 label but no
residue, modification mass 3 label). Processing is also stopped with a zero length string is passed, for example the following string would stop processing at count = 1,
<note type="input" label="residue, modification mass 1"></note>
A non-zero length string that cannot be interpreted as a residue modification is interpretted as meaning that the data should be
analyzed with no residue modifications, for example:
<note type="input" label="residue, modification mass 1">none</note>, or
<note type="input" label="residue, modification mass 1"> </note>.
- Compatibility for version 1.1 of the CMN format has been added, allowing long description strings (> 255 characters).
- Detection of new, non-standard variants of mzXML files has been added.
|
This release is the first of the new TORNADO versions of X! Tandem,
which have the goal of utilizing available external annotation information it improve the performance of
sequence identifications. The 2007.04.01 release started this project, by adding single nucleotide
induce amino acid polymorphism annotation to searches. TORNADO introduces the capability of setting
the potential modifications tested on a sequence by sequence basis, controlled by a BIOML annotation
file.
|
- A fix for the method to force the use of specific file formats (made by Patrick Lacasse)
- Addition of a class to handle sequence annotation files in BIOML format (saxmodhandler).
- Addition of a method to load the annotation file information into an STL map, in the class
mprocess.
- When compiling on Linux platforms, several possible makefiles are provided. The default
makefile will work for GCC version 4, with the expat libraries dynamically linked. The other
makefiles are all in the src directory, with names like "Makefile_XXX" where XXX
is a descriptive name indicating in which situations this file is appropriate. To use these files,
use a command line like this:
>make -f Makefile_GCCv3
|
This release adds compatibility with 64 bit floating point
data in mzXML or mzData formats.
|
- An new override of the dtohl method in the saxhandler.cpp file
was added to deal with 64 bit floats.
|
This release is the first version to support amino acid residue
polymorphism annotation. A file containing known coding mutations can be specified and the
search engine will check each specified version of those modified residues.
|
- Including SAPS required modification of the classes mprocess and
mscore as well as the addition of a new classs mscoresap, which
is specified in the mscorepam.h file. The new class follows the
same pattern as the other state machines for tracking sequence modifications. A class
that reads the XML-formatted SAPS annotation information has also
been added, saxsaphandler, which follows the same pattern as
the other BIOML processing classes.
- This version (and all subsequent ones) will use the preprocessor commands associated with
the compiler make processor to specify the platform being compiled. Previous
versions required the commenting out undesired options in the stdafx.h file. This
change includes the PLUGGABLE_SCORING preprocessor definition in mscore.h that
is necessary to alter the peptide scoring portion of the code.
- The Mac OSX version of the executable binaries is statically linked to
the most recent version of the XML parser expat. Unlike previous versions
it will not be necessary to have expat installed on the computer used
to run the search engine. The Linux/Unix builds are now the only
platforms onwhich dynamic linking is necessary.
|
This release is the first release to support rho-diagrams
for the determination of expectation value thresholds. It also has a minor, but important
change to the interpretation of results to make the results of refinement rounds more
consistent. This release is also the first to support the Mac OS X 10.4 version for Intel
processors. Support for non-Intel processors will be discontinued as of the next release.
|
- The output file has a new output value in the "performance parameters" group,
for example:
<group label="performance parameters" type="parameters">
<note label="quality values">117 34 22 10 5 2 1 0 0 1 0 1 0 0 0 0 0 0 0 0</note>
</group>
This change was necessary to support the use of rho-diagrams in the GPM display software.
- The proteins reported as possible correct identifications have been changed somewhat.
In previous versions, it was possible for a protein to be reported as identified even if
it did not have any qualifying peptides that were found to have the specified enzymatic
cleavage: a protein could have peptides found only during the cleavage-at-every site round
or the point-mutation round. This behavior has been changed so that a protein must have
at least one significant peptide found to have the specified enzymatic cleavage.
- The implementation of point mutation detection has been altered so that if a particular
possible point mutation has been explained by any set of potential modifications, it will not
be included as a possible solution in the output.
- The stdafx.h file has been altered to add in new preprocessor statements that
deal with the different versions of Mac OS X. These new statements are #OSX_TIGER
and #OSX_INTEL. To compile with OS X 10.4 on a PPC computer, uncomment both #OSX
and #OSX_TIGER. To compile with OS X 10.4 on an Intel computer, uncomment only
#OSX_INTEL.
|
This release of X! Tandem/P3 contains several
minor changes to maintain cross-platform compatibility.
|
-
Code that uses iterator math to determine the limits of a calculation
have been altered so that the iterators are not incremented passed the
end of an STL container. Incrementing iterators passed the end of a
container generates a run-time error when compiled with Microsoft Visual
C++ 2005. The precompiler variable _CRT_SECURE_NO_DEPRECATE has been
defined for the Visual C++ compiler, to prevent the generation of
unnecessary compiler warnings for the use of C string functions, such as "strcpy".
-
The SAXTandemInputHandler::characters method has been updated to
improve its performance and to handle escape characters correctly (suggested by Brendan Maclean).
-
The documentation and precompiler defines in stdafx.h that provide cross-platform compatibility
for 64-bit integer types have been updated.
|
This release of X! Tandem contains two
fixes to improve compatibility with Linux and one adjustment to be compatible
with X! Hunter.
|
-
On at least some Linux platforms, astersiks (*) in FASTA file were not being
processed properly. This has been corrected by Brendan Maclean.
-
Some mzXML files could produce memory problems, when combined with some
spectrum processing parameters because a vector was not being cleared between
processing individual spectra. This problem did not affect the results of the
search, but it could cause memory paging when using a large file.
-
The taxonomy xml file has always contained type specification, which was
not used by X! Tandem. Now X! Tandem enforces that FASTA or FASTA.PRO files
must be specificed with type="peptide".
|
This release of X! Tandem includes a
number of additions to the system API. These changes are mainly for
programmers, allowing for greater customization of searches. Some of these
features have been present in previous versions of X! Tandem, but have been
either undocumented or unsufficiently well tested.
This version of the X! Tandem code also merges the code for X! P3. The P3
executable can be compiled by uncommenting the preprocessor variable X_P3 in
"stdafx.h". Adding in the P3 code was done by creating several
classes that are extensions of the normal Tandem classes and using a small
number of preprocessor directives. These new classes have the prefix p3.
|
Added parameters:
-
output, log path
-
output, message
-
output, one sequence copy
-
output, sequence path
-
refine, modification mass
-
refine, sequence path
-
refine, tic percent
-
scoring, cyclic permuation
-
scoring, include reverse
-
spectrum, sequence batch size
|
This release of X! Tandem includes a
number of additions to the system API. These changes are mainly for
programmers, allowing for greater customization of searches. Some of these
features have been present in previous versions of X! Tandem, but have been
either undocumented or unsufficiently well tested.
This version of the X! Tandem code also merges the code for X! P3. The P3
executable can be compiled by uncommenting the preprocessor variable X_P3 in
"stdafx.h". Adding in the P3 code was done by creating several
classes that are extensions of the normal Tandem classes and using a small
number of preprocessor directives. These new classes have the prefix p3.
|
Added parameters:
-
output, log path
-
output, message
-
output, one sequence copy
-
output, sequence path
-
refine, modification mass
-
refine, sequence path
-
refine, tic percent
-
scoring, cyclic permutation
-
scoring, include reverse
-
spectrum, sequence batch size
|
This release of X! Tandem contains
several small fixes in response to error reports.
|
-
The C-ion mass calculation has been improved for electron-capture ion source
identifications, suggested by David Fenyo.
-
A problem relating to an include file that caused compilation difficulties for
some versions of GCC on some version of Linux has been fixed.
-
The calculation of parent ion mass difference has been improved, to provide
better consistency for very accurate (< 1 ppm) parent ion mass
determinations.
-
The "semi" cleavage state machine has been adjusted for better
performance.
|
This release of X! Tandem adds several
new features, as well as improving the XML standards compatibility of the
system.
|
-
An improved handling of hex encoded binary information in mzXML and mzData
files, for 64-bit processors, added by Steven Wiley.
-
Addition of testing for N-terminal glutamic acid cyclization, suggested by Oleg
Krohkin.
-
Addition of "semi" enzymatic cleavage (specific enzyme cleavage at
one end of a peptide and non-specific cleavage at the other), suggested by Matt
Monroe.
-
An improved system for detecting XML file types, suggested by Steven Wiley.
-
Support for variant methods of expressing parent ion charge in mzData v. 1.05,
added by Fredrik Levander.
|
This release of X! Tandem adds several
new features, as well as improving on some of the existing features. It
contains a number of engineering architectural changes meant to allow simpler
access to some of the key algorithms in the system.
|
-
An improved version of the state machine that lists all of the possible
potential modification states of a peptide sequence was written by Brendan
Maclean. This version is both more thorough and faster than the previous code.
-
The capability of using chemical average masses for fragment ion mass
calculations was added by Brendan Maclean.
-
A simplification of the mprocess class that allows for a "pluggin"
approach to adding new refinement modules was designed and implemented by Rob
Craig.
-
An improved routine for correcting for isotope peaks and multiple observations
of similar masses was made by Patrick Lacasse.
-
An additional state machine using cyclic peptide sequence permutation to
compensate for small sequence collections and for large mass peptides was
added. This feature is based on a suggestions made separately by Tom Blackwell
& David States and Patrick Lacasse.
-
An improved sorting method to improve the consistency of homologous sequence
assignments was added by Rob Craig.
|
The changes in this release are aimed at
increasing XML compliance and high accuracy mass calculation consistency.
|
-
The handlers for GAML spectra, taxonomy files and input parameter files have to
changed to using expat, rather than custom routines.
-
A more flexible mass calculation class has been added to improve molecular mass
consistency for high accuracy calculations.
-
The input spectrum file type detection method has been improved by adding the
possibility of forcing it to select one file type. This forcing is done using
the input parameter "spectrum, path type" parameters, which can have
the values: dta, pkl, mgf, gaml, mzxml or mzdata. If this parameter is missing
or of zero length, the normal file type detection scheme is used.
|
This version corrects an issue that could
arise in large MudPIT data sets with large numbers of redundant
identifications. The calculation of the protein expectation value in previous
versions was susceptible to floating point overflows when making this
calculation, resulting in unpredictable values.
|
This release adds the ability to process
mzxml and mzdata
file formats using eXpat library of
functions. Most of the changes in this release were initially made by Patrick
Lacasse (Université Laval, Dept. of Medicine, supported by Genome Québec) with
the final version and optimizations made by Brendan MacLean, from the Fred
Hutchinson Cancer Research Center. Also, the ability to define the amino acid
residue masses has been added allowing users to change the default masses when
doing N15 experiments for example.
|
-
New classes have been added to allow the processing of mzxml and mzdata file
formats.
Two of the new classes are publicly derived from loadspectrum, a custom class
specific to Tandem. Two others are publicly derived from the xml parser class
SAXSpectraHandler which is imported from the expat library of functions. The
xml parser classes use the expat functions exclusively to parse the input in
order to load it into the traditional Tandem spectra data members.
-
base64.cpp and base64.h have been added to allow b64_decode_mio() function
calls, which are needed to decode the spectra in mzxml and mzdata spectra
files.
-
Included in the src folder is the libexpat.lib which is required to compile new
versions of the executable on Windows. Linux and OSX machines should have the
required libraries as part of the core operating system.
-
A new function has been added to msequtilites that allows amino acid residue
masses to be defined by an xml input file. If the parameter 'protein, modified
residue mass file' is defined in the input.xml, the masses are taken from the
file defined by that parameter. An example of the format can be viewed
here.
|
This release contains modifications
necessary to insert new types of peptide scoring systems as well as to deal
effectively with high accuracy parent ion measurements, which are now available
in some types of mass spectrometers. Most of the changes in this release were
made by Brendan MacLean, from the Fred Hutchinson Cancer Research Center.
|
-
Several new classes have been added, to make the scoring system
"pluggable", i.e., it is now much easier to alter the scoring system
used, for the purposes of bioinformatics investigations. These changes are
mainly of interest to informatics professionals and they should not affect the
normal operation of the software for users.
-
The calculation of parent ion mass has been changed, taking more care as to the
mass of added groups and correctly accounting for electron masses.
-
Better statistical methods have been added to deal with the small number of
possible peptides generated from a list of protein sequences that have a very
high accurately determined parent ion mass.
|
This release adds in several features
that were originally scheduled to appear in the 2004.11.15 release, but which
were pushed back from the initial release. The 2004.11.15.2 version was not
generally released. |
-
Spectra that are interpreted as being caused by a prompt neutral loss now have
the prompt loss specified in the appropriate <aa> node in the output.
-
Correction of an issue with the OS X version that resulted in improper reading
of ".pro" sequence files. Initially, the ".pro" format was
to have both little endian and big endian versions, however this became too
confusing to maintain. The current plan is to only use the little endian format
and to compensate for this on-the-fly in the OS X version.
-
The maximum parent ion charge to be used can now be specified using the
"spectrum, maximum parent charge" parameter. This parameter has a
default value of 4. This change was made necessary because of high charge
states being called by some MS peak assignment software, which caused spurious
assignments.
-
The first round of refinement (finding partially cleaved peptides) has been
extended, so that it possible to repeat it with different sets of modifications
and motifs. These additional refinement rounds are specified by adding
parameters using the following format:
-
Round 1: "refine, potential modification mass"
"refine, potential modification motif"
-
Round 2: "refine, potential modification mass 1"
"refine, potential modification motif 1"
-
Round 3: "refine, potential modification mass 2"
"refine, potential modification motif 2"
This will continue until both of the next pair of parameters are either missing
or neither contain an ampersand (@).
|
This is a maintenance release, to correct
one issue identified in the 2004.09.01 release. |
-
An error that resulted in the incorrect interpretation of some motifs was
corrected.
-
Addition of spectrum prefiltering to remove repeated spectra from the initial
set of mass spectra. This feature compares spectra using a dot product
calculation and removes spectra that have vector representations that point in
the same direction. The most intense spectrum out of a set of repeated spectra
is kept and used for analysis. This type of filtering can remove up to 90% of
spectra from a MudPit-style run, making data analysis and interpretation
easier.
|
This is a maintenance release, to correct
one issue identified in the 2004.08.01 release. |
-
A possible floating-point overrun error that could lead to 0.0 expectation
values for high scoring peptides was detected from GPMDB submissions has been
corrected.
|
This is a maintenance release, to correct
several issues identified in the 2004.07.15 release. |
-
An error that reduced the score for triply-charged ions was corrected.
-
Quantitation information was added to the output XML file.
|
This is a major release of TANDEM,
sufficiently different from previous releases to merit a major revision number:
this release will be referred to as TANDEM 2. |
-
The memory management throughout the program has been analyzed and altered to
minimize the amount of memory used per spectrum. This effort has reduced the
amount of memory used in single threaded operation by as much as 60%: the
improvement for double threaded operation may be as much as 80%.
-
The threading model has been changed to allow for the use of multiple
processeors in the refinement process. TANDEM 1 separated work between the
threads by dividing up the sequences to search, so that each thread would only
search a subset of the sequences in a FASTA file. TANDEM 2 divides up the mass
spectra between threads, so that each thread searches a subset of the mass
spectra. This change makes it easier to divide up the refinement job, but means
that running more than one thread on a single processor will degrade the
performance of the software. For best performance, it is now important to keep
the number of threads and the number of processors the same.
-
The refinement process has been improved in accuracy by applying a logical
filter after each step of refinement. This means that once a refinement step is
completed, the new results obtained from the refinement are examined and it the
new results are not significantly better than those obtained from a simpler
search, they are discarded and the simpler results retained. This filtering
significantly reduces the complexity of analyzing results when there may be a
variety of similar modification patterns or point mutations that explain a
particular spectrum.
-
Validation of results using reversed sequence databases has been built-in to
the search process. This validation may be turned on or off, using the new
input parameter "scoring, include reverse" (values = yes|no).
This validation process tabulates the number of unique high probability hits
from the reversed sequence search and places them in the output file, along
with estimates of the false positive rate based on TANDEM's stochastic
histogramming technique and the estimate derived from the reversed sequence
process. NOTE: When this validation method is used, twice as many sequences
must be processed (both forward and reversed), which may require significantly
more processing time.
-
Numerous small optimizations have been made, particularly for loading and
reporting the results for very large collections of mass spectra.
|
This release adds three new
functionalities to X! TANDEM. These new functions make it possible to modify
protein sequences in new ways. |
-
The ability to specify modifications based on sequence motifs was added to both
the normal search and refinement steps. A comma separated list of motifs in
slightly modified PROSITE format can be used to only modify specific residues.
An example of this format is:
204@[N!]{P}[ST]{P} - which says a motif that has an N, followed by any residue
except P, followed by an S or a T, followed by any residue except P is
specified. Modify the residue in the group containing the exclamation point (in
this case the N) by adding 204 Da.
The peptides containing this motif are checked both with and without this
modification, so it is interpreted the same way that a "potential" modification
is interpretted. The rules for creating these motifs are:
-
Square brackets "[]" indicate any of the residues contained is
possible;
-
French brackets "{}" indicate that any of the residues contained is
forbidden;
-
A bare letter is interpreted as if it was in square brackets and can be
modified with an exclamation point, e.g. 16@[M!] is the same as 16M!;
-
An exclamation point indicates the position of modification;
-
The letter "X" indicates any residue;
and
-
Round brackets "()" indicate a count, e.g. "X(10)" means
ten X's in a row;
-
All other characters are ignored, e.g. 80@[ST!]PX[KR] is the same as
80@[ST!]-P-X-[KR].
-
The ability to specify prompt neutral losses for potential modifications
(including motifs) has been added. This neutral loss is specified by adding a
colon followed by the mass corresponding to the loss. For example:
80@S specified phosporylation without loss, while 80:-98@S specifies the
neutral loss of the phosphate group.
-
The ability to specify on a sequence by sequence level specific fixed
modifications of residues by the residue number has been added. This capability
cannot be exploited currently because of a lack of sequence lists that contain
this type of information. However, an appropriately translated version of a
database such as SWISSPROT could be used to provide this information.
|
This release is a maintenance release. It
should improve memory usage for very long sequence lists, but other wise should
be neutral. |
-
The mechanism for storing sequences in the mprocess class has been changed.
Previously, a copy of each protein sequence was stored with each peptide model
associated with a spectrum. Now, a master list of protein sequences is kept and
only a lookup number is stored with each peptide model. This change improves
memory management for very large pools of redundant proteins or very long lists
of spectra.
|
This release corrects an interpretation
problem introduced in the 2004.04.01 release. This problem results in an
overemphasis on peptides found in the refinement steps. |
-
The refinement processing was returned to its previous state, so that only the
best scoring peptides from the refinement process are reported.
|
This release is the result of an effort
to reduce memory usage by TANDEM. This effort has resulted in a 70% reduction
in memory usage, when using large data files. |
-
GAML spectra, such as those in output xml files, can now be used as input data.
-
The length of scoring histogram arrays have been altered to improve memory
usage.
-
Several instances of temporary copies of data have been removed and other data
structures cleared as soon as possible after use.
|
-
A behavior that resulted in the lost of the last character in sequences in some
FASTA files has been corrected.
-
A compatibility issue resulting from various choices for the size of the size_t
STL variable on unix platforms has been corrected, so that most unix platforms
should compile without modifying the linux version of the code.
|
-
No problems known at time of release
|
This release fixes a number of
compatibility issues and unexpected behaviors in Tandem and associated
formating files. |
-
A new state machine was added to perform N- and C- terminal partial
modifications. Previous versions used these modifications as complete
modifications only.
-
The optimization for the minimum number of residues considered was removed and
replaced with the constant value of 4. The prior optimization did not produce a
significant improvement in speed, but it did cause occasional problems with
large neutral losses.
-
The xslt and css files have been updated to conform more closely to
specification, making them compatible with the FireFox browser.
|
-
A behavior that allowed the occasional consideration of peptides with too many
missed cleavage sites was fixed.
-
A compatibility issue for starting threads on some unix platforms has been
corrected, so that most unix platforms should compile without modifying the
linux version of the code.
|
-
No problems known at time of release
|
This release introduces the capability of
detecting point mutations in protein sequences |
-
A new state machine was added to the mscore object to track point mutations.
-
A new report value was added so that a protein sequence would only recorded
once in the XML output, if desired.
|
-
Several problems associated with detected unsupported spectrum files types were
corrected. Previous version could hang indefinately if binary files were used.
-
A LINUX compilation problem with some flavours of LINUX that caused a failure
to create new threads was corrected.
|
-
No problems known at time of release
|
This release introduces a new statistical
model for multiple model correlations. |
-
A new statistical interpretation was added, to combine expectation values when
multiple models from the same sequence are found to be the best model in
different spectra. Using this model, expectation values for the collections of
models are now listed as the base-10 log of the expectation value, beside the
FASTA description line.
-
The way FASTA description lines are listed has been changed. Rather than
listing the descriptions in the same order they were encountered in the search,
they are now listed by length: the longest entry first. The logic to this
choice is that for the NCBI database nr, the oldest entry for a similar
sequence tends to be the longest and the first line of that entry tends to have
the best description of the protein's common name. Unfortunately, this is not
always true.
-
A new way of organizing the output was added. It can be accessed by setting the output,
sort results by parameter value to protein. Models corresponding
to a given sequence are grouped together, with the best set of models at the
top of the page.
|
-
FASTA file name problem fixed.
-
Multiple modification reporting problem fixed.
|
-
No problems known at time of release
|
This was the first release of a
multithreading version of tandem. |
-
A threading model was introduced that allows up to 16 threads. Each thread is
given a unique set of sequences to model and the results of all of the models
are summed at the end. The threads are started in a simple manner and then the
program waits for all threads to return before summing and correlating the
data.
-
A new class called "mspectrumcondition" was added to perform
any spectrum filtering necessary. The initial release had this functionality in
the "mscore" class.
-
An example XSLT and CSS stylesheet pair were added, that allow viewing the
output XML with a browser somewhat easier. They also are an example of how such
a pair of files can be constructed to create a GUI from the output XML.
|
-
The memory leak found in the 2003.05.01 release was found and fixed.
|
-
Some FASTA sequence list file names are not being recorded in the output XML.
-
Occasionally, multiple potential modifications are noted for the same residue.
-
The XSLT is not compatible with Internet Explorer 5.5. This is by design and it
will not change in later releases.
|
This was the first release of tandem. |
-
Soon after the release, a serious memory leak was reported, which became
evident when searching large data files. This leak reduced system performance
dramatically.
|
Copyright © 2004-2013, The Global Proteome Machine Organization
|
|