The COMMON file format project has the goal of determining the true information content of a set
of tandem mass spectrum data files and to define a simple compression scheme that takes advantage of that
knowledge. The software (named COMMON) that is made available both through the GPM SVN repository and the GPM
ftp site.
The current version of this compression scheme (CMN 1.0) uses the X! series peak processing system and a simple
differential compression scheme. The most recent versions of the X! series search engines (2007.07.07.2) have been updated
to read CMN 1.0 files directly. The file compression ratios vary, depending on the input file type, however
a simple example is as follows:
|
Orbitrap RAW file |
mzXML file |
CMN 1.0 file |
File size |
222.5 MBytes |
530.1 MBytes |
1.8 MBytes |
bytes/spectrum |
19,189 |
45,727 |
155 |
Note: the mzXML file was generated from the RAW file using the reAdw software made available
by the Sashimi project.
Binary executable versions of the compression software are available for Windows, Linux and OS X. The
compilable code for the three platforms is also available. To perform a compression on a Windows platform, the simplest
method is to place the binary executable in a suitable place and type the following on using the console:
>common FILENAME
where FILENAME is the name of the file you wish to compress (note: for Linux or OS X use "./common FILENAME"). This file containing the MS/MS spectrum
information can be in any one of the following formats:
- Mascot Generic Format;
- mzXML;
- mzData;
- DTA (single file or concatenated);
- PKL; or
- CMN 1.0.
The result will be a compressed file named "FILENAME.cmn". The utility has several input flags
that can be used to control the output. A list of these flags and a brief description of their use can
be obtained by simply running the program with no command line parameters (or using the flag -h). To extract
the information from a CMN file back into a simple ASCII format, such as Mascot Generic Format, simply
typing:
>common FILENAME.cmn -dmgf -oFILENAME.mgf
will generate a file named "FILENAME.mgf" that can be analyzed by other search engines.
|