Menu
Home Explore People Places Arts History Plants & Animals Science Life & Culture Technology
On this page
Mass spectrometry data format
List article

Mass spectrometry is a scientific technique for measuring the mass-to-charge ratio of ions. It is often coupled to chromatographic techniques such as gas- or liquid chromatography and has found widespread adoption in the fields of analytical chemistry and biochemistry where it can be used to identify and characterize small molecules and proteins (proteomics). The large volume of data produced in a typical mass spectrometry experiment requires that computers be used for data storage and processing. Over the years, different manufacturers of mass spectrometers have developed various proprietary data formats for handling such data which makes it difficult for academic scientists to directly manipulate their data. To address this limitation, several open, XML-based data formats have recently been developed by the Trans-Proteomic Pipeline at the Institute for Systems Biology to facilitate data manipulation and innovation in the public sector. These data formats are described here.

Open formats

JCAMP-DX

This format was one of the earliest attempts to supply a standardized file format for data exchange in mass spectrometry. JCAMP-DX was initially developed for infrared spectrometry. JCAMP-DX is an ASCII based format and therefore not very compact even though it includes standards for file compression. JCAMP was officially released in 1988.2 Together with the American Society for Mass Spectrometry a JCAMP-DX format for mass spectrometry was developed with aim to preserve legacy data.3

ANDI-MS or netCDF

The Analytical Data Interchange Format for Mass Spectrometry is a format for exchanging data. Many mass spectrometry software packages can read or write ANDI files. ANDI is specified in the ASTM E1947 Standard.4 ANDI is based on netCDF which is a software tool library for writing and reading data files. ANDI was initially developed for chromatography-MS data and therefore was not used in the proteomics gold rush where new formats based on XML were developed.5

AnIML

AnIML is a joined effort of IUPAC and ASTM International to create an XML based standard that covers a wide variety of analytical techniques including mass spectrometry.6

mzData

mzData was the first attempt by the Proteomics Standards Initiative (PSI) from the Human Proteome Organization (HUPO) to create a standardized format for Mass Spectrometry data.7 This format is now deprecated, and replaced by mzML.8

mzXML

mzXML is a XML (eXtensible Markup Language) based common file format for proteomics mass spectrometric data.910 This format was developed at the Seattle Proteome Center/Institute for Systems Biology while the HUPO-PSI was trying to specify the standardized mzData format, and is still in use in the proteomics community.

YAFMS

Yet Another Format for Mass Spectrometry (YAFMS) is a suggestion to save data in four table relational server-less database schema with data extraction and appending being exercised using SQL queries.11

mzML

As two formats (mzData and mzXML) for representing the same information is an undesirable state, a joint effort was set by HUPO-PSI, the SPC/ISB and instrument vendors to create a unified standard borrowing the best aspects of both mzData and mzXML, and intended to replace them. Originally called dataXML, it was officially announced as mzML.12 The first specification was published in June 2008.13 This format was officially released at the 2008 American Society for Mass Spectrometry Meeting, and is since then relatively stable with very few updates. On 1 June 2009, mzML 1.1.0 was released. There are no planned further changes as of 2013.

mzAPI

Instead of defining new file formats and writing converters for proprietary vendor formats a group of scientists proposed to define a common application program interface to shift the burden of standards compliance to the instrument manufacturers' existing data access libraries.14

mz5

The mz5 format addresses the performance problems of the previous XML based formats. It uses the mzML ontology, but saves the data using the HDF5 backend for reduced storage space requirements and improved read/write speed.15

imzML

The imzML standard was proposed to exchange data from mass spectrometry imaging in a standardized XML file based on the mzML ontology. It splits experimental data into XML and spectral data in a binary file. Both files are linked by a universally unique identifier.16

mzDB

mzDB saves data in an SQLite database to save on storage space and improve access times as the data points can be queried from a relational database.17

Toffee

Toffee is an open lossless file format for data-independent acquisition mass spectrometry. It leverages HDF5 and aims to achieve file sizes similar to those from the proprietary and closed vendor formats.18

mzMLb

mzMLb is another take on using a HDF5 backend for performant raw data saving. It, however, preserves the mzML XML data structure and stays compliant to the existing standard.19

Allotrope

The Allotrope Foundation curates a HDF5 and Triplestore based file format named Allotrope Data Format (ADF) and a flat JSON representation ASM short for Allotrope Simple Model. Both are based on the Allotrope Foundation Ontologies (AFO) and contain schemas for mass spectrometry and chromatography coupled with MS detectors.20

Proprietary formats

Below is a table of different file format extensions.

CompanyExtensionFile type
ACD/Labs*.spectrusHD5/JSON based file formats for the Spectrus platform21
AgilentBruker.D (folder)Agilent MassHunter, Agilent ChemStation, or Bruker BAF/YEP/TDF data format
Agilent/Bruker.YEPinstrument data format
Agilent.AEV, .ASRASCII Report format (for Analytical Studio Reviewer)
Bruker.BAFinstrument data format
Bruker.FIDinstrument data format
Bruker.TDFtimsTOF instrument data format
ABI/Sciex.WIFF, .WIFF2instrument data format
ABI/Sciex.t2d4700 and 4800 file format
ABI/Sciex.datVoyager-DE series file format
Waters.PKLMassLynx peak list format
ThermoPerkinElmer.RAW*Thermo XcaliburPerkinElmer TurboMass
Micromass**/Waters.RAW* (folder)Waters MassLynx
ChromtechFinnigan***VG.DATFinnigan ITDS file format; MAT95 instrument data formatMassLab data format
Finnigan***.MSITS40 instrument data format
Shimadzu.QGDGCMSSolution format
Shimadzu.qgdinstrument data format
Shimadzu.lcdQQQ/QTOF instrument data format
Shimadzu.spclibrary data format
Bruker/Varian.SMSinstrument data format
Bruker/Varian.XMSinstrument data format
ION-TOF.itmraw measurement data
ION-TOF.itaanalysis data
Physical Electronics/ULVAC-PHI.raw*raw measurement data
Physical Electronics/ULVAC-PHI.tdcspectrum data

(*) Note that the RAW formats of each vendor are not interchangeable; software from one cannot handle the RAW files from another. (**) Micromass was acquired by Waters in 1997 (***) Finnigan is a division of Thermo

Software

Viewers

There are several viewers for mzXML, mzML and mzData. These viewers are of two types: Free Open Source Software (FOSS) or proprietary.

In the FOSS viewer category, one can find MZmine,22 mineXpert2 (mzXML, mzML, native timsTOF, xy, MGF, BafAscii)23 MS-Spectre,24 TOPPView (mzXML, mzML and mzData),25 Spectra Viewer,26 SeeMS,27 msInspect,28 jmzML.29

In the proprietary category, one can find PEAKS,30 Insilicos,31 Mascot Distiller,32 Elsci Peaksel.33

There is a viewer for ITA images.34 ITA and ITM images can be parsed with the pySPM python library.35

Converters

Known converters for mzData to mzXML:

Hermes: A Java "mzData, mzXML, mzML" converter to all directions: publicly available, runs with a graphical user interface, by the Institute of Molecular Systems Biology, ETH Zurich3637 FileConverter: A command line tool that converts to/from various mass spectrometry formats, part of TOPP38

Known converters for mzXML:

The Institute for Systems Biology maintains a list of converters39

Known converters for mzML:

msConvert: A command line tool converting to/from various mass spectrometry formats. A GUI is also available for Windows users.40 ReAdW:41 The Institute for Systems Biology command line converter for Thermo RAW files, part of the TransProteomicPipeline.42 The latest update of this tool was made in September 2009. Users are now redirected by the TPP development team to use the msConvert software (see above). FileConverter: A command line tool that converts to/from various mass spectrometry formats, part of TOPP43

Converters for proprietary formats:

msConvert: A command line tool converting to/from various mass spectrometry formats including multiple proprietary formats. A GUI is also available for Windows users.44 CompassXport, Bruker's free tool generating mzXML (and now mzData) files for many of their native file formats (.baf).45 MASSTransit, a software to change data between proprietary formats, by Palisade Corporation and distributed by Scientific Instrument Services, Inc46 and PerkinElmer.47 Purchased from Palisade by John Wiley and Sons in 2020 and incorporated into KnowItAll Spectroscopy software.(list of file formats supported). Aston,48 native support for several Agilent Chemstation, Agilent Masshunter and Thermo Isodat file formats unfinnigan,49 native support for Finnigan (*.RAW) file formats OpenChrom, an open source software with support to convert various native file formats including its own open .ocb format to store chromatograms, peaks and identification results50

Currently available converters are :

MassWolf, for Micromass MassLynx .Raw format mzStar, for SCIEX/ABI SCIEX/ABI Analyst format wiff2dta51 for SCIEX/ABI SCIEX/ABI Analyst format to mzXML, DTA, MGF and PMF

See also

References

  1. Deutsch EW (December 2012). "File formats commonly used in mass spectrometry proteomics". Molecular & Cellular Proteomics. 11 (12): 1612–21. doi:10.1074/mcp.R112.019695. PMC 3518119. PMID 22956731. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3518119

  2. McDonald, Robert S.; Wilks, Paul A. (1988). "JCAMP-DX: A Standard Form for Exchange of Infrared Spectra in Computer Readable Form" (PDF). Applied Spectroscopy. 42 (1): 151–162. Bibcode:1988ApSpe..42..151M. doi:10.1366/0003702884428734. http://old.iupac.org/jcamp/protocols/dxir01.pdf

  3. Lampen P, Hillig H, Davies AN, Linscheid M (December 1994). "JCAMP-DX for mass spectrometry". Applied Spectroscopy. 48 (12): 1545–52. Bibcode:1994ApSpe..48.1545L. doi:10.1366/0003702944027840. S2CID 96773027. https://www.osapublishing.org/as/abstract.cfm?uri=as-48-12-1545

  4. ASTM E1947 – 98(2009) Standard Specification for Analytical Data Interchange Protocol for Chromatographic Data http://www.astm.org/Standards/E1947.htm

  5. Mayer G, Jones AR, Binz PA, Deutsch EW, Orchard S, Montecchi-Palazzi L, et al. (January 2014). "Controlled vocabularies and ontologies in proteomics: overview, principles and practice". Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics. 1844 (1 Pt A): 98–107. doi:10.1016/j.bbapap.2013.02.017. PMC 3898906. PMID 23429179. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3898906

  6. Davies, Tony (2007). "Herding AnIMLs (no, it's not a spelling mistake): Update on the IUPAC and ASTM Collaboration on Analytical Data Standards". Chemistry International. 29 (6). https://old.iupac.org/publications/ci/2007/2906/pp1_animls.html

  7. Orchard S, Montechi-Palazzi L, Deutsch EW, Binz PA, Jones AR, Paton N, et al. (October 2007). "Five years of progress in the Standardization of Proteomics Data 4th Annual Spring Workshop of the HUPO-Proteomics Standards Initiative April 23-25, 2007 Ecole Nationale Supérieure (ENS), Lyon, France". Proteomics. 7 (19): 3436–40. doi:10.1002/pmic.200700658. PMID 17907277. S2CID 22837325. /wiki/Doi_(identifier)

  8. "mzData". HUPO-PSI. Archived from the original on 7 July 2018. Retrieved 26 April 2021. https://web.archive.org/web/20180707193136/https://www.psidev.info/mzdata

  9. Pedrioli PG, Eng JK, Hubley R, Vogelzang M, Deutsch EW, Raught B, et al. (November 2004). "A common open representation of mass spectrometry data and its application to proteomics research". Nature Biotechnology. 22 (11): 1459–66. doi:10.1038/nbt1031. PMID 15529173. S2CID 25734712. /wiki/Doi_(identifier)

  10. Lin SM, Zhu L, Winter AQ, Sasinowski M, Kibbe WA (December 2005). "What is mzXML good for?". Expert Review of Proteomics. 2 (6): 839–45. doi:10.1586/14789450.2.6.839. PMID 16307524. S2CID 24914725. /wiki/Doi_(identifier)

  11. Shah AR, Davidson J, Monroe ME, Mayampurath AM, Danielson WF, Shi Y, et al. (October 2010). "An efficient data format for mass spectrometry-based proteomics". Journal of the American Society for Mass Spectrometry. 21 (10): 1784–8. Bibcode:2010JASMS..21.1784S. doi:10.1016/j.jasms.2010.06.014. PMID 20674389. https://doi.org/10.1016%2Fj.jasms.2010.06.014

  12. "mzML". HUPO-Proteomics Standards Initiative. Retrieved 19 April 2013. http://www.psidev.info/mzml

  13. Deutsch E (July 2008). "mzML: a single, unifying data format for mass spectrometer output". Proteomics. 8 (14): 2776–7. doi:10.1002/pmic.200890049. PMID 18655045. S2CID 28297899. https://doi.org/10.1002%2Fpmic.200890049

  14. Askenazi M, Parikh JR, Marto JA (April 2009). "mzAPI: a new strategy for efficiently sharing mass spectrometry data". Nature Methods. 6 (4): 240–1. doi:10.1038/nmeth0409-240. PMC 2691659. PMID 19333238. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2691659

  15. Wilhelm M, Kirchner M, Steen JA, Steen H (January 2012). "mz5: space- and time-efficient storage of mass spectrometry data sets". Molecular & Cellular Proteomics. 11 (1): O111.011379. doi:10.1074/mcp.O111.011379. PMC 3270111. PMID 21960719. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3270111

  16. Schramm T, Hester Z, Klinkert I, Both JP, Heeren RM, Brunelle A, et al. (August 2012). "imzML--a common data format for the flexible exchange and processing of mass spectrometry imaging data" (PDF). Journal of Proteomics. 75 (16): 5106–5110. doi:10.1016/j.jprot.2012.07.026. PMID 22842151. S2CID 25970597. https://hal.archives-ouvertes.fr/hal-00741330/file/imzML%20-%20JoP-%20submitted%20revision%20JPROT-D-12-00290R11%20%282%29.pdf

  17. Bouyssié D, Dubois M, Nasso S, Gonzalez de Peredo A, Burlet-Schiltz O, Aebersold R, Monsarrat B (March 2015). "mzDB: a file format using multiple indexing strategies for the efficient analysis of large LC-MS/MS and SWATH-MS data sets". Molecular & Cellular Proteomics. 14 (3): 771–81. doi:10.1074/mcp.O114.039115. PMC 4349994. PMID 25505153. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4349994

  18. Tully B (June 2020). "Toffee – a highly efficient, lossless file format for DIA-MS". Scientific Reports. 10 (1): 8939. Bibcode:2020NatSR..10.8939T. doi:10.1038/s41598-020-65015-y. PMC 7265431. PMID 32488104. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7265431

  19. Bhamber RS, Jankevics A, Deutsch EW, Jones AR, Dowsey AW (January 2021). "mzMLb: A Future-Proof Raw Mass Spectrometry Data Format Based on Standards-Compliant mzML and Optimized for Speed and Storage Requirements". Journal of Proteome Research. 20 (1): 172–183. doi:10.1021/acs.jproteome.0c00192. PMC 7871438. PMID 32864978. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7871438

  20. Rauh, David; Blankenburg, Claudia; Fischer, Tillmann G.; Jung, Nicole; Kuhn, Stefan; Schatzschneider, Ulrich; Schulze, Tobias; Neumann, Steffen (27 June 2022). "Data format standards in analytical chemistry". Pure and Applied Chemistry. 94 (6): 725–736. doi:10.1515/pac-2021-3101. hdl:2086/22122. /wiki/Doi_(identifier)

  21. "Standardization of Analytical Data: Best Practices". ACD/Labs. Retrieved 27 April 2025. https://www.acdlabs.com/resource/standardization-of-analytical-data-best-practices/

  22. Katajamaa, Mikko; Miettinen, Jarkko; Oresic, Matej (1 March 2006). "MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data". Bioinformatics. 22 (5): 634–636. doi:10.1093/bioinformatics/btk039. ISSN 1367-4803. PMID 16403790. https://pubmed.ncbi.nlm.nih.gov/16403790

  23. Langella, Olivier; Rusconi, Filippo (7 April 2021). "mineXpert2: Full-Depth Visualization and Exploration of MSn Mass Spectrometry Data". Journal of the American Society for Mass Spectrometry. 32 (4): 1138–1141. doi:10.1021/jasms.0c00402. ISSN 1879-1123. PMID 33683899. https://pubmed.ncbi.nlm.nih.gov/33683899

  24. "MS-Spectre website". Ms-spectre.sourceforge.net. Retrieved 29 November 2011. http://ms-spectre.sourceforge.net

  25. Sturm, Marc; Kohlbacher, Oliver (6 July 2009). "TOPPView: An Open-Source Viewer for Mass Spectrometry Data". Journal of Proteome Research. 8 (7): 3760–3763. doi:10.1021/pr900171m. ISSN 1535-3893. PMID 19425593. /wiki/Doi_(identifier)

  26. "An open source viewer developed under academic projects". Staff.icar.cnr.it. Retrieved 29 November 2011. http://staff.icar.cnr.it/cannataro/projects/SpectraViewer/

  27. "An open source viewer developed by Matt Chambers at Vanderbilt". Proteowizard.sourceforge.net. Retrieved 29 November 2011. http://proteowizard.sourceforge.net

  28. "An open source viewer developed by at the Fred Hutchinson Cancer Center". Proteomics.fhcrc.org. Retrieved 29 November 2011. http://proteomics.fhcrc.org/CPL/msinspect.html

  29. Côté, Richard G.; Reisinger, Florian; Martens, Lennart (2010). "jmzML, an open-source Java API for mzML, the PSI standard for MS data". Proteomics. 10 (7): 1332–1335. doi:10.1002/pmic.200900719. ISSN 1615-9861. PMID 20127693. /wiki/Doi_(identifier)

  30. "BSI: PEAKS website". Bioinfor.com. Retrieved 29 November 2011. http://www.bioinfor.com/peaks

  31. "Insilicos website". Archived from the original on 20 December 2014. Retrieved 28 March 2020. https://web.archive.org/web/20141220185618/http://insilicos.com/

  32. Matrix Science Limited. "Commercial software with free viewer mode for mzXML and many proprietary formats". Matrixscience.com. Retrieved 29 November 2011. http://www.matrixscience.com/distiller.html

  33. "Peaksel - software to read and process proprietary and open HPLC formats". https://elsci.io/peaksel/

  34. "ITAviewer online"."ITAviewer source". GitHub. 9 November 2017. https://scholi.github.io/ITAviewer

  35. "pySPM website". GitHub. 17 June 2022. https://github.com/scholi/pySPM

  36. Hermes Archived 3 March 2016 at the Wayback Machine http://blogs.ethz.ch/andreas/2008/05/09/mzdata-to-mzxml-converter/

  37. "Hermes website". Icecoffee.ch. Retrieved 29 November 2011. http://www.icecoffee.ch/hermes/converter.html

  38. Kohlbacher, Oliver; Reinert, Knut; Gröpl, Clemens; Lange, Eva; Pfeifer, Nico; Schulz-Trieglaff, Ole; Sturm, Marc (15 January 2007). "TOPP—the OpenMS proteomics pipeline". Bioinformatics. 23 (2): e191 – e197. doi:10.1093/bioinformatics/btl299. ISSN 1367-4803. PMID 17237091. /wiki/Doi_(identifier)

  39. "mzXML". Retrieved 30 June 2008. http://tools.proteomecenter.org/wiki/index.php?title=Formats:mzXML

  40. Kessner, Darren; Chambers, Matt; Burke, Robert; Agus, David; Mallick, Parag (1 November 2008). "ProteoWizard: open source software for rapid proteomics tools development". Bioinformatics. 24 (21): 2534–2536. doi:10.1093/bioinformatics/btn323. ISSN 1367-4803. PMC 2732273. PMID 18606607. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2732273

  41. "ReAdW". Tools.proteomecenter.org. Retrieved 29 November 2011. http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW

  42. "TransProteomicPipeline". Tools.proteomecenter.org. 25 May 2011. Retrieved 29 November 2011. http://tools.proteomecenter.org/wiki/index.php?title=Software:TPP

  43. Kohlbacher, Oliver; Reinert, Knut; Gröpl, Clemens; Lange, Eva; Pfeifer, Nico; Schulz-Trieglaff, Ole; Sturm, Marc (15 January 2007). "TOPP—the OpenMS proteomics pipeline". Bioinformatics. 23 (2): e191 – e197. doi:10.1093/bioinformatics/btl299. ISSN 1367-4803. PMID 17237091. /wiki/Doi_(identifier)

  44. Kohlbacher, Oliver; Reinert, Knut; Gröpl, Clemens; Lange, Eva; Pfeifer, Nico; Schulz-Trieglaff, Ole; Sturm, Marc (15 January 2007). "TOPP—the OpenMS proteomics pipeline". Bioinformatics. 23 (2): e191 – e197. doi:10.1093/bioinformatics/btl299. ISSN 1367-4803. PMID 17237091. /wiki/Doi_(identifier)

  45. Guillaume, Erny (29 February 2016). "Converting Bruker files to the mzML format using CompassXport". finnee blog. Retrieved 11 April 2025. https://finneeblog.wordpress.com/2016/02/29/converting-bruker-files-to-the-mzml-format-using-compassxport/

  46. MASSTransit by Palisade Archived 9 May 2008 at the Wayback Machine http://www.sisweb.com/software/masstransit.htm

  47. "Gas Chromatography (GC)". PerkinElmer. Retrieved 29 November 2011. http://www.perkinelmer.com/gc

  48. aston – Open source chromatography and mass spectrometry software – Google Project Hosting https://code.google.com/p/aston

  49. unfinnigan – Painless extraction of mass spectra from Thermo "raw" files – Google Project Hosting https://code.google.com/p/unfinnigan

  50. Dąbrowski Ł (7 August 2015). "Review of free data processing software for chromatography". Mediterranean Journal of Chemistry. 4 (4): 193–200. doi:10.13171/mjc.4.4.2015.15.09.16.35/dabrowski. https://doi.org/10.13171%2Fmjc.4.4.2015.15.09.16.35%2Fdabrowski

  51. wiff2dta at sourceforge https://sourceforge.net/projects/protms/files/wiff2dta/