In order to avoid generating different InChIs for tautomeric structures, before generating the InChI, an input chemical structure is normalized to reduce it to its so-called core parent structure. This may involve changing bond orders, rearranging formal charges and possibly adding and removing protons. Different input structures may give the same result; for example, acetic acid and acetate would both give the same core parent structure, that of acetic acid. A core parent structure may be disconnected, consisting of more than one component, in which case the sublayers in the InChI usually consist of sublayers for each component, separated by semicolons (periods for the chemical formula sublayer). One way this can happen is that all metal atoms are disconnected during normalization; so, for example, the InChI for tetraethyllead will have five components, one for lead and four for the ethyl groups.9
The first, main, layer of the InChI refers to this core parent structure, giving its chemical formula, non-hydrogen connectivity without bond order (/c sublayer) and hydrogen connectivity (/h sublayer.) The /q portion of the charge layer gives its charge, and the /p portion of the charge layer tells how many protons (hydrogen ions) must be added to or removed from it to regenerate the original structure. If present, the stereochemical layer, with sublayers b, /t, /m and /s, gives stereochemical information, and the isotopic layer /i (which may contain sublayers /h, /b, /t, /m and /s) gives isotopic information. These are the only layers which can occur in a standard InChI.10
If the user wants to specify an exact tautomer, a fixed hydrogen layer /f can be appended, which may contain various additional sublayers; this cannot be done in standard InChI though, so different tautomers will have the same standard InChI (for example, alanine will give the same standard InChI whether input in a neutral or a zwitterionic form.) Finally, a nonstandard reconnected /r layer can be added, which effectively gives a new InChI generated without breaking bonds to metal atoms. This may contain various sublayers, including /f.11
Every InChI starts with the string "InChI=" followed by the version number, currently 1. If the InChI is standard, this is followed by the letter S for standard InChIs, which is a fully standardized InChI flavor maintaining the same level of attention to structure details and the same conventions for drawing perception. The remaining information is structured as a sequence of layers and sub-layers, with each layer providing one specific type of information. The layers and sub-layers are separated by the delimiter "/" and start with a characteristic prefix letter (except for the chemical formula sub-layer of the main layer). The six layers with important sublayers are:
The delimiter-prefix format has the advantage that a user can easily use a wildcard search to find identifiers that match only in certain layers.
The condensed, 27 character InChIKey is a hashed version of the full InChI (using the SHA-256 algorithm), designed to allow for easy web searches of chemical compounds.15 The standard InChIKey is the hashed counterpart of standard InChI. Most chemical structures on the Web up to 2007 have been represented as GIF files, which are not searchable for chemical content. The full InChI turned out to be too lengthy for easy searching, and therefore the InChIKey was developed. There is a very small, but nonzero chance of two different molecules having the same InChIKey, but the probability for duplication of only the first 14 characters has been estimated as only one duplication in 75 databases each containing one billion unique structures. With all databases currently having below 50 million structures, such duplication appears unlikely at present. A recent study more extensively studies the collision rate finding that the experimental collision rate is in agreement with the theoretical expectations.16
The InChIKey currently consists of three parts separated by hyphens, of 14, 10 and one character(s), respectively, like XXXXXXXXXXXXXX-YYYYYYYYFV-P. The first 14 characters result from a SHA-256 hash of the connectivity information (the main layer and /q sublayer of the charge layer) of the InChI. The second part consists of 8 characters resulting from a hash of the remaining layers of the InChI, a single character indicating the kind of InChIKey (S for standard and N for nonstandard), and a character indicating the version of InChI used (currently A for version 1). Finally, the single character at the end indicates the protonation of the core parent structure, corresponding to the /p sublayer of the charge layer (N for no protonation, O, P, ... if protons should be added and M, L, ... if they should be removed.)1718
Morphine has the structure shown on the right. The standard InChI for morphine is InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10-11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1 and the standard InChIKey for morphine is BQJCRHHNABKAKU-KBQPJGBKSA-N.19
As the InChI cannot be reconstructed from the InChIKey, an InChIKey always needs to be linked to the original InChI to get back to the original structure. InChI Resolvers act as a lookup service to make these links, and prototype services are available from National Cancer Institute, the UniChem service at the European Bioinformatics Institute, and PubChem. ChemSpider has had a resolver until July 2015 when it was decommissioned.20
The format was originally called IChI (IUPAC Chemical Identifier), then renamed in July 2004 to INChI (IUPAC-NIST Chemical Identifier), and renamed again in November 2004 to InChI (IUPAC International Chemical Identifier), a trademark of IUPAC.
Scientific direction of the InChI standard is carried out by the IUPAC Division VIII Subcommittee, and funding of subgroups investigating and defining the expansion of the standard is carried out by both IUPAC and the InChI Trust. The InChI Trust funds the development, testing and documentation of the InChI. Current extensions are being defined to handle polymers and mixtures, Markush structures, reactions21 and organometallics, and once accepted by the Division VIII Subcommittee will be added to the algorithm.
The InChI Trust has developed software to generate the InChI, InChIKey and other identifiers. The release history of this software follows.22
The InChI has been adopted by many larger and smaller databases, including ChemSpider, ChEMBL, Golm Metabolome Database, and PubChem.25 However, the adoption is not straightforward, and many databases show a discrepancy between the chemical structures and the InChI they contain, which is a problem for linking databases.26
"What on Earth is InChI?". IUPAC 100. Retrieved 10 May 2024. https://iupac.org/100/stories/what-on-earth-is-inchi/ ↩
"The InChI Trust and IUPAC". InChI Trust. Retrieved August 22, 2022. https://www.inchi-trust.org/iupac/ ↩
Heller, S.R.; McNaught, A.; Pletnev, I.; Stein, S.; Tchekhovskoi, D. (2015). "InChI, the IUPAC International Chemical Identifier". Journal of Cheminformatics. 7: 23. doi:10.1186/s13321-015-0068-4. PMC 4486400. PMID 26136848. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4486400 ↩
"The IUPAC International Chemical Identifier (InChI)". IUPAC. 5 September 2007. Archived from the original on October 30, 2007. Retrieved 2007-09-18. https://web.archive.org/web/20071030202540/http://www.iupac.org/inchi/release102.html ↩
E.L. Willighagen (17 September 2011). "InChIKey collision: the DIY copy/pastables". Retrieved 2012-11-06. http://chem-bla-ics.blogspot.nl/2011/09/inchikey-collision-diy-copypastables.html ↩
Goodman, Jonathan M.; Pletnev, Igor; Thiessen, Paul; Bolton, Evan; Heller, Stephen R. (December 2021). "InChI version 1.06: now more than 99.99% reliable". Journal of Cheminformatics. 13 (1): 40. doi:10.1186/s13321-021-00517-z. PMC 8147039. PMID 34030732. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8147039 ↩
McNaught, Alan (2006). "The IUPAC International Chemical Identifier:InChl". Chemistry International. Vol. 28, no. 6. IUPAC. Retrieved 2007-09-18. http://www.iupac.org/publications/ci/2006/2806/4_tools.html ↩
"IUPAC/InChI-Trust Licence for the International Chemical Identifier (InChI) Software" (PDF). IUPAC/InChI-Trust. 2020. Retrieved 2022-08-09. https://www.inchi-trust.org/wp/download/106/LICENCE.pdf ↩
Heller, Stephen R.; McNaught, Alan; Pletnev, Igor; Stein, Stephen; Tchekhovskoi, Dmitrii (2015). "InChI, the IUPAC International Chemical Identifier". Journal of Cheminformatics. 7: 23. doi:10.1186/s13321-015-0068-4. PMC 4486400. PMID 26136848. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4486400 ↩
Pletnev, I.; Erin, A.; McNaught, A.; Blinov, K.; Tchekhovskoi, D.; Heller, S. (2012). "InChIKey collision resistance: An experimental testing". Journal of Cheminformatics. 4 (1): 39. doi:10.1186/1758-2946-4-39. PMC 3558395. PMID 23256896. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3558395 ↩
"Technical FAQ - InChI Trust". inchi-trust.org. Retrieved 2021-01-08. http://www.inchi-trust.org/technical-faq/#13.1 ↩
"InChI=1/C17H19NO3/c1-18..." Chemspider. Retrieved 2007-09-18. http://www.chemspider.com/RecordView.aspx?id=5760 ↩
InChI Resolver, 27 July 2015 http://www.chemspider.com/InChiResolverDecommissioned.aspx ↩
Grethe, Guenter; Blanke, Gerd; Kraut, Hans; Goodman, Jonathan M. (9 May 2018). "International chemical identifier for reactions (RInChI)". Journal of Cheminformatics. 10 (1): 45. doi:10.1186/s13321-018-0277-8. PMC 4015173. PMID 24152584. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015173 ↩
Downloads of InChI Software, accessed Jan. 8, 2021. https://www.inchi-trust.org/downloads/ ↩
Warr, W.A. (2015). "Many InChIs and quite some feat". Journal of Computer-Aided Molecular Design. 29 (8): 681–694. Bibcode:2015JCAMD..29..681W. doi:10.1007/s10822-015-9854-3. PMID 26081259. S2CID 31786997. /wiki/Bibcode_(identifier) ↩
Akhondi, S. A.; Kors, J. A.; Muresan, S. (2012). "Consistency of systematic chemical identifiers within and between small-molecule databases". Journal of Cheminformatics. 4 (1): 35. doi:10.1186/1758-2946-4-35. PMC 3539895. PMID 23237381. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3539895 ↩