JSML is primarily an XML text format used by Java applications to annotate text input to speech synthesizers. Elements of JSML provide speech synthesizer with detailed information on how to speak text in a naturalized fashion.
JSML defines elements which define a document's structure, the pronunciation of certain words and phrases, features of speech such as emphasis and intonation, etc. JSML is designed in the Java fashion to be simple to learn and use, to be portable across different synthesizers and computing platforms, and although designed for use within is also applicable to a wide range of languages.
An example of how JSML is defined is set out below:
JSML built on the original proposal for a speech synthesis markup language (SSML), a set of general markup tags that could be used across different text-to-speech (TTS) systems.2
The W3C developed a standard markup language called VoiceXML, also often referred to as SSML, which is based on JSML but is not identical to it.3 This became a formal W3C recommendation in 2004.
Taylor, Paul (2009). Text-to-Speech Synthesis. Cambridge University Press. pp. 68–69. ISBN 9780521899277. 9780521899277 ↩