UBJSON is a proposed successor to BSON, BJSON and others. UBJSON has the following goals:
UBJSON data can be either a value or a container.
UBJSON uses a single binary tuple to represent all JSON value types:1
Each element in the tuple is defined as:
The type is a 1-byte ASCII character used to indicate the type of the data following it. The ASCII characters were chosen to make manually walking and debugging data stored in the UBJSON format as easy as possible (e.g. making the data relatively readable in a hex editor). Types are available for the five JSON value types. There is also a no-op type used for stream keep-alive.
High-precision numbers are represented as an arbitrarily long, UTF-8 string-encoded numeric value.
The length is an integer number (e.g. uint8, or int64) encoding the size of the data payload in bytes. It is used for strings, high-precision numbers and optionally containers. They are omitted for other types.
Length is encoded following the same convention as integers, thus including its own type. For example, the string hello is encoded as S,U,0x05,h,e,l,l,o.
A sequence of bytes representing the actual binary data for this type of value. All numbers are in big-endian order.
Similarly to JSON, UBJSON defines two container types: array and object.2
Arrays are ordered sequences of elements, represented as a [ followed by zero or more elements of value and container type and a trailing ].
Objects are labeled sets of elements, represented as a { followed by zero or more key-value pairs and a trailing }. Each key is a string with the S character omitted, and each "value" can be any element of value or container type.
Alternatively, arrays and objects may indicate the number of elements they contain as # followed by an integer number before their first element, in which case the trailing ] or } is omitted. Additionally, if all elements have the same type, the types can be omitted and replaced by a single $ followed by the type, in which case the element count must follow immediately. For example, the array ["a","b","c"] may be represented as [,$,C,#,U,0x03,a,b,c.
While there is no explicit binary type, binary data is stored in a strongly typed array of uint8 values. This ensures binary efficiency while maintaining compatibility with JSON, even though JSON has no direct support for binary data.34
The MIME type 'application/ubjson' is recommended, as is the file extension '.ubj' when stored in a file-system.5
"Value Types | Universal Binary JSON Specification". Retrieved 20 July 2019. http://ubjson.org/type-reference/value-types/ ↩
"Container Types | Universal Binary JSON Specification". Retrieved 20 July 2019. http://ubjson.org/type-reference/container-types/ ↩
"Binary Data | Universal Binary JSON Specification". Retrieved 20 July 2019. http://ubjson.org/type-reference/binary-data/ ↩
"UBJSON (.ubj)—Wolfram Language Documentation". Retrieved 20 July 2019. https://reference.wolfram.com/language/ref/format/UBJSON.html ↩
"UBJSON Storage Format". Retrieved 20 July 2019. https://docs.teradata.com/reader/C8cVEJ54PO4~YXWXeXGvsA/b9kd0QOTMB3uZp9z5QT2aw ↩