The original IBM OS/360 Object File Format was developed in 1964 for the new IBM System/360 mainframe computer. The format was also used by makers of plug compatible and workalike mainframes, including the Univac 90/60, 90/70 and 90/80 and Fujitsu B2800. The format was expanded to add symbolic records and expanded information about modules, plus support for procedures and functions with names longer than 8 characters. While this helped, it did not provide for the enhanced information necessary for today's more complicated programming languages and more advanced features such as objects, properties and methods, Unicode support, and virtual methods.
The GOFF object file format was developed by IBM approximately in 1995 as a means to overcome these problems.2 The earliest mention of this format was in the introductory information about the new High Level Assembler.3 GOFF supports embedded debugging information in the Associated Data (ADATA) format, but it does not support the older SYM records generated by the TEST option. Note that the OS/360 Object File Format was simply superseded by the GOFF format, it was not deprecated, and is still in use by assemblers and language compilers where the language can withstand the limitations of the older format.
This article will use the term "module" to refer to any name or equivalent symbol, which is used to provide an identifier for a piece of code or data external to the scope to which it is referenced. A module may refer to a subroutine, a function, Fortran Common or Block Data, an object or class, a method or property of an object or class, or any other named routine or identifier external to that particular scope referencing the external name.
The terms "assembler" for a program that converts assembly language to machine code, as well as to "assemble" as the process of using one, and to "compile," as the process of using a "compiler," which does the same thing for high-level languages, should, for the purposes of this article. be considered interchangeable; thus where "compile" and "compiler" are used, substitute "assemble" and "assembler" as needed.
Numbers used in this article are expressed as follows: unless specified as hexadecimal (base 16), all numbers used are in decimal (base 10). When necessary to express a number in hexadecimal, the standard mainframe assembler format of using the capital letter X preceding the number, expressing any hexadecimal letters in the number in upper case, and enclosing the number in single quotes, e.g. the number 15deadbeef16 would be expressed as X'15DEADBEEF'.
A "byte" as used in this article, is 8-bits, and unless otherwise specified, a "byte" and a "character" are the same thing; characters in EBCDIC are also 8-bit. When multi-byte character sets (such as Unicode) are used in user programs, they will use two (or more) bytes.
The format is similar to the OS/360 Object File Format but adds additional information for use in building applications.4
According to the Users Guide for z/OS XL C/C++ User's Guide, "The maximum size of a GOFF object is 1 gigabyte."6
Similarly to the older OS/360 format, object file records are divided into 6 different record types, some added, some deleted, some altered:
GOFF records may be fixed or variable length; the minimum length when using variable-length records is 56 characters, although most records will be longer than this. Except for module and class names, all characters are in the EBCDIC character set. Unix-based systems must use fixed-length (80-byte) records. Records in fixed-length files that are shorter than the fixed length should be zero-filled. To distinguish GOFF records from the older OS/360 object format (where the first byte of a record is X'02') or from commands that may be present in the file, the first byte of each GOFF record is always the binary value X'03', while commands must start with a character value of at least space (X'40'). The next 2 bytes of a GOFF record indicate the record type, continuation and version of the file format. These first 3 bytes are known as the PTV field.
The PTV field represents the first 3 bytes of every GOFF record.
The HDR record is required, and must be the first record.
An ESD record gives the public name for a module, a main program, a subroutine, procedure, function, property or method in an object, Fortran Common or alternate entry point. An ESD record for a public name must be present in the file before any reference to that name is made by any other record.
In the case of fixed-length records where the name requires continuation records, the following is used:
ADATA ("associated data") records are used to provide additional symbol information about a module. They replaced the older SYM records in the 360 object file format. To create an ADATA record
ADATA records will be appended to the end of the class in the order they are declared.
Class names assigned to ADATA records are translated by IBM programs by converting the binary value to text and appending it to the name C_ADATA, So an item numbered X'0033' would become the text string C_ADATA0033.
TXT records specify the machine code instructions and data to be placed at a specific address location in the module. Note that wherever a "length" must be specified for this record, the length value must include any continuations to this record.
The data length in bytes 22-23 being an unsigned value may be incorrect. According to comments in the GOFF generator part of the LLVM Compiler suite,
A compression table is used if bytes 20-21 of the TXT record is nonzero. The R value is used to determine the number of times to repeat the string; the L value indicates the length of the text to be repeated "R" times. This could be used for pre-initializing tables or arrays to blanks or zero or for any other purpose where it is useful to express repeated data as a repeat count and a value.
The IDR Table, which is located starting at byte 24 of the TXT record, identifies the compiler or assembler (and its version number) that created this object file.
Note that unlike most number values stored in a GOFF file, the "version", "release" and "trans_date" values are numbers as text characters instead of binary
Normally compilers and assemblers do not generate this format record, it is typically created by the binder.
All text in this item are character data; no binary information is used.
RLD records allow a module to show where it references an address that must be relocated, such as references to specific locations in itself, or to external modules.
[A] If R_Pointer (bit 0 of byte 0 of Flags field is 1) is omitted, this field starts 4 bytes lower, in bytes 8-11. [B] If R_Pointer or P_Pointer (bit 1 of byte 0 of Flags field is 1) is omitted, this field starts 4 bytes lower, in bytes 12-15. If both fields are omitted, this field starts 8 bytes lower, in bytes 8-11. [C] If R_Pointer, P_Pointer, or Offset (bit 2 of byte 0 of Flags field is 1) are omitted, this field starts 4 bytes lower. If any two of them are omitted, this field starts 8 bytes lower. If all of them are omitted, this field starts 12 bytes lower.
To clarify, if a module in a C program named "Basura" was to issue a call to the "exit" function to terminate itself, the R_Pointer address would be the ESDID of the routine "exit" while the P_Pointer would be the ESDID of "Basura". If the address was in the same module (like internal subroutines, or a reference to data within the same module) R_Pointer and P_Pointer would be the same.
LEN records are used to declare the length of a module where it was not known at the time the ESD record was created, e.g. for one-pass compilers.
A deferred-length element entry cannot be continued or split
END must be the last record for a module. An 'Entry Point' is used when an address other than the beginning of the module is to be used as the start point for its execution. This is used either because the program has non-executable data appearing before the start of the module (very common for older assembly programmers, as older versions of the assembler were much slower to assemble data stored in programs once instructions were specified), or because the module calls an external module first, such as a run-time library to initialize itself.
If an entry-point name specified on a fixed-length END record is longer than 54 bytes or (if this record itself is also continued) is longer than an additional 77 bytes), the following continuation record is used.
John R. Ehrman (March 1, 2001). "How the Linkage Editor Works: A Tutorial on Object/Load Modules, Link Editors, Loaders, and What They Do for (and to) You" (PDF). IBM Silicon Valley (Santa Teresa) Laboratory, San Jose. Retrieved September 8, 2019.[permanent dead link] ftp://ftp.boulder.ibm.com/software/websphere/awdtools/hlasm/s8169a.pdf ↩
"Appendix C. Generalized object file format (GOFF)" (PDF). MVS Program Management: Advanced Facilities (PDF). z/OS (eighth ed.). Poughkeepsie, NY: IBM. September 2007. pp. 205–240. SA22-7644-07. Archived from the original (PDF) on October 19, 2021. Retrieved August 9, 2013. https://web.archive.org/web/20211019012559/http://publibz.boulder.ibm.com/epubs/pdf/iea2b270.pdf ↩
IBM High Level Assembler for MVS & VM & VSE Release 2 Presentation Guide (PDF). December 1995. SG24-3910-01. Archived from the original (PDF) on 2016-01-23. Retrieved November 13, 2015. https://web.archive.org/web/20160123064644/http://www.redbooks.ibm.com/redbooks/pdfs/sg243910.pdf ↩
High Level Assembler for z/OS & z/VM & z/VSE Programmer's Guide (PDF) (sixth ed.). San Jose, CA: IBM. July 2008. Appendix C. SC26-4941-05. Retrieved September 8, 2019.[permanent dead link] http://publibfp.dhe.ibm.com/epubs/pdf/asmp1020.pdf ↩
"RLD". www.ibm.com. IBM. 16 August 2013. Retrieved 10 July 2020. https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.asma100/asmp102141.htm ↩
"Lp64 | Ilp32". IBM. https://www.ibm.com/docs/en/zos/2.4.0?topic=options-lp64-ilp32 ↩
"llvm/BinaryFormat/GOFF.h - GOFF definitions". LLVM.ORG. Retrieved August 16, 2024. https://llvm.org/docs/doxygen/BinaryFormat_2GOFF_8h_source.html ↩