shithub: plan9front

ref: e72da62915b09d5673b0c0179ba8dfe045aeb8c3
dir: /sys/doc/libmach.ms/

View raw version
.HTML "Adding Application Support for a New Architecture in Plan 9
.TL
Adding Application Support for a New Architecture in Plan 9
.AU
Bob Flandrena
bobf@plan9.bell-labs.com
.SH
Introduction
.LP
Plan 9 has five classes of architecture-dependent software:
headers, kernels, compilers and loaders, the
.CW libc
system library, and a few application programs.  In general,
architecture-dependent programs
consist of a portable part shared by all architectures and a
processor-specific portion for each supported architecture.
The portable code is often compiled and stored in a library
associated with
each architecture.  A program is built by
compiling the architecture-specific code and loading it with the
library.  Support for a new architecture is provided
by building a compiler for the architecture, using it to
compile the portable code into libraries,
writing the architecture-specific code, and
then loading that code with
the libraries.
.LP
This document describes the organization of the architecture-dependent
code and headers on Plan 9.
The first section briefly discusses the layout of
the headers and the source code for the kernels, compilers, loaders, and the
system library, 
.CW libc .
The second section provides a detailed
discussion of the structure of
.CW libmach ,
a library containing almost
all architecture-dependent code
used by application programs.
The final section describes the steps required to add
application program support for a new architecture.
.SH
Directory Structure
.PP
Architecture-dependent information for the new processor
is stored in the directory tree rooted at \f(CW/\fP\fIm\fP
where
.I m
is the name of the new architecture (e.g.,
.CW mips ).
The new directory should be initialized with several important
subdirectories, notably
.CW bin ,
.CW include ,
and
.CW lib .
The directory tree of an existing architecture
serves as a good model for the new tree.
The architecture-dependent
.CW mkfile
must be stored in the newly created root directory
for the architecture.  It is easiest to copy the
mkfile for an existing architecture and modify
it for the new architecture.  When the mkfile
is correct, change the
.CW OS
and
.CW CPUS
variables in the
.CW /sys/src/mkfile.proto
to reflect the addition of the new architecture.
.SH
Headers
.LP
Architecture-dependent headers are stored in directory
.CW /\fIm\fP/include
where
.I m
is the name of the architecture (e.g.,
.CW mips ).
Two header files are required:
.CW u.h 
and
.CW ureg.h .
The first defines fundamental data types,
bit settings for the floating point
status and control registers, and
.CW va_list
processing which depends on the stack
model for the architecture.  This file
is best built by copying and modifying the
.CW u.h
file from an architecture
with a similar stack model.
The
.CW ureg.h
file
contains a structure describing the layout
of the saved register set for
the architecture; it is defined by the kernel.
.LP
Header file
.CW /sys/include/a.out.h
contains the definitions of the magic
numbers used to identify executables for
each architecture.  When support for a new
architecture is added, the magic number
for the architecture must be added to this file.
.LP
The header format of a bootable executable is defined by
each manufacturer.  Header file
.CW /sys/include/bootexec.h
contains structures describing the headers currently
supported.  If the new architecture uses a common header
such as COFF,
the header format is probably already defined,
but if the bootable header format is non-standard,
a structure defining the format must be added to this file.
.LP
.SH
Kernel
.LP
Although the kernel depends critically on the properties of the underlying
hardware, most of the
higher-level kernel functions, including process
management, paging, pseudo-devices, and some
networking code, are independent of processor
architecture.  The portable kernel code
is divided into two parts: that implementing kernel
functions and that devoted to the boot process.
Code in the first class is stored in directory
.CW /sys/src/9/port
and the portable boot code is stored in
.CW /sys/src/9/boot .
Architecture-dependent kernel code is stored in the
subdirectories of
.CW /sys/src/9
named for each architecture.
.LP
The relationship between the kernel code and the boot code
is convoluted and subtle.  The portable boot code
is compiled into a library for each architecture.  An architecture-specific
main program is loaded with the appropriate library and the resulting
executable is compiled into the kernel where it is executed as
a user process during the final stages of kernel initialization.  The boot process
performs authentication, attaches the name space root to the appropriate
file system and starts the
.CW init
process.
.LP
The organization of the portable kernel source code differs from that
of most other architecture-specific code.
Instead of storing the portable code in a library
and loading it with the architecture-specific
code, the portable code is compiled directly into
the directory containing the architecture-specific code
and linked with the object files built from the source in that directory.
.LP
.SH
Compilers and Loaders
.LP
The compiler source code conforms to the usual
organization: portable code is compiled into a library
for each architecture
and the architecture-dependent code is loaded with
that library.
The common compiler code is stored in
.CW /sys/src/cmd/cc .
The
.CW mkfile
in this directory compiles the portable source and
archives the objects in a library for each architecture.
The architecture-specific compiler source
is stored in a subdirectory of
.CW /sys/src/cmd
with the same name as the compiler (e.g.,
.CW /sys/src/cmd/vc ).
.LP
There is no portable code shared by the loaders.
Each directory of loader source
code is self-contained, except for
a header file and an instruction name table
included from the
directory of the associated
compiler.
.LP
.SH
Libraries
.LP
Most C library modules are
portable; the source code is stored in
directories
.CW /sys/src/libc/port
and
.CW /sys/src/libc/9sys .
Architecture-dependent library code
is stored in the subdirectory of
.CW /sys/src/libc
named the same as the target processor.
Non-portable functions not only
implement architecture-dependent operations
but also supply assembly language implementations
of functions where speed is critical.
Directory
.CW /sys/src/libc/9syscall
is unusual because it
contains architecture-dependent information
for all architectures.
It holds only a header file defining
the names and numbers of system calls
and a
.CW mkfile .
The
.CW mkfile
executes an
.CW rc
script that parses the header file, constructs
assembler language functions implementing the system
call for each architecture, assembles the code,
and archives the object files in
.CW libc .
The assembler language syntax and the system interface
differ for each architecture.
The
.CW rc
script in this
.CW mkfile
must be modified to support a new architecture.
.LP
.SH
Applications
.LP
Application programs process two forms of architecture-dependent
information: executable images and intermediate object files.
Almost all processing is on executable files.
System library
.CW libmach
provides functions that convert
architecture-specific data
to a portable format so application programs
can process this data independent of its
underlying representation.
Further, when a new architecture is implemented
almost all code changes
are confined to the library;
most affected application programs need only be reloaded.
The source code for the library is stored in
.CW /sys/src/libmach .
.LP
An application program running on one type of
processor must be able to interpret
architecture-dependent information for all
supported processors.
For example, a debugger must be able to debug
the executables of
all architectures, not just the
architecture on which it is executing, since
.CW /proc
may be imported from a different machine.
.LP
A small part of the application library
provides functions to
extract symbol references from object files.
The remainder provides the following processing
of executable files or memory images:
.IP \(bu
Header interpretation.
.IP \(bu
Symbol table interpretation.
.IP \(bu
Execution context interpretation, such as stack traces
and stack frame location.
.IP \(bu
Instruction interpretation including disassembly and
instruction size and follow-set calculations.
.IP \(bu
Exception and floating point number interpretation.
.IP \(bu
Architecture-independent read and write access through a
relocation map.
.LP
Header file
.CW /sys/include/mach.h
defines the interfaces to the
application library.  Manual pages
.I mach (2),
.I symbol (2),
and
.I object (2)
describe the details of the
library functions.
.LP
Two data structures, called
.CW Mach
and
.CW Machdata ,
contain architecture-dependent  parameters and
a jump table of functions.
Global variables
.CW mach
and
.CW machdata
point to the
.CW Mach
and
.CW Machdata
data structures associated with the target architecture.
An application determines the target architecture of
a file or executable image, sets the global pointers
to the data structures associated with that architecture,
and subsequently performs all references indirectly through the
pointers.
As a result, direct references to the tables for each
architecture are avoided and the application code intrinsically
supports all architectures (though only one at a time).
.LP
Object file processing is handled similarly: architecture-dependent
functions identify and
decode the intermediate files for the processor.
The application indirectly
invokes a classification function to identify
the architecture of the object code and to select the
appropriate decoding function.  Subsequent calls
then use that function to decode each record.  Again,
the layer of indirection allows the application code
to support all architectures without modification.
.LP
Splitting the architecture-dependent information
between the
.CW Mach
and
.CW Machdata
data structures
allows applications to choose
an appropriate level of service.  Even though an application
does not directly reference the architecture-specific data structures,
it must load the
architecture-dependent tables and code 
for all architectures it supports.  The size of this data
can be substantial and many applications do not require
the full range of architecture-dependent functionality.
For example, the
.CW size
command does not require the disassemblers for every architecture;
it only needs to decode the header.
The
.CW Mach
data structure contains a few architecture-specific parameters
and a description of the processor register set.
The size of the structure
varies with the size of the register
set but is generally small.
The
.CW Machdata
data structure contains
a jump table of architecture-dependent functions;
the amount of code and data referenced by this table
is usually large.
.SH
Libmach Source Code Organization
.LP
The
.CW libmach
library provides four classes of functionality:
.LP
.IP "Header and Symbol Table Decoding\ -\ "
Files
.CW executable.c
and
.CW sym.c
contain code to interpret the header and
symbol tables of
an executable file or executing image.
Function
.CW crackhdr
decodes the header,
reformats the
information into an
.CW Fhdr
data structure, and points
global variable
.CW mach
to the
.CW Mach
data structure of the target architecture.
The symbol table processing
uses the data in the
.CW Fhdr
structure to decode the symbol table.
A variety of symbol table access functions then support
queries on the reformatted table.
.IP "Debugger Support\ -\ "
Files named
.CW \fIm\fP.c ,
where
.I m
is the code letter assigned to the architecture,
contain the initialized
.CW Mach
data structure and the definition of the register
set for each architecture.
Architecture-specific debugger support functions and
an initialized
.CW Machdata
structure are stored in
files named
.CW \fIm\fPdb.c .
Files
.CW machdata.c 
and
.CW setmach.c
contain debugger support functions shared
by multiple architectures.
.IP "Architecture-Independent Access\ -\ "
Files
.CW map.c ,
.CW access.c ,
and
.CW swap.c
provide accesses through a relocation map
to data in an executable file or executing image.
Byte-swapping is performed as needed.  Global variables
.CW mach
and
.CW machdata
must point to the
.CW Mach
and
.CW Machdata
data structures of the target architecture.
.IP "Object File Interpretation\ -\ "
These files contain functions to identify the
target architecture of an
intermediate object file
and extract references to symbols.  File
.CW obj.c
contains code common to all architectures;
file
.CW \fIm\fPobj.c
contains the architecture-specific source code
for the machine with code character
.I m .
.LP
The
.CW Machdata
data structure is primarily a jump
table of architecture-dependent debugger support
functions. Functions select the
.CW Machdata
structure for a target architecture based
on the value of the
.CW type
code in the
.CW Fhdr
structure or the name of the architecture.
The jump table provides functions to swap bytes, interpret
machine instructions,
perform stack
traces, find stack frames, format floating point
numbers, and decode machine exceptions.  Some functions, such as
machine exception decoding, are idiosyncratic and must be
supplied for each architecture.  Others depend
on the compiler run-time model and several
architectures may share code common to a model.  For
example, many architectures share the code to
process the fixed-frame stack model implemented by
several of the compilers.
Finally, some
functions, such as byte-swapping, provide a general capability and
the jump table need only select an implementation appropriate
to the architecture.
.LP
.SH
Adding Application Support for a New Architecture
.LP
This section describes the
steps required to add application-level
support for a new architecture.
We assume
the kernel, compilers, loaders and system libraries
for the new architecture are already in place.  This
implies that a code-character has been assigned and
that the architecture-specific headers have been
updated.
With the exception of two programs,
application-level changes are confined to header
files and the source code in
.CW /sys/src/libmach .
.LP
.IP 1.
Begin by updating the application library
header file in
.CW /sys/include/mach.h .
Add the following symbolic codes to the
.CW enum
statement near the beginning of the file:
.RS
.IP \(bu
The processor type code, e.g., 
.CW MSPARC .
.IP \(bu
The type of the executable.  There are usually
two codes needed: one for a bootable
executable (i.e., a kernel) and one for an
application executable.
.IP \(bu
The disassembler type code.  Add one entry for
each supported disassembler for the architecture.
.IP \(bu
A symbolic code for the object file.
.RE
.LP
.IP 2.
In a file name
.CW /sys/src/libmach/\fIm\fP.c
(where
.I m
is the identifier character assigned to the architecture),
initialize
.CW Reglist
and
.CW Mach
data structures with values defining
the register set and various system parameters.
The source file for a similar architecture
can serve as template.
Most of the fields of the
.CW Mach
data structure are obvious
but a few require further explanation.
.RS
.IP "\f(CWkbase\fP\ -\ "
This field
contains the address of the kernel 
.CW ublock .
The debuggers
assume the first entry of the kernel
.CW ublock
points to the
.CW Proc
structure for a kernel thread.
.IP "\f(CWktmask\fP\ -\ "
This field
is a bit mask used to calculate the kernel text address from
the kernel 
.CW ublock
address.
The first page of the
kernel text segment is calculated by
ANDing
the negation of this mask with
.CW kbase .
.IP "\f(CWkspoff\fP\ -\ "
This field
contains the byte offset in the
.CW Proc
data structure to the saved kernel
stack pointer for a suspended kernel thread.  This
is the offset to the 
.CW sched.sp
field of a
.CW Proc
table entry.
.IP "\f(CWkpcoff\fP\ -\ "
This field contains the byte offset into the
.CW Proc
data structure
of
the program counter of a suspended kernel thread.
This is the offset to
field
.CW sched.pc
in that structure.
.IP "\f(CWkspdelta\fP and \f(CWkpcdelta\fP\ -\ "
These fields
contain corrections to be added to
the stack pointer and program counter, respectively,
to properly locate the stack and next
instruction of a kernel thread.  These
values bias the saved registers retrieved
from the
.CW Label
structure named
.CW sched
in the
.CW Proc
data structure.
Most architectures require no bias
and these fields contain zeros.
.IP "\f(CWscalloff\fP\ -\ "
This field
contains the byte offset of the
.CW scallnr
field in the
.CW ublock
data structure associated with a process.
The
.CW scallnr
field contains the number of the
last system call executed by the process.
The location of the field varies depending on
the size of the floating point register set
which precedes it in the
.CW ublock .
.RE
.LP
.IP 3.
Add an entry to the initialization of the
.CW ExecTable
data structure at the beginning of file
.CW /sys/src/libmach/executable.c .
Most architectures
require two entries: one for
a normal executable and
one for a bootable
image.  Each table entry contains:
.RS
.IP \(bu
Magic Number\ \-\ 
The big-endian magic number assigned to the architecture in
.CW /sys/include/a.out.h .
.IP \(bu
Name\ \-\ 
A string describing the executable.
.IP \(bu
Executable type code\ \-\ 
The executable code assigned in
.CW /sys/include/mach.h .
.IP \(bu
\f(CWMach\fP pointer\ \-\ 
The address of the initialized
.CW Mach
data structure constructed in Step 2.
You must also add the name of this table to the
list of
.CW Mach
table definitions immediately preceding the
.CW ExecTable
initialization.
.IP \(bu
Header size\ \-\ 
The number of bytes in the executable file header.
The size of a normal executable header is always
.CW sizeof(Exec) .
The size of a bootable header is
determined by the size of the structure
for the architecture defined in
.CW /sys/include/bootexec.h .
.IP \(bu
Byte-swapping function\ \-\ 
The address of
.CW beswal
or
.CW leswal
for big-endian and little-endian
architectures, respectively.
.IP \(bu
Decoder function\ -\ 
The address of a function to decode the header.
Function
.CW adotout
decodes the common header shared by all normal
(i.e., non-bootable) executable files.
The header format of bootable
executable files is defined by the manufacturer and
a custom function is almost always
required to decode it.
Header file
.CW /sys/include/bootexec.h
contains data structures defining the bootable
headers for all architectures.  If the new architecture
uses an existing format, the appropriate
decoding function should already be in
.CW executable.c .
If the header format is unique, then
a new function must be added to this file.
Usually the decoding function for an existing
architecture can be adopted with minor modifications.
.RE
.LP
.IP 4.
Write an object file parser and
store it in file
.CW /sys/src/libmach/\fIm\fPobj.c
where
.I m
is the identifier character assigned to the architecture.
Two functions are required: a predicate to identify an
object file for the architecture and a function to extract
symbol references from the object code.
The object code format is obscure but
it is often possible to adopt the
code of an existing architecture
with minor modifications.
When these
functions are in hand, insert their addresses
in the jump table at the beginning of file
.CW /sys/src/libmach/obj.c .
.LP
.IP 5.
Implement the required debugger support functions and
initialize the parameters and jump table of the
.CW Machdata
data structure for the architecture.
This code is conventionally stored in
a file named
.CW /sys/src/libmach/\fIm\fPdb.c
where
.I m
is the identifier character assigned to the architecture.
The fields of the
.CW Machdata
structure are:
.RS
.IP "\f(CWbpinst\fP and \f(CWbpsize\fP\ -\ "
These fields
contain the breakpoint instruction and the size
of the instruction, respectively.
.IP "\f(CWswab\fP\ -\ "
This field
contains the address of a function to
byte-swap a 16-bit value.  Choose
.CW leswab
or
.CW beswab
for little-endian or big-endian architectures, respectively.
.IP "\f(CWswal\fP\ -\ "
This field
contains the address of a function to
byte-swap a 32-bit value.  Choose
.CW leswal
or
.CW beswal
for little-endian or big-endian architectures, respectively.
.IP "\f(CWctrace\fP\ -\ "
This field
contains the address of a function to perform a
C-language stack trace.  Two general trace functions,
.CW risctrace
and
.CW cisctrace ,
traverse fixed-frame and relative-frame stacks,
respectively.  If the compiler for the
new architecture conforms to one of
these models, select the appropriate function.  If the
stack model is unique,
supply a custom stack trace function.
.IP "\f(CWfindframe\fP\ -\ "
This field
contains the address of a function to locate the stack
frame associated with a text address.
Generic functions
.CW riscframe
and
.CW ciscframe
process fixed-frame and relative-frame stack
models.
.IP "\f(CWufixup\fP\ -\ "
This field
contains the address of a function to adjust
the base address of the register save area.
Currently, only the
68020 requires this bias
to offset over the active
exception frame.
.IP "\f(CWexcep\fP\ -\ "
This field
contains the address of a function to produce a
text
string describing the
current exception.
Each architecture stores exception
information uniquely, so this code must always be supplied.
.IP "\f(CWbpfix\fP\ -\ "
This field
contains the address of a function to adjust an
address prior to laying down a breakpoint.
.IP "\f(CWsftos\fP\ -\ "
This field
contains the address of a function to convert a single
precision floating point value
to a string.  Choose
.CW leieeesftos
for little-endian
or
.CW beieeesftos
for big-endian architectures.
.IP "\f(CWdftos\fP\ -\ "
This field
contains the address of a function to convert a double
precision floating point value
to a string.  Choose
.CW leieeedftos
for little-endian
or
.CW beieeedftos
for big-endian architectures.
.IP "\f(CWfoll\fP, \f(CWdas\fP, \f(CWhexinst\fP, and \f(CWinstsize\fP\ -\ "
These fields point to functions that interpret machine
instructions.
They rely on disassembly of the instruction
and are unique to each architecture.
.CW Foll
calculates the follow set of an instruction.
.CW Das
disassembles a machine instruction to assembly language.
.CW Hexinst
formats a machine instruction as a text
string of
hexadecimal digits.
.CW Instsize
calculates the size in bytes, of an instruction.
Once the disassembler is written, the other functions
can usually be implemented as trivial extensions of it.
.LP
It is possible to provide support for a new architecture
incrementally by filling the jump table entries
of the
.CW Machdata
structure as code is written.  In general, if
a jump table entry contains a zero, application
programs requiring that function will issue an
error message instead of attempting to
call the function.  For example,
the
.CW foll ,
.CW das ,
.CW hexinst ,
and
.CW instsize
jump table slots can be zeroed until a
disassembler is written.
Other capabilities, such as
stack trace or variable inspection,
can be supplied and will be available to
the debuggers but attempts to use the
disassembler will result in an error message.
.RE
.IP 6.
Update the table named
.CW machines
near the beginning of
.CW /sys/src/libmach/setmach.c .
This table binds the
file type code and machine name to the
.CW Mach
and
.CW Machdata
structures of an architecture.
The names of the initialized
.CW Mach
and
.CW Machdata
structures built in steps 2 and 5
must be added to the list of
structure definitions immediately
preceding the table initialization.
If both Plan 9 and
native disassembly are supported, add
an entry for each disassembler to the table.  The
entry for the default disassembler (usually
Plan 9) must be first.
.IP 7.
Add an entry describing the architecture to
the table named
.CW trans
near the end of
.CW /sys/src/cmd/prof.c .
.RE
.IP 8.
Add an entry describing the architecture to
the table named
.CW objtype
near the start of
.CW /sys/src/cmd/pcc.c .
.RE
.IP 9.
Recompile and install
all application programs that include header file
.CW mach.h
and load with
.CW libmach.a .