ARM64 程序調用標準
- 1 Machine Registers
- 1.1 General-purpose Registers
- 1.2 SIMD and Floating-Point Registers
- 2 Processes, Memory and the Stack
- 2.1 Memory Addresses
- 2.2 The Stack
- 2.2.1 Universal stack constraints
- 2.2.2 Stack constraints at a public interface
- 2.3 The Frame Pointer
- 3 Subroutine Calls
- 3.1 Use of IP0 and IP1 by the linker
- 4 Parameter Passing
- 4.1 Variadic Subroutines
- 4.2 Parameter Passing Rules
- 5 Result Return
- 6 Interworking
The base standard defines a machine-level calling standard for the A64 instruction set. It assumes the availability of the vector registers for passing floating-point and SIMD arguments. Application code is expected to conform to one of the two defined major variants of it (SVR4-like or Windows-like).
基礎標準定義了 A64 指令集的machine-level 調用標準。它假定了用于傳遞浮點和 SIMD 參數的矢量寄存器的可用性。應用代碼應符合其兩個主要變體(類 SVR4 或類 Windows)中的一個。
note:
SRV4: System V Revision 4. A variant of the Unix Operating System. Although this specification refers to SVR4, many other operating systems, such as Linux or BSD use similar rules.
SRV4:System V Revision 4。Unix 操作系統的一種變體。雖然本規范指的是 SVR4,但許多其他操作系統(如 Linux 或 BSD)也使用類似的規則。
1 Machine Registers
The ARM 64-bit architecture defines two mandatory register banks: a general-purpose register bank which can be used for scalar integer processing and pointer arithmetic; and a SIMD and Floating-Point register bank.
ARM 64 位體系結構定義了兩個強制性寄存器組:一個通用寄存器組,可用于標量整數處理和指針運算;另一個 SIMD 和浮點寄存器組。
1.1 General-purpose Registers
There are thirty-one, 64-bit, general-purpose (integer) registers visible to the A64 instruction set; these are labeled r0-r30. In a 64-bit context these registers are normally referred to using the names x0-x30; in a 32-bit context the registers are specified by using w0-w30. Additionally, a stack-pointer register, SP, can be used with a restricted number of instructions. Register names may appear in assembly language in either upper case or lower case. In this specification upper case is used when the register has a fixed role in this procedure call standard. Table 2, General purpose registers and AAPCS64 usage summarizes the uses of the general-purpose registers in this standard. In addition to the general-purpose registers there is one status register (NZCV) that may be set and read by conforming code.
A64 指令集有 31 個 64 位通用(整數)寄存器,分別標為 r0-r30。在 64 位環境下,這些寄存器通常使用 x0-x30 的名稱;在 32 位環境下,這些寄存器使用 w0-w30 的名稱。此外,堆棧指針寄存器 SP 可用于數量有限的指令。寄存器名稱在匯編語言中可以大寫或小寫出現。在本規范中,當寄存器在程序調用標準中具有固定作用時,則使用大寫。表 2(通用寄存器和 AAPCS64 的使用)概述了本標準中通用寄存器的用途。除通用寄存器外,還有一個狀態寄存器(NZCV)可由符合標準的代碼設置和讀取。
The first eight registers, r0-r7, are used to pass argument values into a subroutine and to return result values from a function. They may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls).
前 8 個寄存器(r0-r7)用于向子程序傳遞參數值和從函數返回結果值。它們也可用于保存例程中的中間值(但一般只在子例程調用之間使用)。
Registers r16 (IP0) and r17 (IP1) may be used by a linker as a scratch register between a routine and any subroutine it calls (for details, see §3.1.1, Use of IP0 and IP1 by the linker). They can also be used within a routine to hold intermediate values between subroutine calls.
寄存器 r16(IP0)和 r17(IP1)可以被鏈接器用作例程和它調用的任何子例程之間的scratch寄存器(詳見章節 3.1.1,鏈接器對 IP0 和 IP1 的使用)。它們也可以在例程中使用,在子程序調用之間保存中間值。
The role of register r18 is platform specific. If a platform ABI has need of a dedicated general purpose register to carry inter-procedural state (for example, the thread context) then it should use this register for that purpose. If the platform ABI has no such requirements, then it should use r18 as an additional temporary register. The platform ABI specification must document the usage for this register.
寄存器 r18 的作用與平臺有關。如果平臺 ABI 需要一個專用的通用寄存器來承載程序間狀態(例如線程上下文),那么就應該使用該寄存器。如果平臺 ABI 沒有此類要求,則應使用 r18 作為額外的臨時寄存器。平臺 ABI 規范必須記錄該寄存器的用途。
Note: Software developers creating platform-independent code are advised to avoid using r18 if at all possible. Most compilers provide a mechanism to prevent specific registers from being used for general allocation; portable hand-coded assembler should avoid it entirely. It should not be assumed that treating the register as callee-saved will be sufficient to satisfy the requirements of the platform. Virtualization code must, of course, treat the register as they would any other resource provided to the virtual machine.
注: 建議創建平臺獨立代碼的軟件開發人員盡可能避免使用 r18。大多數編譯器都提供了防止將特定寄存器用于通用分配的機制;可移植的手工編碼匯編程序應完全避免使用 r18。不要以為將寄存器視為 “可保存的”(calle-saved)寄存器就能滿足平臺的要求。當然,虛擬化代碼必須像對待提供給虛擬機的其他資源一樣對待寄存器。
A subroutine invocation must preserve the contents of the registers r19-r29 and SP.
子程序調用必須保留寄存器 r19-r29 和 SP 的內容。
In all variants of the procedure call standard, registers r16, r17, r29 and r30 have special roles. In these roles they are labeled IP0, IP1, FP and LR when being used for holding addresses (that is, the special name implies accessing the register as a 64-bit entity).
在程序調用標準的所有變體中,寄存器 r16、r17、r29 和 r30 具有特殊作用。當寄存器用于保存地址時,它們被標記為 IP0、IP1、FP 和 LR(也就是說,特殊名稱意味著將寄存器作為 64 位實體訪問)。
Note: The special register names (IP0, IP1, FP and LR) should be used only in the context in which they are special. It is recommended that disassemblers always use the architectural names for the registers.
注意*: 特殊寄存器名稱(IP0、IP1、FP 和 LR)只能在特殊情況下使用。建議反匯編程序始終使用寄存器的結構名稱。
The NZCV register is a global condition flag register with the following properties:
NZCV 寄存器是全局條件標志寄存器,具有以下特性:
- The N, Z, C and V flags are undefined on entry to and return from a public interface.
- 在進入公共接口和從公共接口返回時,N、Z、C 和 V 標志都是未定義的。
1.2 SIMD and Floating-Point Registers
The ARM 64-bit architecture also has a further thirty-two registers, v0-v31, which can be used by SIMD and Floating-Point operations. The precise name of the register will change indicating the size of the access.
ARM 64 位架構還有另外 32 個寄存器 v0-v31,可供 SIMD 和浮點運算使用。寄存器的精確名稱將發生變化,指示訪問的大小。
Note Unlike in AArch32, in AArch64 the 128-bit and 64-bit views of a SIMD and Floating-Point register do no overlap multiple registers in a narrower view, so q1, d1 and s1 all refer to the same entry in the register bank.
注意 與 AArch32 不同,在 AArch64 中,SIMD 和浮點寄存器的 128 位和 64 位視圖不會與較窄視圖中的多個寄存器重疊,因此 q1、d1 和 s1 都指向寄存器庫中的同一個條目。
The first eight registers, v0-v7, are used to pass argument values into a subroutine and to return result values from a function. They may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls).
前 8 個寄存器(v0-v7)用于向子程序傳遞參數值和從函數返回結果值。這些寄存器還可用于保存例程中的中間值(但一般只在子例程調用之間使用)。
Registers v8-v15 must be preserved by a callee across subroutine calls; the remaining registers (v0-v7, v16-v31) do not need to be preserved (or should be preserved by the caller). Additionally, only the bottom 64-bits of each value stored in v8-v15 need to be preserved; it is the responsibility of the caller to preserve larger values.
子程序調用時,被調用者必須保留寄存器 v8-v15;其余寄存器(v0-v7、v16-v31)無需保留(或應由調用者保留)。此外,只有 v8-v15 中存儲的每個值的底部 64 位才需要保留;調用者有責任保留更大的值。
The FPSR is a status register that holds the cumulative exception bits of the floating-point unit. It contains the fields IDC, IXC, UFC, OFC, DZC, IOC and QC. These fields are not preserved across a public interface and may have any value on entry to a subroutine.
FPSR 是一個狀態寄存器,保存浮點單元的累積異常位。它包含字段IDC、IXC、UFC、OFC、DZC、IOC和QC。這些字段不會在公共接口中保留,并且在進入子例程時可能具有任何值。
The FPCR is used to control the behavior of the floating-point unit. It is a global register with the following properties.
FPCR 用于控制浮點運算單元的行為。它是一個全局寄存器,具有以下特性。
? The exception-control bits (8-12), rounding mode bits (22-23) and flush-to-zero bits (24) may be modified by calls to specific support functions that affect the global state of the application.
? 異常控制位(8-12)、舍入模式位(22-23)和平移歸零位(24)可通過調用特定的支持函數進行修改,從而影響應用程序的全局狀態。
? All other bits are reserved and must not be modified. It is not defined whether the bits read as zero or one, or whether they are preserved across a public interface.
? 所有其他位均為保留位,不得修改。至于這些位讀作 0 還是 1,或者是否在公共接口上保留,目前還沒有明確規定。
2 Processes, Memory and the Stack
The AAPCS64 applies to a single thread of execution or process (hereafter referred to as a process). A process has a program state defined by the underlying machine registers and the contents of the memory it can access. The memory a process can access, without causing a run-time fault, may vary during the execution of the process.
AAPCS64 適用于單線程執行或進程(以下簡稱進程)。進程的程序狀態由底層機器寄存器及其可訪問的內存內容定義。在進程執行過程中,在不導致運行時故障的情況下,進程可訪問的內存可能會發生變化。
The memory of a process can normally be classified into five categories:
進程的內存通常可以分為五類:
? code (the program being executed), which must be readable, but need not be writable, by the process.
? 代碼(正在執行的程序),進程必須可以讀取,但不一定可以寫入。
? read-only static data.
? 只讀靜態數據。
? writable static data.
? 可寫靜態數據。
? the heap.
? 堆
? the stack.
? 棧
Writable static data may be further sub-divided into initialized, zero-initialized and uninitialized data. Except for the stack there is no requirement for each class of memory to occupy a single contiguous region of memory. A process must always have some code and a stack, but need not have any of the other categories of memory.
可寫靜態數據可以進一步細分為已初始化數據、零初始化數據和未初始化數據。除了棧之外,不需要每一類內存都占用單個連續的內存區域。進程必須始終具有一些代碼和棧,但不需要任何其他類別的內存。
The heap is an area (or areas) of memory that are managed by the process itself (for example, with the C malloc function). It is typically used for the creation of dynamic data objects.
堆是由進程自身管理的內存區域(如使用 C 語言的 malloc 函數)。它通常用于創建動態數據對象。
A conforming program must only execute instructions that are in areas of memory designated to contain code.
符合要求的程序必須只執行指定包含代碼的內存區域內的指令。
2.1 Memory Addresses
The address space may consist of one or more disjoint regions. No region may span address zero (although one region may start at zero).
地址空間可由一個或多個不相連的區域組成。任何區域都不得跨越零地址(盡管一個區域可以從零開始)。
The use of tagged addressing is platform specific. When tagged addressing is disabled all 64 bits of a pointer are passed to the address translation system. When tagged addressing is enabled, the top eight bits of a pointer are ignored for the purposes of address translation.
標記尋址的使用與平臺有關。禁用標記尋址時,指針的所有 64 位都將傳遞給地址轉換系統。啟用標記尋址后,指針的前八位在地址轉換時將被忽略。
2.2 The Stack
The stack is a contiguous area of memory that may be used for storage of local variables and for passing additional arguments to subroutines when there are insufficient argument registers available.
堆棧是一個連續的內存區域,可用于存儲局部變量以及在沒有足夠的參數寄存器可用時將附加參數傳遞給子例程。
The stack implementation is full-descending, with the current extent of the stack held in the special-purpose register SP. The stack will, in general, have both a base and a limit though in practice an application may not be able to determine the value of either.
堆棧實現是全遞減的,堆棧的當前范圍保存在專用寄存器 SP 中。一般來說,堆棧將具有基數和限制,但實際上應用程序可能無法確定其中任何一個的值。
The stack may have a fixed size or be dynamically extendable (by adjusting the stack-limit downwards).
堆棧可以具有固定大小,也可以動態擴展(通過向下調整堆棧限制)。
The rules for maintenance of the stack are divided into two parts: a set of constraints that must be observed at all times, and an additional constraint that must be observed at a public interface.
堆棧的維護規則分為兩部分:一組必須始終遵守的約束,以及必須在公共接口處遵守的附加約束。
2.2.1 Universal stack constraints
At all times the following basic constraints must hold:
任何時候都必須遵守以下基本約束:
? Stack-limit < SP <= stack-base. The stack pointer must lie within the extent of the stack.
? Stack-limit < SP <= stack-base。堆棧指針必須位于堆棧范圍內。
? A process may only access (for reading or writing) the closed interval of the entire stack delimited by [SP, stack-base – 1].
? 進程只能訪問(用于讀取或寫入)由 [SP, stack-base – 1] 分隔的整個堆棧的閉區間。
Additionally, at any point at which memory is accessed via SP, the hardware requires that
此外,在訪問內存的任何時候硬件都要求通過 SP 訪問
? SP mod 16 = 0. The stack must be quad-word aligned.
? SP mod 16 = 0。堆棧必須四字對齊。
2.2.2 Stack constraints at a public interface
The stack must also conform to the following constraint at a public interface:
堆棧還必須在公共接口處符合以下約束:
? SP mod 16 = 0. The stack must be quad-word aligned.
? SP mod 16 = 0。堆棧必須四字對齊。
2.3 The Frame Pointer
Conforming code shall construct a linked list of stack-frames. Each frame shall link to the frame of its caller by means of a frame record of two 64-bit values on the stack. The frame record for the innermost frame (belonging to the most recent routine invocation) shall be pointed to by the Frame Pointer register (FP). The lowest addressed double-word shall point to the previous frame record and the highest addressed double-word shall contain the value passed in LR on entry to the current function. The end of the frame record chain is indicated by the address zero in the address for the previous frame. The location of the frame record within a stack frame is not specified. Note: There will always be a short period during construction or destruction of each frame record during which the frame pointer will point to the caller’s record.
合格的代碼應構造堆棧幀的鏈接列表。每個幀應通過堆棧上兩個 64 位值的幀記錄鏈接到其調用者的幀。最里面的幀(屬于最近的例程調用)的幀記錄應由幀指針寄存器(FP)指向。最低尋址雙字應指向前一幀記錄,最高尋址雙字應包含進入當前函數時在 LR 中傳遞的值。幀記錄鏈的末尾由前一幀地址中的地址零指示。堆棧幀內幀記錄的位置未指定。注意:在每個幀記錄的構造或銷毀過程中總會有一個短暫的時間段,在此期間幀指針將指向調用者的記錄。
A platform shall mandate the minimum level of conformance with respect to the maintenance of frame records.
平臺應強制規定幀記錄維護的最低一致性水平。
The options are, in decreasing level of functionality:
這些選項按功能級別遞減:
? It may require the frame pointer to address a valid frame record at all times, except that small subroutines which do not modify the link register may elect not to create a frame record
? 它可能需要幀指針始終尋址有效的幀記錄,但不修改鏈接寄存器的小子例程可能選擇不創建幀記錄
? It may require the frame pointer to address a valid frame record at all times, except that any subroutine may elect not to create a frame record
? 它可能需要幀指針始終尋址有效的幀記錄,但任何子例程可能選擇不創建幀記錄
? It may permit the frame pointer register to be used as a general-purpose callee-saved register, but provide a platform-specific mechanism for external agents to reliably detect this condition
? 它可以允許幀指針寄存器用作通用被調用者保存的寄存器,但為外部代理提供特定于平臺的機制來可靠地檢測這種情況
? It may elect not to maintain a frame chain and to use the frame pointer register as a general-purpose callee-saved register.
? 它可以選擇不維護幀鏈并使用幀指針寄存器作為通用被調用者保存的寄存器。
3 Subroutine Calls
The A64 instruction set contains primitive subroutine call instructions, BL and BLR, which performs a branch-with-link operation. The effect of executing BL is to transfer the sequentially next value of the program counter—the return address—into the link register (LR) and the destination address into the program counter. The effect of executing BLR is similar except that the new PC value is read from the specified register.
A64指令集包含原始子程序調用指令BL和BLR,它們執行帶鏈接的分支操作。執行BL的效果是將程序計數器的下一個值(返回地址)傳送到鏈接寄存器(LR),并將目標地址傳送到程序計數器。執行BLR 的效果類似,只是從指定寄存器中讀取新的PC 值。
3.1 Use of IP0 and IP1 by the linker
The A64 branch instructions are unable to reach every destination in the address space, so it may be necessary for the linker to insert a veneer between a calling routine and a called subroutine. Veneers may also be needed to support dynamic linking. Any veneer inserted must preserve the contents of all registers except IP0, IP1 (r16, r17) and the condition code flags; a conforming program must assume that a veneer that alters IP0 and/or IP1 may be inserted at any branch instruction that is exposed to a relocation that supports long branches.
A64 分支指令無法到達地址空間中的每個目的地,因此鏈接器可能需要在調用例程和被調用子例程之間插入veneer代碼。還可能需要veneer來支持動態鏈接。插入的任何veneer必須保留除 IP0、IP1(r16、r17)和條件代碼標志之外的所有寄存器的內容;符合要求的程序必須假設可以將更改 IP0 和/或 IP1 的veneer代碼插入到支持長分支的重定位的任何分支指令處。
Note R_AARCH64_CALL26, and R_AARCH64_JUMP26 are the ELF relocation types with this property.
注意 R_AARCH64_CALL26 和 R_AARCH64_JUMP26 是具有此屬性的 ELF 重定位類型。
4 Parameter Passing
The base standard provides for passing arguments in general-purpose registers (r0-r7), SIMD/floating-point registers (v0-v7) and on the stack. For subroutines that take a small number of small parameters, only registers are used.
基本標準規定在通用寄存器 (r0-r7)、SIMD/浮點寄存器 (v0-v7) 和堆棧中傳遞參數。對于采用少量小參數的子程序,僅使用寄存器。
4.1 Variadic Subroutines
A Variadic subroutine is a routine that takes a variable number of parameters. The full parameter list is known by the caller, but the callee only knows a minimum number of arguments will be passed and will determine the additional arguments based on the values passed in other arguments. The two classes of arguments are known as Named arguments (these form the minimum set) and Anonymous arguments (these are the optional additional arguments).
變量子例程是一種接受可變數參數的例程。調用者知道完整的參數列表,但被調用者只知道將傳遞的最小參數數,并將根據其他參數中傳遞的值確定附加參數。這兩類參數被稱為命名參數(構成最小參數集)和匿名參數(可選的附加參數)。
In this standard a non-variadic subroutine can be considered to be identical to a variadic subroutine that takes no optional arguments.
在本標準中,非變量子程序可視為與不帶可選參數的變量子程序相同。
4.2 Parameter Passing Rules
Parameter passing is defined as a two-level conceptual model
參數傳遞被定義為兩個層次的概念模型
? A mapping from the type of a source language argument onto a machine type
? 從源語言參數類型到機器類型的映射
? The marshaling of machine types to produce the final parameter list
? 對機器類型進行調整,以生成最終參數列表
The mapping from a source language type onto a machine type is specific for each language and is described separately. The result is an ordered list of arguments that are to be passed to the subroutine.
從源語言類型到機器類型的映射是每種語言所特有的,將分別進行描述。結果是要傳遞給子程序的參數的有序列表。
For a caller, sufficient stack space to hold stacked argument values is assumed to have been allocated prior to marshaling: in practice the amount of stack space required cannot be known until after the argument marshaling has been completed. A callee is permitted to modify any stack space used for receiving parameter values from the caller.
對于調用方而言,假定在堆疊參數之前已經分配了足夠的堆棧空間來存放堆疊參數值:實際上,只有在參數堆疊完成后才能知道所需的堆棧空間。允許被調用者修改用于接收調用者參數值的棧空間
5 Result Return
The manner in which a result is returned from a function is determined by the type of that result:
函數返回結果的方式由結果的類型決定:
? If the type, T, of the result of a function is such that
? 如果類型 是T,函數將會是這樣的
void func(T arg)
would require that arg be passed as a value in a register (or set of registers) according to the rules in §4 Parameter Passing, then the result is returned in the same registers as would be used for such an argument.
將要求 arg 按照第 4 節參數傳遞中的規則,以寄存器(或寄存器組)中的值的形式傳遞,然后結果將以用于此類參數的相同寄存器返回。
? Otherwise, the caller shall reserve a block of memory of sufficient size and alignment to hold the result. The address of the memory block shall be passed as an additional argument to the function in x8. The callee may modify the result memory block at any point during the execution of the subroutine (there is no requirement for the callee to preserve the value stored in x8).
否則,調用者應預留足夠大小和對齊方式的內存塊來保存結果。內存塊的地址應作為附加參數傳遞給 x8 中的函數。被調用者可在執行子程序的任何時候修改結果內存塊(不要求調用者保留 x8 中存儲的值)。
6 Interworking
Interworking between the 32-bit AAPCS and the AAPCS64 is not supported within a single process. (In AArch64, all inter-operation between 32-bit and 64-bit machine states takes place across a change of exception level).
單個進程內不支持 32 位 AAPCS 和 AAPCS64 之間的互操作。(在 AArch64 中,32 位和 64 位機器狀態之間的所有互操作都是在異常級別發生變化時進行的)。