OpenJDK 17 源碼安全點輪詢的信號處理流程

OpenJDK 17 源碼，安全點輪詢的信號處理流程如下（重點分析安全點輪詢相關部分）：

核心信號處理流程

信號觸發：
- 當線程訪問安全點輪詢內存頁時（SafepointMechanism::is_poll_address），會觸發?SIGSEGV?信號
- 觸發位置在?MacroAssembler::safepoint_poll?生成的匯編指令中
信號入口：
cpp
```
extern "C" JNIEXPORT int JVM_HANDLE_XXX_SIGNAL(int sig, siginfo_t* info, void* ucVoid, int abort_if_unrecognized)
```
- 這是 JVM 的全局信號處理入口
- 調用?PosixSignals::pd_hotspot_signal_handler?進行平臺相關處理

安全點輪詢識別：

cpp

if (sig == SIGSEGV && SafepointMechanism::is_poll_address((address)info->si_addr)) {stub = SharedRuntime::get_poll_stub(pc);
}

關鍵檢查：信號必須是?SIGSEGV?且訪問地址是安全點輪詢頁

獲取處理樁：

cpp

address SharedRuntime::get_poll_stub(address pc) {bool at_poll_return = ((CompiledMethod*)cb)->is_at_poll_return(pc);if (at_poll_return) {stub = SharedRuntime::polling_page_return_handler_blob()->entry_point();} else {stub = SharedRuntime::polling_page_safepoint_handler_blob()->entry_point();}
}

區分兩種輪詢類型：
- POLL_AT_RETURN：方法返回前的輪詢
- POLL_AT_LOOP：循環中的普通輪詢
返回對應的處理樁入口地址

處理樁跳轉：
cpp
```
if (stub != NULL) {if (thread != NULL) thread->set_saved_exception_pc(pc);os::Posix::ucontext_set_pc(uc, stub);return true;
}
```
- 保存原始 PC 到線程狀態（用于后續恢復）
- 修改上下文中的 PC 寄存器指向處理樁代碼
- 信號處理返回后，CPU 會跳轉到樁代碼執行

處理樁執行流程

樁代碼準備：
cpp
```
_polling_page_return_handler_blob = generate_handler_blob(CAST_FROM_FN_PTR(address, SafepointSynchronize::handle_polling_page_exception),POLL_AT_RETURN);
```
- JVM 啟動時生成兩個處理樁：
  - 返回輪詢樁：polling_page_return_handler_blob
  - 普通輪詢樁：polling_page_safepoint_handler_blob

樁代碼操作（以返回輪詢為例）：

cpp

__ bind(entry->_stub_label);
__ lea(rscratch1, safepoint_pc); // 計算安全點返回地址
__ movptr(Address(r15_thread, JavaThread::saved_exception_pc_offset()), rscratch1);
__ jump(callback_addr); // 跳轉到 handle_polling_page_exception

保存原始返回地址到線程狀態
跳轉到統一的處理函數

安全點處理核心：

cpp

void SafepointSynchronize::handle_polling_page_exception(JavaThread *thread) {ThreadSafepointState* state = thread->safepoint_state();state->handle_polling_page_exception();
}

調用線程安全點狀態的處理方法
最終會使線程阻塞，直到安全點操作完成

關鍵設計要點

兩級跳轉設計：
text
```
信號處理 → 樁代碼 → handle_polling_page_exception
```
- 信號處理只負責重定向到樁代碼
- 樁代碼負責保存精確上下文
- C++ 函數實現阻塞邏輯
上下文保存：
- 信號處理中保存?saved_exception_pc
- 樁代碼中保存?safepoint_pc
- 確保安全點操作后能精確恢復執行
類型區分：
- 普通輪詢：POLL_AT_LOOP
- 返回輪詢：POLL_AT_RETURN
- 不同輪詢類型使用不同處理樁
異步信號安全：
- 使用預生成的匯編樁代碼（blob）
- 避免在信號處理中調用復雜 C++ 函數
- 僅通過修改 PC 寄存器實現控制流轉移

執行流程圖

圖表

代碼

特別注意事項

棧水印檢查：
- 返回輪詢（POLL_AT_RETURN）包含額外的棧水印檢查
- 使用?cmpptr(rsp/rbp, polling_word)?指令
- 確保棧收縮時的內存安全
日志追蹤：
- 代碼中添加的日志點（如?tty->print_cr）有助于調試
- 可以追蹤線程進入安全點的完整路徑
平臺適配：
- 通過模板元編程實現條件編譯（select_emit_stub）
- 僅在水印屏障支持的平臺啟用相關邏輯

這種設計實現了高效的安全點協作機制，在保證正確性的同時最小化性能開銷，通過信號處理+樁代碼+C++函數的組合，實現了從硬件異常到JVM安全點管理的無縫銜接。

##源碼

void SafepointSynchronize::handle_polling_page_exception(JavaThread *thread) {// yym-gaizao 注釋 添加日志：進入 handle_polling_page_exception// 獲取線程名字并打印oop thread_obj = thread->threadObj();const char* thread_name = "UNKNOWN";if (thread_obj != nullptr) {oop name_oop = java_lang_Thread::name(thread_obj);if (name_oop != nullptr) {thread_name = java_lang_String::as_utf8_string(name_oop);}}tty->print_cr("Entering handle_polling_page_exception for thread: %s", thread_name);assert(thread->thread_state() == _thread_in_Java, "should come from Java code");// Enable WXWrite: the function is called implicitly from java code.MACOS_AARCH64_ONLY(ThreadWXEnable wx(WXWrite, thread));if (log_is_enabled(Info, safepoint, stats)) {Atomic::inc(&_nof_threads_hit_polling_page);}ThreadSafepointState* state = thread->safepoint_state();state->handle_polling_page_exception();
}static SafepointBlob* polling_page_safepoint_handler_blob()  { return _polling_page_safepoint_handler_blob; }_polling_page_safepoint_handler_blob = generate_handler_blob(CAST_FROM_FN_PTR(address, SafepointSynchronize::handle_polling_page_exception), POLL_AT_LOOP);static SafepointBlob* polling_page_safepoint_handler_blob()  { return _polling_page_safepoint_handler_blob; }address SharedRuntime::get_poll_stub(address pc) {// yym-gaizao 注釋添加日志：正在獲取 Poll Stubtty->print_cr("Fetching poll stub for pc " INTPTR_FORMAT, p2i(pc));address stub;// Look up the code blobCodeBlob *cb = CodeCache::find_blob(pc);// Should be an nmethodguarantee(cb != NULL && cb->is_compiled(), "safepoint polling: pc must refer to an nmethod");// Look up the relocation informationassert(((CompiledMethod*)cb)->is_at_poll_or_poll_return(pc),"safepoint polling: type must be poll");#ifdef ASSERTif (!((NativeInstruction*)pc)->is_safepoint_poll()) {tty->print_cr("bad pc: " PTR_FORMAT, p2i(pc));Disassembler::decode(cb);fatal("Only polling locations are used for safepoint");}
#endifbool at_poll_return = ((CompiledMethod*)cb)->is_at_poll_return(pc);bool has_wide_vectors = ((CompiledMethod*)cb)->has_wide_vectors();if (at_poll_return) {assert(SharedRuntime::polling_page_return_handler_blob() != NULL,"polling page return stub not created yet");stub = SharedRuntime::polling_page_return_handler_blob()->entry_point();} else if (has_wide_vectors) {assert(SharedRuntime::polling_page_vectors_safepoint_handler_blob() != NULL,"polling page vectors safepoint stub not created yet");stub = SharedRuntime::polling_page_vectors_safepoint_handler_blob()->entry_point();} else {assert(SharedRuntime::polling_page_safepoint_handler_blob() != NULL,"polling page safepoint stub not created yet");stub = SharedRuntime::polling_page_safepoint_handler_blob()->entry_point();}log_debug(safepoint)("... found polling page %s exception at pc = "INTPTR_FORMAT ", stub =" INTPTR_FORMAT,at_poll_return ? "return" : "loop",(intptr_t)pc, (intptr_t)stub);return stub;
}bool PosixSignals::pd_hotspot_signal_handler(int sig, siginfo_t* info,ucontext_t* uc, JavaThread* thread) {//gaizaofprintf(stderr, "[SIGNAL] ThreadID=%d accessing poll page\n", (int)syscall(SYS_gettid));const char* msg = "@@@@yym---pd_hotspot_signal_handler----\n";write(STDERR_FILENO, msg, strlen(msg));  // 直接寫入文件描述符/*NOTE: does not seem to work on linux.if (info == NULL || info->si_code <= 0 || info->si_code == SI_NOINFO) {// can't decode this kind of signalinfo = NULL;} else {assert(sig == info->si_signo, "bad siginfo");}
*/// decide if this trap can be handled by a stubaddress stub = NULL;address pc          = NULL;//%note os_trap_1if (info != NULL && uc != NULL && thread != NULL) {pc = (address) os::Posix::ucontext_get_pc(uc);#ifndef AMD64// Halt if SI_KERNEL before more crashes get misdiagnosed as Java bugs// This can happen in any running code (currently more frequently in// interpreter code but has been seen in compiled code)if (sig == SIGSEGV && info->si_addr == 0 && info->si_code == SI_KERNEL) {fatal("An irrecoverable SI_KERNEL SIGSEGV has occurred due ""to unstable signal handling in this distribution.");}
#endif // AMD64// Handle ALL stack overflow variations hereif (sig == SIGSEGV) {address addr = (address) info->si_addr;// check if fault address is within thread stackif (thread->is_in_full_stack(addr)) {// stack overflowif (os::Posix::handle_stack_overflow(thread, addr, pc, uc, &stub)) {return true; // continue}}}if ((sig == SIGSEGV) && VM_Version::is_cpuinfo_segv_addr(pc)) {// Verify that OS save/restore AVX registers.stub = VM_Version::cpuinfo_cont_addr();}if (thread->thread_state() == _thread_in_Java) {// Java thread running in Java code => find exception handler if any// a fault inside compiled code, the interpreter, or a stubif (sig == SIGSEGV && SafepointMechanism::is_poll_address((address)info->si_addr)) {stub = SharedRuntime::get_poll_stub(pc);} else if (sig == SIGBUS /* && info->si_code == BUS_OBJERR */) {// BugId 4454115: A read from a MappedByteBuffer can fault// here if the underlying file has been truncated.// Do not crash the VM in such a case.CodeBlob* cb = CodeCache::find_blob_unsafe(pc);CompiledMethod* nm = (cb != NULL) ? cb->as_compiled_method_or_null() : NULL;bool is_unsafe_arraycopy = thread->doing_unsafe_access() && UnsafeCopyMemory::contains_pc(pc);if ((nm != NULL && nm->has_unsafe_access()) || is_unsafe_arraycopy) {address next_pc = Assembler::locate_next_instruction(pc);if (is_unsafe_arraycopy) {next_pc = UnsafeCopyMemory::page_error_continue_pc(pc);}stub = SharedRuntime::handle_unsafe_access(thread, next_pc);}}else#ifdef AMD64if (sig == SIGFPE  &&(info->si_code == FPE_INTDIV || info->si_code == FPE_FLTDIV)) {stub =SharedRuntime::continuation_for_implicit_exception(thread,pc,SharedRuntime::IMPLICIT_DIVIDE_BY_ZERO);
#elseif (sig == SIGFPE /* && info->si_code == FPE_INTDIV */) {// HACK: si_code does not work on linux 2.2.12-20!!!int op = pc[0];if (op == 0xDB) {// FIST// TODO: The encoding of D2I in x86_32.ad can cause an exception// prior to the fist instruction if there was an invalid operation// pending. We want to dismiss that exception. From the win_32// side it also seems that if it really was the fist causing// the exception that we do the d2i by hand with different// rounding. Seems kind of weird.// NOTE: that we take the exception at the NEXT floating point instruction.assert(pc[0] == 0xDB, "not a FIST opcode");assert(pc[1] == 0x14, "not a FIST opcode");assert(pc[2] == 0x24, "not a FIST opcode");return true;} else if (op == 0xF7) {// IDIVstub = SharedRuntime::continuation_for_implicit_exception(thread, pc, SharedRuntime::IMPLICIT_DIVIDE_BY_ZERO);} else {// TODO: handle more cases if we are using other x86 instructions//   that can generate SIGFPE signal on linux.tty->print_cr("unknown opcode 0x%X with SIGFPE.", op);fatal("please update this code.");}
#endif // AMD64} else if (sig == SIGSEGV &&MacroAssembler::uses_implicit_null_check(info->si_addr)) {// Determination of interpreter/vtable stub/compiled code null exceptionstub = SharedRuntime::continuation_for_implicit_exception(thread, pc, SharedRuntime::IMPLICIT_NULL);}} else if ((thread->thread_state() == _thread_in_vm ||thread->thread_state() == _thread_in_native) &&(sig == SIGBUS && /* info->si_code == BUS_OBJERR && */thread->doing_unsafe_access())) {address next_pc = Assembler::locate_next_instruction(pc);if (UnsafeCopyMemory::contains_pc(pc)) {next_pc = UnsafeCopyMemory::page_error_continue_pc(pc);}stub = SharedRuntime::handle_unsafe_access(thread, next_pc);}// jni_fast_Get<Primitive>Field can trap at certain pc's if a GC kicks in// and the heap gets shrunk before the field access.if ((sig == SIGSEGV) || (sig == SIGBUS)) {address addr = JNI_FastGetField::find_slowcase_pc(pc);if (addr != (address)-1) {stub = addr;}}}#ifndef AMD64// Execution protection violation//// This should be kept as the last step in the triage.  We don't// have a dedicated trap number for a no-execute fault, so be// conservative and allow other handlers the first shot.//// Note: We don't test that info->si_code == SEGV_ACCERR here.// this si_code is so generic that it is almost meaningless; and// the si_code for this condition may change in the future.// Furthermore, a false-positive should be harmless.if (UnguardOnExecutionViolation > 0 &&stub == NULL &&(sig == SIGSEGV || sig == SIGBUS) &&uc->uc_mcontext.gregs[REG_TRAPNO] == trap_page_fault) {int page_size = os::vm_page_size();address addr = (address) info->si_addr;address pc = os::Posix::ucontext_get_pc(uc);// Make sure the pc and the faulting address are sane.//// If an instruction spans a page boundary, and the page containing// the beginning of the instruction is executable but the following// page is not, the pc and the faulting address might be slightly// different - we still want to unguard the 2nd page in this case.//// 15 bytes seems to be a (very) safe value for max instruction size.bool pc_is_near_addr =(pointer_delta((void*) addr, (void*) pc, sizeof(char)) < 15);bool instr_spans_page_boundary =(align_down((intptr_t) pc ^ (intptr_t) addr,(intptr_t) page_size) > 0);if (pc == addr || (pc_is_near_addr && instr_spans_page_boundary)) {static volatile address last_addr =(address) os::non_memory_address_word();// In conservative mode, don't unguard unless the address is in the VMif (addr != last_addr &&(UnguardOnExecutionViolation > 1 || os::address_is_in_vm(addr))) {// Set memory to RWX and retryaddress page_start = align_down(addr, page_size);bool res = os::protect_memory((char*) page_start, page_size,os::MEM_PROT_RWX);log_debug(os)("Execution protection violation ""at " INTPTR_FORMAT", unguarding " INTPTR_FORMAT ": %s, errno=%d", p2i(addr),p2i(page_start), (res ? "success" : "failed"), errno);stub = pc;// Set last_addr so if we fault again at the same address, we don't end// up in an endless loop.//// There are two potential complications here.  Two threads trapping at// the same address at the same time could cause one of the threads to// think it already unguarded, and abort the VM.  Likely very rare.//// The other race involves two threads alternately trapping at// different addresses and failing to unguard the page, resulting in// an endless loop.  This condition is probably even more unlikely than// the first.//// Although both cases could be avoided by using locks or thread local// last_addr, these solutions are unnecessary complication: this// handler is a best-effort safety net, not a complete solution.  It is// disabled by default and should only be used as a workaround in case// we missed any no-execute-unsafe VM code.last_addr = addr;}}}
#endif // !AMD64if (stub != NULL) {// save all thread context in case we need to restore itif (thread != NULL) thread->set_saved_exception_pc(pc);os::Posix::ucontext_set_pc(uc, stub);return true;}return false;
}#define JVM_HANDLE_XXX_SIGNAL JVM_handle_linux_signal
#else
#error who are you?
#endifextern "C" JNIEXPORT
int JVM_HANDLE_XXX_SIGNAL(int sig, siginfo_t* info,void* ucVoid, int abort_if_unrecognized)
{//yym-gaizaoconst char* msg = "@@@@yym---JVM_HANDLE_XXX_SIGNAL----\n";write(STDERR_FILENO, msg, strlen(msg));  // 直接寫入文件描述符assert(info != NULL && ucVoid != NULL, "sanity");// Note: it's not uncommon that JNI code uses signal/sigset to install,// then restore certain signal handler (e.g. to temporarily block SIGPIPE,// or have a SIGILL handler when detecting CPU type). When that happens,// this handler might be invoked with junk info/ucVoid. To avoid unnecessary// crash when libjsig is not preloaded, try handle signals that do not require// siginfo/ucontext first.// Preserve errno value over signal handler.//  (note: RAII ok here, even with JFR thread crash protection, see below).ErrnoPreserver ep;// Unblock all synchronous error signals (see JDK-8252533)PosixSignals::unblock_error_signals();ucontext_t* const uc = (ucontext_t*) ucVoid;Thread* const t = Thread::current_or_null_safe();// Handle JFR thread crash protection.//  Note: this may cause us to longjmp away. Do not use any code before this//  point which really needs any form of epilogue code running, eg RAII objects.os::ThreadCrashProtection::check_crash_protection(sig, t);bool signal_was_handled = false;// Handle assertion poison page accesses.
#ifdef CAN_SHOW_REGISTERS_ON_ASSERTif (!signal_was_handled &&((sig == SIGSEGV || sig == SIGBUS) && info != NULL && info->si_addr == g_assert_poison)) {signal_was_handled = handle_assert_poison_fault(ucVoid, info->si_addr);}
#endifif (!signal_was_handled) {// Handle SafeFetch access.
#ifndef ZEROif (uc != NULL) {address pc = os::Posix::ucontext_get_pc(uc);if (StubRoutines::is_safefetch_fault(pc)) {os::Posix::ucontext_set_pc(uc, StubRoutines::continuation_for_safefetch_fault(pc));signal_was_handled = true;}}
#else// See JDK-8076185if (sig == SIGSEGV || sig == SIGBUS) {sigjmp_buf* const pjb = get_jmp_buf_for_continuation();if (pjb) {siglongjmp(*pjb, 1);}}
#endif // ZERO}// Ignore SIGPIPE and SIGXFSZ (4229104, 6499219).if (!signal_was_handled &&(sig == SIGPIPE || sig == SIGXFSZ)) {PosixSignals::chained_handler(sig, info, ucVoid);signal_was_handled = true; // unconditionally.}// Call platform dependent signal handler.if (!signal_was_handled) {JavaThread* const jt = (t != NULL && t->is_Java_thread()) ? (JavaThread*) t : NULL;signal_was_handled = PosixSignals::pd_hotspot_signal_handler(sig, info, uc, jt);}// From here on, if the signal had not been handled, it is a fatal error.// Give the chained signal handler - should it exist - a shot.if (!signal_was_handled) {signal_was_handled = PosixSignals::chained_handler(sig, info, ucVoid);}// Invoke fatal error handling.if (!signal_was_handled && abort_if_unrecognized) {// Extract pc from context for the error handler to display.address pc = NULL;if (uc != NULL) {// prepare fault pc address for error reporting.if (S390_ONLY(sig == SIGILL || sig == SIGFPE) NOT_S390(false)) {pc = (address)info->si_addr;} else if (ZERO_ONLY(true) NOT_ZERO(false)) {// Non-arch-specific Zero code does not really know the pc.// This can be alleviated by making arch-specific os::Posix::ucontext_get_pc// available for Zero for known architectures. But for generic Zero// code, it would still remain unknown.pc = NULL;} else {pc = os::Posix::ucontext_get_pc(uc);}}// For Zero, we ignore the crash context, because://  a) The crash would be in C++ interpreter code, so context is not really relevant;//  b) Generic Zero code would not be able to parse it, so when generic error//     reporting code asks e.g. about frames on stack, Zero would experience//     a secondary ShouldNotCallThis() crash.VMError::report_and_die(t, sig, pc, info, NOT_ZERO(ucVoid) ZERO_ONLY(NULL));// VMError should not return.ShouldNotReachHere();}return signal_was_handled;
}