python實現了自己的多線程,為了保證線程安全,引入了全局解釋器鎖GIL,只有拿到GIL的線程才能執行,所以在python中同一時刻只能有一個線程在運行,python多線程無法發揮多核處理器的威力,《python源碼剖析》中對GIL存在的歷史原因作了詳細的描述,總之,目前來說GIL的方案可能是python多線程實現的最優解。
python3.13中對去除GIL作了實驗性的嘗試,使用去除GIL的python需要下載特定的編譯版本,GIL相當于在全局范圍內做了資源互斥,去除GIL后就需要以更細的粒度做資源互斥,這可能會導致去除GIL后的python執行效率還不及GIL的版本,不過python也在持續優化這一點。
在多線程執行時,持有GIL的線程在執行一段時間后需要釋放GIL,以使其它線程也有機會執行,那么每個線程應該“享受”GIL多長時間呢?在python2中規定了每個線程持有GIL的時間片,即執行100條字節碼指令就釋放GIL,python3中線程執行的時間片不是固定的,但是依然可以指定一個時間片,通過sys.setswitchinterval
可以設定這個時間片,如果當前有線程持有GIL正在運行,那么等待GIL的線程會嘗試等待一定時間,當超過等待時間后,那么在排隊的線程就會向正在執行的線程發出“催促”,通過設置eval_breaker標志位來向執行線程發出釋放GIL的信號,在字節碼指令的設計中,會有許多檢查eval_breaker的機會,執行線程當檢查到eval_breaker中存在釋放GIL的標志后,就會嘗試釋放GIL并重新排隊(排隊不是真的排隊,這里只是比喻,等待GIL的線程是在公平競爭的),以給其它線程獲取GIL的機會。
python的多線程機制涉及到系統調用和各個平臺的兼容,相對比較復雜,這里就主要關注線程切換的過程,嘗試對python3的線程切換過程進行分析。
線程執行
python3的底層線程模塊為_thread,它的實現位于Modules/_threadmodule.c中,創建線程的入口即為thread_PyThread_start_new_thread函數,前期主要經過線程處理對象ThreadHandle的創建和線程狀態對象PyThreadState的創建后,進入平臺相關函數PyThread_start_joinable_thread,在該函數中會構建系統調用的參數并調用系統函數創建線程,傳遞給系統調用的是一個統一的函數入口thread_run,在系統原生線程創建出來后就會執行這個函數,thread_run函數會先獲取自己的線程id并將線程狀態對象綁定到全局運行時對象_PyRuntimeState中,隨后調用PyEval_AcquireThread函數嘗試獲取GIL并執行線程代碼,新線程獲取GIL的核心操作便在這個PyEval_AcquireThread函數中。
獲取GIL
新線程進入PyEval_AcquireThread函數嘗試獲取GIL,它的調用鏈是PyEval_AcquireThread->_PyThreadState_Attach->_PyEval_AcquireLock->take_gil,真正獲取GIL的操作在take_gil中,這個函數位于Python/ceval_gil.c文件中,它的源碼如下:
/* Take the GIL.The function saves errno at entry and restores its value at exit.tstate must be non-NULL.Returns 1 if the GIL was acquired, or 0 if not. */
static void
take_gil(PyThreadState *tstate)
{int err = errno;assert(tstate != NULL);/* We shouldn't be using a thread state that isn't viable any more. */// XXX It may be more correct to check tstate->_status.finalizing.// XXX assert(!tstate->_status.cleared);if (_PyThreadState_MustExit(tstate)) {/* bpo-39877: If Py_Finalize() has been called and tstate is not thethread which called Py_Finalize(), exit immediately the thread.This code path can be reached by a daemon thread after Py_Finalize()completes. In this case, tstate is a dangling pointer: points toPyThreadState freed memory. */PyThread_exit_thread();}assert(_PyThreadState_CheckConsistency(tstate));PyInterpreterState *interp = tstate->interp;struct _gil_runtime_state *gil = interp->ceval.gil;
#ifdef Py_GIL_DISABLEDif (!_Py_atomic_load_int_relaxed(&gil->enabled)) {return;}
#endif/* Check that _PyEval_InitThreads() was called to create the lock */assert(gil_created(gil));MUTEX_LOCK(gil->mutex);int drop_requested = 0;while (_Py_atomic_load_int_relaxed(&gil->locked)) {unsigned long saved_switchnum = gil->switch_number;unsigned long interval = (gil->interval >= 1 ? gil->interval : 1);int timed_out = 0;COND_TIMED_WAIT(gil->cond, gil->mutex, interval, timed_out);/* If we timed out and no switch occurred in the meantime, it is timeto ask the GIL-holding thread to drop it. */if (timed_out &&_Py_atomic_load_int_relaxed(&gil->locked) &&gil->switch_number == saved_switchnum){PyThreadState *holder_tstate =(PyThreadState*)_Py_atomic_load_ptr_relaxed(&gil->last_holder);if (_PyThreadState_MustExit(tstate)) {MUTEX_UNLOCK(gil->mutex);// gh-96387: If the loop requested a drop request in a previous// iteration, reset the request. Otherwise, drop_gil() can// block forever waiting for the thread which exited. Drop// requests made by other threads are also reset: these threads// may have to request again a drop request (iterate one more// time).if (drop_requested) {_Py_unset_eval_breaker_bit(holder_tstate, _PY_GIL_DROP_REQUEST_BIT);}PyThread_exit_thread();}assert(_PyThreadState_CheckConsistency(tstate));_Py_set_eval_breaker_bit(holder_tstate, _PY_GIL_DROP_REQUEST_BIT);drop_requested = 1;}}#ifdef Py_GIL_DISABLEDif (!_Py_atomic_load_int_relaxed(&gil->enabled)) {// Another thread disabled the GIL between our check above and// now. Don't take the GIL, signal any other waiting threads, and// return.COND_SIGNAL(gil->cond);MUTEX_UNLOCK(gil->mutex);return;}
#endif#ifdef FORCE_SWITCHING/* This mutex must be taken before modifying gil->last_holder:see drop_gil(). */MUTEX_LOCK(gil->switch_mutex);
#endif/* We now hold the GIL */_Py_atomic_store_int_relaxed(&gil->locked, 1);_Py_ANNOTATE_RWLOCK_ACQUIRED(&gil->locked, /*is_write=*/1);if (tstate != (PyThreadState*)_Py_atomic_load_ptr_relaxed(&gil->last_holder)) {_Py_atomic_store_ptr_relaxed(&gil->last_holder, tstate);++gil->switch_number;}#ifdef FORCE_SWITCHINGCOND_SIGNAL(gil->switch_cond);MUTEX_UNLOCK(gil->switch_mutex);
#endifif (_PyThreadState_MustExit(tstate)) {/* bpo-36475: If Py_Finalize() has been called and tstate is notthe thread which called Py_Finalize(), exit immediately thethread.This code path can be reached by a daemon thread which was waitingin take_gil() while the main thread calledwait_for_thread_shutdown() from Py_Finalize(). */MUTEX_UNLOCK(gil->mutex);/* tstate could be a dangling pointer, so don't pass it todrop_gil(). */drop_gil(interp, NULL, 1);PyThread_exit_thread();}assert(_PyThreadState_CheckConsistency(tstate));tstate->_status.holds_gil = 1;_Py_unset_eval_breaker_bit(tstate, _PY_GIL_DROP_REQUEST_BIT);update_eval_breaker_for_thread(interp, tstate);MUTEX_UNLOCK(gil->mutex);errno = err;return;
}
_PyThreadState_MustExit函數用于檢查當前線程是否在退出狀態,如果在退出狀態則不再參與搶占GIL,隨后進入while循環,獲取gil->locked,如果locked為1,說明當前GIL被其它線程持有,在while循環中,首先保存當前switch_number,然后調用COND_TIMED_WAIT嘗試等待interval時長,等待結束后進行判斷,如果timeout為1,且gil->locked為1,且gil->switch_number == saved_switchnum,則說明經過interval時長,原來持有GIL的線程還在執行,依然沒有釋放GIL,那么就進入if語句塊中,向執行線程發出釋放GIL的信號,表明有線程在等待GIL。如果gil->switch_number != saved_switchnum,則說明在等待期間GIL已經被其它線程搶占了,白等了,重新開始新一輪while循環,設置saved_switchnum,再次等待GIL釋放。
進入if語句塊中,就會獲取執行線程的狀態對象holder_tstate,然后調用_Py_set_eval_breaker_bit函數向它的eval_breaker中設置_PY_GIL_DROP_REQUEST_BIT標志位,表明要求執行線程在下一個檢查點釋放GIL。當執行線程收到信號釋放GIL后,等待的線程就可以進行搶占了。
釋放GIL
那么正在執行的線程應該如何接收到釋放GIL的通知呢?在python3的字節碼中插入了許多檢查當前線程eval_breaker的代碼,是通過CHECK_EVAL_BREAKER宏實現的,比如在字節碼開始的重置指令RESUME中就有CHECK_EVAL_BREAKER,跳轉指令中也有,通過這些檢查點來保證執行線程一定會收到釋放信號,不會使GIL形成死鎖。
CHECK_EVAL_BREAKER宏判斷當前線程狀態對象如果設置了eval_breaker則進入_Py_HandlePending函數處理標志位,_Py_HandlePending函數也位于Python/ceval_gil.c文件中,它的源碼如下:
int
_Py_HandlePending(PyThreadState *tstate)
{uintptr_t breaker = _Py_atomic_load_uintptr_relaxed(&tstate->eval_breaker);/* Stop-the-world */if ((breaker & _PY_EVAL_PLEASE_STOP_BIT) != 0) {_Py_unset_eval_breaker_bit(tstate, _PY_EVAL_PLEASE_STOP_BIT);_PyThreadState_Suspend(tstate);/* The attach blocks until the stop-the-world event is complete. */_PyThreadState_Attach(tstate);}/* Pending signals */if ((breaker & _PY_SIGNALS_PENDING_BIT) != 0) {if (handle_signals(tstate) != 0) {return -1;}}/* Pending calls */if ((breaker & _PY_CALLS_TO_DO_BIT) != 0) {if (make_pending_calls(tstate) != 0) {return -1;}}#ifdef Py_GIL_DISABLED/* Objects with refcounts to merge */if ((breaker & _PY_EVAL_EXPLICIT_MERGE_BIT) != 0) {_Py_unset_eval_breaker_bit(tstate, _PY_EVAL_EXPLICIT_MERGE_BIT);_Py_brc_merge_refcounts(tstate);}
#endif/* GC scheduled to run */if ((breaker & _PY_GC_SCHEDULED_BIT) != 0) {_Py_unset_eval_breaker_bit(tstate, _PY_GC_SCHEDULED_BIT);_Py_RunGC(tstate);}/* GIL drop request */if ((breaker & _PY_GIL_DROP_REQUEST_BIT) != 0) {/* Give another thread a chance */_PyThreadState_Detach(tstate);/* Other threads may run now */_PyThreadState_Attach(tstate);}/* Check for asynchronous exception. */if ((breaker & _PY_ASYNC_EXCEPTION_BIT) != 0) {_Py_unset_eval_breaker_bit(tstate, _PY_ASYNC_EXCEPTION_BIT);PyObject *exc = _Py_atomic_exchange_ptr(&tstate->async_exc, NULL);if (exc != NULL) {_PyErr_SetNone(tstate, exc);Py_DECREF(exc);return -1;}}return 0;
}
_Py_HandlePending根據eval_breaker設置的不同的標志位進入不同分支處理,如果設置了_PY_GIL_DROP_REQUEST_BIT標志位,則調用_PyThreadState_Detach釋放GIL,通過調用鏈_PyThreadState_Detach->detach_thread->_PyEval_ReleaseLock->drop_gil->drop_gil_impl最終釋放了GIL,其實就是把gil->locked設為0而已,gil的原型其實就是一個布爾變量。
在釋放完GIL后又會馬上調用_PyThreadState_Attach重新進入到GIL的競爭中,從釋放到獲取的間隔中可能已經有線程搶到GIL并開始執行了。那么當前線程就和其它等待的線程一起重新競爭GIL,python中的多線程就通過這種通知-釋放的機制進行輪流執行。