(done) MIT6.S081 2023 學習筆記 (Day7: LAB6 Multithreading)

網頁：https://pdos.csail.mit.edu/6.S081/2023/labs/thread.html

(任務1教會了你如何用 C 語言調用匯編，編譯后鏈接即可)
任務1：Uthread: switching between threads (完成)

在這個練習中，你將設計一個用戶級線程系統中的上下文切換機制，并實現它。為了幫助你開始，你的xv6系統中有兩個文件 user/uthread.c 和 user/uthread_switch.S，以及Makefile中的一個規則來構建uthread程序。uthread.c 包含了大部分用戶級線程包的代碼和三個簡單測試線程的代碼。線程包缺少一些創建線程和在線程之間切換的代碼。

根據講義要求，當執行 uthread 程序時，應該出現如下輸出：
在這里插入圖片描述

我們先來運行試試，如下：
在這里插入圖片描述

可以看到沒有任何輸出。先來看看 uthread 的源碼：

int 
main(int argc, char *argv[]) 
{a_started = b_started = c_started = 0;a_n = b_n = c_n = 0;thread_init();thread_create(thread_a);thread_create(thread_b);thread_create(thread_c);current_thread->state = FREE;thread_schedule();exit(0);
}

從 main 函數來看，可以看出大致邏輯和框架：初始化thread，創建三個線程，設置 main thread 狀態為 FREE，讓出執行流(類似于 yield 函數)。

先來看 thread_schedule() 源碼：

void 
thread_schedule(void)
{struct thread *t, *next_thread;/* Find another runnable thread. */next_thread = 0;t = current_thread + 1;for(int i = 0; i < MAX_THREAD; i++){if(t >= all_thread + MAX_THREAD)t = all_thread;if(t->state == RUNNABLE) {next_thread = t;break;}t = t + 1;}if (next_thread == 0) {printf("thread_schedule: no runnable threads\n");exit(-1);}if (current_thread != next_thread) {         /* switch threads?  */next_thread->state = RUNNING;t = current_thread;current_thread = next_thread;/* YOUR CODE HERE* Invoke thread_switch to switch from t to next_thread:* thread_switch(??, ??);*/} elsenext_thread = 0;
}

從函數源碼來理解，是先從 thread 數組獲取一個元素，隨后調用 thread_switch 函數把執行流切換過去。這個 thread_switch 是需要我們自己實現的。

再看看 thread a b c 的源碼：

void 
thread_a(void)
{int i;printf("thread_a started\n");a_started = 1;while(b_started == 0 || c_started == 0)thread_yield();for (i = 0; i < 100; i++) {printf("thread_a %d\n", i);a_n += 1;thread_yield();}printf("thread_a: exit after %d\n", a_n);current_thread->state = FREE;thread_schedule();
}

其實就是不斷打印東西，再切換到別的線程上。

思路其實很簡單，跟著 xv6 的講義做吧：
1.切換線程的函數 thread_switch 在 user/uthread_switch.S 中實現 (doing)
2.你的任務是制定一個計劃來創建線程以及保存/恢復寄存器以在線程之間進行切換，并實現該計劃。完成之后，運行 make grade 應該顯示你的解決方案通過了uthread測試。
3.你需要在 user/uthread.c 中的 thread_create() 和 thread_schedule() 函數以及 user/uthread_switch.S 中的 thread_switch 添加代碼。一個目標是確保當 thread_schedule() 第一次運行給定的線程時，線程能夠在其自己的棧上執行傳遞給 thread_create() 的函數。另一個目標是確保 thread_switch 保存被切換離開的線程的寄存器，恢復被切換到的線程的寄存器，并返回到后者的線程指令中上次離開的位置。你需要決定在哪里保存/恢復寄存器；修改 struct thread 以保存寄存器是一個不錯的計劃。你需要在 thread_schedule 中添加對 thread_switch 的調用；你可以傳遞任何你需要給 thread_switch 的參數，但目的是從線程 t 切換到 next_thread。
4.thread_switch 只需要保存/恢復 callee-saved 的寄存器。為什么？ (回答：編譯器在編譯 C 語言的時候，在編譯對 thread_switch() 的函數調用時會自動保存 caller-saved 寄存器)
5.你可以在 user/uthread.asm 中查看 uthread 的匯編代碼，這對于調試可能很有幫助。
6.為了測試你的代碼，使用 riscv64-linux-gnu-gdb 單步執行 thread_switch 可能會有所幫助。你可以按照以下方式開始：
在這里插入圖片描述

這塊地方其實跟內核線程中的上下文切換很相似，我們定義一個如下的 context 結構體就可以把事情變得簡單化：

struct context {uint64 ra;uint64 sp;// callee-saveduint64 s0;uint64 s1;uint64 s2;uint64 s3;uint64 s4;uint64 s5;uint64 s6;uint64 s7;uint64 s8;uint64 s9;uint64 s10;uint64 s11;
};struct thread {char       stack[STACK_SIZE]; /* the thread's stack */int        state;             /* FREE, RUNNING, RUNNABLE */struct context context;
};

后續很簡單，自己悟吧。

運行 make grade，可以看到任務1-uthread 已通過
在這里插入圖片描述

任務2：Using threads (完成)

在這個作業中，你將使用哈希表探索基于線程和鎖的并行編程。你應該在一個擁有多個核心的真實 Linux 或 MacOS 計算機上完成這個作業（不是 xv6，也不是 qemu）。大多數最新的筆記本電腦都配備了多核處理器。

這個作業使用了 UNIX pthread 線程庫。你可以在手冊頁中找到關于它的信息，使用 man pthreads 命令，你也可以在網上查找，例如這里、這里和這里。

文件 notxv6/ph.c 包含一個簡單的哈希表，如果從單個線程中使用它是正確的，但當從多個線程中使用時它是錯誤的。若運行：

make ph
./ph 1

會得到類似以下的輸出：

100000 puts, 3.991 seconds, 25056 puts/second
0: 0 keys missing
100000 gets, 3.981 seconds, 25118 gets/second

你看到的數字可能與這個示例輸出相差兩倍或更多，這取決于你的計算機速度、是否有多個核心，以及它是否忙于執行其他任務。

ph 運行兩個基準測試。首先，它通過調用 put() 向哈希表中添加大量鍵，并打印每秒 put 操作的速率。然后，它通過 get() 從哈希表中獲取鍵。它打印出由于 put 操作應該存在于哈希表中的鍵的數量，但在這種情況下缺失的鍵數量（這里是零），以及它實現的每秒 get 操作的數量。

未經修改時運行 ph 2，會得到如下輸出
在這里插入圖片描述

這段 ph 2 輸出的第一行表明，當兩個線程同時向哈希表添加條目時，它們實現了每秒 53,044 次插入的總速率。這大約是運行 ph 1 單個線程速率的兩倍。這是一個非常好的“并行加速”，大約是 2 倍，這是人們可能期望的最高加速（即核心數量加倍，每單位時間的工作量也加倍）。

然而，說有 16579 個鍵缺失的兩行表明，大量應該存在于哈希表中的鍵卻不在那里。也就是說，put 操作本應將這些鍵添加到哈希表中，但出了問題。請查看 notxv6/ph.c 文件，特別是 put() 和 insert() 函數。

問題：為什么在兩個線程的情況下會有缺失的鍵，而在一個線程的情況下卻沒有？請識別出兩個線程中的一系列事件，這些事件可能導致一個鍵的缺失。將你的事件序列和簡短的解釋提交到 answers-thread.txt 文件中。（回答看下面）

為了避免這一系列事件，請在 notxv6/ph.c 中的 put 和 get 函數中插入鎖的獲取和釋放語句，以便在兩個線程的情況下缺失的鍵數量始終為 0。相關的 pthread 調用如下：

pthread_mutex_t lock; // 聲明一個鎖 
pthread_mutex_init(&lock, NULL); // 初始化鎖 
pthread_mutex_lock(&lock); // 獲取鎖 
pthread_mutex_unlock(&lock); // 釋放鎖

當你完成這些修改，并且 make grade 顯示你的代碼通過了 ph_safe 測試（該測試要求兩個線程下缺失的鍵數量為零）時，你就完成了這個任務。在這個階段，ph_fast 測試失敗是可以接受的。

不要忘記調用 pthread_mutex_init()。首先使用 1 個線程測試你的代碼，然后用 2 個線程測試它。代碼是否正確（即你是否消除了缺失的鍵？）兩個線程的版本相對于單線程版本是否實現了并行加速（即每單位時間完成的總工作是否更多？）

存在這樣的情況，即并發的 put() 操作在哈希表中的讀寫內存沒有重疊，因此它們不需要鎖來相互保護。你能修改 ph.c 來利用這種情況，以獲得某些 put() 操作的并行加速嗎？提示：每個哈希桶一個鎖怎么樣？

修改你的代碼，以便一些 put 操作可以并行運行，同時保持正確性。當你完成修改并且 make grade 顯示你的代碼同時通過了 ph_safe 和 ph_fast 測試時，你就完成了任務。ph_fast 測試要求兩個線程的 put 操作每秒至少是單個線程的 1.25 倍。

先來稍微讀讀 ph.c 源碼，看為什么會 miss keys，仔細看對哈希表進行插入的代碼:

static void 
insert(int key, int value, struct entry **p, struct entry *n)
{struct entry *e = malloc(sizeof(struct entry));e->key = key;e->value = value;e->next = n; // 注意看這里 <--- 當兩個線程同時執行到這里時，兩個 key 的 next 都等于鏈表頭*p = e; // 隨后讓鏈表頭等于這兩個 key，此時就會有一個 key 被遺漏掉
}

上面代碼的注釋就是對之前問題的回答

為了驗證猜想，對 insert 函數調用加鎖，如下：

    // the new is new.pthread_mutex_lock(&lock); // 獲取鎖 insert(key, value, &table[i], table[i]);pthread_mutex_unlock(&lock); // 釋放鎖

編譯運行，發現已經符合 ph_safe 了，如下

100000 puts, 2.288 seconds, 43702 puts/second
0: 0 keys missing
1: 0 keys missing
200000 gets, 4.546 seconds, 43995 gets/second

運行 make grade，可以同時通過 ph_safe 和 ph_fast，我們就不管了
在這里插入圖片描述

任務3：Barrier (完成)

在這個作業中，你將實現一個 barrier：在應用程序中的一個點，所有參與的線程必須等待，直到所有其他參與的線程也到達那個點。你將使用 pthread 條件變量，這是一種類似于 xv6 的睡眠和喚醒的序列協調技術。

You should do this assignment on a real computer (not xv6, not qemu).

The file notxv6/barrier.c contains a broken barrier.

$ make barrier
$ ./barrier 2
barrier: notxv6/barrier.c:42: thread: Assertion `i == t' failed.

數字 2 指定了在屏障上同步的線程數量（在 barrier.c 中的 nthread）。每個線程執行一個循環。在每次循環迭代中，一個線程調用 barrier()，然后隨機休眠一定數量的微秒。斷言觸發的原因是一個線程在另一個線程到達屏障之前離開了屏障。期望的行為是每個線程在 barrier() 中阻塞，直到所有 nthreads 個線程都調用了 barrier()。

你的目標是實現期望的屏障行為。除了你在 ph 作業中看到的鎖原語之外，你還需要以下新的 pthread 原語；詳細信息請查看這里和這里。

pthread_cond_wait(&cond, &mutex);  // go to sleep on cond, releasing lock mutex, acquiring upon wake up
pthread_cond_broadcast(&cond);     // wake up every thread sleeping on cond

Make sure your solution passes make grade’s barrier test.

pthread_cond_wait releases the mutex when called, and re-acquires the mutex before returning.

We have given you barrier_init(). Your job is to implement barrier() so that the panic doesn’t occur.

We’ve defined struct barrier for you; its fields are for your use.

There are two issues that complicate your task:
1.You have to deal with a succession of barrier calls, each of which we’ll call a round. bstate.round records the current round. You should increment bstate.round each time all threads have reached the barrier.
2.You have to handle the case in which one thread races around the loop before the others have exited the barrier. In particular, you are re-using the bstate.nthread variable from one round to the next. Make sure that a thread that leaves the barrier and races around the loop doesn’t increase bstate.nthread while a previous round is still using it.

Test your code with one, two, and more than two threads.

我們先來看 barrier 的 main 源碼：

int
main(int argc, char *argv[])
{pthread_t *tha;void *value;long i;double t1, t0;if (argc < 2) {fprintf(stderr, "%s: %s nthread\n", argv[0], argv[0]);exit(-1);}nthread = atoi(argv[1]);tha = malloc(sizeof(pthread_t) * nthread);srandom(0);barrier_init();for(i = 0; i < nthread; i++) {assert(pthread_create(&tha[i], NULL, thread, (void *) i) == 0);}for(i = 0; i < nthread; i++) {assert(pthread_join(tha[i], &value) == 0);}printf("OK; passed\n");
}

可以看到程序先根據命令行參數創建幾個線程，隨后讓這些線程執行 thread() 命令

看 thread() 函數實現：

static void *
thread(void *xa)
{long n = (long) xa;long delay;int i;for (i = 0; i < 20000; i++) {int t = bstate.round;assert (i == t);barrier();usleep(random() % 100);}return 0;
}