sendfile系統調用及示例

好的，我們繼續學習 Linux 系統編程中的重要函數。這次我們介紹 sendfile 函數，它是一個高效的系統調用，用于在兩個文件描述符之間直接傳輸數據，通常用于將文件內容發送到網絡套接字，而無需將數據從內核空間復制到用戶空間再復制回內核空間。

1. 函數介紹

sendfile 是一個 Linux 系統調用，旨在優化數據傳輸操作，特別是從一個文件描述符讀取數據并將其寫入到另一個文件描述符的場景。它最典型的用例是 Web 服務器將靜態文件（如 HTML, CSS, JS, 圖片）發送給客戶端。

傳統上，要完成這樣的操作，程序需要：

調用 read() 從文件（例如磁盤）讀取數據到用戶空間的緩沖區。
調用 write() 將用戶空間緩沖區的數據寫入套接字（網絡）。

這種方式涉及多次數據拷貝：磁盤 -> 內核緩沖區 -> 用戶緩沖區 -> 內核套接字緩沖區 -> 網絡。

sendfile 通過讓內核直接在內核空間中完成數據從源文件描述符到目標文件描述符的傳輸，避免了用戶空間和內核空間之間的數據拷貝，從而大大提高了效率，減少了 CPU 的使用。這被稱為**零拷貝 **(Zero-Copy) 技術。

你可以把它想象成一個“傳送帶”：

傳統方式：東西從傳送帶 A 拿下來 -> 放到卡車 -> 再放到傳送帶 B。
sendfile：東西直接從傳送帶 A 轉移到傳送帶 B，無需經過卡車（用戶空間）。

2. 函數原型

#include <sys/sendfile.h> // 必需ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

注意: sendfile 最初是 Linux 特有的，但后來被其他一些系統（如 Solaris）采用。在某些系統上，可能需要定義特定的宏（如 _GNU_SOURCE）才能使用。

3. 功能

高效傳輸: 在內核內部直接將數據從 in_fd（輸入文件描述符）傳輸到 out_fd（輸出文件描述符）。
減少拷貝: 避免了將數據拷貝到用戶空間緩沖區的步驟。
減少系統調用: 一次 sendfile 調用可以完成原本需要多次 read/write 調用才能完成的工作。

4. 參數

int out_fd: 輸出文件描述符。這是數據要被寫入的目標。
- 通常是一個**套接字 **(socket) 文件描述符，例如通過 socket() 和 accept() 獲得的。
- 在較新的 Linux 內核（2.6.33+）中，out_fd 也可以是普通文件。
int in_fd: 輸入文件描述符。這是數據要被讀取的源。
- 通常是一個**普通文件 **(regular file) 的文件描述符，例如通過 open() 獲得的。
- 必須支持 mmap-like 語義，因此不能是套接字、管道等。
off_t *offset: 一個指向 off_t 類型變量的指針，該變量指定從 in_fd 的何處開始讀取數據。
- 如果 offset 為 NULL：從 in_fd 當前的文件偏移量開始讀取，并且讀取后該偏移量會相應更新。
- 如果 offset 非 NULL：從 *offset 指定的字節位置開始讀取。重要：在這種情況下，in_fd 的文件偏移量不會被修改，而 *offset 的值會在 sendfile 返回時被更新為新的偏移量（即 *offset = *offset + number of bytes sent）。
size_t count: 指定要傳輸的最大字節數。

5. 返回值

成功時: 返回實際傳輸的字節數（0 <= 返回值 <= count）。
- 如果返回值為 0，通常表示在 offset 處已經到達輸入文件的末尾。
失敗時: 返回 -1，并設置全局變量 errno 來指示具體的錯誤原因（例如 EBADF 文件描述符無效，EINVAL 參數無效，ENOMEM 內存不足，EIO I/O 錯誤等）。

6. 相似函數，或關聯函數

splice: 另一個零拷貝的數據傳輸函數，功能更強大，可以在任意兩個可 pipe 的文件描述符之間傳輸數據。
tee: 用于在兩個管道之間復制數據，而不消耗數據。
read / write: 傳統的數據傳輸方式，涉及用戶空間拷貝。
mmap / write: 另一種零拷貝方法，先將文件映射到內存，然后寫入套接字。sendfile 通常更簡單高效。
copy_file_range: (Linux 4.5+) 用于在兩個文件描述符之間復制數據，類似于 sendfile，但功能略有不同。

7. 示例代碼

示例 1：使用 `sendfile` 發送文件到套接字 (簡化版 HTTP 服務器片段)

這個例子演示了 Web 服務器如何使用 sendfile 高效地將文件內容發送給客戶端。

// 注意：這是一個簡化的示例，缺少完整的 HTTP 解析、錯誤處理等。
// 編譯時需要鏈接網絡庫: gcc -o sendfile_server sendfile_server.c#include <sys/sendfile.h>  // sendfile
#include <sys/socket.h>    // socket, bind, listen, accept, send, recv
#include <sys/stat.h>      // fstat
#include <fcntl.h>         // open
#include <netinet/in.h>    // sockaddr_in
#include <arpa/inet.h>     // inet_addr
#include <unistd.h>        // close, fstat
#include <stdio.h>         // perror, printf
#include <stdlib.h>        // exit
#include <string.h>        // strstr, strlen#define PORT 8080
#define BUFFER_SIZE 1024void send_http_response(int client_sock, const char *status_line, const char *headers) {char response[BUFFER_SIZE];int len = snprintf(response, sizeof(response), "%s\r\n%s\r\n", status_line, headers);if (len > 0 && len < sizeof(response)) {send(client_sock, response, len, 0);}
}int main() {int server_fd, new_socket;struct sockaddr_in address;int addrlen = sizeof(address);int file_fd;struct stat file_stat;off_t offset;ssize_t bytes_sent;// 1. 創建套接字文件描述符if ((server_fd = socket(AF_INET, SOCK_STREAM, 0)) == 0) {perror("socket failed");exit(EXIT_FAILURE);}// 2. 配置服務器地址address.sin_family = AF_INET;address.sin_addr.s_addr = INADDR_ANY;address.sin_port = htons(PORT);// 3. 綁定套接字if (bind(server_fd, (struct sockaddr *)&address, sizeof(address)) < 0) {perror("bind failed");close(server_fd);exit(EXIT_FAILURE);}// 4. 監聽if (listen(server_fd, 3) < 0) {perror("listen");close(server_fd);exit(EXIT_FAILURE);}printf("Server listening on port %d\n", PORT);// 5. 接受客戶端連接 (這里簡化為處理一個連接)if ((new_socket = accept(server_fd, (struct sockaddr *)&address, (socklen_t*)&addrlen)) < 0) {perror("accept");close(server_fd);exit(EXIT_FAILURE);}printf("Client connected.\n");// 6. 簡單讀取客戶端請求 (假設是 GET / HTTP/1.1)char buffer[BUFFER_SIZE] = {0};read(new_socket, buffer, BUFFER_SIZE - 1);printf("Received request:\n%s\n", buffer);// 7. 簡單解析，檢查是否請求根路徑 "/"if (strstr(buffer, "GET / ") != NULL) {const char *filename = "index.html"; // 假設服務器根目錄下有 index.html// 8. 打開要發送的文件file_fd = open(filename, O_RDONLY);if (file_fd == -1) {perror("open file");const char *not_found = "HTTP/1.1 404 Not Found\r\nContent-Length: 0\r\n\r\n";send(new_socket, not_found, strlen(not_found), 0);close(new_socket);close(server_fd);exit(EXIT_FAILURE);}// 9. 獲取文件狀態 (主要是大小)if (fstat(file_fd, &file_stat) == -1) {perror("fstat");close(file_fd);close(new_socket);close(server_fd);exit(EXIT_FAILURE);}// 10. 發送 HTTP 響應頭char headers[BUFFER_SIZE];snprintf(headers, sizeof(headers),"HTTP/1.1 200 OK\r\n""Content-Type: text/html\r\n""Content-Length: %ld\r\n""\r\n",(long)file_stat.st_size);send(new_socket, headers, strlen(headers), 0);printf("Sent HTTP headers.\n");// 11. 使用 sendfile 發送文件內容offset = 0;ssize_t remaining = file_stat.st_size;while (remaining > 0) {// sendfile 可能不會一次發送完所有數據bytes_sent = sendfile(new_socket, file_fd, &offset, remaining);if (bytes_sent == -1) {perror("sendfile");break;}remaining -= bytes_sent;printf("Sent %zd bytes, %zd bytes remaining.\n", bytes_sent, remaining);}if (remaining == 0) {printf("File sent successfully using sendfile.\n");} else {printf("Error or incomplete transfer.\n");}close(file_fd);} else {// 處理其他請求或發送 404const char *not_found = "HTTP/1.1 404 Not Found\r\nContent-Length: 0\r\n\r\n";send(new_socket, not_found, strlen(not_found), 0);}// 12. 關閉連接和服務器套接字close(new_socket);close(server_fd);printf("Connection closed.\n");return 0;
}

代碼解釋:

創建、綁定、監聽 TCP 套接字，建立一個簡單的服務器。
接受一個客戶端連接。
讀取客戶端的 HTTP 請求（簡化處理）。
檢查請求是否為 GET /。
如果是，打開服務器上的 index.html 文件。
使用 fstat 獲取文件大小。
構造并發送 HTTP 響應頭（包含 Content-Length）。
關鍵步驟: 使用 sendfile 將文件內容發送到客戶端套接字。
- new_socket: 輸出文件描述符（套接字）。
- file_fd: 輸入文件描述符（文件）。
- &offset: 指向 off_t 變量的指針，用于跟蹤文件讀取位置。初始為 0。
- file_stat.st_size: 要傳輸的總字節數。
sendfile 可能不會一次性傳輸所有請求的字節，因此使用 while 循環確保整個文件都被發送。
在循環中更新 remaining 字節數。
最后關閉文件和套接字。

示例 2：使用 `sendfile` 復制文件 (out_fd 為普通文件)

這個例子演示了在較新內核（Linux 2.6.33+）中，如何使用 sendfile 在兩個普通文件之間復制數據。

#define _GNU_SOURCE // 啟用 GNU 擴展以使用 sendfile 的新特性
#include <sys/sendfile.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>int main() {const char *source_filename = "source_file.txt";const char *dest_filename = "dest_file_copy.txt";int src_fd, dest_fd;struct stat stat_buf;ssize_t total_bytes = 0;ssize_t bytes_sent;off_t offset = 0;// 1. 創建并寫入源文件src_fd = open(source_filename, O_WRONLY | O_CREAT | O_TRUNC, 0644);if (src_fd == -1) {perror("open source file for writing");exit(EXIT_FAILURE);}const char *data = "This is the content of the source file.\nIt has multiple lines.\n";if (write(src_fd, data, strlen(data)) == -1) {perror("write to source file");close(src_fd);exit(EXIT_FAILURE);}close(src_fd);printf("Created source file '%s'.\n", source_filename);// 2. 打開源文件 (只讀)src_fd = open(source_filename, O_RDONLY);if (src_fd == -1) {perror("open source file for reading");exit(EXIT_FAILURE);}// 3. 獲取源文件大小if (fstat(src_fd, &stat_buf) == -1) {perror("fstat source file");close(src_fd);exit(EXIT_FAILURE);}// 4. 創建/打開目標文件 (寫入/創建/截斷)dest_fd = open(dest_filename, O_WRONLY | O_CREAT | O_TRUNC, 0644);if (dest_fd == -1) {perror("open destination file");close(src_fd);exit(EXIT_FAILURE);}printf("Copying '%s' to '%s' using sendfile...\n", source_filename, dest_filename);// 5. 使用 sendfile 復制數據// 注意：在舊內核上，這可能會失敗，因為 out_fd 不是套接字while (total_bytes < stat_buf.st_size) {// 計算本次要發送的字節數 (防止溢出)size_t count = stat_buf.st_size - total_bytes;if (count > 0x7ffff000) { // sendfile 一次傳輸上限 (約 2GB)count = 0x7ffff000;}bytes_sent = sendfile(dest_fd, src_fd, &offset, count);if (bytes_sent == -1) {perror("sendfile");// 可能是內核不支持普通文件作為 out_fdif (errno == EINVAL) {printf("Error: sendfile likely doesn't support copying between regular files on this system/kernel.\n");printf("Consider using splice or standard read/write loop instead.\n");}close(src_fd);close(dest_fd);exit(EXIT_FAILURE);}total_bytes += bytes_sent;printf("Copied %zd bytes in this call, total: %zd/%ld\n",bytes_sent, total_bytes, (long)stat_buf.st_size);}printf("File copy completed successfully. %zd bytes copied.\n", total_bytes);// 6. 清理close(src_fd);close(dest_fd);return 0;
}

代碼解釋:

首先創建一個名為 source_file.txt 的源文件并寫入一些內容。
以只讀模式打開源文件，并使用 fstat 獲取其大小。
以寫入、創建、截斷模式打開（或創建）目標文件 dest_file_copy.txt。
進入 while 循環，使用 sendfile(dest_fd, src_fd, &offset, count) 將數據從源文件傳輸到目標文件。
關鍵: out_fd 是目標文件的描述符，這需要 Linux 內核 2.6.33 或更高版本的支持。在不支持的舊內核上，sendfile 會返回 -1，并將 errno 設置為 EINVAL。
循環直到復制完整個文件。
最后關閉兩個文件描述符。

重要提示與注意事項:

零拷貝優勢: sendfile 的主要優勢在于減少了數據在內核空間和用戶空間之間的拷貝次數，降低了 CPU 開銷，提高了吞吐量。
適用范圍:
- 經典用法: in_fd 是文件，out_fd 是套接字。這在所有支持 sendfile 的 Linux 版本上都有效。
- 擴展用法: in_fd 和 out_fd 都可以是普通文件（Linux 2.6.33+）或一個文件一個套接字。
非阻塞 I/O: sendfile 在處理非阻塞套接字時，如果套接字緩沖區已滿，sendfile 可能會傳輸部分數據并返回，或者根據平臺行為阻塞或返回錯誤（如 EAGAIN）。需要正確處理返回值。
傳輸限制: 一次 sendfile 調用傳輸的字節數可能有限制（歷史上是 0x7ffff000 字節）。對于大文件，可能需要循環調用。
offset 參數: 理解 offset 為 NULL 和非 NULL 時的行為差異非常重要。使用非 NULL offset 可以實現線程安全的文件傳輸，因為不修改文件自身的偏移量。
錯誤處理: 始終檢查 sendfile 的返回值。除了常見的錯誤碼，還要特別注意 EINVAL，它可能表示不支持的操作（如舊內核上文件到文件的復制）。
現代替代: splice 和 copy_file_range 是 sendfile 的現代替代或補充，提供了更靈活的數據傳輸能力。