Linux文件系統與基礎IO

文章目錄

1 C文件接口
- 1.1 `fopen`
- 1.2 `fwrite`、`fread`、`rewind`、`fclose`
2 文件系統調用
- 2.1 `open`
- - 2.1.1 參數2：`flags`
  - 2.1.2 參數3：`mode`
  - 2.1.3 返回值——`file descriptor`
- 2.2 write
- 2.3 read
- 2.4 close
3 文件的本質
- 3.1 `struct file`
- 3.2 一個進程如何與多個文件相關聯？
4 重定向
- 4.1 文件描述符對應的分配規則？
- 4.2 `dup2`
- 4.3 重定向`stdout`和`stderr`
5 緩沖區
6 硬盤（固態硬盤(SSD)/機械硬盤(磁盤)）
- 6.1 磁盤
- 6.2 對磁盤的抽象
7 如何理解目錄
8 軟硬鏈接
- 8.1 建立軟連接
- 8.2 建立硬鏈接
9 動/靜態庫
- 9.1 靜態庫
- 9.2 動態庫
- - 9.2.1 如何讓可執行程序找到動態庫
  - 9.2.2 動態庫時怎么被加載的
  - 9.2.2 動態庫時怎么被加載的

1 C文件接口

1.1 `fopen`

fopen新建的文件，如果是相對路徑，在進程的工作路徑下創建-
w：清空寫入
a：追加

1.2 `fwrite`、`fread`、`rewind`、`fclose`

  1 #include <stdio.h>2 #include <string.h>3 #include <stdlib.h>4 int main() {5     FILE* f = fopen("bite.txt", "w+");6     const char* msg = "linux so easy\n";7     fwrite(msg, strlen(msg), 1, f);8 9     rewind(f);   //重置偏移量！！！                                                                                                         10     char buffer[strlen(msg)];11     fread(buffer, 1, strlen(msg), f);12     printf("%s\n", buffer);13     fclose(f);14 }

2 文件系統調用

2.1 `open`

//頭文件：
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>int open(const char* pathname, int flags);
int open(const char* pathname, int flags, mode_t mode);

2.1.1 參數2：`flags`

O_RDONLY：只讀
0_WRONLY: 只寫
O_CREAT：如果文件不存在，則創建文件到path路徑下
0_TRUNC：打開的時候先清空（truncate）
O_APPEND：在追加模式下打開，寫入時在已有內容后追加

2.1.2 參數3：`mode`

umask:權限掩碼：權限 & ~umask —> 最終權限（八進制, eg. 0xxx）

umask系統調用改變umask：

頭文件：<sys/types.h> <sys/stat.h>
``mode_t umask(mode_t mask);`
只改變當前進程的umask，不改變系統的，進程里用自己進程的umask

2.1.3 返回值——`file descriptor`

實質為一個數組下標（詳細看3.2小節）

當調用write時，將fd傳遞給進程，進程根據files指針找到文件描述符表，然后由對應下標(fd)找到打開的文件file

而C語言打開文件返回的FILE是C語言自己封裝的結構體，里面一定含由文件描述符

cout << stdin->_fileno << endl;  //0
cout << stdout->_fileno << endl;  //1
cout << stderr->_fileno << endl;  //2

2.2 write

#include <unistd.h>ssize_t write(int fd, const void* buf, size_t count);

參數1：文件描述符
參數2：寫入內容
參數3：寫入內容的長度strlen(messsage)

2.3 read

#include <unistd.h>ssize_t read(int fd, void *buf, size_t count);

返回值
- 大于0：返回讀取的字節數
- 0：寫端關閉情況
- -1：讀取錯誤

2.4 close

3 文件的本質

3.1 `struct file`

操作系統維護一個被打開文件的信息：struct file, 包含：

在磁盤的什么位置
基本屬性：權限，大小，讀寫位置，誰打開的
文件的內核緩沖區信息
struct file* next指針，將不同文件鏈接起來

3.2 一個進程如何與多個文件相關聯？

task_struct中含有一個stuct files_struct *files記錄自己打開文件的信息，stuct files_struct *file里包含一個struct file *fd_array[]指針數組，存放文件指針（所以open時，會選擇一個空的fd_array位置的下標返回）
struct file *fd_array[]文件描述符表，數組加標0、1、2分別指向三個默認打開的文件：stdin（鍵盤文件）、stdout（顯示器文件）、stderr（顯示器文件）

  1 #include <stdio.h>2 #include <unistd.h>3 #include <string.h>4 #include <sys/types.h>5 #include <sys/stat.h>6 #include <fcntl.h>7 8 int main() {9     const char* msg = "hello\n";10     write(1, msg, strlen(msg));   //想顯示器寫入11 12     char buffer[1024];13     ssize_t s = read(0, buffer, sizeof(buffer));   //向鍵盤讀數據14     buffer[s] = '\0';                                                                                                                       15     printf("echo : %s\n", buffer);16     return 0;17 }

4 重定向

4.1 文件描述符對應的分配規則？

從0下標開始，尋找最小沒有使用的數組位置

int main() {close(1);int fd = open(filename, O_CREAT|O_WRONLY|O_TRUNC, 0666);if (fd < 0) {perror("open");return 1;}const char *msg = "hello\n";for (int i = 0; i < 5; i++) {//因為關閉了1，open文件之后占用1這個位置，寫入從顯示器重定向到了文件中write(1, msg, strlen(msg));    }close(fd);
}

4.2 `dup2`

#include <unistd.h>int dup2(int oldfd, int newfd)   //makes newfd be the copy of oldfd

重定向：將文件描述符對應下標的指針拷貝到要重定向的文件的位值的指針

fd_array[oldfd]拷貝到fd_array[newfd], 拷貝之后需要close(oldfd);

4.3 重定向`stdout`和`stderr`

int mian() {fprintf(stdout, "normal msg");fprintf(stdout, "normal msg");fprintf(stdout, "normal msg");fprintf(stderr, "error msg");fprintf(stderr, "error msg");fprintf(stderr, "error msg");
}
//gcc test.c -o test

./test > normal.log:normal msg重定向到normal.log， error msg打印到屏幕

將stdout和stderr都重定向到一個文件all.log

./test &>all.log
./test >&all.log
./test >all.log 2>&1
./test 2>all.log 1>all.log

5 緩沖區

C的輸出接口輸出到用戶級緩沖區（該緩沖區不在系統中）
顯示器的文件的刷新方案是行刷新，所以在printf執行完成遇到\n就會將數據進行刷新
緩沖區刷新策略：
- 無緩沖——直接刷新
- 行緩沖——碰到\n刷新 —— 顯示器
- 全緩沖——緩沖區滿了才刷新 —— 文件寫入
- 進程退出
fprintf/fwrite等向用戶級緩沖區中寫入，當緩沖區刷新時調用write系統調用接口（因此，C中fflush函數一定封裝了write）,write向系統級緩沖區中寫入
為什么要有用戶級的緩沖區
- 解決效率問題
- 配合格式化
用戶及的緩沖區在哪里？—— 存在FILE結構體中，

  1 #include <stdio.h>2 #include <unistd.h>3 #include <string.h>4 5 int main() {6     const char* fstr = "hello fwrite\n";7     const char* str = "hello write\n";8 9     printf("hello printf, pid:%d, ppid:%d\n", getpid(), getppid());10     fprintf(stdout, "hello fprintf, pid:%d, ppid:%d\n", getpid(), getppid());11     fwrite(fstr, strlen(fstr), 1, stdout);12                       13     write(1, str, strlen(str));14 15     fork();16 17 }

write為系統調用接口，直接刷新，而由于重定向輸出到文件，用戶級緩沖區的刷新策略更改為全緩沖，fork后子進程寫時拷貝，而緩沖區也會隨著FILE結構體的拷貝而拷貝，當子進程退出后刷新緩沖區，接著父進程退出也刷新緩沖區

FILE結構體：

在/usr/include/libio.h
struct _IO_FILE {int _flags; /* High-order word is _IO_MAGIC; rest is flags. */
#define _IO_file_flags _flags//緩沖區相關/* The following pointers correspond to the C++ streambuf protocol. *//* Note: Tk uses the _IO_read_ptr and _IO_read_end fields directly. */char* _IO_read_ptr; /* Current read pointer */char* _IO_read_end; /* End of get area. */char* _IO_read_base; /* Start of putback+get area. */char* _IO_write_base; /* Start of put area. */char* _IO_write_ptr; /* Current put pointer. */char* _IO_write_end; /* End of put area. */char* _IO_buf_base; /* Start of reserve area. */char* _IO_buf_end; /* End of reserve area. *//* The following fields are used to support backing up and undo. */char *_IO_save_base; /* Pointer to start of non-current get area. */char *_IO_backup_base; /* Pointer to first valid character of backup area */char *_IO_save_end; /* Pointer to end of non-current get area. */struct _IO_marker *_markers;struct _IO_FILE *_chain;int _fileno; //封裝的文件描述符
#if 0int _blksize;
#elseint _flags2;
#endif_IO_off_t _old_offset; /* This used to be _offset but it's too small. */
#define __HAVE_COLUMN /* temporary *//* 1+column number of pbase(); 0 is unknown. */unsigned short _cur_column;signed char _vtable_offset;char _shortbuf[1];/* char* _save_gptr; char* _save_egptr; */_IO_lock_t *_lock;
#ifdef _IO_USE_OLD_IO_FILE
};

6 硬盤（固態硬盤(SSD)/機械硬盤(磁盤)）

磁盤上存儲的文件 = 文件的內容 + 文件的屬性
文件內容——數據塊，文件屬性 —— inode
文件在磁盤當中的存儲是將屬性和內容分開存儲的

6.1 磁盤

定位一個扇區：面（定位該用哪個磁頭） -> 磁道（柱面） -> 扇區（CHS尋址方式）

時間消耗主要來自于尋道時間

6.2 對磁盤的抽象

LBA地址：將磁盤磁頭、磁道、扇區邏輯抽象成一個一維數組，通過除模運算計算出CHS

09B31BEA48DF0457251FF93A6F55B0A6

建立聯系，
在Linux中，用于標識文件, 找到inode編號->inode table -> struct inode -> blocks[] -> 文件內容

struct inode {inode number//文件類型//權限 : w/r/x//引用計數//擁有者//所屬組//ACM時間int blocks[N]    //
}

inode table: 存放inode，每個inode有唯一的編號（一個文件一個inode，一個inode可能對應多個block）
- ls -li: 查看inode編號
Block Bitmap：位圖，標記塊是否被使用
inode Bitmap：位圖，標記inode編號是否是有效的
Group Descriptor Table：
Super Block：文件系統的基本信息，包含整個分區的基本使用情況
- 一共有多少個組、每個組的大小，每個組inode的數量、每個組的block數量、每個組的其實inode、文件系統的類型和名稱

7 如何理解目錄

目錄是文件：內容 + 屬性，也有inode
目錄也有數據塊，存放目錄下，文件的文件名和對應文件與inode的映射關系
- 因此同一目錄下不能有相同文件名
- 若該目錄沒有w權限，無法創建文件：因為無法將文件名與inode寫入該目錄的數據塊
- 若該目錄沒有r權限，無法查看該目錄
- 若該目錄沒有x權限，無法進入該目錄
dentry緩存
- 如何知道自己的inode？當前的目錄的數據塊中存放當前目錄下文件名與inode的映射關系，而當前目錄又被上級目錄視為文件，存放該目錄的inode與數據塊，所以當訪問一個文件的inode時需要遞歸到根目錄再從根目錄訪問到當前inode

8 軟硬鏈接

8.1 建立軟連接

ln -s file.txt soft-link

軟連接具有獨立的inode，也有獨立的數據塊，它的數據塊里面保存的是指向文件的路徑（類似于快捷方式）

8.2 建立硬鏈接

ln test.txt hard-link

硬鏈接具有相同的inode，本質上是在當前目錄下，建立新的文件名字與inode鏈接（取別名/引用）
不允許給目錄建立硬鏈接（除非是 . 和 ..）,不然會造成查找路徑的環路問題

9 動/靜態庫

靜態庫：libXXX.a
動態庫：libXXX.so

9.1 靜態庫

靜態庫本質上時一些.o文件的集合
ar是gun歸檔工具, 用于打包靜態庫，rc表示replace and create

lib=libmymath.a$(lib):mymath.o     //可能有多個.o文件ar -rc $@ $^
mymath.o:mymath.cgcc -c $^.PHONY:
clean:rm -f *.a *.o.PHONY:output
output:mkdir -p lib/includemkdir -p lib/mymathlibcp *.h lib/includecp *.a lib/mymathlib

使用庫：

找到頭文件路徑 —— -I
找到庫的路徑（否則鏈接時報錯）—— -L
并且說明鏈接該路徑下的哪一個庫 —— -l (去掉lib，去掉.a，剩下的名字) ；第三方庫必須指定庫名稱

gcc main.c -I ./lib/include/ -L ./lib/mymathlib/ -lmymath

查看可執行文件所用的標準庫（動態庫）

ldd a.out

庫的安裝

sudo cp lib/include/math.h /usr/include/
sudo cp lib/mymathlib/libmymath.a /lib64/

建立軟連接（不建議這么做）

9.2 動態庫

生成.o文件

gcc -fPIC -c mylob.c

(-c 不知名目標文件時，生成的時同名.o文件)

生成.so文件

gcc -shared -o libmymethod *.o

(不加-shared生成的是可執行文件)

當程序運行動態庫中的方法，系統會將動態庫加載到內存中執行，所以.so文件自動帶有x可執行權限

dy-lib=libmymethod.so
static-lib=libmymath.a.PHONY:all
all: $(dy-lib) $(static-lib)$(static-lib):mymath.oar -rc $@ $^
mymath.o:mymath.cgcc -c $^$(dy-lib):mylog.o myprint.o
mylog.o:mylog.cgcc -fPIC -c $^
myprint.o:myprint.cgcc -fPIC -c $^.PHONY:clean
clean:rm -rf *.o *.a *.so mylib.PHONY:output
output:mkdir -p mylib/includemkdir -p mylib/libcp *.h mylib/includecp *.a mylib/libcp *.so mylib/lib

編譯時于靜態庫相同
-fPIC：與地址無關碼

9.2.1 如何讓可執行程序找到動態庫

四種方法：

將動態庫拷貝到/lib64下
建立在/lib64下的軟連接

ln -s xxx(絕對路徑) /lib64/xxx

添加到環境變量

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/xxx/xxxx/xx(絕對路徑)

- cd /etc/ld.so.conf.d
- 創建一個.conf文件
- 將動態庫路徑添加到該文件中
- 執行ldconfig

9.2.2 動態庫時怎么被加載的

動態庫在系統中加載后，會被所有進程共享
共享庫中的全局變量，既然會被共享，那么會不會沖突？不會，因為會發生寫時拷貝
程序在編譯好之后，內部有地址，及就是虛擬地址，編譯器也要考慮程序內存加載的問題
共享庫肯能非常大，所以使用固定位置是不現實的，庫可以在虛擬內存的共享區中任意位置加載，動態庫內部的函數不采用絕對編址，只需要表示每個函數在庫中的偏移量即可，通過庫的起始地址 + 偏移量找到函數
- 所以編譯形成動態庫的鏈接文件（.o）時，需要帶選項-fPIC：（position independent code）
  s xxx(絕對路徑) /lib64/xxx


3. 添加到環境變量```bash
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/xxx/xxxx/xx(絕對路徑)

- cd /etc/ld.so.conf.d
- 創建一個.conf文件
- 將動態庫路徑添加到該文件中
- 執行ldconfig

9.2.2 動態庫時怎么被加載的

動態庫在系統中加載后，會被所有進程共享
共享庫中的全局變量，既然會被共享，那么會不會沖突？不會，因為會發生寫時拷貝
程序在編譯好之后，內部有地址，及就是虛擬地址，編譯器也要考慮程序內存加載的問題
共享庫肯能非常大，所以使用固定位置是不現實的，庫可以在虛擬內存的共享區中任意位置加載，動態庫內部的函數不采用絕對編址，只需要表示每個函數在庫中的偏移量即可，通過庫的起始地址 + 偏移量找到函數
- 所以編譯形成動態庫的鏈接文件（.o）時，需要帶選項-fPIC：（position independent code）