為什么用rsync刪除大量文件的時候比用rm快
今天研究怎么在Linux下快速刪除大量文件,
搜到很多人都說可以用rsync來刪除大量文件,
速度比rm要快很多,但是沒有人說為什么,
仔細研究了一下原因,總結起來大概就是,一個是列出文件的時候,
在文件非常多的時候會導致慢, 另外就是刪除導致B樹rebanlance導致開銷
rsync減少了這種開銷,所以速度比rm要快
==轉===========
rm on a directory with millions of files
When a directory needs to be listed readdir() is called on the
directory which yields a list of files. readdir is a posix call,
but the real linux system call being used here is called
'getdents'. Getdents list directory entries by filling a buffer is
entries.
The problem is mainly down to the fact that that readdir()
uses a fixed buffer size of 32Kb to fetch files. As a directory
gets larger and larger (the size increases as files are added) ext3
gets slower and slower to fetch entries and additional readdir's
32Kb buffer size is only sufficient to include a fraction of the
entries in the directory. This causes readdir to loop over and over
and invoke the expensive system call over and over.
。。。。
I revisited this today, because most filesystems store their
directory structures in a btree format, the order of which you
delete files is also important. One needs to avoid rebalancing the
btree when you perform the unlink.
As such I added a sort before deletes occur. The program will
now (on my system) delete 1000000 files in 43 seconds. The closest
program to this was rsync -a --delete which took 60 seconds (which
also does deletions in-order, too but does not perform an efficient
directory lookup).
Efficiently delete large directory containing thousands of
files
linux-快速刪除五十萬+文件的方法
linux下面快速刪除大量文件及快速復制大量小文件