命令小结

Linux中,有时需要将文件分割成更小的片段,比如为提高可读性,生成日志。使用 split命令,则可以将一个大文件分割成指定大小的很多个小文件,如果是文本文件,也可以按照行数进行拆分,默认是1000行作为一个拆分单位。

实战经验

命令格式与常用选项

split [ -b ] [ -C ] [ - ] [ -l ] [ 要切割的文件 ] [ 输出文件名前缀 ] [ -a ]

  • -b <字节> :指定按照多少字节进行拆分,也可指定 K,M,G,T单位

    -b, –bytes=SIZE

    put SIZE bytes per output file
  • -<行数> 或 -l <行数> :指定每多少行要拆分成一个文件

    -l, –lines=NUMBER

    put NUMBER lines per output file
  • 输出文件名前缀:设置拆分后的文件的名称前缀,split会自动在前缀后加上编号,默认从aa开始

  • -a<后缀长度>:默认的后缀长度是2,也就是按照aa、ab、ac这样的格式依此编号

    -a, –suffix-length=N

    generate suffixes of length N (default 2)

拆分文件与合并

如下,dd命令生成一个700M的文件,用400M单位拆分成两个文件,然后再合并为新文件,比较。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[baiyongan@bya split_test]$ dd if=/dev/zero bs=1024 count=700000 of=test_large_file
700000+0 records in
700000+0 records out
716800000 bytes (717 MB) copied, 2.02677 s, 354 MB/s
[baiyongan@bya split_test]$ ll
total 700000
-rw-rw-r-- 1 baiyongan baiyongan 716800000 May 24 11:39 test_large_file
[baiyongan@bya split_test]$ split -b 400M test_large_file
[baiyongan@bya split_test]$ ll
total 1400000
-rw-rw-r-- 1 baiyongan baiyongan 716800000 May 24 11:39 test_large_file
-rw-rw-r-- 1 baiyongan baiyongan 419430400 May 24 11:40 xaa
-rw-rw-r-- 1 baiyongan baiyongan 297369600 May 24 11:40 xab
[baiyongan@bya split_test]$ cat xaa xab > test_large_file_merged
[baiyongan@bya split_test]$ ll
total 2100000
-rw-rw-r-- 1 baiyongan baiyongan 716800000 May 24 11:39 test_large_file
-rw-rw-r-- 1 baiyongan baiyongan 716800000 May 24 11:40 test_large_file_merged
-rw-rw-r-- 1 baiyongan baiyongan 419430400 May 24 11:40 xaa
-rw-rw-r-- 1 baiyongan baiyongan 297369600 May 24 11:40 xab
[baiyongan@bya split_test]$

设置拆分文件的名称前、后缀

如下,以文件名 test_large_file_part_作为前缀, 以及设置数字后缀

-d, –numeric-suffixes[=FROM]
use numeric suffixes instead of alphabetic; FROM changes the start value (default 0)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[baiyongan@bya split_test]$ rm -rf xa*
[baiyongan@bya split_test]$ ll
total 1400000
-rw-rw-r-- 1 baiyongan baiyongan 716800000 May 24 11:39 test_large_file
-rw-rw-r-- 1 baiyongan baiyongan 716800000 May 24 11:40 test_large_file_merged
[baiyongan@bya split_test]$ split -b 400M test_large_file test_large_file_part_
[baiyongan@bya split_test]$ ll
total 2100000
-rw-rw-r-- 1 baiyongan baiyongan 716800000 May 24 11:39 test_large_file
-rw-rw-r-- 1 baiyongan baiyongan 716800000 May 24 11:40 test_large_file_merged
-rw-rw-r-- 1 baiyongan baiyongan 419430400 May 24 11:42 test_large_file_part_aa
-rw-rw-r-- 1 baiyongan baiyongan 297369600 May 24 11:42 test_large_file_part_ab
[baiyongan@bya split_test]$
[baiyongan@bya split_test]$ rm -rf *part*
[baiyongan@bya split_test]$ ll
total 1400000
-rw-rw-r-- 1 baiyongan baiyongan 716800000 May 24 11:39 test_large_file
-rw-rw-r-- 1 baiyongan baiyongan 716800000 May 24 11:40 test_large_file_merged
[baiyongan@bya split_test]$ split -b 400M -d test_large_file test_large_file_part_
[baiyongan@bya split_test]$ ll
total 2100000
-rw-rw-r-- 1 baiyongan baiyongan 716800000 May 24 11:39 test_large_file
-rw-rw-r-- 1 baiyongan baiyongan 716800000 May 24 11:40 test_large_file_merged
-rw-rw-r-- 1 baiyongan baiyongan 419430400 May 24 11:45 test_large_file_part_00
-rw-rw-r-- 1 baiyongan baiyongan 297369600 May 24 11:45 test_large_file_part_01
[baiyongan@bya split_test]$

按照行数进行拆分

例如将/etc/passwd 文件每十行进行拆分

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[baiyongan@bya split_test]$ split -d -10 /etc/passwd my_passwd_
[baiyongan@bya split_test]$ ll
total 2100024
-rw-rw-r-- 1 baiyongan baiyongan 385 May 24 11:47 my_passwd_00
-rw-rw-r-- 1 baiyongan baiyongan 543 May 24 11:47 my_passwd_01
-rw-rw-r-- 1 baiyongan baiyongan 607 May 24 11:47 my_passwd_02
-rw-rw-r-- 1 baiyongan baiyongan 526 May 24 11:47 my_passwd_03
-rw-rw-r-- 1 baiyongan baiyongan 589 May 24 11:47 my_passwd_04
-rw-rw-r-- 1 baiyongan baiyongan 56 May 24 11:47 my_passwd_05
[baiyongan@bya split_test]$
[baiyongan@bya split_test]$ wc -l my_passwd_*
10 my_passwd_00
10 my_passwd_01
10 my_passwd_02
10 my_passwd_03
10 my_passwd_04
1 my_passwd_05
51 total
[baiyongan@bya split_test]$

合并后的校验

网络传输大文件,或者再设备之间复制大文件的时候,可能会出现传输前后数据不一致的情况。

推荐使用md5sum进行计算,比对前后两个大文件的md5 值。

1
2
3
4
5
6
7
8
9
10
11
12
[baiyongan@bya split_test]$ ll
total 2100000
-rw-rw-r-- 1 baiyongan baiyongan 716800000 May 24 11:49 test_large_file
-rw-rw-r-- 1 baiyongan baiyongan 716800000 May 24 11:50 test_large_file_merged
-rw-rw-r-- 1 baiyongan baiyongan 419430400 May 24 11:50 test_large_file_part_00
-rw-rw-r-- 1 baiyongan baiyongan 297369600 May 24 11:50 test_large_file_part_01
[baiyongan@bya split_test]$ md5sum test_large_file
eacff27bf2db99c7301383b7d8c1c07c test_large_file
[baiyongan@bya split_test]$ md5sum test_large_file_merged
eacff27bf2db99c7301383b7d8c1c07c test_large_file_merged
[baiyongan@bya split_test]$

拓展阅读