shell腳本單詞去重多個文件

例如要求如下：

有多個txt文件，每個文件內有多行單詞
中間為英文的”,”，或者中文的”，”逗號作為分隔符。

world,世界
set，設置
good,好，商品
....

將這些文件匯總去除重復的單詞，并輸出到一個新的文件內
要求可以不區分大小寫

實現

#! /bin/bash
#------------------------------------------------------------------------------
# Filename:    filterWords.sh
# Usage:       ./filterWords.sh ~/test/
# Version:     1.0
# Date:        2018-04-04
# Author:      vincent
# Email:       N/A
# Description: 此腳本用于過濾多個文件的重復單詞，保留唯一的單詞，并輸出結果到新的文件
#              忽略大小寫，如果單詞重復，隨機保留釋義
#              支持格式：
#                  set,設置        #英文標點
#                  set，設置       #中文標點
#                  set,設置，集合   #支持多個“,”，但是默認第一分隔符前面為單詞
#                  SeT,設置        #支持不許分大小寫
# Notes:       N/A
#-------------------------------------------------------------------------------declare folderPath=$1
declare currentTime=$(date +%F"-"%H"-"%M"-"%S)
declare outputPath="${currentTime}_words.txt"
declare wordsCounts=0outputMsg()
{if [ $1 -ne 0 ]thenecho $2exit 1fi
}# 檢驗路徑是否為空
if [ -z $folderPath ]
thenfolderPath="."
else# 檢驗路徑是否存在if [ ! -d $folderPath ]thenecho "${folderPath} is not existed !"exit 1fi
fifileList=$(find $folderPath -type f -name "*.txt")
outputMsg $? "Find txt file failed!"
if [[ -z $fileList ]]
thenecho "No txt files are found."exit 1
fi# 支持英文的“,”或者中文的“，”分隔符,忽略大小寫 
# 這里在復制代碼的時候，注意格式，最好自己縮短為一行，否則很容易出錯
# 設置兩種分隔符
awk -F'[,|，]' 'BEGIN{key="";}{key=tolower($1);words[key]=$0}END{for(i in words) print words[i]}' $fileList > $outputPath
outputMsg $? "Filter words failed!"wordsCounts=$(wc -l $outputPath)
echo "Words counts: "
echo $wordsCounts

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/536116.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/536116.shtml
英文地址，請注明出處：http://en.pswp.cn/news/536116.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！