去除文件頭部的u+feff
Today, we encountered an error while trying to create some database seeds from a CSV. This CSV was originally generated by me using a Ruby script which piped the output to a file and saved as a CSV.
今天,我們在嘗試從CSV創建一些數據庫種子時遇到錯誤。 該CSV最初是由我使用Ruby腳本生成的,該腳本將輸出通過管道傳輸到文件并另存為CSV。
The CSV was checked in to Git and had been used for awhile until we had to update some parts of it by adding a new column and fixing some values.
CSV已簽入Git,并使用了一段時間,直到我們不得不通過添加新列并修復一些值來更新其中的某些部分。
While we don’t know the exact reason yet, my theory is that somehow, Excel for Mac (we are all using Macs) added some additional metadata to it even after saving the file as a CSV.
盡管我們尚不知道確切原因,但我的理論是,即使將文件另存為CSV,Excel for Mac(我們都在使用Mac)也向其中添加了一些其他元數據。
This in turn made anyone using the seed receive the following error:
反過來,這使使用種子的任何人都收到以下錯誤:
CSV::MalformedCSVError: Illegal quoting in line 1.
I opened the CSV file and nothing looked suspicious. My first thought was some left/right quotation marks were somehow mixed into the file instead of just the ‘normal’ double quotes: "
. But upon further investigation, there was nothing out of the ordinary. This led me to just wipe out the whole file, and actually type out the first row again.
我打開了CSV文件,但沒有任何可疑的地方。 我首先想到的是,文件中混入了一些左/右引號,而不僅僅是“正常”雙引號: "
。但是,經過進一步的調查,發現并沒有什么不尋常的地方。這導致我只消了整個內容。文件,然后再次鍵入第一行。
I saved that file again and ran the migration:
我再次保存該文件并運行遷移:
CSV::MalformedCSVError: Illegal quoting in line 1.
What?!
什么?!
Okay, this was driving me nuts. I opened up a new file, typed the exact single line again, and ran the migration. It worked. So what was in that file?!
好吧,這真讓我發瘋。 我打開了一個新文件,再次鍵入了確切的單行,然后運行了遷移。 有效。 那那個文件里有什么?
Only one way to find out:
只有一種方法可以找出:
cat companies.csv | pbcopy | pbpaste > temp.csv
rm companies.csv
mv temp.csv companies.csv
git diff
So OSX has these two functions that are very useful: pbcopy
and pbpaste
. Basically anything piped to pbcopy
gets into your clipboard and pbpaste
puts what you have on your clipboard to standard output (stdout). But it removes all formatting.
因此OSX具有這兩個非常有用的功能: pbcopy
和pbpaste
。 基本上,通過管道傳輸到pbcopy
都會進入剪貼板,而pbpaste
會將剪貼板上的pbpaste
放入標準輸出(stdout)。 但是它將刪除所有格式。
Very useful when you want to just copy some text from somewhere and you want to paste it into a WYSIWYG editor without all the formatting. Like when writing an email from Gmail, for example.
當您只想從某處復制一些文本并將其粘貼到WYSIWYG編輯器而不使用所有格式時,此功能非常有用。 例如,從Gmail編寫電子郵件時。
I then removed the original file and saved the new ‘unformatted’ file with the same file name so I could see the difference.
然后,我刪除了原始文件,并使用相同的文件名保存了新的“未格式化”文件,這樣我就可以看到區別。
And we finally saw the invisible man:
最后我們看到了那個看不見的人:
A quick Google search told us that our friend U+FEFF
was called a ZERO WIDTH NO-BREAK SPACE
. Also, a quick trip to Wikipedia told us about the actual uses for U+FEFF
, more commonly known as Byte order mark
or BOM
.
快速的Google搜索告訴我們,我們的朋友U+FEFF
被稱為ZERO WIDTH NO-BREAK SPACE
。 另外, 快速訪問Wikipedia告訴了我們U+FEFF
的實際用法,通常被稱為Byte order mark
或BOM
。
Our friend FEFF
means different things, but it’s basically a signal for a program on how to read the text. It can be UTF-8
(more common), UTF-16
, or even UTF-32
.
我們的朋友FEFF
意味著不同的事情,但這基本上是一個程序如何閱讀文本的信號。 它可以是UTF-8
(更常見), UTF-16
甚至UTF-32
。
FEFF
itself is for UTF-16
— in UTF-8
it is more commonly known as 0xEF,0xBB, or 0xBF
.
FEFF
本身是針對UTF-16
-在UTF-8
它通常被稱為0xEF,0xBB, or 0xBF
。
From my understanding, when the CSV file was opened in Excel and saved, Excel created a space for our invisible stowaway, U+FEFF
. And in front of the file to boot!
據我了解,當在Excel中打開并保存CSV文件時,Excel為我們的隱形U+FEFF
創建了一個空間。 并在文件前面啟動!
Excel did some magic, and it was probably saved in UTF-16
instead of UTF-8
. UTF-8
does not understand BOM
and just treats it as a non-character so visually, the file was okay. But Ruby’s CSV
thought that there was something wrong because it assumed the file it was reading was UTF-8
and it couldn’t ignore Mr. U+FEFF
.
Excel做了一些魔術,它可能保存在UTF-16
而不是UTF-8
。 UTF-8
不了解BOM
而只是將其視為非字符,因此從視覺上看,該文件還可以。 但是Ruby的CSV
認為出了點問題,因為它假定正在讀取的文件是UTF-8
,并且不能忽略U+FEFF
先生。
So lesson learned: don’t open (and save!) a CSV file in Excel if you want to feed it to Ruby’s CSV
parser.
因此,我們汲取了教訓:如果您想將其饋送到Ruby的CSV
解析器中,請不要在Excel中打開(并保存!)CSV文件。
If you do ever encounter an error like that, be sure to look for hidden characters not shown by your editor. If you still can’t see it and are using OSX, then pbcopy
and pbpaste
will help you out — they strip out any formatting or hidden characters from text in addition to copying and pasting it.
如果您確實遇到過這樣的錯誤,請確保查找編輯器未顯示的隱藏字符。 如果您仍然看不到它并使用OSX,則pbcopy
和pbpaste
將為您提供幫助-除了復制和粘貼外,它們還會從文本中刪除所有格式或隱藏字符。
翻譯自: https://www.freecodecamp.org/news/a-quick-tale-about-feff-the-invisible-character-cd25cd4630e7/
去除文件頭部的u+feff