python – CSV阅读器在前几个字符中拾取垃圾

2023年11月28日 263次阅读

我试图读取CSV文件的第一行并将其分配给标题. CSV文件如下所示：

TIME,DAY,MONTH,YEAR
"3:21","23","FEB","2018"
"3:23","23","FEB","2018"
...

这是代码：

import csv

with open("20180223.csv") as csvfile:
    rdr = csv.reader(csvfile)
    header = next(rdr)
    print(header)

我希望输出看起来像：

['TIME', 'DAY', 'MONTH', 'YEAR']

但是输出看起来像这样：

['ï»¿TIME', 'DAY', 'MONTH', 'YEAR']

我错过了什么？

最佳答案第一个字符是
Byte order mark字符.

试试这个：

with open("20180223.csv", encoding="utf-8-sig") as csvfile:

这个建议在documentation中有点隐藏,但它存在：

In some areas, it is also convention to use a “BOM” at the start of
UTF-8 encoded files; the name is misleading since UTF-8 is not
byte-order dependent. The mark simply announces that the file is
encoded in UTF-8. Use the ‘utf-8-sig’ codec to automatically skip the
mark if present for reading such files.