我想将csv文件的每一行拆分成多个文本块,并将它们保存为单独的文本文件(它只有1列,每行包含一个文本块).我的items_split函数与定义的文本块完全正常,但是当应用于csv文件时,我收到了错误
“File “untitled.py”, line 25, in items_split
idx = text_lines.index(“ABC”) + 1ValueError: ‘ABC’ is not in list”
我使用的代码如下:
import re
import uuid
def items_split(file):
data=file
## First, we want to remove all empty lines in the text files
data = re.sub(r'\n\s*\n','\n',data,re.MULTILINE)
data = re.sub(r'\n\s*\n','\n',data,re.MULTILINE)
data = re.sub(r'\n\s*\n','\n',data,re.MULTILINE)
data = re.sub(r'\n\s*\n','\n',data,re.MULTILINE)
data = re.sub(r'\n\s*\n','\n',data,re.MULTILINE)
data = re.sub(r'\n\s*\n','\n',data,re.MULTILINE)
data = re.sub(r'\n\s*\n','\n',data,re.MULTILINE)
data = re.sub(r'\n\s*\n','\n',data,re.MULTILINE)
## Then, we remove all lines up to ABC
text_lines = data.split("\n")
idx = text_lines.index("ABC") + 1
data = "\n".join(text_lines[idx:])
## Last, we split the text files into multiple files, each with a news item
current_file = None
for line in data.split('\n'):
# Set initial filename,
if current_file == None and line != '':
current_file = str(uuid.uuid4()) + '.txt' #this will assign a random file name
#current_file = line + '.txt'
# This is to handle the blank line after Brief
if current_file == None:
continue
text_file = open(current_file, "a")
text_file.write(line + "\n")
text_file.close()
# Reset filename if we have finished this section
# which is idenfitied by:
# starts with Demographics - ^Demographics
# contains some random amount of text - .*
# ends with ) - )$
if re.match(r'^Demographics:.*\)$', line) is not None:
current_file = None
import csv
with open('Book1.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',')
for row in spamreader:
items_split(row)
例如,csv文件中的每一行都如下所示:
“MEDIA News report
ABC
Topic 1 dzfffa a agasgeaherhryyeshdh
Demographics: 12,000 (male 16+) • 7,000 (female 16+)
Topic 2
fszg seez trbwtewtmytmutryrmujfcj
Demographics: 10,000 (male 16+) • 5,000 (female 16+)
Are you happy with this content? “
我想把它分成:
ABC
Topic 1 dzfffa a agasgeaherhryyeshdh
Demographics: 12,000 (male 16+) • 7,000 (female 16+)
和
Topic 2
fszg seez trbwtewtmytmutryrmujfcj
Demographics: 10,000 (male 16+) • 5,000 (female 16+)
Are you happy with this content? “
并将每个保存为单独的文本文件.我已经在文本本身上运行了这个功能,它完全正常.问题是当我在csv文件上运行它时,它不知道每行都是一个文本块,我试图将它转换成字符串等是徒劳的.
最佳答案 Python有一个很棒的库,用于导入和读取CSV文件.永远不要重新发明轮子
从文档中的一个简短示例,解释如何从CSV文件中读取.
import csv
with open('eggs.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in spamreader:
print ', '.join(row)
此模块的工作方式类似,但现在它返回OrderedDict []类型,这使得导航文件更容易一些.
import csv
with open('names.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row['first_name'], row['last_name'])