我试图在
Python中读取大型JSON文件(data.json).因为JSON文件有多个JSON对象,并且将在Python中创建多个字典(字典的数量未知),所以我使用了decoder.raw_decode()和generator.
以下是代码:
import json
def parse():
with open('data.json',encoding='utf-8') as jfile:
try:
while True:
decoder = json.JSONDecoder()
obj, idx = decoder.raw_decode(jfile)
yield obj
except ValueError as e:
print(e)
pass
else:
print("aha")
def main():
imputd=parse()
if imputd:
while True:
try:
print(str(next(imputd)).readlines())
except StopIteration as e:
print(e)
break
main()
我收到错误:
Traceback (most recent call last):
File "H:\Document\Python\j10.py", line 57, in <module>
main()
File "H:\Document\Python\j10.py", line 36, in main
print(str(next(imputd)).readlines())
File "H:\Document\Python\j10.py", line 21, in parse
obj, idx = decoder.raw_decode(jfile)
File "C:\Python34\lib\json\decoder.py", line 360, in raw_decode
obj, end = self.scan_once(s, idx)
TypeError: first argument must be a string, not _io.TextIOWrapper
更新——–解决了! :d ———————————————— ——————-
import json
file=open('data.json',encoding='utf-8')
def readin():
return file.read(buffersize)
def parse():
decoder = json.JSONDecoder(strict=False)
buffer = ''
for chunk in iter(readin, ''):
buffer += chunk
while buffer:
try:
result, index = decoder.raw_decode(buffer)
yield result
buffer = buffer[index:]
except ValueError as e:
print("1",e)
# Not enough data to decode, read more
break
def main():
imputd=parse()
output = open('output.txt', 'w')
output.write(json.dumps(next(imputd)))
main()
非常感谢你的帮助!
最佳答案 您正在传入文件对象,但decoder.raw_decode()仅接受文本数据.你需要自己做阅读:
obj, idx = decoder.raw_decode(jfile.read())
然后,您将生成从JSON数据创建的Python对象,因此main()函数循环中的.readlines()调用也将失败.
但是,您没有正确使用raw_decode().你自己负责提供大量的文本,它不会从文件中读取该文本.如果你想以块的形式处理文件,并且JSON条目之间没有明确的分隔符,你将被迫以块的形式读取文件:
decoder = json.JSONDecoder()
buffer = ''
for chunk in iter(partial(jfile.read, buffersize), ''):
buffer += chunk
while buffer:
try:
result, index = decoder.raw_decode(buffer)
yield result
buffer = buffer[index:]
except ValueError:
# Not enough data to decode, read more
break
这仍将产生完全解码的对象;如果您的文件是一个长JSON对象(如一个顶级列表或字典),那么这将不会逐个产生该对象的内容;在屈服之前它仍会读取整个对象.