如何将多页PDF转换为Python中的图像对象列表？

2023年7月7日 236次阅读

我想将多页PDF文档转换为列表结构中的一系列图像对象,而不是在 Python中将图像保存在磁盘中(我想用PIL Image处理它们).到目前为止,我只能这样做才能将图像写入文件：

from wand.image import Image

with Image(filename='source.pdf') as img:

    with img.convert('png') as converted:
        converted.save(filename='pyout/page.png')

但是,如何将上面的img对象直接转换为PIL.Image对象列表呢？

最佳答案新答案：

pip install pdf2image

from pdf2image import convert_from_path, convert_from_bytes
images = convert_from_path('/path/to/my.pdf')

您可能还需要安装枕头.这可能只适用于Linux.

https://github.com/Belval/pdf2image

两种方法的结果可能不同.

老答案：

Python 3.4：

from PIL import Image
from wand.image import Image as wimage
import os
import io

if __name__ == "__main__":
    filepath = "fill this in"
    assert os.path.exists(filepath)
    page_images = []
    with wimage(filename=filepath, resolution=200) as img:
        for page_wand_image_seq in img.sequence:
            page_wand_image = wimage(page_wand_image_seq)
            page_jpeg_bytes = page_wand_image.make_blob(format="jpeg")
            page_jpeg_data = io.BytesIO(page_jpeg_bytes)
            page_image = Image.open(page_jpeg_data)
            page_images.append(page_image)

最后,您可以对mogrify进行系统调用,但由于您需要管理临时文件,因此可能会更复杂.