读取PDF文件的中文文字(二)

2024年3月10日 143次阅读来源: xuexiaodong2009

之前使用过一种方式iTextSharp读取PDF，但在使用过程中发现有一部分PDF文件不能读取，于是只能重新查找PDF文件的读取的方法，终于找到了一个Spire.Pdf也可以读取中文的PDF文件。

安装NUGet程序包Spire.PDF

核心代码如下：

 public static string ReadPFD2(string path)
        {
            // string path = path;// @"D:\ydfile\d4bab8ff-26ff-4ddf-a602-872f6988db86_.pdf";
            string text = string.Empty;
            try
            {
                string pdffilename = path;
                StringBuilder buffer = new StringBuilder();
                //Create a pdf document.
                using (Spire.Pdf.PdfDocument doc = new Spire.Pdf.PdfDocument())
                {
                    // Load the PDF Document
                    doc.LoadFromFile(pdffilename);
                    // String for hold the extracted text
                 
                    foreach (Spire.Pdf.PdfPageBase page in doc.Pages)
                    {
                        buffer.Append(page.ExtractText());
                    }
                    doc.Close();
                }                   
                //save text
                text = buffer.ToString();               
                return text;
            }
            catch (Exception ex)
            {
                DHC.EAS.Common.LogInfo.Debug("读取PDF文件返回=" + text);
                DHC.EAS.Common.LogInfo.Debug("读取PDF文件错误", ex);
                return null;
            }
        }

Spire.Pdf 的各种操作总结

    原文作者：xuexiaodong2009
    原文地址: https://blog.csdn.net/xuexiaodong2009/article/details/82995535
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。