(doc, docx)文档合并的三种方法

Word文档合并几种方式

  • 通过com.spire.doc包

具体参考地址:https://www.e-iceblue.cn/spiredocforjavaoperating/merge-word-documents-in-java.html,它是自己开发的一个东西,所以在maven仓库中你是搜索不到这个包的,所以引入的时候需要将包和仓库一起引入进来,不然编译的时候他会报com.spire.doc这个包找不到什么的。http://repo.e-iceblue.cn/repository/maven-public/这个是它的仓库,你可以直接点进去看看他还有哪些包。具体的请看他们的官网文档,感觉蛮好。

<dependency>

      <groupId>e-iceblue</groupId>

      <artifactId>spire.doc</artifactId>

      <version>3.4.10</version>

</dependency>

<repositories>

        <repository>

            <id>com.e-iceblue</id>

<url>http://repo.e-iceblue.cn/repository/maven-public/</url>

        </repository>

</repositories>

  1. 文档合并,第二个文档不填充到第一个文档的末尾

// 因里面的源码被对面防止反编译了,所以对其源码的解析就不深入的探讨,他只是对外开放的提供了几个较好的方法。

public class MergeWordDocument {

    public static void main(String[] args){

        //获取第一个文档的路径

        String filePath1 = “merge1.docx”;

        //获取第二个文档的路径

        String filePath2 = “merge2.docx”;

        //加载第一个文档

        Document document = new Document(filePath1);

        //使用insertTextFromFile方法将第二个文档的内容插入到第一个文档

        document.insertTextFromFile(filePath2, FileFormat.Docx_2013);

        //保存文档

        document.saveToFile(“Output.docx”, FileFormat.Docx_2013);

    }

}

我们可以使用Document类中的insertTextFromFile方法将不同的文档合并到同一个文档。需要注意的是,使用该方法合并文档时,被合并文档的内容默认从新的一页拼接。

弊端: 当你使用的不是免费的时候,会在你合并的文档中出现一行字 Evaluation Warning: The document was created with Spire.Doc for JAVA. 

  1. 文档合并,第二个文档填充到第一个文档的末尾,section的使用,这里可以合并多个文件,传递的参数是文件路径。

public static void merge1(List<String> fileList) {

   List<Document> docList = new ArrayList<>(fileList.size());

   for (String str : fileList) {

        Document document = new Document(str);

        docList.add(document);

    }

    for (int i = 0; i < docList.size(); i++) {

       if (i != docList.size()-1) {

           Section lastSection = docList.get(i).getLastSection();

           for (Section section : (Iterable<Section>) docList.get(i + 1).getSections()) {

              for(DocumentObject obj: ( Iterable<DocumentObject> ) section.getBody().getChildObjects()) {

                        lastSection.getBody().getChildObjects().add(obj.deepClone());

                    }

                }

                docList.get(i).saveToFile(“Doc1.doc”, FileFormat.Docx_2013);

            }

        }

}

  • 通过POI的形式进行文档合并

针对于POI的形式进行文档合并,主要是通过XWPFDocument对象操作的是后缀名为docx的文档和HWPFDocument 对象操作后缀名为doc文档,两者对象的区别很大。

目前就使用XWPFDocument 操作。

需要引入的jar包主要是以下几个,我写demo的使用用的springboot,所以直接在pom文件里面加依赖就行了,如果各位不是通过pom.xml文件的形式引入的话,去maven中心仓库找,下jar包。

  1. 引入jar包

<dependency>

   <groupId>org.apache.poi</groupId>

   <artifactId>poi</artifactId>

   <version>3.8</version>

</dependency>

<dependency>

   <groupId>org.apache.poi</groupId>

   <artifactId>poi-scratchpad</artifactId>

   <version>3.8</version>

</dependency>

<dependency>

   <groupId>org.apache.poi</groupId>

   <artifactId>poi-ooxml</artifactId>

   <version>3.8</version>

</dependency>

  1. 上代码

public static void mergeWord(List<InputStream> wordList, OutputStream outputStream) throws IOException, XmlException {
    if (CollectionUtils.isEmpty(wordList)) {
        return;
    }
    // docx
    XWPFDocument newDocument = null;
    CTBody newCtBody = null;
    Map<String, Object> headMap = new HashMap<>(30);
    String newString = null;
    String prefix = null;
    StringBuilder mainPart = new StringBuilder();
    for (int i = 0; i < wordList.size(); ++i) {
        try (InputStream word = wordList.get(i)) {
            XWPFDocument xwpfDocument = new XWPFDocument(word);
            if (i != wordList.size() – 1) {
                XWPFRun run = xwpfDocument.getLastParagraph().createRun();
                run.addBreak(BreakType.PAGE);
            }
            CTBody ctBody = xwpfDocument.getDocument().getBody();
            if (i == 0) {
                newDocument = xwpfDocument;
                newCtBody = ctBody;
                newString = newCtBody.xmlText();
                prefix = newString.substring(0, newString.indexOf(‘>’) + 1);
                mainPart.append(newString, newString.indexOf(‘>’) + 1, newString.lastIndexOf(‘<‘));
            } else {
                mergeOther2First(newDocument, mainPart, xwpfDocument, ctBody, headMap);
            }
        } catch (IOException | InvalidFormatException e) {
            e.printStackTrace();
        }
    }
    String sufix = null;
    if (newString != null) {
        sufix = newString.substring(newString.lastIndexOf(‘<‘));
    }
    if (newCtBody != null) {
        newCtBody.set(CTBody.Factory.parse(prefix + mainPart.toString() + sufix));
    }
    if (newDocument != null) {
        newDocument.write(outputStream);
    }
}
public static void mergeWord1(List<InputStream> wordList, OutputStream outputStream) throws IOException, XmlException {
    if (CollectionUtils.isEmpty(wordList)) {
        return;
    }
    XWPFDocument newDocument = null;
    CTBody newCtBody = null;
    Map<String, Object> headMap = new HashMap<>(30);
    String newString = null;
    String prefix = null;
    StringBuilder mainPart = new StringBuilder();
    for (int i = 0; i < wordList.size(); ++i) {
        try (InputStream word = wordList.get(i)) {
            XWPFDocument xwpfDocument = new XWPFDocument(word);
            if (i != wordList.size() – 1) {
                XWPFRun run = xwpfDocument.getLastParagraph().createRun();
                run.addBreak(BreakType.PAGE);
            }
            CTBody ctBody = xwpfDocument.getDocument().getBody();
            if (i == 0) {
                newDocument = xwpfDocument;
                newCtBody = ctBody;
                newString = newCtBody.xmlText();
                prefix = newString.substring(0, newString.indexOf(‘>’) + 1);
                mainPart.append(newString, newString.indexOf(‘>’) + 1, newString.lastIndexOf(‘<‘));
            } else {
                mergeOther2First(newDocument, mainPart, xwpfDocument, ctBody, headMap);
            }
        } catch (IOException | InvalidFormatException e) {
            e.printStackTrace();
        }
    }
    String sufix = null;
    if (newString != null) {
        sufix = newString.substring(newString.lastIndexOf(‘<‘));
    }
    if (newCtBody != null) {
        newCtBody.set(CTBody.Factory.parse(prefix + mainPart.toString() + sufix));
    }
    if (newDocument != null) {
        newDocument.write(outputStream);
    }
}

private static void mergeOther2First(XWPFDocument newDocument,
                             StringBuilder mainPart,
                             XWPFDocument xwpfDocument,
                             CTBody ctBody,
                             Map<String, Object> headMap) throws InvalidFormatException {
    XmlOptions xmlOptions = new XmlOptions();
    xmlOptions.setSaveOuter();
    String appendString = ctBody.xmlText(xmlOptions);
    getXmlns(appendString.substring(1, appendString.indexOf(‘>’)) + ” “, headMap);
    String addPart = appendString.substring(appendString.indexOf(‘>’) + 1, appendString.lastIndexOf(‘<‘));
    List<XWPFPictureData> allPictures = xwpfDocument.getAllPictures();
    if (allPictures != null) {
        // 记录图片合并前及合并后的ID
        Map<String, String> map = new HashMap<>();
        for (XWPFPictureData picture : allPictures) {
            String before = xwpfDocument.getRelationId(picture);
            //将原文档中的图片加入到目标文档中
            String after = newDocument.addPictureData(picture.getData(), picture.getPictureType());
            map.put(before, after);
        }
        if (!map.isEmpty()) {
            //对xml字符串中图片ID进行替换
            for (Map.Entry<String, String> set : map.entrySet()) {
                addPart = addPart.replace(set.getKey(), set.getValue());
            }
        }
    }
    mainPart.append(addPart);
}

private static final Pattern PATTERN = Pattern.compile(“(xmlns(:[\\s\\S]+?)?)=[\\s\\S]+?\\s”);

private static void getXmlns(String head, Map<String, Object> headMap) {
    Matcher matcher = PATTERN.matcher(head);
    while (matcher.find()) {
        headMap.put(matcher.group(1), matcher.group());
    }
}

  1. 说明

这其实本已经是我们在网站上能搜寻到的easyword的源码,然后这里面谈论到有这个功能,所以就拿过来使用,本以为能满足之前做的需求,后面尝试了一下发现失败了。它能够合成我们在桌面上创建的多个文档,然后进行合并,但是他不能合成通过word类型的xml+freemaker标签生成的多个文档。它对其进行会报出一个错误为 org.apache.poi.openxml4j.exceptions.InvalidFormatException: Package should contain a content type part [M1.13],或者是java.lang.IllegalArgumentException: The document is really a XML file其中的意思大家去百度,都有说明。如果没有特殊的需求说明,用easyword应该是挺不错的,只不过它合成的文档是没有承接到上一个文档的末尾,这个得去研究一下。

  • 通过docx4j这个包去进行文档合并

话不多说,最终解决问题的方案是这种,神奇的是不管你的文档是doc还是docx类型的文档,你都可以进行合并,两种文档还可以进行混合合并,而且是追尾合并的。依赖在maven仓库中就可以找到。

  1. 导包

<dependency>

   <groupId>org.docx4j</groupId>

   <artifactId>docx4j</artifactId>

  <version>6.1.0</version>

</dependency>

  1. 上代码

public void mergeDoc(List<String> wordList , OutputStream out) {

        List<InputStream> streamList = new ArrayList<>();

        if (CollectionUtils.isEmpty(wordList)) {

            // 这里可以抛出一个异常

        }

        for (String wordPath : wordList) {

            try {

                streamList.add(new FileInputStream(wordPath));

            } catch (FileNotFoundException e) {

                e.printStackTrace();

            }

        }

        try {

            mergeDocStream(streamList,out);

        } catch (IOException | Docx4JException e) {

            e.printStackTrace();

        }

}

private void mergeDocStream(List<InputStream> streamList, OutputStream out) throws Docx4JException, IOException {

WordprocessingMLPackage target = null;

final File generated = File.createTempFile(“generated”, “.docx”);

int chunkId = 0;

Iterator<InputStream> iterator = streamList.iterator();

while (iterator.hasNext()) {

InputStream is = iterator.next();

if (is != null) {

if (target == null) {

OutputStream os = new FileOutputStream(generated);

os.write(IOUtils.toByteArray(is));

os.close();

target = WordprocessingMLPackage.load(generated);

} else {

insertDoc(target.getMainDocumentPart(), IOUtils.toByteArray(is), chunkId++);

     }

}

}

if (target != null) {

target.save(generated);

FileInputStream fileInputStream = new FileInputStream(generated);

   saveTemplate(fileInputStream,out);

}

}

private void insertDoc(MainDocumentPart mainDocumentPart, byte[] bytes, int chunkId) {

try {

PartName partName = new PartName(“/part” + chunkId + “.docx”);

AlternativeFormatInputPart afiPart = new AlternativeFormatInputPart(partName);

   afiPart.setBinaryData(bytes);

   Relationship relationship = mainDocumentPart.addTargetPart(afiPart);

   CTAltChunk chunk = Context.getWmlObjectFactory().createCTAltChunk();

    chunk.setId(relationship.getId());

    mainDocumentPart.addObject(chunk);

} catch (Exception e) {

    e.printStackTrace();

}

}

private void saveTemplate(InputStream targetStream, OutputStream out) {

FileOutputStream fos;

int bytesum = 0;

int byteread = 0;

try {

  //fos = new FileOutputStream(targetWordPath);

fos = (FileOutputStream) out;

byte[] buffer = new byte[1024];

while ((byteread = targetStream.read(buffer)) != -1) {

      bytesum += byteread; // 字节数 文件大小

      fos.write(buffer, 0, byteread);

}

   targetStream.close();

   fos.close();

} catch (FileNotFoundException e1) {

   e1.printStackTrace();

} catch (IOException e) {

   e.printStackTrace();

}

}

  1. 说明

拿到代码之后其实也有一些东西需要纠正,那就是异常, 工具类中的异常尽量就是抛出去,在你调用工具类的那个地方进行捕获,然后就可以在catch里面进行打日志,知道是哪个地方出错了,有准备。

各位如果有什么问题的话欢迎在评论区留言哦

    原文作者:走出舒适圈丶
    原文地址: https://blog.csdn.net/weixin_42232639/article/details/107232323
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞