(doc, docx)文档合并的三种方法

2023年1月2日 31次阅读来源: 走出舒适圈丶

Word文档合并几种方式

通过com.spire.doc包

具体参考地址：https://www.e-iceblue.cn/spiredocforjavaoperating/merge-word-documents-in-java.html，它是自己开发的一个东西，所以在maven仓库中你是搜索不到这个包的，所以引入的时候需要将包和仓库一起引入进来，不然编译的时候他会报com.spire.doc这个包找不到什么的。http://repo.e-iceblue.cn/repository/maven-public/这个是它的仓库，你可以直接点进去看看他还有哪些包。具体的请看他们的官网文档，感觉蛮好。

<dependency>
      <groupId>e-iceblue</groupId>
      <artifactId>spire.doc</artifactId>
      <version>3.4.10</version>
</dependency>
<repositories>
        <repository>
            <id>com.e-iceblue</id>
<url>http://repo.e-iceblue.cn/repository/maven-public/</url>
        </repository>
</repositories>

文档合并，第二个文档不填充到第一个文档的末尾

// 因里面的源码被对面防止反编译了，所以对其源码的解析就不深入的探讨，他只是对外开放的提供了几个较好的方法。

public class MergeWordDocument {
    public static void main(String[] args){
        //获取第一个文档的路径
        String filePath1 = “merge1.docx”;
        //获取第二个文档的路径
        String filePath2 = “merge2.docx”;
        //加载第一个文档
        Document document = new Document(filePath1);
        //使用insertTextFromFile方法将第二个文档的内容插入到第一个文档
        document.insertTextFromFile(filePath2, FileFormat.Docx_2013);
        //保存文档
        document.saveToFile(“Output.docx”, FileFormat.Docx_2013);
    }
}

我们可以使用Document类中的insertTextFromFile方法将不同的文档合并到同一个文档。需要注意的是，使用该方法合并文档时，被合并文档的内容默认从新的一页拼接。

弊端：当你使用的不是免费的时候，会在你合并的文档中出现一行字 Evaluation Warning: The document was created with Spire.Doc for JAVA.

文档合并，第二个文档填充到第一个文档的末尾，section的使用，这里可以合并多个文件，传递的参数是文件路径。
public static void merge1(List<String> fileList) {
   List<Document> docList = new ArrayList<>(fileList.size());
   for (String str : fileList) {
        Document document = new Document(str);
        docList.add(document);
    }
    for (int i = 0; i < docList.size(); i++) {
       if (i != docList.size()-1) {
           Section lastSection = docList.get(i).getLastSection();
           for (Section section : (Iterable<Section>) docList.get(i + 1).getSections()) {
              for(DocumentObject obj: ( Iterable<DocumentObject> ) section.getBody().getChildObjects()) {
                        lastSection.getBody().getChildObjects().add(obj.deepClone());
                    }
                }
                docList.get(i).saveToFile(“Doc1.doc”, FileFormat.Docx_2013);
            }
        }
}

通过POI的形式进行文档合并

针对于POI的形式进行文档合并，主要是通过XWPFDocument对象操作的是后缀名为docx的文档和HWPFDocument 对象操作后缀名为doc文档，两者对象的区别很大。
目前就使用XWPFDocument 操作。
需要引入的jar包主要是以下几个，我写demo的使用用的springboot，所以直接在pom文件里面加依赖就行了，如果各位不是通过pom.xml文件的形式引入的话，去maven中心仓库找，下jar包。

引入jar包

<dependency>
   <groupId>org.apache.poi</groupId>
   <artifactId>poi</artifactId>
   <version>3.8</version>
</dependency>
<dependency>
   <groupId>org.apache.poi</groupId>
   <artifactId>poi-scratchpad</artifactId>
   <version>3.8</version>
</dependency>
<dependency>
   <groupId>org.apache.poi</groupId>
   <artifactId>poi-ooxml</artifactId>
   <version>3.8</version>
</dependency>

上代码

public static void mergeWord(List<InputStream> wordList, OutputStream outputStream) throws IOException, XmlException {
    if (CollectionUtils.isEmpty(wordList)) {
        return;
    }
    // docx
    XWPFDocument newDocument = null;
    CTBody newCtBody = null;
    Map<String, Object> headMap = new HashMap<>(30);
    String newString = null;
    String prefix = null;
    StringBuilder mainPart = new StringBuilder();
    for (int i = 0; i < wordList.size(); ++i) {
        try (InputStream word = wordList.get(i)) {
            XWPFDocument xwpfDocument = new XWPFDocument(word);
            if (i != wordList.size() – 1) {
                XWPFRun run = xwpfDocument.getLastParagraph().createRun();
                run.addBreak(BreakType.PAGE);
            }
            CTBody ctBody = xwpfDocument.getDocument().getBody();
            if (i == 0) {
                newDocument = xwpfDocument;
                newCtBody = ctBody;
                newString = newCtBody.xmlText();
                prefix = newString.substring(0, newString.indexOf(‘>’) + 1);
                mainPart.append(newString, newString.indexOf(‘>’) + 1, newString.lastIndexOf(‘<‘));
            } else {
                mergeOther2First(newDocument, mainPart, xwpfDocument, ctBody, headMap);
            }
        } catch (IOException | InvalidFormatException e) {
            e.printStackTrace();
        }
    }
    String sufix = null;
    if (newString != null) {
        sufix = newString.substring(newString.lastIndexOf(‘<‘));
    }
    if (newCtBody != null) {
        newCtBody.set(CTBody.Factory.parse(prefix + mainPart.toString() + sufix));
    }
    if (newDocument != null) {
        newDocument.write(outputStream);
    }
}
public static void mergeWord1(List<InputStream> wordList, OutputStream outputStream) throws IOException, XmlException {
    if (CollectionUtils.isEmpty(wordList)) {
        return;
    }
    XWPFDocument newDocument = null;
    CTBody newCtBody = null;
    Map<String, Object> headMap = new HashMap<>(30);
    String newString = null;
    String prefix = null;
    StringBuilder mainPart = new StringBuilder();
    for (int i = 0; i < wordList.size(); ++i) {
        try (InputStream word = wordList.get(i)) {
            XWPFDocument xwpfDocument = new XWPFDocument(word);
            if (i != wordList.size() – 1) {
                XWPFRun run = xwpfDocument.getLastParagraph().createRun();
                run.addBreak(BreakType.PAGE);
            }
            CTBody ctBody = xwpfDocument.getDocument().getBody();
            if (i == 0) {
                newDocument = xwpfDocument;
                newCtBody = ctBody;
                newString = newCtBody.xmlText();
                prefix = newString.substring(0, newString.indexOf(‘>’) + 1);
                mainPart.append(newString, newString.indexOf(‘>’) + 1, newString.lastIndexOf(‘<‘));
            } else {
                mergeOther2First(newDocument, mainPart, xwpfDocument, ctBody, headMap);
            }
        } catch (IOException | InvalidFormatException e) {
            e.printStackTrace();
        }
    }
    String sufix = null;
    if (newString != null) {
        sufix = newString.substring(newString.lastIndexOf(‘<‘));
    }
    if (newCtBody != null) {
        newCtBody.set(CTBody.Factory.parse(prefix + mainPart.toString() + sufix));
    }
    if (newDocument != null) {
        newDocument.write(outputStream);
    }
}
private static void mergeOther2First(XWPFDocument newDocument,
                             StringBuilder mainPart,
                             XWPFDocument xwpfDocument,
                             CTBody ctBody,
                             Map<String, Object> headMap) throws InvalidFormatException {
    XmlOptions xmlOptions = new XmlOptions();
    xmlOptions.setSaveOuter();
    String appendString = ctBody.xmlText(xmlOptions);
    getXmlns(appendString.substring(1, appendString.indexOf(‘>’)) + ” “, headMap);
    String addPart = appendString.substring(appendString.indexOf(‘>’) + 1, appendString.lastIndexOf(‘<‘));
    List<XWPFPictureData> allPictures = xwpfDocument.getAllPictures();
    if (allPictures != null) {
        // 记录图片合并前及合并后的ID
        Map<String, String> map = new HashMap<>();
        for (XWPFPictureData picture : allPictures) {
            String before = xwpfDocument.getRelationId(picture);
            //将原文档中的图片加入到目标文档中
            String after = newDocument.addPictureData(picture.getData(), picture.getPictureType());
            map.put(before, after);
        }
        if (!map.isEmpty()) {
            //对xml字符串中图片ID进行替换
            for (Map.Entry<String, String> set : map.entrySet()) {
                addPart = addPart.replace(set.getKey(), set.getValue());
            }
        }
    }
    mainPart.append(addPart);
}
private static final Pattern PATTERN = Pattern.compile(“(xmlns(:[\\s\\S]+?)?)=[\\s\\S]+?\\s”);
private static void getXmlns(String head, Map<String, Object> headMap) {
    Matcher matcher = PATTERN.matcher(head);
    while (matcher.find()) {
        headMap.put(matcher.group(1), matcher.group());
    }
}

说明

这其实本已经是我们在网站上能搜寻到的easyword的源码，然后这里面谈论到有这个功能，所以就拿过来使用，本以为能满足之前做的需求，后面尝试了一下发现失败了。它能够合成我们在桌面上创建的多个文档，然后进行合并，但是他不能合成通过word类型的xml+freemaker标签生成的多个文档。它对其进行会报出一个错误为 org.apache.poi.openxml4j.exceptions.InvalidFormatException: Package should contain a content type part [M1.13]，或者是java.lang.IllegalArgumentException: The document is really a XML file其中的意思大家去百度，都有说明。如果没有特殊的需求说明，用easyword应该是挺不错的，只不过它合成的文档是没有承接到上一个文档的末尾，这个得去研究一下。

通过docx4j这个包去进行文档合并

话不多说，最终解决问题的方案是这种，神奇的是不管你的文档是doc还是docx类型的文档，你都可以进行合并，两种文档还可以进行混合合并，而且是追尾合并的。依赖在maven仓库中就可以找到。

导包

<dependency>
   <groupId>org.docx4j</groupId>
   <artifactId>docx4j</artifactId>
  <version>6.1.0</version>
</dependency>

上代码

public void mergeDoc(List<String> wordList , OutputStream out) {
        List<InputStream> streamList = new ArrayList<>();
        if (CollectionUtils.isEmpty(wordList)) {
            // 这里可以抛出一个异常
        }
        for (String wordPath : wordList) {
            try {
                streamList.add(new FileInputStream(wordPath));
            } catch (FileNotFoundException e) {
                e.printStackTrace();
            }
        }
        try {
            mergeDocStream(streamList,out);
        } catch (IOException | Docx4JException e) {
            e.printStackTrace();
        }
}
private void mergeDocStream(List<InputStream> streamList, OutputStream out) throws Docx4JException, IOException {
WordprocessingMLPackage target = null;
final File generated = File.createTempFile(“generated”, “.docx”);
int chunkId = 0;
Iterator<InputStream> iterator = streamList.iterator();
while (iterator.hasNext()) {
InputStream is = iterator.next();
if (is != null) {
if (target == null) {
OutputStream os = new FileOutputStream(generated);
os.write(IOUtils.toByteArray(is));
os.close();
target = WordprocessingMLPackage.load(generated);
} else {
insertDoc(target.getMainDocumentPart(), IOUtils.toByteArray(is), chunkId++);
     }
}
}
if (target != null) {
target.save(generated);
FileInputStream fileInputStream = new FileInputStream(generated);
   saveTemplate(fileInputStream,out);
}
}
private void insertDoc(MainDocumentPart mainDocumentPart, byte[] bytes, int chunkId) {
try {
PartName partName = new PartName(“/part” + chunkId + “.docx”);
AlternativeFormatInputPart afiPart = new AlternativeFormatInputPart(partName);
   afiPart.setBinaryData(bytes);
   Relationship relationship = mainDocumentPart.addTargetPart(afiPart);
   CTAltChunk chunk = Context.getWmlObjectFactory().createCTAltChunk();
    chunk.setId(relationship.getId());
    mainDocumentPart.addObject(chunk);
} catch (Exception e) {
    e.printStackTrace();
}
}
private void saveTemplate(InputStream targetStream, OutputStream out) {
FileOutputStream fos;
int bytesum = 0;
int byteread = 0;
try {
  //fos = new FileOutputStream(targetWordPath);
fos = (FileOutputStream) out;
byte[] buffer = new byte[1024];
while ((byteread = targetStream.read(buffer)) != -1) {
      bytesum += byteread; // 字节数文件大小
      fos.write(buffer, 0, byteread);
}
   targetStream.close();
   fos.close();
} catch (FileNotFoundException e1) {
   e1.printStackTrace();
} catch (IOException e) {
   e.printStackTrace();
}
}

说明

拿到代码之后其实也有一些东西需要纠正，那就是异常，工具类中的异常尽量就是抛出去，在你调用工具类的那个地方进行捕获，然后就可以在catch里面进行打日志，知道是哪个地方出错了，有准备。

各位如果有什么问题的话欢迎在评论区留言哦

    原文作者：走出舒适圈丶
    原文地址: https://blog.csdn.net/weixin_42232639/article/details/107232323
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。