将doctype插入XML文档(Java / SAX)

想象一下你有一个
XML文档并且想象你有DTD但是文档本身并没有实际指定DOCTYPE …你如何插入DOCTYPE声明,最好是在解析器上指定它(类似于你如何设置模式对于将要解析的文档)或通过XMLFilter等插入必要的SAX事件?

我发现了很多对EntityResolver的引用,但是在解析过程中找到DOCTYPE后会调用它,并且它用于指向本地DTD文件. EntityResolver2似乎有我正在寻找的东西但我没有找到任何使用示例.

这是我到目前为止最接近的:(代码是Groovy,但足够接近你应该能够理解它……)

import org.xml.sax.*
import org.xml.sax.ext.*
import org.xml.sax.helpers.*

class XmlFilter extends XMLFilterImpl {
    public XmlFilter( XMLReader reader ) { super(reader) }

    @Override public void startDocument() {
        super.startDocument()        
        super.resolveEntity( null, 
            'file:///./entity.dtd')
        println "filter startDocument"
    }
}

class MyHandler extends DefaultHandler2 { 
    public InputSource resolveEntity(String name, String publicId, String baseURI, String systemId) {
        println "entity: $name, $publicId, $baseURI, $systemId"
        return new InputSource(new StringReader('<!ENTITY asdf "&#161;">'))
    }
}

def handler = new MyHandler()

def parser = XMLReaderFactory.createXMLReader()
parser.setFeature 'http://xml.org/sax/features/use-entity-resolver2', true
def filter = new XmlFilter( parser )
filter.setContentHandler( handler )
filter.setEntityResolver( handler )

filter.parse( new InputSource(new StringReader('''<?xml version="1.0" ?>
    <test>one &asdf; two! &nbsp; &iexcl;&pound;&cent;</test>''')) );

我看到resolveEntity被调用但仍然被击中

org.xml.sax.SAXParseException: The entity “asdf” was referenced, but not declared.
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
at org.xml.sax.helpers.XMLFilterImpl.parse(XMLFilterImpl.java:333)

我想这是因为没有办法添加解析器知道的SAX事件,我只能通过解析器上游的过滤器添加事件,这些过滤器传递给ContentHandler.因此,文档必须有效进入XMLReader.有什么方法吗?我知道我可以修改原始流以添加doctype或者可能进行转换以设置DTD …还有其他选项吗?

最佳答案 您可以尝试
DoctypeChanger,根据您的建议修改原始流:

DoctypeChanger is a Java class that lets you add, modify or remove a DOCTYPE declaration from a byte stream as it is fed into an XML parser.

InputStream in = ...   // get your XML InputStream
DOCTYPEChangerStream changer = new DOCTYPEChangerStream(in);
changer.setGenerator( 
    new DoctypeGenerator() {
        public Doctype generate(Doctype old) {
            return new DoctypeImpl("rootElement", "pubId", "sysId", "internalSubset");
        }
    } 
);
// .. and pass it on to the parser.
点赞