python – lxml库在添加新元素后不向树添加换行符或缩进

标题是自我解释的,在将其标记为重复之前,请考虑我已经检查了
this answer并且它对我不起作用,因为我甚至没有在sys.stdout中获得正确的格式,而不仅仅是在写入文件时.所以我有以下xml(test.xml):

<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://www...">
  <soap:Body>
    <SubmitTransaction xmlns="http://www.">
      <Authentication>
      </Authentication>
      <Transaction>
        <DataFields>
        </DataFields>
      </Transaction>
    </SubmitTransaction>
  </soap:Body>
</soap:Envelope>

以下代码:

from lxml import etree

parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse("test.xml", parser)

def get_data_fields():
    for node in tree.iter():
        if 'DataFields' in node.tag:
            return node
a = get_data_fields()
field = etree.Element('Field_1')
child_1 = etree.Element('FieldName')
child_2 = etree.Element('FieldValue')
child_3 = etree.Element('FieldIndex')
child_1.text = 'dateTime'
child_2.text = '2016-07-29T12:00:00'
child_3.text = '1'

for i in [child_1, child_2, child_3]:
    field.append(i)
a.append(field)

s = etree.tostring(tree, pretty_print=True)
print(s.decode('utf-8'))

OUTPUT

<soap:Envelope xmlns:soap="http://www...">
  <soap:Body>
    <SubmitTransaction xmlns="http://www.">
      <Authentication>
      </Authentication>
      <Transaction>
        <DataFields>
        <Field_1><FieldName>dateTime</FieldName><FieldValue>2016-07-29T12:00:00</FieldValue><FieldIndex>1</FieldIndex></Field_1></DataFields>
      </Transaction>
    </SubmitTransaction>
  </soap:Body>
</soap:Envelope>

预期

<soap:Envelope xmlns:soap="http://www...">
  <soap:Body>
    <SubmitTransaction xmlns="http://www.">
      <Authentication>
      </Authentication>
      <Transaction>
        <DataFields>
          <Field_1>
            <FieldName>dateTime</FieldName>
            <FieldValue>2016-07-29T12:00:00</FieldValue>
            <FieldIndex>1</FieldIndex>
          </Field_1>
        </DataFields>
      </Transaction>
    </SubmitTransaction>
  </soap:Body>
</soap:Envelope>

我真的不明白为什么我要添加的新字段没有按原样格式化,因为如果我只打印字段,一切看起来都很好:

s = etree.tostring(root, pretty_print=True)
print(s.decode('utf-8'))

#<Field_1 xmlns="http://www." xmlns:soap="http://www...">
#  <FieldName>dateTime</FieldName>
#  <FieldValue>2016-07-29T12:00:00</FieldValue>
#  <FieldIndex>1</FieldIndex>
#</Field_1>

注意:我正在使用python 3.4(这就是为什么我必须.decode(‘utf-8’),否则我只是得到字节文字).

最佳答案 如果你在= get_data_fields()之后添加这一行,它会起作用:

a.text = None

lxml不能总是确定哪些空格是可忽略的,因此在某些情况下需要手动删除空格.

http://lxml.de/FAQ.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-output

If you want to be sure all blank text is removed from an XML document (or just more blank text than the parser does by itself), you have to use either a DTD to tell the parser which whitespace it can safely ignore, or remove the ignorable whitespace manually after parsing, e.g. by setting all tail text to None:

点赞