unicode – 由HTML Tidy处理的希伯来字符变成乱码

2023年8月11日 239次阅读

我正在使用
HTML Tidy Online(
http://infohound.net/tidy/)来整理一些非常古怪且混乱的HTML文件,其中包含一些希伯来字符.每当Tidy处理页面时,即使在设置中更改了编码方法,输出也会将希伯来字符变为乱码.使用不同的设置,我设法使用希伯来字符作为unicode实体获得相同的输出.

我用Google搜索可能的解决方案,但没有找到.

我有几个想法,但我不确定如何处理它们,如果有的话(也许有人有更好的解决方案).

I thought maybe I could (after processing the page) scan the page for unicode entities and replace them with the corresponding Hebrew characters (in a systematic way, of course).
Maybe I could take the HTML Tidy source code and modify it to output Hebrew characters appropriately. The problem with this is that I doubt I am knowledgeable enough to even get started on something like this.

最佳答案我遇到了类似的问题. UTF-8文档,包含unicode字符. HTML Tidy将它们转换为HTML实体.这在HTMLTIDY.CFG中修复了它：

char-encoding: utf8
input-encoding: utf8
output-encoding: utf8

希望能帮助到你.