python中的编码题目

2024年2月27日 290次阅读来源: 不卷毛

1. 声明

起首声明几个名词解释：
1. 文件自身的编码：能够应用file搜检文件的编码花样。比方：

file test.py
test.py: UTF-8 Unicode Java program text

这时候文件自身的编码是UTF-8的。

python代码的编码：翻开文件，在文件上部增加的encoding。比方：

# -*- encoding: utf-8 -*-
import sys

2. 怎样设定编码

既然存在2个编码，那末就存在雷同和差别状况，二者雷同自然是没问题，比方都是gb18030或许utf-8，假如差别会怎样呢？显然是编码显现毛病，看以下几个例子：
文件编码为utf-8，代码编码为gb18030，有：

# -*- encoding: gb18030 -*-

str_alphabeta = "ABCDEFG"
print type(str_alphabeta)
print str_alphabeta

str_kanji = "可口可乐"
print type(str_kanji)
print str_kanji

输出为：

File "test.py", line 1
SyntaxError: encoding problem: with BOM

涌现一个新的关键词BOM，这个能够google一下，假如你在vim中看到<feff>这么一个东西，那也是BOM引发的，假如文档是utf-8个人以为运用无BOM花样会好处置惩罚点。
那末为了能一般运转，须要文档的编码和代码的编码一致。

3. unicode_literals

来自future库的内容示意如今还在“试用”阶段，假如你寻求“新”就用，假如你寻求“稳”就别用（我这么明白的，虽然我经经常使用division）。
unicode_literals的协助是这么写的：

>>> help(unicode_literals)
Help on instance of _Feature in module __future__:

class _Feature
 |  Methods defined here:
 |  
 |  __init__(self, optionalRelease, mandatoryRelease, compiler_flag)
 |  
 |  __repr__(self)
 |  
 |  getMandatoryRelease(self)
 |      Return release in which this feature will become mandatory.
 |      
 |      This is a 5-tuple, of the same form as sys.version_info, or, if
 |      the feature was dropped, is None.
 |  
 |  getOptionalRelease(self)
 |      Return first release in which this feature was recognized.
 |      
 |      This is a 5-tuple, of the same form as sys.version_info.

简朴地说就是，非unicode(32)的代码编码（比方utf-8），直接赋值一个字符串获得的编码是代码的编码体式格局，对象的范例是str，然则假如字符串前面加一个“u”就示意这个字符串是unicode(32)的编码，比方：

# -*- encoding: utf-8 -*-

str_kanji = "可口可乐"
print type(str_kanji)
print str_kanji

str_kanji_unicode = u"可口可乐"
print type(str_kanji_unicode)
print str_kanji_unicode

输出为：

<type 'str'>
可口可乐
<type 'unicode'>
可口可乐

第一个可口可乐是utf-8编码的（能够经由过程locale中的LC_CTYPE来考证），第二个是unicode(32)的。
假如import unicode_literals则变成（代码略）：

<type 'unicode'>
可口可乐
<type 'unicode'>
可口可乐

    原文作者：不卷毛
    原文地址: https://segmentfault.com/a/1190000000346236
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。