python sqlite的编码问题

2024年5月7日 192次阅读来源: sqlite

从http://initd.org/tracker/pysqlite/wiki/pysqlite下载了SQLITE

的PYTHON绑定。并用WINDOWS下的sqlite3.exe创建了一个库一张表：

+++++++++++++++++++

-database: wanna

-table name: hello

-id name

—- ———

0 帅哥

1 wannachan

2 dick.chan

3 雯雯

+++++++++++++++++++

好了，建表成功了！下面开始PYSQLITE来操作此数据库了！心情那个激动啊~~

首先建立连接：

>>> from pysqlite2 import dbapi2 as sqlite

>>> con=sqlite.connect(“g:\sqlite\wanna”)

再启用cursor:

>>> cur=con.cursor()

激动人心的时刻到了！执行SQL：

———————————————-

>>> cur.execute(‘select * from hello’)

Traceback (most recent call last):

File “<pyshell#24>”, line 1, in <module>

cur.execute(‘select * from hello’)

OperationalError: Could not decode to UTF-8 column ‘name’ with text ‘帅哥’

———————————————-

OH！NO！竟说我的‘帅哥’不能以UTF-8编码！咋办捏？咋办捏？！

上网查查看！看到有人用这个con=sqlite.connect(“database”,encoding=’cp936′)

我也试试，结果：

—————————————————————-

>>> con=sqlite.connect(“g:\sqlite\wanna”,encoding=’cp936′)

Traceback (most recent call last):

File “<stdin>”, line 1, in <module>

TypeError: ‘encoding’ is an invalid keyword argument for this function

—————————————————————-

看来以前用的版本才有encoding这参数，现在我的2.5版没有哇！看来只有看manual

了！于是俺万分不情愿地翻开MAN看了起来，原来是要指定text_factory才行！

于是俺试着抄它一句例子来试试：

>>> con.text_factory=lambda x: unicode(x, “utf-8”, “ignore”)

表示是用UTF8来编码取得RECORD，如果非UTF-8则ignore。俺心知自己的编码是GBK，

但也想看看会有什么错出现，于是继续：

———————————————-

>>> cur=con.cursor()

>>> cur.execute(‘select * from hello’)

<pysqlite2.dbapi2.Cursor object at 0x012984A0>

>>> rs=cur.fetchall()

>>> rs[0]

(0, u”)

>>> rs[1]

(1, u’wannachan’)

>>> rs[2]

(2, u’dick.chan’)

>>> rs[3]

(3, u”)

———————————————-

可以看到，我的两项有中文的RECORD都由于编码不符被忽略成了u”了，这时我

心中看到了光明！下面用GBK（CP936）来编码试试：

———————————————-

>>> con.text_factory=lambda x: unicode(x, “cp936”, “ignore”)

>>> cur=con.cursor()

>>> cur.execute(‘select * from hello’)

<pysqlite2.dbapi2.Cursor object at 0x012984D0>

>>> rs=cur.fetchall()

>>> rs[0]

(0, u’\u9648\u67f1′)

>>> u’\u9648\u67f1′

u’\u9648\u67f1′

>>> print rs[0][1]

帅哥

———————————————-

HOHO！成功了一大步！这时我想，如果我指它UTF-8不IGNORE会怎么样？

说干就干：

———————————————-

>>> con.text_factory=lambda x: unicode(x, “utf-8”)

>>> cur=con.cursor()

>>> cur.execute(‘select * from hello’)

Traceback (most recent call last):

File “<pyshell#21>”, line 1, in <module>

cur.execute(‘select * from hello’)

File “<pyshell#19>”, line 1, in <lambda>

con.text_factory=lambda x: unicode(x, “utf-8”)

UnicodeDecodeError: ‘utf8’ codec can’t decode byte 0xb3 in position 0:

unexpected code byte

———————————————-

出错了！说明编码不MATCH真的不能用，那么如果我的RECORD有多种编码

咋办呢？于是我尝试用MANUAL上出现过的UNICODE OPTION来试试：

———————————————-

>>> con.text_factory = sqlite.OptimizedUnicode

>>> cur=con.cursor()

>>> cur.execute(‘select * from hello’)

Traceback (most recent call last):

File “<pyshell#24>”, line 1, in <module>

cur.execute(‘select * from hello’)

OperationalError: Could not decode to UTF-8 column ‘name’ with text ‘帅哥’

>>>

———————————————-

TNND，竟然UNICODE也用UTF-8来ENCODE，这不是欺负人么？明知UTF-8存汉字占空间大的不行！

于是俺再仔细的看MANUAL，终于看到有个是以byteString的形式传回来的了。

俺想，这个不涉及到具体的ENCODING，应该就算有不同的CHARSET都不会有ERROR吧！

试试：

———————————————-

>>> from pysqlite2 import dbapi2 as sqlite

>>> con=sqlite.connect(“g:\sqlite\wanna”)

>>> con.text_factory=str #str代表以byte string形式return

>>> cur=con.cursor()

>>> cur.execute(‘select * from hello’)

<pysqlite2.dbapi2.Cursor object at 0x00BDBB30>

>>> rs=cur.fetchall()

>>> rs[0][1]

‘\xb3\xc2\xd6\xf9’

>>> print rs[0][1]

帅哥

———————————————-

HAHA！成功了！得到了BYTECODE STRING，就可以按指定的编码来解码它，就可以

得到正确的输出了！

看看上面那个BYTECODE STRING ‘\xb3\xc2\xd6\xf9’，试着给它解码：

>>> ‘\xb3\xc2\xd6\xf9’.decode(‘cp936’)

u’\u9648\u67f1′

果然得到了一个UNICODE串了，俺再判断它是否等于我们预期的字串？

>>> _ == u’帅哥’

True

>>> u’帅哥’

u’\u9648\u67f1′

很明显是一样的串！OK，解决了！

最后的想法：

1，如果在text_factory=str之前，俺把str定义成了一个变量，override了它作

为<type ‘str’>的这个身份，那会不会出错呢？

2，喜欢用PYTHON，就是因为它的中文问题总是那么容易解决，想将编码怎么转

换就怎么转换！一点也不含糊！

from:http://www.cnblogs.com/changyou/archive/2010/01/09/1642980.html

    原文作者：sqlite
    原文地址: https://www.cnblogs.com/dkblog/archive/2011/03/03/1980639.html
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。