PostgreSQL DBA(115) - pgAdmin(Don't do this:Don't use varchar(n) by default)

no zuo no die系列,来自于pg的wiki。
这一节的内容是:不要使用Don’t use varchar(n) by default。
理由是:

varchar(n) is a variable width text field that will throw an error if you try and insert a string longer than n characters (not bytes) into it.
varchar (without the (n)) or text are similar, but without the length limit. If you insert the same string into the three field types they will take up exactly the same amount of space, and you won’t be able to measure any difference in performance.
If what you really need is a text field with an length limit then varchar(n) is great, but if you pick an arbitrary length and choose varchar(20) for a surname field you’re risking production errors in the future when Hubert Blaine Wolfe­schlegel­stein­hausen­berger­dorff signs up for your service.
Some databases don’t have a type that can hold arbitrary long text, or if they do it’s not as convenient or efficient or well-supported as varchar(n). Users from those databases will often use something like varchar(255) when what they really want is text.
If you need to constrain the value in a field you probably need something more specific than a maximum length – maybe a minimum length too, or a limited set of characters – and a check constraint can do all of those things as well as a maximum string length.

主要原因是缺乏灵活性,设置了长度限制,如超长长度则需要修改表结构,而不限定长度的varchar类型与text类似,不限制长度。如需要长度限制,则建议使用check等约束。

[local]:5432 pg12@testdb=# drop table if exists t_varchar;
DROP TABLE
Time: 3.763 ms
[local]:5432 pg12@testdb=# create table t_varchar(id int,c1 varchar(100),c2 varchar,c3 text);
CREATE TABLE
Time: 15.398 ms
[local]:5432 pg12@testdb=# 
[local]:5432 pg12@testdb=# insert into t_varchar values(1,'test','test','test');
INSERT 0 1
Time: 3.644 ms
[local]:5432 pg12@testdb=# insert into t_varchar values(2,'test',lpad('test',1024,'t'),lpad('test',1024,'t'));
INSERT 0 1
Time: 5.491 ms
[local]:5432 pg12@testdb=# 
[local]:5432 pg12@testdb=# select id,octet_length(c2),octet_length(c3) from t_varchar;
 id | octet_length | octet_length 
----+--------------+--------------
  1 |            4 |            4
  2 |         1024 |         1024
(2 rows)
Time: 5.754 ms
[local]:5432 pg12@testdb=#

要注意的是char类型不指定长度则默认为1

[local]:5432 pg12@testdb=# drop table if exists t_varcharandchar;
NOTICE:  table "t_varcharandchar" does not exist, skipping
DROP TABLE
Time: 0.754 ms
[local]:5432 pg12@testdb=# create table t_varcharandchar(id int,c1 varchar,c2 char);
CREATE TABLE
Time: 5.911 ms
[local]:5432 pg12@testdb=# insert into t_varcharandchar(id,c1) values(1,lpad('test',1024,'t'));
INSERT 0 1
Time: 2.469 ms
[local]:5432 pg12@testdb=# insert into t_varcharandchar(id,c2) values(2,lpad('test',1024,'t'));
ERROR:  value too long for type character(1)
Time: 0.579 ms
[local]:5432 pg12@testdb=#

三种类型的性能比较:varchar(n),varchar,text

[local]:5432 pg12@testdb=# drop table if exists t_varchar1;
NOTICE:  table "t_varchar1" does not exist, skipping
DROP TABLE
Time: 0.626 ms
[local]:5432 pg12@testdb=# create table t_varchar1(id int,c1 varchar(100));
CREATE TABLE
Time: 2.513 ms
[local]:5432 pg12@testdb=# insert into t_varchar1 select x,x||'c1' from generate_series(1,10000000) as x;
INSERT 0 10000000
Time: 48618.215 ms (00:48.618)
[local]:5432 pg12@testdb=# 
[local]:5432 pg12@testdb=# drop table if exists t_varchar2;
NOTICE:  table "t_varchar2" does not exist, skipping
DROP TABLE
Time: 1.570 ms
[local]:5432 pg12@testdb=# create table t_varchar2(id int,c1 varchar);
CREATE TABLE
Time: 12.443 ms
[local]:5432 pg12@testdb=# insert into t_varchar2 select x,x||'c1' from generate_series(1,10000000) as x;
INSERT 0 10000000
Time: 43483.235 ms (00:43.483)
[local]:5432 pg12@testdb=# 
[local]:5432 pg12@testdb=# drop table if exists t_varchar3;
NOTICE:  table "t_varchar3" does not exist, skipping
DROP TABLE
Time: 0.469 ms
[local]:5432 pg12@testdb=# create table t_varchar3(id int,c1 text);
CREATE TABLE
Time: 3.315 ms
[local]:5432 pg12@testdb=# insert into t_varchar3 select x,x||'c1' from generate_series(1,10000000) as x;
INSERT 0 10000000
Time: 55569.841 ms (00:55.570)
[local]:5432 pg12@testdb=#

虽然看起来有些性能差异,但实际上在PG内部,这三种类型都使用相同的C数据结构:varlena。


/* ----------------
 *        Variable-length datatypes all share the 'struct varlena' header.
 *
 * NOTE: for TOASTable types, this is an oversimplification, since the value
 * may be compressed or moved out-of-line.  However datatype-specific routines
 * are mostly content to deal with de-TOASTed values only, and of course
 * client-side routines should never see a TOASTed value.  But even in a
 * de-TOASTed value, beware of touching vl_len_ directly, as its
 * representation is no longer convenient.  It's recommended that code always
 * use macros VARDATA_ANY, VARSIZE_ANY, VARSIZE_ANY_EXHDR, VARDATA, VARSIZE,
 * and SET_VARSIZE instead of relying on direct mentions of the struct fields.
 * See postgres.h for details of the TOASTed form.
 * ----------------
 */
struct varlena
{
    char        vl_len_[4];        /* Do not touch this field directly! */
    char        vl_dat[FLEXIBLE_ARRAY_MEMBER];    /* Data content is here */
};

参考资料
Don’t Do This
CHAR(x) vs. VARCHAR(x) vs. VARCHAR vs. TEXT

    原文作者:husthxd
    原文地址: http://blog.itpub.net/6906/viewspace-2661480/
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞