PG字符:使用collation设置排序规则

原创
2014/03/28 14:37
阅读数 1.5W
AI总结

  相关文档:

  PG9.1 
  Collation支持 http://www.postgresql.org/docs/9.1/static/collation.html
  字符集的设置 http://www.postgresql.org/docs/9.1/static/multibyte.html
  PG9.2 
  Collation支持 http://www.postgresql.org/docs/current/static/collation.html
  字符集的设置 http://www.postgresql.org/docs/current/static/multibyte.html
  
  简介:PG9.1开始支持collationcollation允许指定每列数据的排序顺序和字符分类,甚至每个操作的。
  它缓解了数据库的LC_COLLATELC_CTYPE设置无法在数据库创建后更改的限制。
  
  使用示例:
  
  1.排序顺序的设置:
  a.初始化数据库是可以设置locale,如果不设置默认参照操作系统的locale
  [postgres@localhost  bin]$   ./initdb -D ../data5 --encoding=UTF8 --locale=zh_CN.UTF8
  postgres=# \l
  List of databases
  Name | Owner | Encoding | Collate | Ctype | Access privileges 
  -----------+----------+----------+-------------+-------------+-----------------------
  postgres | postgres | UTF8 | zh_CN.utf8 | zh_CN.utf8 | 
  template0| postgres | UTF8 | zh_CN.utf8 | zh_CN.utf8 | =c/postgres 
  +
  | | | | | postgres=CTc/postgres
  template1| postgres | UTF8 | zh_CN.utf8 | zh_CN.utf8 | =c/postgres 
  +
  | | | | | postgres=CTc/postgres
  (5 rows)
  
  b.创建数据库时,locale参数的设置必须是数据库集群初始化时设置的locale(也就是templatelocale)兼容的:
  postgres=# create database testdb with ENCODING='utf8'   LC_COLLATE='zh_CN.utf8' LC_CTYPE='zh_CN.utf8'
  template=template0; 
  CREATE DATABASE
  c.创建表时,引用的collation必须是所在数据库的编码encoding兼容的。
  CREATE TABLE test1 (
  a text COLLATE "de_DE",
  b text COLLATE "es_ES"
  );
  
  2.定义新的collation
  语法:
  CREATE COLLATION name (
  [ LOCALE = locale, ]
  [ LC_COLLATE = lc_collate, ]
  [ LC_CTYPE = lc_ctype ]
  )
  CREATE COLLATION name FROM existing_collation
  注意:创建collation时:若设置参数LOCALE的值,则表示LC_COLLATELC_CTYPE都设置为LOCALE所设置的值,
  此时不能再设置LC_COLLATELC_CTYPE;不设置LOCALE值的时候才能设置LC_COLLATELC_CTYPE的值,
  一般二者设置为相同的值,也可以设置为不同,但一般在实践中是无意义的。
  例子:
  postgres=# create collation ddd (locale='de_DE.utf8');
  CREATE COLLATION
  postgres=# select a<b collate "ddd" from tt;
  ?column? 
  ----------
  f
  t
  f
  f
  f
  t
  f
  (7 rows)
  postgres=# create collation ddd2 from ddd;
  CREATE COLLATION
  
  3.排序顺序的使用:
  一个表达式的collation derivation可以是隐式的或者显式的。这个区别在多个不同collation出现在一个表达式中
  时影响collation的组合。显式的collation   derivation出现在使用COLLATE语句的时候;所有其他的collation   derivations
  都是隐式的。当需要组合多个collation时,例如在一个函数调用中,通常有以下规则:
  aIf any input expression has an explicit collation derivation, then   all explicitly derived collations among the 
  input expressions must be the same, otherwise an error is raised. If any   explicitly derived collation is present, 
  that is the result of the collation combination.
  b. Otherwise, all input expressions must have the same implicit collation   derivation or the default collation. 
  If any non-default collation is present, that is the result of the collation   combination. Otherwise, the result is
  the default collation.
  c. If there are conflicting non-default implicit collations among the input   expressions, then the combination is
  deemed to have indeterminate collation. This is not an error condition unless   the particular function being 
  invoked requires knowledge of the collation it should apply. If it does, an   error will be raised at run-time.
  
  以表test1的操作为例:
  postgres=# \d test1
  Table "public.test1"
  Column | Type | Modifiers 
  --------+------+---------------
  a | text | collate de_DE
  b | text | collate es_ES
  ab列有冲突的隐式collation。必须为任何一个输入表达式附加一个显式的collation说明。如下:
  postgres=# select a<b from test1;
  
  ERROR: could not determine which collation to use for string comparison
  HINT: Use the COLLATE clause to set the collation explicitly.
  STATEMENT: select a<b from test1;
  ERROR: could not determine which collation to use for string comparison
  HINT: Use the COLLATE clause to set the collation explicitly.
  
  解决:SELECT a < b COLLATE "de_DE" FROM test1; 
  或等效的 SELECT a COLLATE "de_DE" < b FROM test1;
  
  ab列组合时,隐式的collation不一致,则需要显式地为组合指定一个collation。如下:
  
  postgres=# select * from test1 order by a||b;
  
  ERROR: collation mismatch between implicit collations "de_DE" and   "es_ES" at character 30
  HINT: You can choose the collation by applying the COLLATE clause to one or   both expressions.
  STATEMENT: select * from test1 order by a||b;
  ERROR: collation mismatch between implicit collations "de_DE" and   "es_ES"
  LINE 1: select * from tt order by a||b;
  ^
  HINT: You can choose the collation by applying the COLLATE clause to one or   both expressions.
  
  解决:SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
  
  PostgreSQL认为不同的collation对象是不兼容的,即使他们有相同的属性。所以,比如:
  postgres=# select a collate "C"<b COLLATE "POSIX" from   test1;
  
  ERROR: collation mismatch between explicit collations "C" and   "POSIX" at character 24
  STATEMENT: select a collate "C"<b COLLATE "POSIX" from   test1;
  ERROR: collation mismatch between explicit collations "C" and   "POSIX"
  LINE 1: select a collate "C"<b COLLATE "POSIX" from   test1;
  tip:默认情况下,不管数据库编码是什么都可以使用CPOSIXcollation
  ④当排序顺序不符合自身语言环境的习惯时,可以对排序字段排序顺序的转化来调整。以下是一个
  中文排序习惯的转化:
  以下排序不符合中文发音排序的习惯:
  postgres=# select * from t order by cont;
  id | name | cont 
  ----+------+--------------
  10 | kkk | 从此从此从此
  10 | kkk | 可可己
  10 | kkk | 可可己
  10 | kkk | 可可己
  10 | kkk | 并不并不
  10 | kkk | 阿阿阿阿阿阿
  (6 rows)
  
  解决:
  postgres=# select * from t order by convert_to(cont,'GBK');
  id | name | cont 
  ----+------+--------------
  10 | kkk | 阿阿阿阿阿阿
  10 | kkk | 并不并不
  10 | kkk | 从此从此从此
  10 | kkk | 可可己
  10 | kkk | 可可己
  10 | kkk | 可可己
  (6 rows)
  
  4.创建索引时也可以由COLLATE语句显式指定collation
  postgres=# create index tiindex on t1(b collate "de_DE" DESC); 
  CREATE INDEX
  
  更多详细信息,请查阅PG社区相关文档专区http://www.postgresql.org/docs/

 初次发表网址:   http://bbs.pgsqldb.com:8079/client/post_show.php?zt_auto_bh=56943 


展开阅读全文
加载中
点击引领话题📣 发布并加入讨论🔥
0 评论
3 收藏
0
分享
AI总结
返回顶部
顶部