pgpool-II 支持流复制协议

March 4, 2016, 11:25 pm

≪ Previous: PostgreSQL 用CPU "硬解码" 提升1倍数值运算能力助力金融大数据量计算

验证一下pgpool-II是否支持流复制协议，记录一下结果。

支持流复制协议的话，做多机房的异地容灾就可以通过流复制直接走PGPOOL的端口了。

51356 digoal 20 0 115m 1412 1084 R 99.3 0.0 2:16.12 pg_basebackup -h 127.0.0.1 -p 9999 -U postgres -D /disk1/digoal/test

33031 digoal 20 0 34196 1804 1260 R 55.6 0.0 1:17.12 pgpool: postgres replication 127.0.0.1(54446) BASE_BACKUP

51357 digoal 20 0 32.9g 8444 7656 S 53.9 0.0 1:10.94 postgres: wal sender process postgres 127.0.0.1(60636) sending backup "pg_basebackup base backup"

速度可以达到500MB/s

↧

Postgres-XL 的一些硬限制

March 5, 2016, 12:42 am

≫ Next: PostgreSQL 如何提升LDAP或AD域认证的可用性

≪ Previous: pgpool-II 支持流复制协议

收集更新中

节点数:

src/include/pgxc/nodemgr.h

/* Compile time max limits on number of coordinators and datanodes */

#define MAX_COORDINATORS 64

#define MAX_DATANODES 256

↧

PostgreSQL 如何提升LDAP或AD域认证的可用性

March 8, 2016, 1:30 am

≫ Next: 让Greenplum 支持反转索引

≪ Previous: Postgres-XL 的一些硬限制

PostgreSQL 如何配置AD域认证或LDAP认证，请参考：
http://blog.163.com/digoal@126/blog/static/16387704020145914717111/
http://blog.163.com/digoal@126/blog/static/1638770402014563264469/

引入LDAP，AD认证，可能会增加故障点，当认证服务器出现故障时，认证将失败。
本文主要介绍一下PostgreSQL是如何解决这个问题的，以及部分代码的分析。

当用户选择了使用AD域或者LDAP进行认证时，可以选择使用单机或多机的配置，多机主要是为了防止LDAP，AD认证服务器的单点故障。
单机模式

# simple bind :   
host all new 0.0.0.0/0 ldap ldapserver=172.16.3.150 ldapport=389 ldapprefix="uid=" ldapsuffix=",ou=People,dc=my-domain,dc=com"  
# search bind :   
host all new 0.0.0.0/0 ldap ldapserver=172.16.3.150 ldapport=389 ldapsearchattribute="uid" ldapbasedn="ou=People,dc=my-domain,dc=com"

多机模式，使用空格隔开，可以在ldapserver中设置，覆盖ldapport中的设置。

# simple bind :   
host all new 0.0.0.0/0 ldap ldapserver="172.16.3.150 172.16.3.151:388 10.1.1.1" ldapport=389 ldapprefix="uid=" ldapsuffix=",ou=People,dc=my-domain,dc=com"  
# search bind :   
host all new 0.0.0.0/0 ldap ldapserver="172.16.3.150 172.16.3.151:388 10.1.1.1" ldapport=389 ldapsearchattribute="uid" ldapbasedn="ou=People,dc=my-domain,dc=com"

防止LDAP，AD认证服务器的单点故障还有一种解法，使用域名。但是也有一些注意事项，如下：
在域名服务器中为一个域名配置多台主机地址，这是非常惯用的手法，但是这种方法也有一定的问题。
例如某个企业在全国各地的IDC机房都有对应的AD服务器，使用域名的方式，如果将这些AD服务器的IP都指给一个域名，在DNS响应gethostbyname请求时，一般是以轮询的方式返回列表。
例如：
某次请求返回

IP_A, IP_B, IP_C

当本地的DNS cache TTL超时后，接下来的请求可能返回

IP_B, IP_C, IP_A

客户端在拿到这些地址信息后，通常取的是第一个IP hostent->h_addr_list[0] 作为解析出来的IP拿来使用。
那么就存在一个问题，在进行AD域认证时，可能有时候取到的是本IDC的AD域服务器，有时候取到的是其他IDC的AD域服务器。
怎么让DNS返回的就是本地IDC的AD域服务器呢？
常用的手法是使用智能DNS，根据来源IP，返回地址。

gethostbyname代码：

NAME  
       gethostbyname, gethostbyaddr, sethostent, gethostent, endhostent, h_errno, herror, hstrerror, gethostbyaddr_r, gethostbyname2, gethostbyname2_r, gethostbyname_r, gethostent_r - get network host entry  

SYNOPSIS  
       #include <netdb.h>  
       extern int h_errno;  

       struct hostent *gethostbyname(const char *name);  
......  

       The hostent structure is defined in <netdb.h> as follows:  

           struct hostent {  
               char  *h_name;            /* official name of host */  
               char **h_aliases;         /* alias list */  
               int    h_addrtype;        /* host address type */  
               int    h_length;          /* length of address */  
               char **h_addr_list;       /* list of addresses */  
           }  
           #define h_addr h_addr_list[0] /* for backward compatibility */  

       The members of the hostent structure are:  

       h_name The official name of the host.  

       h_aliases  
              An array of alternative names for the host, terminated by a NULL pointer.  

       h_addrtype  
              The type of address; always AF_INET or AF_INET6 at present.  

       h_length  
              The length of the address in bytes.  

       h_addr_list  
              An array of pointers to network addresses for the host (in network byte order), terminated by a NULL pointer.  

       h_addr The first address in h_addr_list for backward compatibility.

src/backend/libpq/auth.c

/*  
 * Initialize a connection to the LDAP server, including setting up  
 * TLS if requested.  
 */  
static int  
InitializeLDAPConnection(Port *port, LDAP **ldap)  
{  
        int                     ldapversion = LDAP_VERSION3;  
        int                     r;  

        *ldap = ldap_init(port->hba->ldapserver, port->hba->ldapport);  
        if (!*ldap)  
        {  
#ifndef WIN32  
                ereport(LOG,  
                                (errmsg("could not initialize LDAP: %m")));  
#else  
                ereport(LOG,  
                                (errmsg("could not initialize LDAP: error code %d",  
                                                (int) LdapGetLastError())));

man ldap_init

NAME  
       ldap_init, ldap_initialize, ldap_open - Initialize the LDAP library and open a connection to an LDAP server  

SYNOPSIS  
       #include <ldap.h>  

       LDAP *ldap_open(host, port)  
       char *host;  
       int port;  

       LDAP *ldap_init(host, port)  
       char *host;  
       int port;  

DESCRIPTION  
       ldap_open() opens a connection to an LDAP server and allocates an LDAP structure which is used to identify the connection and to maintain per-connection information.    
       ldap_init() allocates an LDAP structure but does not open an initial connection.    
       ldap_initialize() allocates an LDAP structure but does not open an initial connection.    
       ldap_init_fd() allocates an LDAP structure using an existing  connection on the provided socket.    
       One of these routines must be called before any operations are attempted.  

       ldap_open()  takes  host, the hostname on which the LDAP server is running, and port, the port number to which to connect.    
       If the default IANA-assigned port of 389 is desired, LDAP_PORT should be specified for port.    

       The host parameter may contain a blank-separated list of hosts to try to connect to, and each host may optionally by of the form host:port.    
       If present, the :port overrides the port parameter  to ldap_open().     

       Upon  successfully  making a connection to an LDAP server, ldap_open() returns a pointer to an opaque LDAP structure, which should be passed to subsequent calls to ldap_bind(), ldap_search(),  
       etc.   

       Certain fields in the LDAP structure can be set to indicate size limit, time limit, and how aliases are handled during operations;    
       read  and  write  access  to  those  fields  must  occur  by  calling ldap_get_option(3) and ldap_set_option(3) respectively, whenever possible.  

       ldap_init() acts just like ldap_open(), but does not open a connection to the LDAP server.  The actual connection open will occur when the first operation is attempted.

感兴趣的童鞋可以下载openldap的源码看看。
yum install -y openldap-debuginfo

PostgreSQL 的ldap server配置说明，指定多台主机时，空格隔开即可，与ldap_init介绍一致。
http://www.postgresql.org/docs/9.5/static/auth-methods.html#AUTH-LDAP

ldapserver  
    Names or IP addresses of LDAP servers to connect to. Multiple servers may be specified, separated by spaces.  

ldapport  
    Port number on LDAP server to connect to. If no port is specified, the LDAP library's default port setting will be used.

PostgreSQL 使用HbaLine存储pg_hba.conf中的数据结构。与LDAP认证相关的ldapserver和ldapport都在其中。
src/include/libpq/hba.h

typedef struct HbaLine  
{  
    int         linenumber;  
    char       *rawline;  
    ConnType    conntype;  
    List       *databases;  
    List       *roles;  
    struct sockaddr_storage addr;  
    struct sockaddr_storage mask;  
    IPCompareMethod ip_cmp_method;  
    char       *hostname;  
    UserAuth    auth_method;  

    char       *usermap;  
    char       *pamservice;  
    bool        ldaptls;  
    char       *ldapserver;  
    int         ldapport;  
    char       *ldapbinddn;  
    char       *ldapbindpasswd;  
    char       *ldapsearchattribute;  
    char       *ldapbasedn;  
    int         ldapscope;  
    char       *ldapprefix;  
    char       *ldapsuffix;  
    bool        clientcert;  
    char       *krb_realm;  
    bool        include_realm;  
    char       *radiusserver;  
    char       *radiussecret;  
    char       *radiusidentifier;  
    int         radiusport;  
}

语义解析，判断是否使用LDAP认证的部分：

/*  
 * Parse one tokenised line from the hba config file and store the result in a  
 * HbaLine structure, or NULL if parsing fails.  
 *  
 * The tokenised line is a List of fields, each field being a List of  
 * HbaTokens.  
 *  
 * Note: this function leaks memory when an error occurs.  Caller is expected  
 * to have set a memory context that will be reset if this function returns  
 * NULL.  
 */  
static HbaLine *  
parse_hba_line(List *line, int line_num, char *raw_line)  
{  
......  
#endif  
        else if (strcmp(token->string, "ldap") == 0)  
#ifdef USE_LDAP  
                parsedline->auth_method = uaLDAP;  
#else  
                unsupauth = "ldap";  
#endif  
......  
        /*  
         * Check if the selected authentication method has any mandatory arguments  
         * that are not set.  
         */  
        if (parsedline->auth_method == uaLDAP)  
        {  
                MANDATORY_AUTH_ARG(parsedline->ldapserver, "ldapserver", "ldap");  

                /*  
                 * LDAP can operate in two modes: either with a direct bind, using  
                 * ldapprefix and ldapsuffix, or using a search+bind, using  
                 * ldapbasedn, ldapbinddn, ldapbindpasswd and ldapsearchattribute.  
                 * Disallow mixing these parameters.  
                 */  
                if (parsedline->ldapprefix || parsedline->ldapsuffix)  
                {  
                        if (parsedline->ldapbasedn ||  
                                parsedline->ldapbinddn ||  
                                parsedline->ldapbindpasswd ||  
                                parsedline->ldapsearchattribute)  
                        {  
                                ereport(LOG,  
                                                (errcode(ERRCODE_CONFIG_FILE_ERROR),  
                                                 errmsg("cannot use ldapbasedn, ldapbinddn, ldapbindpasswd, ldapsearchattribute, or ldapurl together with ldapprefix"),  
                                                 errcontext("line %d of configuration file \"%s\"",  
                                                                        line_num, HbaFileName)));  
                                return NULL;  
                        }  
                }  
                else if (!parsedline->ldapbasedn)  
                {  
                        ereport(LOG,  
                                        (errcode(ERRCODE_CONFIG_FILE_ERROR),  
                                         errmsg("authentication method \"ldap\" requires argument \"ldapbasedn\", \"ldapprefix\", or \"ldapsuffix\" to be set"),  
                                         errcontext("line %d of configuration file \"%s\"",  
                                                                line_num, HbaFileName)));  
                        return NULL;  
                }  
        }  

......

LDAP认证方法配置的option的语义解析部分：

/*  
 * Parse one name-value pair as an authentication option into the given  
 * HbaLine.  Return true if we successfully parse the option, false if we  
 * encounter an error.  
 */  
static bool  
parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline, int line_num)  
{  
......  
        else if (strcmp(name, "ldapurl") == 0)  
        {  
#ifdef LDAP_API_FEATURE_X_OPENLDAP  
                LDAPURLDesc *urldata;  
                int                     rc;  
#endif  

                REQUIRE_AUTH_OPTION(uaLDAP, "ldapurl", "ldap");  
#ifdef LDAP_API_FEATURE_X_OPENLDAP  
                rc = ldap_url_parse(val, &urldata);  
                if (rc != LDAP_SUCCESS)  
                {  
                        ereport(LOG,  
                                        (errcode(ERRCODE_CONFIG_FILE_ERROR),  
                                         errmsg("could not parse LDAP URL \"%s\": %s", val, ldap_err2string(rc))));  
                        return false;  
                }  

                if (strcmp(urldata->lud_scheme, "ldap") != 0)  
                {  
                        ereport(LOG,  
                                        (errcode(ERRCODE_CONFIG_FILE_ERROR),  
                        errmsg("unsupported LDAP URL scheme: %s", urldata->lud_scheme)));  
                        ldap_free_urldesc(urldata);  
                        return false;  
                }  

                hbaline->ldapserver = pstrdup(urldata->lud_host);  
                hbaline->ldapport = urldata->lud_port;  
                hbaline->ldapbasedn = pstrdup(urldata->lud_dn);  

                if (urldata->lud_attrs)  
                        hbaline->ldapsearchattribute = pstrdup(urldata->lud_attrs[0]);          /* only use first one */  
                hbaline->ldapscope = urldata->lud_scope;  
                if (urldata->lud_filter)  
                {  
                        ereport(LOG,  
                                        (errcode(ERRCODE_CONFIG_FILE_ERROR),  
                                         errmsg("filters not supported in LDAP URLs")));  
                        ldap_free_urldesc(urldata);  
                        return false;  
                }  
                ldap_free_urldesc(urldata);  
#else                                                   /* not OpenLDAP */  
                ereport(LOG,  
                                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),  
                                 errmsg("LDAP URLs not supported on this platform")));  
#endif   /* not OpenLDAP */  
        }  
        else if (strcmp(name, "ldaptls") == 0)  
        {  
                REQUIRE_AUTH_OPTION(uaLDAP, "ldaptls", "ldap");  
                if (strcmp(val, "1") == 0)  
                        hbaline->ldaptls = true;  
                else  
......

↧

让Greenplum 支持反转索引

March 8, 2016, 3:22 am

≫ Next: Greenplum通过gp_dist_random('gp_id') 在所有节点调用某个函数

≪ Previous: PostgreSQL 如何提升LDAP或AD域认证的可用性

GP的反转索引可以通过函数reverse来实现，但是这个函数在GP的版本中没有，所以需要port过来。
可以在9.5的代码中找到
src/backend/utils/adt/varlena.c

vi reverse.c

#include <string.h>
#include "postgres.h"
#include "fmgr.h"

PG_MODULE_MAGIC;

PG_FUNCTION_INFO_V1(text_reverse);


/*
 * Return reversed string
 */
Datum
text_reverse(PG_FUNCTION_ARGS)
{
        text       *str = PG_GETARG_TEXT_PP(0);
        const char *p = VARDATA_ANY(str);
        int                     len = VARSIZE_ANY_EXHDR(str);
        const char *endp = p + len;
        text       *result;
        char       *dst;

        result = palloc(len + VARHDRSZ);
        dst = (char *) VARDATA(result) + len;
        SET_VARSIZE(result, len + VARHDRSZ);

        if (pg_database_encoding_max_length() > 1)
        {
                /* multibyte version */
                while (p < endp)
                {
                        int                     sz;

                        sz = pg_mblen(p);
                        dst -= sz;
                        memcpy(dst, p, sz);
                        p += sz;
                }
        }
        else
        {
                /* single byte version */
                while (p < endp)
                        *(--dst) = *p++;
        }

        PG_RETURN_TEXT_P(result);
}

编译

gcc -O3 -Wall -Wextra -Werror -I /home/digoal/gpsrc/src/include -g -fPIC -c ./reverse.c -o reverse.o
gcc -O3 -Wall -Wextra -Werror -I /home/digoal/gpsrc/src/include -g -shared reverse.o -o libreverse.so

cp libreverse.so /home/digoal/gphome/lib

拷贝到所有节点

gpscp -f ./host /home/digoal/gphome/lib/libreverse.so =:/home/digoal/gphome/lib/

创建函数并测试可用性

postgres=# create or replace function reverse(text) returns text as '/home/digoal/gphome/lib/libreverse.so', 'text_reverse' language C STRICT immutable;
CREATE FUNCTION
postgres=# select reverse('abc');
 reverse 
---------
 cba
(1 row)

postgres=# select reverse('a f d12');
 reverse 
---------
 21d f a
(1 row)

postgres=# select reverse(null);
 reverse 
---------

(1 row)

创建反转索引测试

postgres=# create table t(id int, info text);
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
postgres=# create index idx on t(reverse(info));
CREATE INDEX
postgres=# insert into t select id,md5(random()::text) from generate_series(1,1000000) t(id);
INSERT 0 1000000

postgres=# select id,info,reverse(info) from t limit 10;
  id  |               info               |             reverse              
------+----------------------------------+----------------------------------
  197 | fefb23cb1705e6faa74601e6cd0dc8d7 | 7d8cd0dc6e10647aaf6e5071bc32bfef
  314 | 64c3d79458fc0ba2413e5493582830dd | dd0382853945e3142ab0cf85497d3c46
  426 | e0486de86c2c6bd72912fbc33d059de8 | 8ed950d33cbf21927db6c2c68ed6840e
  715 | c5087148a8086e2201a63b158268adbb | bbda862851b36a1022e6808a8417805c
  768 | 50ebdba7ff260d5adc11495313817221 | 12271831359411cda5d062ff7abdbe05
  944 | 8da13db138858ec1f78193f5c16cc310 | 013cc61c5f39187f1ce858831bd31ad8
 1057 | cf029096c2c66714861d0adba9ab49d8 | 8d94ba9abda0d16841766c2c690920fc
 1233 | ae6eb1bdf32b15c73c1a04adf54b9881 | 1889b45fda40a1c37c51b23fdb1be6ea
 1286 | 9943f89159055e0450765ab0968a1ad8 | 8da1a8690ba5670540e55095198f3499
 1575 | b8f0337315d238070984b9883a965c57 | 75c569a3889b489070832d5137330f8b
(10 rows)

postgres=# select * from t where reverse(info)>='7d8cd0dc6e10647aaf6e507' and reverse(info)<'7d8cd0dc6e10647aaf6e508';
 id  |               info               
-----+----------------------------------
 197 | fefb23cb1705e6faa74601e6cd0dc8d7
(1 row)

postgres=# explain analyze select * from t where reverse(info)>='7d8cd0dc6e10647aaf6e507' and reverse(info)<'7d8cd0dc6e10647aaf6e508';
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 240:1  (slice1; segments: 240)  (cost=11012.90..13972.90 rows=40001 width=37)
   Rows out:  1 rows at destination with 9.113 ms to first row, 43 ms to end, start offset by 1.317 ms.
   ->  Bitmap Heap Scan on t  (cost=11012.90..13972.90 rows=167 width=37)
         Recheck Cond: reverse(info) >= '7d8cd0dc6e10647aaf6e507'::text AND reverse(info) < '7d8cd0dc6e10647aaf6e508'::text
         Rows out:  1 rows (seg46) with 0.156 ms to first row, 0.178 ms to end, start offset by 9.850 ms.
         ->  Bitmap Index Scan on idx  (cost=0.00..11002.90 rows=167 width=0)
               Index Cond: reverse(info) >= '7d8cd0dc6e10647aaf6e507'::text AND reverse(info) < '7d8cd0dc6e10647aaf6e508'::text
               Bitmaps out:  Avg 1.0 x 240 workers.  Max 1 (seg0) with 0.021 ms to end, start offset by 8.845 ms.
               Work_mem used:  9K bytes.
 Slice statistics:
   (slice0)    Executor memory: 475K bytes.
   (slice1)    Executor memory: 321K bytes avg x 240 workers, 329K bytes max (seg46).  Work_mem: 9K bytes max.
 Statement statistics:
   Memory used: 128000K bytes
 Total runtime: 71.958 ms
(15 rows)

适用场景：
1. 带后缀的检索。

↧

Greenplum通过gp_dist_random('gp_id') 在所有节点调用某个函数

March 9, 2016, 12:37 am

≫ Next: zhparser中文分词的参数与分词结果的影响

≪ Previous: 让Greenplum 支持反转索引

使用greenplum时，如果需要调用一个函数，这个函数很可能就在master执行，而不会跑到segment上去执行。
例如 random()函数。
通过select random()来调用的话，不需要将这条SQL发送到segment节点，所以执行计划如下，没有gather motion的过程。

postgres=# explain analyze select random();  
                                       QUERY PLAN                                         
----------------------------------------------------------------------------------------  
 Result  (cost=0.01..0.02 rows=1 width=0)  
   Rows out:  1 rows with 0.017 ms to end, start offset by 0.056 ms.  
   InitPlan  
     ->  Result  (cost=0.00..0.01 rows=1 width=0)  
           Rows out:  1 rows with 0.004 ms to end of 2 scans, start offset by 0.059 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 29K bytes.  
   (slice1)    Executor memory: 29K bytes.  
 Statement statistics:  
   Memory used: 128000K bytes  
 Total runtime: 0.074 ms  
(11 rows)

如果要让这个函数在segment执行，怎么办呢？
通过gp_dist_random('gp_id')来调用，gp_dist_random的参数是一个可查询的视图，或表。

postgres=# explain analyze select random() from gp_dist_random('gp_id');  
                                                               QUERY PLAN                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 240:1  (slice1; segments: 240)  (cost=0.00..4.00 rows=240 width=0)  
   Rows out:  240 rows at destination with 6.336 ms to first row, 59 ms to end, start offset by 4195 ms.  
   ->  Seq Scan on gp_id  (cost=0.00..4.00 rows=1 width=0)  
         Rows out:  Avg 1.0 rows x 240 workers.  Max 1 rows (seg0) with 0.073 ms to first row, 0.075 ms to end, start offset by 4207 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 471K bytes.  
   (slice1)    Executor memory: 163K bytes avg x 240 workers, 163K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Total runtime: 4279.445 ms  
(10 rows)

gp_id在每个segment中都有一条记录，所以以上SQL会在每个SEGMENT中调用一次random()并返回所有结果，例如我的测试环境中有240个segment, 那么以上SQL将返回240条记录。

在gp_id的定义中，介绍了gp_dist_random用它可以做一些管理的工作：
譬如查询数据库的大小，查询表的大小，其实都是这样统计的。
src/backend/catalog/postgres_bki_srcs

/*-------------------------------------------------------------------------  
 *  
 * gp_id.h  
 *        definition of the system "database identifier" relation (gp_dbid)  
 *        along with the relation's initial contents.  
 *  
 * Copyright (c) 2009-2010, Greenplum inc  
 *  
 * NOTES  
 *    Historically this table was used to supply every segment with its  
 * identification information.  However in the 4.0 release when the file  
 * replication feature was added it could no longer serve this purpose  
 * because it became a requirement for all tables to have the same physical  
 * contents on both the primary and mirror segments.  To resolve this the  
 * information is now passed to each segment on startup based on the  
 * gp_segment_configuration (stored on the master only), and each segment  
 * has a file in its datadirectory (gp_dbid) that uniquely identifies the  
 * segment.  
 *  
 *   The contents of the table are now irrelevant, with the exception that  
 * several tools began relying on this table for use as a method of remote  
 * function invocation via gp_dist_random('gp_id') due to the fact that this  
 * table was guaranteed of having exactly one row on every segment.  The  
 * contents of the row have no defined meaning, but this property is still  
 * relied upon.  
 */  
#ifndef _GP_ID_H_  
#define _GP_ID_H_  


#include "catalog/genbki.h"  
/*  
 * Defines for gp_id table  
 */  
#define GpIdRelationName                        "gp_id"  

/* TIDYCAT_BEGINFAKEDEF  

   CREATE TABLE gp_id  
   with (shared=true, oid=false, relid=5001, content=SEGMENT_LOCAL)  
   (  
   gpname       name     ,  
   numsegments  smallint ,  
   dbid         smallint ,  
   content      smallint   
   );  

   TIDYCAT_ENDFAKEDEF  
*/

查询数据库大小的GP函数

postgres=# \df+ pg_database_size  
                                                                                                     List of functions  
   Schema   |       Name       | Result data type | Argument data types |  Type  |  Data access   | Volatility |  Owner   | Language |      Source code      |                         Description                           
------------+------------------+------------------+---------------------+--------+----------------+------------+----------+----------+-----------------------+-------------------------------------------------------------  
 pg_catalog | pg_database_size | bigint           | name                | normal | reads sql data | volatile   | dege.zzz | internal | pg_database_size_name | Calculate total disk space usage for the specified database  
 pg_catalog | pg_database_size | bigint           | oid                 | normal | reads sql data | volatile   | dege.zzz | internal | pg_database_size_oid  | Calculate total disk space usage for the specified database  
(2 rows)

其中pg_database_size_name 的源码如下：
很明显，在统计数据库大小时也用到了select sum(pg_database_size('%s'))::int8 from gp_dist_random('gp_id');

Datum  
pg_database_size_name(PG_FUNCTION_ARGS)  
{  
        int64           size = 0;  
        Name            dbName = PG_GETARG_NAME(0);  
        Oid                     dbOid = get_database_oid(NameStr(*dbName));  

        if (!OidIsValid(dbOid))  
                ereport(ERROR,  
                                (errcode(ERRCODE_UNDEFINED_DATABASE),  
                                 errmsg("database \"%s\" does not exist",  
                                                NameStr(*dbName))));  

        size = calculate_database_size(dbOid);  

        if (Gp_role == GP_ROLE_DISPATCH)  
        {  
                StringInfoData buffer;  

                initStringInfo(&buffer);  

                appendStringInfo(&buffer, "select sum(pg_database_size('%s'))::int8 from gp_dist_random('gp_id');", NameStr(*dbName));  

                size += get_size_from_segDBs(buffer.data);  
        }  

        PG_RETURN_INT64(size);  
}

不信我们可以直接查询这个SQL，和使用pg_database_size函数得到的结果几乎是一样的，只差了calculate_database_size的部分。

postgres=# select sum(pg_database_size('postgres'))::int8 from gp_dist_random('gp_id');  
      sum         
----------------  
 16006753522624  
(1 row)  

postgres=# select pg_database_size('postgres');  
 pg_database_size   
------------------  
   16006763924106  
(1 row)

gp_dist_random('gp_id')本质上就是在所有节点查询gp_id，
gp_dist_random('pg_authid')就是在所有节点查询pg_authid，
例如：

postgres=# select * from gp_dist_random('gp_id');  
  gpname   | numsegments | dbid | content   
-----------+-------------+------+---------  
 Greenplum |          -1 |   -1 |      -1  
 Greenplum |          -1 |   -1 |      -1  
 Greenplum |          -1 |   -1 |      -1  
 Greenplum |          -1 |   -1 |      -1  
 Greenplum |          -1 |   -1 |      -1  
 Greenplum |          -1 |   -1 |      -1  
 Greenplum |          -1 |   -1 |      -1  
 Greenplum |          -1 |   -1 |      -1  
 Greenplum |          -1 |   -1 |      -1  
 Greenplum |          -1 |   -1 |      -1  
。。。。。。

如果不想返回太多记录，可以使用limit 来过滤，但是执行还是会在所有的segment都执行，如下：

postgres=# explain analyze select random() from gp_dist_random('gp_id') limit 1;  
                                                                  QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.00..0.04 rows=1 width=0)  
   Rows out:  1 rows with 5.865 ms to first row, 5.884 ms to end, start offset by 4212 ms.  
   ->  Gather Motion 240:1  (slice1; segments: 240)  (cost=0.00..0.04 rows=1 width=0)  
         Rows out:  1 rows at destination with 5.857 ms to end, start offset by 4212 ms.  
         ->  Limit  (cost=0.00..0.02 rows=1 width=0)  
               Rows out:  Avg 1.0 rows x 240 workers.  Max 1 rows (seg0) with 0.062 ms to first row, 0.063 ms to end, start offset by 4228 ms.  
               ->  Seq Scan on gp_id  (cost=0.00..4.00 rows=1 width=0)  
                     Rows out:  Avg 1.0 rows x 240 workers.  Max 1 rows (seg0) with 0.060 ms to end, start offset by 4228 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 463K bytes.  
   (slice1)    Executor memory: 163K bytes avg x 240 workers, 163K bytes max (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Total runtime: 4288.007 ms  
(14 rows)

↧

zhparser中文分词的参数与分词结果的影响

March 10, 2016, 1:31 am

≫ Next: Greenplum plpgsql函数中exit存在无法跳出循环的BUG

≪ Previous: Greenplum通过gp_dist_random('gp_id') 在所有节点调用某个函数

以下配置在PG9.2及以上版本使用,这些选项是用来控制字典加载行为和分词行为的,这些选项都不是必须的,默认都为false(即如果没有在配置文件中设置这些选项，则zhparser的行为与将下面的选项设置为false一致)。

zhparser.punctuation_ignore = f

zhparser.seg_with_duality = f

zhparser.dict_in_memory = f

zhparser.multi_short = f

zhparser.multi_duality = f

zhparser.multi_zmain = f

zhparser.multi_zall = f

http://www.xunsearch.com/scws/docs.php#libscws

8. `void scws_set_ignore(scws_t s, int yes)` 设定分词结果是否忽略所有的标点等特殊符号（不会忽略\r和\n）。

> **参数 yes** 1 表示忽略，0 表示不忽略，缺省情况为不忽略。

9. `void scws_set_multi(scws_t s, int mode)` 设定分词执行时是否执行针对长词复合切分。（例：“中国人”分为“中国”、“人”、“中国人”）。

> **参数 mode** 复合分词法的级别，缺省不复合分词。取值由下面几个常量异或组合：

> - SCWS_MULTI_SHORT 短词

> - SCWS_MULTI_DUALITY 二元（将相邻的2个单字组合成一个词）

> - SCWS_MULTI_ZMAIN 重要单字

> - SCWS_MULTI_ZALL 全部单字

10. `void scws_set_duality(scws_t s, int yes)` 设定是否将闲散文字自动以二字分词法聚合。

> **参数 yes** 如果为 1 表示执行二分聚合，0 表示不处理，缺省为 0。

digoal=> select to_tsvector('zhcfg','云安全部');

to_tsvector

-------------

'云安':1

(1 row)

digoal=> select to_tsvector('zhcfg','云安全部');

to_tsvector

----------------------------

'云':1 '安全':3 '安全部':2

(1 row)

digoal=> set zhparser.multi_short=off;

SET

digoal=> select to_tsvector('zhcfg','网络安全部');

to_tsvector

---------------------

'安全部':2 '网络':1

(1 row)

digoal=> set zhparser.multi_short=on;

SET

digoal=> select to_tsvector('zhcfg','网络安全部');

to_tsvector

------------------------------

'安全':3 '安全部':2 '网络':1

(1 row)

影响索引zhparser的设置影响to_tsvector函数索引。

建议初始 zhparser.multi_short=on 设置为on。

或者设置用户级别或者数据库级别的参数。

↧

Greenplum plpgsql函数中exit存在无法跳出循环的BUG

March 10, 2016, 1:45 am

≫ Next: PostgreSQL in 语法的优化器处理以及如何优化

≪ Previous: zhparser中文分词的参数与分词结果的影响

Greenplum中如果使用循环，并且内部嵌套了子块，在子块中的exit只能跳出子块，不能跳出子块外面的循环。

CREATE OR REPLACE FUNCTION test1(i integer) RETURNS

integer AS

DECLARE count int;

BEGIN

count := 1;

LOOP

count := count + 1;

begin

raise notice 'sub xact: %', count;

EXECUTE 'select 1';

IF count > 10 THEN

EXIT; -- BUG在这里, 只跳出了begin, 没有跳出LOOP

raise notice 'sub xact if: %', count;

END IF;

raise notice 'sub xact end if: %', count;

exception when others then

end;

raise notice 'parent xact: %', count;

END LOOP;

return 1;

END

$$ LANGUAGE plpgsql;

postgres=# select test1(1);

NOTICE: sub xact: 2

NOTICE: sub xact end if: 2

NOTICE: parent xact: 2

NOTICE: sub xact: 3

NOTICE: sub xact end if: 3

NOTICE: parent xact: 3

NOTICE: sub xact: 4

NOTICE: sub xact end if: 4

NOTICE: parent xact: 4

NOTICE: sub xact: 5

NOTICE: sub xact end if: 5

NOTICE: parent xact: 5

NOTICE: sub xact: 6

NOTICE: sub xact end if: 6

NOTICE: parent xact: 6

NOTICE: sub xact: 7

NOTICE: sub xact end if: 7

NOTICE: parent xact: 7

NOTICE: sub xact: 8

NOTICE: sub xact end if: 8

NOTICE: parent xact: 8

NOTICE: sub xact: 9

NOTICE: sub xact end if: 9

NOTICE: parent xact: 9

NOTICE: sub xact: 10

NOTICE: sub xact end if: 10

NOTICE: parent xact: 10

NOTICE: sub xact: 11

NOTICE: parent xact: 11

NOTICE: sub xact: 12

NOTICE: parent xact: 12

NOTICE: sub xact: 13

NOTICE: parent xact: 13

NOTICE: sub xact: 14

NOTICE: parent xact: 14

CREATE OR REPLACE FUNCTION test1(i integer) RETURNS

integer AS

DECLARE count int;

BEGIN

count := 1;

LOOP

count := count + 1;

begin

raise notice 'sub xact: %', count;

EXECUTE 'select 1';

IF count > 10 THEN

return 0; -- 改成return, 退出整个函数, 如果要跳出loop, 应该在loop内控制。不能放在LOOP内的sub block执行。

END IF;

exception when others then

end;

raise notice 'parent xact: %', count;

END LOOP;

return 1;

END

$$ LANGUAGE plpgsql;

postgres=# select test1(1);

NOTICE: sub xact: 2

NOTICE: parent xact: 2

NOTICE: sub xact: 3

NOTICE: parent xact: 3

NOTICE: sub xact: 4

NOTICE: parent xact: 4

NOTICE: sub xact: 5

NOTICE: parent xact: 5

NOTICE: sub xact: 6

NOTICE: parent xact: 6

NOTICE: sub xact: 7

NOTICE: parent xact: 7

NOTICE: sub xact: 8

NOTICE: parent xact: 8

NOTICE: sub xact: 9

NOTICE: parent xact: 9

NOTICE: sub xact: 10

NOTICE: parent xact: 10

NOTICE: sub xact: 11

test1

-------

(1 row)

在PostgreSQL中不存在这个问题。

↧

PostgreSQL in 语法的优化器处理以及如何优化

March 10, 2016, 10:28 pm

≫ Next: PostgreSQL promote过程和一主多备时间线对接详解

≪ Previous: Greenplum plpgsql函数中exit存在无法跳出循环的BUG

PostgreSQL in 的优化器处理以及如何优化

在使用数据库的过程中，经常会遇到需要匹配多个值的情况。
通常的写法包括：

-- select * from table where id = any(array);  
-- select * from table where id in (values);  
-- select * from table where id=x or id=x or ....;  
-- select * from table where id in (query);  
-- select * from table where id in ( values query );  
-- select * from table t1 join (query or values query) t2 on t1.id=t2.id;

每种写法会产生多种执行计划的可能，如下：

-- select * from table where id = any(array);  
  优化器可以使用index scan, bitmap scan, seq scan.  

-- select * from table where id in (values);  
  优化器可以使用index scan, bitmap scan, seq scan.  

-- select * from table where id=x or id=x or ....;  
  优化器可以使用bitmap scan + BitmapOr, seq scan.  

-- select * from table where id in (query);  
  优化器可以使用join (merge,hash,nest).  

-- select * from table where id in ( values query );  
  优化器可以使用join (merge,hash,nest).  

-- select * from table t1 join (query or values query) t2 on t1.id=t2.id;  
  优化器可以使用join (merge,hash,nest).

SQL优化策略是尽量减少CPU的运算以及page的扫描数量。

下面针对每种SQL，看看对应的可能的执行计划有什么差别，（使用开关来控制执行计划的选择, 如set enable_indexscan=off）
支持的开关如下：

enable_bitmapscan     enable_hashjoin       enable_indexscan      enable_mergejoin      enable_seqscan        enable_tidscan          
enable_hashagg        enable_indexonlyscan  enable_material       enable_nestloop       enable_sort

开始测试，使用auto_explain输出执行计划：

load 'auto_explain';    
set auto_explain.log_analyze =true;    
set auto_explain.log_buffers =true;    
set auto_explain.log_nested_statements=true;    
set auto_explain.log_timing=true;    
set auto_explain.log_triggers=true;    
set auto_explain.log_verbose=true;    
set auto_explain.log_min_duration=0;    
set client_min_messages ='log';    
set work_mem='8GB';

测试SQL写法1：

-- select * from table where id = any(array);  

do language plpgsql $$  
declare  
  v_id int[];  
begin  
  select array_agg(trunc(random()*100000)) into v_id from generate_series(1,200) t(id);  
  perform * from t_in_test where id = any (v_id);  
end;  
$$;

优化器选择1 (index scan)：
离散扫描，适合小的扫描集。

LOG:  duration: 2.312 ms  plan:  
Query Text: SELECT * from t_in_test where id = any (v_id)  
Index Scan using t_in_test_pkey on public.t_in_test  (cost=0.43..895.50 rows=200 width=37) (actual time=0.039..2.266 rows=200 loops=1)  
  Output: id, info  
  Index Cond: (t_in_test.id = ANY ('{50836,73414,41071,45604,...省略部分...,76236}'::integer[]))  
  Buffers: shared hit=776  
CONTEXT:  SQL statement "SELECT * from t_in_test where id = any (v_id)"  
PL/pgSQL function inline_code_block line 6 at PERFORM

优化器选择2 (bitmap scan)：
比index scan多了Recheck的开销，以及按照ctid排序的开销。
适合大的扫描集，排序的目的是减少离散扫描，还可以用到块设备的prefetch。

LOG:  duration: 1.602 ms  plan:  
Query Text: SELECT * from t_in_test where id = any (v_id)  
Bitmap Heap Scan on public.t_in_test  (cost=888.55..1711.16 rows=200 width=37) (actual time=0.880..1.563 rows=200 loops=1)  
  Output: id, info  
  Recheck Cond: (t_in_test.id = ANY ('{32635,31123,6282,59640,...省略部分...,87705}'::integer[]))  
  Heap Blocks: exact=184  
  Buffers: shared hit=784  
  ->  Bitmap Index Scan on t_in_test_pkey  (cost=0.00..888.50 rows=200 width=0) (actual time=0.846..0.846 rows=200 loops=1)  
        Index Cond: (t_in_test.id = ANY ('{32635,31123,6282,59640,...省略部分...,87705}'::integer[]))  
        Buffers: shared hit=600  
CONTEXT:  SQL statement "SELECT * from t_in_test where id = any (v_id)"  
PL/pgSQL function inline_code_block line 6 at PERFORM

优化器选择3 (seq scan)：
适合非常庞大的扫描集。

LOG:  duration: 19940.394 ms  plan:  
Query Text: SELECT * from t_in_test where id = any (v_id)  
Seq Scan on public.t_in_test  (cost=0.00..2683354.80 rows=200 width=37) (actual time=4.237..19940.330 rows=199 loops=1)  
  Output: id, info  
  Filter: (t_in_test.id = ANY ('{45867,72450,95153,86233,63073,11016,56010,47158,...省略部分...,90444}'::integer[]))  
  Rows Removed by Filter: 9999801  
  Buffers: shared hit=83334  
CONTEXT:  SQL statement "SELECT * from t_in_test where id = any (v_id)"  
PL/pgSQL function inline_code_block line 6 at PERFORM

测试SQL写法2：

-- select * from table where id in (values);  

do language plpgsql $$  
declare  
  v_where text;  
begin  
  select string_agg(id::text,',') into v_where from (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t;  
  execute 'select * from t_in_test where id in ('||v_where||')';  
end;  
$$;

优化器选择1 (index scan)：

LOG:  duration: 0.919 ms  plan:  
Query Text: select * from t_in_test where id in (8826,2038,72163,29843,76886,37893,5279,64308,...省略部分...,48126,44868)  
Index Scan using t_in_test_pkey on public.t_in_test  (cost=0.43..895.50 rows=200 width=37) (actual time=0.017..0.894 rows=200 loops=1)  
  Output: id, info  
  Index Cond: (t_in_test.id = ANY ('{8826,2038,72163,29843,76886,37893,5279,64308,7370,80216,...省略部分...,48126,44868}'::integer[]))  
  Buffers: shared hit=779  
CONTEXT:  SQL statement "select * from t_in_test where id in (8826,2038,72163,29843,76886,37893,5279,64308,7370,80216,...省略部分...,73366,48126,44868)"  
PL/pgSQL function inline_code_block line 6 at EXECUTE

优化器选择2 (bitmap scan)：

LOG:  duration: 1.012 ms  plan:  
Query Text: select * from t_in_test where id in (17424,80517,35148,38245,93037,...省略部分...,14997,34639,10646)  
Bitmap Heap Scan on public.t_in_test  (cost=888.55..1711.16 rows=200 width=37) (actual time=0.657..0.978 rows=200 loops=1)  
  Output: id, info  
  Recheck Cond: (t_in_test.id = ANY ('{17424,80517,35148,38245,93037,4516,...省略部分...,14997,34639,10646}'::integer[]))  
  Heap Blocks: exact=177  
  Buffers: shared hit=779  
  ->  Bitmap Index Scan on t_in_test_pkey  (cost=0.00..888.50 rows=200 width=0) (actual time=0.629..0.629 rows=200 loops=1)  
        Index Cond: (t_in_test.id = ANY ('{17424,80517,35148,38245,93037,4516,27690,...省略部分...,34639,10646}'::integer[]))  
        Buffers: shared hit=602  
CONTEXT:  SQL statement "select * from t_in_test where id in (17424,80517,35148,38245,93037,4516,27690,48978,11902,...省略部分...,34639,10646)"  
PL/pgSQL function inline_code_block line 6 at EXECUTE

优化器选择3 (seq scan)：

LOG:  duration: 19678.014 ms  plan:  
Query Text: select * from t_in_test where id in (77056,1340,73056,42536,6862,44702,64810,42774,...省略部分...,24083,11322)  
Seq Scan on public.t_in_test  (cost=0.00..2683354.80 rows=200 width=37) (actual time=2.045..19677.975 rows=200 loops=1)  
  Output: id, info  
  Filter: (t_in_test.id = ANY ('{77056,1340,73056,42536,6862,...省略部分...,24083,11322}'::integer[]))  
  Rows Removed by Filter: 9999800  
  Buffers: shared hit=83334  
CONTEXT:  SQL statement "select * from t_in_test where id in (77056,1340,73056,42536,6862,44702,...省略部分...,24083,11322)"  
PL/pgSQL function inline_code_block line 6 at EXECUTE

测试SQL写法3：

-- select * from table where id=x or id=x or ....;  

do language plpgsql $$  
declare  
  v_where text := 'id=';  
  v int;  
begin  
  for v in select trunc(random()*100000)::int from generate_series(1,200) t(id)  
  loop  
    v_where := ' '|| v_where ||v||' or id=';  
  end loop;  
  v_where := rtrim(v_where,'or id=');  
  execute 'select * from t_in_test where '||v_where;  
end;  
$$;

优化器选择1 (bitmapindex scan + bitmapor)：
使用or的写法，只能选择bitmap index scan，所以不如使用IN的写法。

LOG:  duration: 1.085 ms  plan:  
Query Text: select * from t_in_test where                                                                                                                                                                                                         id=29207 or id=69918 or id=4044 or ...省略部分... or id=53009 or id=28015 or id=11763  
Bitmap Heap Scan on public.t_in_test  (cost=898.50..1771.11 rows=200 width=37) (actual time=0.754..1.043 rows=200 loops=1)  
  Output: id, info  
  Recheck Cond: ((t_in_test.id = 29207) OR (t_in_test.id = 69918) OR (t_in_test.id = 4044) OR (t_in_test.id = 65838) OR ...省略部分... OR (t_in_test.id = 28015) OR (t_in_test.id = 11763))  
  Heap Blocks: exact=180  
  Buffers: shared hit=781  
  ->  BitmapOr  (cost=898.50..898.50 rows=200 width=0) (actual time=0.725..0.725 rows=0 loops=1)  
        Buffers: shared hit=601  
        ->  Bitmap Index Scan on t_in_test_pkey  (cost=0.00..4.44 rows=1 width=0) (actual time=0.020..0.020 rows=1 loops=1)  
              Index Cond: (t_in_test.id = 29207)  
              Buffers: shared hit=3  
        ->  Bitmap Index Scan on t_in_test_pkey  (cost=0.00..4.44 rows=1 width=0) (actual time=0.011..0.011 rows=1 loops=1)  
              Index Cond: (t_in_test.id = 69918)  
              Buffers: shared hit=3  
        .....省略部分  
        ->  Bitmap Index Scan on t_in_test_pkey  (cost=0.00..4.44 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=1)  
              Index Cond: (t_in_test.id = 11763)  
              Buffers: shared hit=3  
CONTEXT:  SQL statement "select * from t_in_test where                                                                                                                                                                                                         id=29207 or id=69918 or id=4044 or ...省略部分... or id=28015 or id=11763"  
PL/pgSQL function inline_code_block line 11 at EXECUTE

优化器选择2 (seq scan)：

LOG:  duration: 107484.074 ms  plan:  
Query Text: select * from t_in_test where                                                                                                                                                                                                         id=51946 or id=17129 or id=90027 or ...省略部分... or id=22127 or id=62334 or id=11722  
Seq Scan on public.t_in_test  (cost=0.00..5183374.80 rows=200 width=37) (actual time=17.394..107483.942 rows=199 loops=1)  
  Output: id, info  
  Filter: ((t_in_test.id = 51946) OR (t_in_test.id = 17129) OR (t_in_test.id = 90027) OR ...省略部分... OR (t_in_test.id = 62334) OR (t_in_test.id = 11722))  
  Rows Removed by Filter: 9999801  
  Buffers: shared hit=83334  
CONTEXT:  SQL statement "select * from t_in_test where                                                                                                                                                                                                         id=51946 or id=17129 or id=90027 or ...省略部分... or id=62334 or id=11722"  
PL/pgSQL function inline_code_block line 11 at EXECUTE

测试SQL写法4：

-- select * from table where id in (query);  

do language plpgsql $$  
declare  
begin  
  perform * from t_in_test where id in (select trunc(random()*100000)::int as id from generate_series(1,200) t(id));  
end;  
$$;

优化器选择1 (nestloop join)：
适合小的扫描集，并且其中有一个表的JOIN列是带有主键或唯一约束的。

LOG:  duration: 1.314 ms  plan:  
Query Text: SELECT * from t_in_test where id in (select trunc(random()*100000)::int as id from generate_series(1,200) t(id))  
Nested Loop  (cost=32.94..1727.00 rows=5000040 width=37) (actual time=0.166..1.226 rows=200 loops=1)  
  Output: t_in_test.id, t_in_test.info  
  Buffers: shared hit=800  
  ->  HashAggregate  (cost=32.50..34.50 rows=200 width=4) (actual time=0.149..0.189 rows=200 loops=1)  
        Output: ((trunc((random() * '100000'::double precision)))::integer)  
        Group Key: (trunc((random() * '100000'::double precision)))::integer  
        ->  Function Scan on pg_catalog.generate_series t  (cost=0.00..20.00 rows=1000 width=0) (actual time=0.033..0.089 rows=200 loops=1)  
              Output: (trunc((random() * '100000'::double precision)))::integer  
              Function Call: generate_series(1, 200)  
  ->  Index Scan using t_in_test_pkey on public.t_in_test  (cost=0.43..8.45 rows=1 width=37) (actual time=0.004..0.005 rows=1 loops=200)  
        Output: t_in_test.id, t_in_test.info  
        Index Cond: (t_in_test.id = ((trunc((random() * '100000'::double precision)))::integer))  
        Buffers: shared hit=800  
CONTEXT:  SQL statement "SELECT * from t_in_test where id in (select trunc(random()*100000)::int as id from generate_series(1,200) t(id))"  
PL/pgSQL function inline_code_block line 4 at PERFORM

优化器选择2 (hash join)：
适合大的扫描集，同时两个表的JOIN列上面都没有索引的情况。

LOG:  duration: 2454.400 ms  plan:  
Query Text: SELECT * from t_in_test where id in (select trunc(random()*100000)::int as id from generate_series(1,200) t(id))  
Hash Join  (cost=37.00..220874.10 rows=5000040 width=37) (actual time=0.413..2454.343 rows=200 loops=1)  
  Output: t_in_test.id, t_in_test.info  
  Hash Cond: (t_in_test.id = ((trunc((random() * '100000'::double precision)))::integer))  
  Buffers: shared hit=83334  
  ->  Seq Scan on public.t_in_test  (cost=0.00..183334.80 rows=10000080 width=37) (actual time=0.010..1182.626 rows=10000000 loops=1)  
        Output: t_in_test.id, t_in_test.info  
        Buffers: shared hit=83334  
  ->  Hash  (cost=34.50..34.50 rows=200 width=4) (actual time=0.221..0.221 rows=200 loops=1)  
        Output: ((trunc((random() * '100000'::double precision)))::integer)  
        Buckets: 1024  Batches: 1  Memory Usage: 16kB  
        ->  HashAggregate  (cost=32.50..34.50 rows=200 width=4) (actual time=0.149..0.177 rows=200 loops=1)  
              Output: ((trunc((random() * '100000'::double precision)))::integer)  
              Group Key: (trunc((random() * '100000'::double precision)))::integer  
              ->  Function Scan on pg_catalog.generate_series t  (cost=0.00..20.00 rows=1000 width=0) (actual time=0.033..0.088 rows=200 loops=1)  
                    Output: (trunc((random() * '100000'::double precision)))::integer  
                    Function Call: generate_series(1, 200)  
CONTEXT:  SQL statement "SELECT * from t_in_test where id in (select trunc(random()*100000)::int as id from generate_series(1,200) t(id))"  
PL/pgSQL function inline_code_block line 4 at PERFORM

优化器选择3 (merge join)：
适合大的扫描集，并且两个表的JOIN列都有索引。

LOG:  duration: 32.551 ms  plan:  
Query Text: SELECT * from t_in_test where id in (select trunc(random()*100000)::int as id from generate_series(1,200) t(id))  
Merge Join  (cost=42.58..368067.98 rows=5000040 width=37) (actual time=0.561..32.497 rows=200 loops=1)  
  Output: t_in_test.id, t_in_test.info  
  Merge Cond: (t_in_test.id = ((trunc((random() * '100000'::double precision)))::integer))  
  Buffers: shared hit=1112  
  ->  Index Scan using t_in_test_pkey on public.t_in_test  (cost=0.43..343022.64 rows=10000080 width=37) (actual time=0.016..20.499 rows=99905 loops=1)  
        Output: t_in_test.id, t_in_test.info  
        Buffers: shared hit=1108  
  ->  Sort  (cost=42.15..42.65 rows=200 width=4) (actual time=0.268..0.296 rows=200 loops=1)  
        Output: ((trunc((random() * '100000'::double precision)))::integer)  
        Sort Key: ((trunc((random() * '100000'::double precision)))::integer)  
        Sort Method: quicksort  Memory: 34kB  
        Buffers: shared hit=4  
        ->  HashAggregate  (cost=32.50..34.50 rows=200 width=4) (actual time=0.148..0.181 rows=200 loops=1)  
              Output: ((trunc((random() * '100000'::double precision)))::integer)  
              Group Key: (trunc((random() * '100000'::double precision)))::integer  
              ->  Function Scan on pg_catalog.generate_series t  (cost=0.00..20.00 rows=1000 width=0) (actual time=0.035..0.078 rows=200 loops=1)  
                    Output: (trunc((random() * '100000'::double precision)))::integer  
                    Function Call: generate_series(1, 200)  
CONTEXT:  SQL statement "SELECT * from t_in_test where id in (select trunc(random()*100000)::int as id from generate_series(1,200) t(id))"  
PL/pgSQL function inline_code_block line 4 at PERFORM

测试SQL写法5：

-- select * from table where id in ( values query );  

do language plpgsql $$  
declare  
  v_values text := 'values ( ';  
  v int;  
begin  
  for v in select trunc(random()*100000)::int from generate_series(1,200) t(id)  
  loop  
    v_values := v_values ||v||'),(';  
  end loop;  
  v_values := rtrim( v_values,',(' );  
  execute 'select * from t_in_test where id in ( select * from ('||v_values||') as t(id))';  
end;  
$$;

优化器选择1 (nestloop join)：

LOG:  duration: 1.272 ms  plan:  
Query Text: select * from t_in_test where id in ( select * from (values ( 96474),(39030),(12481),(60519),...省略部分...,(23783),(9253)) as t(id))  
Nested Loop  (cost=3.44..1697.50 rows=5000040 width=37) (actual time=0.130..1.195 rows=200 loops=1)  
  Output: t_in_test.id, t_in_test.info  
  Buffers: shared hit=802  
  ->  HashAggregate  (cost=3.00..5.00 rows=200 width=4) (actual time=0.105..0.143 rows=200 loops=1)  
        Output: "*VALUES*".column1  
        Group Key: "*VALUES*".column1  
        ->  Values Scan on "*VALUES*"  (cost=0.00..2.50 rows=200 width=4) (actual time=0.001..0.040 rows=200 loops=1)  
              Output: "*VALUES*".column1  
  ->  Index Scan using t_in_test_pkey on public.t_in_test  (cost=0.43..8.45 rows=1 width=37) (actual time=0.004..0.005 rows=1 loops=200)  
        Output: t_in_test.id, t_in_test.info  
        Index Cond: (t_in_test.id = "*VALUES*".column1)  
        Buffers: shared hit=802  
CONTEXT:  SQL statement "select * from t_in_test where id in ( select * from (values ( 96474),(39030),(12481),(60519),(70354),(33117),...省略部分...,(15818),(23783),(9253)) as t(id))"  
PL/pgSQL function inline_code_block line 11 at EXECUTE

优化器选择2 (hash join)：

LOG:  duration: 2444.648 ms  plan:  
Query Text: select * from t_in_test where id in ( select * from (values ( 95286),(76612),(56400),(99838),(2155),...省略部分...,(29527),(99252)) as t(id))  
Hash Join  (cost=7.50..220844.60 rows=5000040 width=37) (actual time=0.222..2444.573 rows=200 loops=1)  
  Output: t_in_test.id, t_in_test.info  
  Hash Cond: (t_in_test.id = "*VALUES*".column1)  
  Buffers: shared hit=83334  
  ->  Seq Scan on public.t_in_test  (cost=0.00..183334.80 rows=10000080 width=37) (actual time=0.009..1174.724 rows=10000000 loops=1)  
        Output: t_in_test.id, t_in_test.info  
        Buffers: shared hit=83334  
  ->  Hash  (cost=5.00..5.00 rows=200 width=4) (actual time=0.173..0.173 rows=200 loops=1)  
        Output: "*VALUES*".column1  
        Buckets: 1024  Batches: 1  Memory Usage: 16kB  
        ->  HashAggregate  (cost=3.00..5.00 rows=200 width=4) (actual time=0.101..0.135 rows=200 loops=1)  
              Output: "*VALUES*".column1  
              Group Key: "*VALUES*".column1  
              ->  Values Scan on "*VALUES*"  (cost=0.00..2.50 rows=200 width=4) (actual time=0.001..0.042 rows=200 loops=1)  
                    Output: "*VALUES*".column1  
CONTEXT:  SQL statement "select * from t_in_test where id in ( select * from (values ( 95286),(76612),(56400),...省略部分...,(29527),(99252)) as t(id))"  
PL/pgSQL function inline_code_block line 11 at EXECUTE

优化器选择3 (merge join)：

LOG:  duration: 32.296 ms  plan:  
Query Text: select * from t_in_test where id in ( select * from (values ( 18704),(70725),(55056),...省略部分...,(80068),(28737)) as t(id))  
Merge Semi Join  (cost=10.58..368035.98 rows=5000040 width=37) (actual time=0.560..32.212 rows=200 loops=1)  
  Output: t_in_test.id, t_in_test.info  
  Merge Cond: (t_in_test.id = "*VALUES*".column1)  
  Buffers: shared hit=1110  
  ->  Index Scan using t_in_test_pkey on public.t_in_test  (cost=0.43..343022.64 rows=10000080 width=37) (actual time=0.023..20.733 rows=99962 loops=1)  
        Output: t_in_test.id, t_in_test.info  
        Buffers: shared hit=1110  
  ->  Sort  (cost=10.14..10.64 rows=200 width=4) (actual time=0.105..0.134 rows=200 loops=1)  
        Output: "*VALUES*".column1  
        Sort Key: "*VALUES*".column1  
        Sort Method: quicksort  Memory: 34kB  
        ->  Values Scan on "*VALUES*"  (cost=0.00..2.50 rows=200 width=4) (actual time=0.002..0.035 rows=200 loops=1)  
              Output: "*VALUES*".column1  
CONTEXT:  SQL statement "select * from t_in_test where id in ( select * from (values ( 18704),(70725),(55056),...省略部分...,(28737)) as t(id))"  
PL/pgSQL function inline_code_block line 11 at EXECUTE

测试SQL写法6：

-- select * from table t1 join (query or values query) t2 on t1.id=t2.id;  

do language plpgsql $$  
declare  
begin  
  perform * from t_in_test t1 join (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t2 on (t1.id=t2.id);  
end;  
$$;

优化器选择1 (nestloop join)：

LOG:  duration: 1.327 ms  plan:  
Query Text: SELECT * from t_in_test t1 join (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t2 on (t1.id=t2.id)  
Nested Loop  (cost=0.44..8404.50 rows=1000 width=41) (actual time=0.062..1.241 rows=200 loops=1)  
  Output: t1.id, t1.info, ((trunc((random() * '100000'::double precision)))::integer)  
  Buffers: shared hit=802  
  ->  Function Scan on pg_catalog.generate_series t  (cost=0.00..20.00 rows=1000 width=0) (actual time=0.034..0.131 rows=200 loops=1)  
        Output: (trunc((random() * '100000'::double precision)))::integer  
        Function Call: generate_series(1, 200)  
  ->  Index Scan using t_in_test_pkey on public.t_in_test t1  (cost=0.43..8.36 rows=1 width=37) (actual time=0.005..0.005 rows=1 loops=200)  
        Output: t1.id, t1.info  
        Index Cond: (t1.id = ((trunc((random() * '100000'::double precision)))::integer))  
        Buffers: shared hit=802  
CONTEXT:  SQL statement "SELECT * from t_in_test t1 join (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t2 on (t1.id=t2.id)"  
PL/pgSQL function inline_code_block line 4 at PERFORM

优化器选择2 (hash join)：

LOG:  duration: 4883.088 ms  plan:  
Query Text: SELECT * from t_in_test t1 join (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t2 on (t1.id=t2.id)  
Hash Join  (cost=308335.80..308390.80 rows=1000 width=41) (actual time=4882.749..4883.023 rows=200 loops=1)  
  Output: t1.id, t1.info, ((trunc((random() * '100000'::double precision)))::integer)  
  Hash Cond: (((trunc((random() * '100000'::double precision)))::integer) = t1.id)  
  Buffers: shared hit=83334  
  ->  Function Scan on pg_catalog.generate_series t  (cost=0.00..20.00 rows=1000 width=0) (actual time=0.033..0.125 rows=200 loops=1)  
        Output: (trunc((random() * '100000'::double precision)))::integer  
        Function Call: generate_series(1, 200)  
  ->  Hash  (cost=183334.80..183334.80 rows=10000080 width=37) (actual time=4767.895..4767.895 rows=10000000 loops=1)  
        Output: t1.id, t1.info  
        Buckets: 16777216  Batches: 1  Memory Usage: 804901kB  
        Buffers: shared hit=83334  
        ->  Seq Scan on public.t_in_test t1  (cost=0.00..183334.80 rows=10000080 width=37) (actual time=0.014..1325.338 rows=10000000 loops=1)  
              Output: t1.id, t1.info  
              Buffers: shared hit=83334  
CONTEXT:  SQL statement "SELECT * from t_in_test t1 join (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t2 on (t1.id=t2.id)"  
PL/pgSQL function inline_code_block line 4 at PERFORM

优化器选择3 (merge join)：

LOG:  duration: 32.505 ms  plan:  
Query Text: SELECT * from t_in_test t1 join (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t2 on (t1.id=t2.id)  
Merge Join  (cost=80.27..368117.67 rows=1000 width=41) (actual time=0.182..32.429 rows=200 loops=1)  
  Output: t1.id, t1.info, ((trunc((random() * '100000'::double precision)))::integer)  
  Merge Cond: (t1.id = ((trunc((random() * '100000'::double precision)))::integer))  
  Buffers: shared hit=1102  
  ->  Index Scan using t_in_test_pkey on public.t_in_test t1  (cost=0.43..343022.64 rows=10000080 width=37) (actual time=0.022..20.782 rows=99360 loops=1)  
        Output: t1.id, t1.info  
        Buffers: shared hit=1102  
  ->  Sort  (cost=79.83..82.33 rows=1000 width=4) (actual time=0.154..0.180 rows=200 loops=1)  
        Output: ((trunc((random() * '100000'::double precision)))::integer)  
        Sort Key: ((trunc((random() * '100000'::double precision)))::integer)  
        Sort Method: quicksort  Memory: 34kB  
        ->  Function Scan on pg_catalog.generate_series t  (cost=0.00..20.00 rows=1000 width=0) (actual time=0.036..0.099 rows=200 loops=1)  
              Output: (trunc((random() * '100000'::double precision)))::integer  
              Function Call: generate_series(1, 200)  
CONTEXT:  SQL statement "SELECT * from t_in_test t1 join (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t2 on (t1.id=t2.id)"  
PL/pgSQL function inline_code_block line 4 at PERFORM

优化选择建议如下，可以根据需求来写SQL，最不建议写的是OR：
index scan：
离散扫描，适合小的扫描集。
bitmap scan：
比index scan多了Recheck的开销，以及按照ctid排序的开销。
适合大的扫描集，排序的目的是减少离散扫描，还可以用到块设备的prefetch。
seq scan：
适合非常庞大的扫描集。
bitmapindex scan + bitmapor|And：
使用or的写法，只能选择bitmap index scan，所以不如使用IN的写法。
nestloop join：
适合小的扫描集，并且其中有一个表的JOIN列是带有主键或唯一约束的。
hash join：
适合大的扫描集，同时两个表的JOIN列上面都没有索引的情况。
merge join：
适合大的扫描集，并且两个表的JOIN列都有索引。

如果你发现写好的SQL没有选择最优的执行计划，可以通过设置优化器开关，或者使用hint plan这个插件来指定优化器使用对应的scan或join method.

http://pghintplan.osdn.jp/pg_hint_plan.html

git clone git://git.osdn.jp/gitroot/pghintplan/pg_hint_plan.git  
mv pg_hint_plan postgresql-9.5.0/contrib/  
cd postgresql-9.5.0/contrib/pg_hint_plan  
export PATH=/home/digoal/pgsql9.5.0/bin:$PATH  
make  
make install  

psql  
postgres=# create extension pg_hint_plan;  
CREATE EXTENSION  

postgres=# LOAD 'pg_hint_plan';  

postgres=# /*+ NestLoop(t1 t2) */ explain select * from t_in_test t1 join (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t2 on (t1.id=t2.id);  
LOG:  duration: 0.000 ms  plan:  
Query Text: /*+ NestLoop(t1 t2) */ explain select * from t_in_test t1 join (select trunc(random()*100000)::int as id from generate_series(1,200) t(id)) t2 on (t1.id=t2.id);  
Nested Loop  (cost=0.44..8404.50 rows=1000 width=41)  
  Output: t1.id, t1.info, ((trunc((random() * '100000'::double precision)))::integer)  
  ->  Function Scan on pg_catalog.generate_series t  (cost=0.00..20.00 rows=1000 width=0)  
        Output: (trunc((random() * '100000'::double precision)))::integer  
        Function Call: generate_series(1, 200)  
  ->  Index Scan using t_in_test_pkey on public.t_in_test t1  (cost=0.43..8.36 rows=1 width=37)  
        Output: t1.id, t1.info  
        Index Cond: (t1.id = ((trunc((random() * '100000'::double precision)))::integer))  
                                        QUERY PLAN                                          
------------------------------------------------------------------------------------------  
 Nested Loop  (cost=0.44..8404.50 rows=1000 width=41)  
   ->  Function Scan on generate_series t  (cost=0.00..20.00 rows=1000 width=0)  
   ->  Index Scan using t_in_test_pkey on t_in_test t1  (cost=0.43..8.36 rows=1 width=37)  
         Index Cond: (id = ((trunc((random() * '100000'::double precision)))::integer))  
(4 rows)

↧

PostgreSQL promote过程和一主多备时间线对接详解

March 11, 2016, 4:47 am

≫ Next: Kipmi0 占用100% CPU1核

≪ Previous: PostgreSQL in 语法的优化器处理以及如何优化

PostgreSQL的physical standby数据库的promote过程，数据库会在pg_xlog目录产生3个文件。
例如将备库1 promote，它将在pg_xlog目录产生如下文件：

A.partial     (xlog) 
NEWTL_A  (xlog)
NEWTL.history  (history file)

例如备库1当前已接收到的XLOG位置是 00000001000000000000002D 文件中的某个位置 0/2D15D7D0，现在promote它。
将会在pg_xlog目录中产生3个文件：

00000001000000000000002D.partial
00000002000000000000002D  
        (00000001000000000000002D.partial 的内容会拷贝到 00000002000000000000002D)
00000002.history
         1       0/2D15D7D0      no recovery target specified

假设还有一个备库叫备库2，备库2如何能顺利的对接到已激活的备库1呢？
有个前提条件
1. 备库2在TL1这条时间线上，还没有接收到00000001000000000000002D 这个文件。
把00000002.history拷贝到备库2的pg_xlog。
备库2会在应用完00000001000000000000002C后请求下一个时间线的 00000002000000000000002D 文件。
这样就能完美对接。

↧

Kipmi0 占用100% CPU1核

March 11, 2016, 7:20 pm

≫ Next: PostgreSQL MySQL 兼容性之 - 数字类型

≪ Previous: PostgreSQL promote过程和一主多备时间线对接详解

Kipmi0 占用100%单核，NICE 19，一般没什么影响。

但是也可以临时降低

echo 100 > /sys/module/ipmi_si/parameters/kipmid_max_busy_us

降到10%

重启生效的配置，修改模块参数。

Create a file in /etc/modprobe.d/, i.e./etc/modprobe.d/ipmi.conf, and add the following content:

# Prevent kipmi0 from consuming 100% CPU

options ipmi_si kipmid_max_busy_us=100

modinfo ipmi_si

parm: bt_debug:debug bitmask, 1=enable, 2=messages, 4=states (int)

parm: smic_debug:debug bitmask, 1=enable, 2=messages, 4=states (int)

parm: kcs_debug:debug bitmask, 1=enable, 2=messages, 4=states (int)

parm: hotmod:Add and remove interfaces. See Documentation/IPMI.txt in the kernel sources for the gory details.

parm: trydefaults:Setting this to 'false' will disable the default scan of the KCS and SMIC interface at the standard address (bool)

parm: type:Defines the type of each interface, each interface separated by commas. The types are 'kcs', 'smic', and 'bt'. For example si_type=kcs,bt will set the first interface to kcs and the second to bt (string)

parm: addrs:Sets the memory address of each interface, the addresses separated by commas. Only use if an interface is in memory. Otherwise, set it to zero or leave it blank. (array of ulong)

parm: ports:Sets the port address of each interface, the addresses separated by commas. Only use if an interface is a port. Otherwise, set it to zero or leave it blank. (array of uint)

parm: irqs:Sets the interrupt of each interface, the addresses separated by commas. Only use if an interface has an interrupt. Otherwise, set it to zero or leave it blank. (array of int)

parm: regspacings:The number of bytes between the start address and each successive register used by the interface. For instance, if the start address is 0xca2 and the spacing is 2, then the second address is at 0xca4. Defaults to 1. (array of int)

parm: regsizes:The size of the specific IPMI register in bytes. This should generally be 1, 2, 4, or 8 for an 8-bit, 16-bit, 32-bit, or 64-bit register. Use this if you the 8-bit IPMI register has to be read from a larger register. (array of int)

parm: regshifts:The amount to shift the data read from the. IPMI register, in bits. For instance, if the data is read from a 32-bit word and the IPMI data is in bit 8-15, then the shift would be 8 (array of int)

parm: slave_addrs:Set the default IPMB slave address for the controller. Normally this is 0x20, but can be overridden by this parm. This is an array indexed by interface number. (array of int)

parm: force_kipmid:Force the kipmi daemon to be enabled (1) or disabled(0). Normally the IPMI driver auto-detects this, but the value may be overridden by this parm. (array of int)

parm: unload_when_empty:Unload the module if no interfaces are specified or found, default is 1. Setting to 0 is useful for hot add of devices using hotmod. (int)

parm: kipmid_max_busy_us:Max time (in microseconds) to busy-wait for IPMI data before sleeping. 0 (default) means to wait forever. Set to 100-500 if kipmid is using up a lot of CPU time. (array of uint)

↧

PostgreSQL MySQL 兼容性之 - 数字类型

March 14, 2016, 7:11 am

≫ Next: PostgreSQL MySQL 兼容性之 - 字符串类型

≪ Previous: Kipmi0 占用100% CPU1核

TINYINT

MySQL

  TINYINT[(M)] [UNSIGNED] [ZEROFILL]
  A very small integer. The signed range is -128 to 127. The unsigned range is 0 to 255.

PostgreSQL

TINYINT 对应 PostgreSQL
  postgres=# create domain tinyint as smallint constraint ck check (value between -127 and 128);
  CREATE DOMAIN
TINYINT [UNSIGNED] 对应 PostgreSQL
  postgres=# create domain utinyint as smallint constraint ck check (value between 0 and 255);
  CREATE DOMAIN

boolean

MySQL

  boolean

PostgreSQL

  boolean

SMALLINT

MySQL

  SMALLINT[(M)] [UNSIGNED] [ZEROFILL]
  A small integer. The signed range is -32768 to 32767. The unsigned range is 0 to 65535.

PostgreSQL

SMALLINT[(M)] 对应 PostgreSQL
  smallint
SMALLINT[(M)] [UNSIGNED] 对应 PostgreSQL
  postgres=# create domain usmallint as int constraint ck check (value between 0 and 65535);
  CREATE DOMAIN

MEDIUMINT

MySQL

  MEDIUMINT[(M)] [UNSIGNED] [ZEROFILL]
  The signed range is -8388608 to 8388607. The unsigned range is 0 to 16777215.

PostgreSQL

MEDIUMINT[(M)] 对应 PostgreSQL
  postgres=# create domain MEDIUMINT as int constraint ck check (value between -8388608 and 8388607);
  CREATE DOMAIN
MEDIUMINT[(M)] [UNSIGNED] 对应 PostgreSQL
  postgres=# create domain UMEDIUMINT as int constraint ck check (value between 0 and 16777215);
  CREATE DOMAIN

INT

MySQL

  INT[(M)] [UNSIGNED]
  INTEGER[(M)] [UNSIGNED] [ZEROFILL]
  When marked UNSIGNED, it ranges from 0 to 4294967295, otherwise its range is -2147483648 to 2147483647 (SIGNED is the default).

PostgreSQL

INT[(M)]  INTEGER[(M)]  对应 PostgreSQL 
  INT
INT[(M)] [UNSIGNED] 对应 PostgreSQL 
INTEGER[(M)] [UNSIGNED] 对应 PostgreSQL 
  postgres=# create domain UINT as int8 constraint ck check (value between 0 and 4294967295);
  CREATE DOMAIN

BIGINT

MySQL

  BIGINT[(M)] [UNSIGNED] [ZEROFILL]
  The signed range is -9223372036854775808 to 9223372036854775807. The unsigned range is 0 to 18446744073709551615.

PostgreSQL

BIGINT[(M)]  对应 PostgreSQL 
  BIGINT
BIGINT[(M)] [UNSIGNED] 对应 PostgreSQL 
  postgres=# create domain UBIGINT as numeric(20,0) constraint ck check (value between 0 and 18446744073709551615);
  CREATE DOMAIN

decimal, dec, numeric, fixed

MySQL

  DECIMAL[(M[,D])] [UNSIGNED] [ZEROFILL]
  DEC[(M[,D])] [UNSIGNED] [ZEROFILL]
  NUMERIC[(M[,D])] [UNSIGNED] [ZEROFILL]
  FIXED[(M[,D])] [UNSIGNED] [ZEROFILL]

PostgreSQL

DECIMAL[(M[,D])]  对应 PostgreSQL 
  decimal[(M[,D])]
DECIMAL[(M[,D])] [UNSIGNED] 对应 PostgreSQL 
  postgres=# create domain udecimal as numeric constraint ck check (value >=0);
  CREATE DOMAIN
  # 不能改domain的scale,precise.

FLOAT

MySQL

  FLOAT[(M,D)] [UNSIGNED] [ZEROFILL]

PostgreSQL

FLOAT[(M,D)]  对应 PostgreSQL 
  float4
FLOAT[(M,D)] [UNSIGNED] 对应 PostgreSQL 
  postgres=# create domain ufloat4 as float4 constraint ck check (value >=0);
  CREATE DOMAIN
  # 不能改domain的scale,precise.

DOUBLE

MySQL

  DOUBLE[(M,D)] [UNSIGNED] [ZEROFILL]
  DOUBLE PRECISION[(M,D)] [UNSIGNED] [ZEROFILL]
  REAL[(M,D)] [UNSIGNED] [ZEROFILL]

PostgreSQL

DOUBLE[(M,D)] 
DOUBLE PRECISION[(M,D)] 
REAL[(M,D)]   对应 PostgreSQL 
  float8
DOUBLE[(M,D)] [UNSIGNED]
DOUBLE PRECISION[(M,D)] [UNSIGNED] 
REAL[(M,D)] [UNSIGNED]     对应 PostgreSQL 
  postgres=# create domain ufloat8 as float8 constraint ck check (value >=0);
  CREATE DOMAIN
  # 不能改domain的scale,precise.

bit

MySQL

  BIT[(M)]
  A bit-field type. M indicates the number of bits per value, from 1 to 64. The default is 1 if M is omitted.

PostgreSQL

  BIT[(M)]

↧

PostgreSQL MySQL 兼容性之 - 字符串类型

March 14, 2016, 7:12 am

≫ Next: PostgreSQL MySQL 兼容性之 - 时间类型

≪ Previous: PostgreSQL MySQL 兼容性之 - 数字类型

char

MySQL

  [NATIONAL] CHAR[(M)] [CHARACTER SET charset_name] [COLLATE collation_name]
  A fixed-length string that is always right-padded with spaces to the specified length when stored. M represents the column length in characters. The range of M is 0 to 255. If M is omitted, the length is 1.

PostgreSQL

[NATIONAL] CHAR[(M)] [COLLATE collation_name] 对应 PostgreSQL
  char[(M)] [COLLATE collation_name]

varchar

MySQL

  [NATIONAL] VARCHAR(M) [CHARACTER SET charset_name] [COLLATE collation_name]
  A variable-length string. M represents the maximum column length in characters. The range of M is 0 to 65,535.

PostgreSQL

[NATIONAL] VARCHAR(M) [COLLATE collation_name] 对应 PostgreSQL
  VARCHAR(M) [COLLATE collation_name]

BINARY CHAR BYTE

MySQL

  BINARY(M) , CHAR BYTE
  The BINARY type is similar to the CHAR type, but stores binary byte strings rather than non-binary character strings. M represents the column length in bytes.

PostgreSQL

  char([M])

VARBINARY(M)

MySQL

  VARBINARY(M)
  The VARBINARY type is similar to the VARCHAR type, but stores binary byte strings rather than non-binary character strings. M represents the maximum column length in bytes.
  It contains no character set, and comparison and sorting are based on the numeric value of the bytes.

PostgreSQL

  varchar[(M)]

BLOB

MySQL

  TINYBLOB  A BLOB column with a maximum length of 255 bytes. Each TINYBLOB value is stored using a one-byte length prefix that indicates the number of bytes in the value.
  MEDIUMBLOB  A BLOB column with a maximum length of 16,777,215 bytes. Each MEDIUMBLOB value is stored using a three-byte length prefix that indicates the number of bytes in the value.
  LONGBLOB A BLOB column with a maximum length of 4,294,967,295 bytes or 4GB. The effective maximum length of LONGBLOB columns depends on the configured maximum packet size in the client/server protocol and available memory. Each LONGBLOB value is stored using a four-byte length prefix that indicates the number of bytes in the value.
  BLOB[(M)]  A BLOB column with a maximum length of 65,535 bytes. Each BLOB value is stored using a two-byte length prefix that indicates the number of bytes in the value.

PostgreSQL

bytea , upto 1GB (support compression, pglz)
large object , upto 4TB (support compression)

TEXT

MySQL

  TINYTEXT [CHARACTER SET charset_name] [COLLATE collation_name]
    A TEXT column with a maximum length of 255 characters. The effective maximum length is less if the value contains multi-byte characters. Each TINYTEXT value is stored using a one-byte length prefix that indicates the number of bytes in the value.
  TEXT[(M)] [CHARACTER SET charset_name] [COLLATE collation_name]
    A TEXT column with a maximum length of 65,535 characters. The effective maximum length is less if the value contains multi-byte characters.
  MEDIUMTEXT [CHARACTER SET charset_name] [COLLATE collation_name]
    A TEXT column with a maximum length of 16,777,215 characters. The effective maximum length is less if the value contains multi-byte characters.
  LONGTEXT [CHARACTER SET charset_name] [COLLATE collation_name]
    A TEXT column with a maximum length of 4,294,967,295 or 4GB characters. The effective maximum length is less if the value contains multi-byte characters.

PostgreSQL

  不支持设置字段的CHARACTER SET, CHARACTER SET是库级别的属性.
  text
    upto 1G
  OR
  varchar[(M)]
    upto 1G

enum

MySQL

  ENUM('value1','value2',...) [CHARACTER SET charset_name] [COLLATE collation_name]
    An enumeration. A string object that can have only one value, chosen from the list of values 'value1', 'value2', ..., NULL or the special '' error value. 
    In theory, an ENUM column can have a maximum of 65,535 distinct values;
    in practice, the real maximum depends on many factors. ENUM values are represented internally as integers.

PostgreSQL

  不支持设置字段的CHARACTER SET, CHARACTER SET是库级别的属性.
  enum

SET

MySQL

  SET('value1','value2',...) [CHARACTER SET charset_name] [COLLATE collation_name]
    A set. A string object that can have zero or more values, each of which must be chosen from the list of values 'value1', 'value2', ... 
    A SET column can have a maximum of 64 members. 
    SET values are represented internally as integers.

PostgreSQL

  enum

↧

PostgreSQL MySQL 兼容性之 - 时间类型

March 17, 2016, 9:00 am

≫ Next: PostgreSQL 构造用法举例

≪ Previous: PostgreSQL MySQL 兼容性之 - 字符串类型

DATE

MySQL

  DATE
  A date. The supported range is '1000-01-01' to '9999-12-31'.
  '0000-00-00' is a permitted special value (zero-date), unless the NO_ZERO_DATE SQL_MODE is used.
  Also, individual components of a date can be set to 0 (for example: '2015-00-12'), unless the NO_ZERO_IN_DATE SQL_MODE is used.

PostgreSQL

  DATE
  但是PG不支持'0000-00-00', 通过改源码 ValidateDate, 自动将'0000-00-00'转存为'0001-01-01 BC'

TIME

MySQL

  TIME [(<microsecond precision>)]
  A time. The range is '-838:59:59.999999' to '838:59:59.999999'. Microsecond precision can be from 0-6; if not specified 0 is used.

PostgreSQL

  TIME [(<microsecond precision>)]

DATETIME

MySQL

  DATETIME [(microsecond precision)]
  A date and time combination. The supported range is '1000-01-01 00:00:00.000000' to '9999-12-31 23:59:59.999999'. 
  MariaDB displays DATETIME values in 'YYYY-MM-DD HH:MM:SS' format, but allows assignment of values to DATETIME columns using either strings or numbers.
  '0000-00-00' is a permitted special value (zero-date), unless the NO_ZERO_DATE SQL_MODE is used. 
  Also, individual components of a date can be set to 0 (for example: '2015-00-12'), unless the NO_ZERO_IN_DATE SQL_MODE is used. 
  In many cases, the result of en expression involving a zero-date, or a date with zero-parts, is NULL. 
  If the ALLOW_INVALID_DATES SQL_MODE is enabled, if the day part is in the range between 1 and 31, the date does not produce any error, even for months that have less than 31 days.

PostgreSQL
timestamp [(microsecond precision)]
timestamptz [(microsecond precision)]

  char([M])

TIMESTAMP

MySQL

  TIMESTAMP [(<microsecond precision)]
  A timestamp in the format YYYY-MM-DD HH:MM:DD.
  The timestamp field is generally used to define at which moment in time a row was added or updated and by default will automatically be assigned the current datetime when a record is inserted or updated.

PostgreSQL

  timestamp [(microsecond precision)]
  timestamptz [(microsecond precision)]

YEAR

MySQL

  YEAR[(4)]
  A year in two-digit or four-digit format. The default is four-digit format. Note that the two-digit format has been deprecated since 5.5.27.
  In four-digit format, the allowable values are 1901 to 2155, and 0000.

PostgreSQL

  postgres=# create domain year as int2 constraint ck check (value between 1901 and 2155);
  CREATE DOMAIN

↧

PostgreSQL 构造用法举例

March 17, 2016, 9:01 am

≫ Next: PostgreSQL MySQL 兼容性之 - Gis类型

≪ Previous: PostgreSQL MySQL 兼容性之 - 时间类型

构造线段，点，线，范围

postgres=# select proname,pg_get_function_arguments(oid) parameter,pg_get_function_result(oid) result,prosrc from pg_proc where prosrc ~ 'construct';
  proname  |                           parameter                            |  result   |       prosrc       
-----------+----------------------------------------------------------------+-----------+--------------------
 lseg      | point, point                                                   | lseg      | lseg_construct
 point     | double precision, double precision                             | point     | construct_point
 line      | point, point                                                   | line      | line_construct_pp
 int4range | integer, integer                                               | int4range | range_constructor2
 int4range | integer, integer, text                                         | int4range | range_constructor3
 numrange  | numeric, numeric                                               | numrange  | range_constructor2
 numrange  | numeric, numeric, text                                         | numrange  | range_constructor3
 tsrange   | timestamp without time zone, timestamp without time zone       | tsrange   | range_constructor2
 tsrange   | timestamp without time zone, timestamp without time zone, text | tsrange   | range_constructor3
 tstzrange | timestamp with time zone, timestamp with time zone             | tstzrange | range_constructor2
 tstzrange | timestamp with time zone, timestamp with time zone, text       | tstzrange | range_constructor3
 daterange | date, date                                                     | daterange | range_constructor2
 daterange | date, date, text                                               | daterange | range_constructor3
 int8range | bigint, bigint                                                 | int8range | range_constructor2
 int8range | bigint, bigint, text                                           | int8range | range_constructor3
(15 rows)

构造数组

postgres=# select array[1,2,3];
  array  
---------
 {1,2,3}
(1 row)

构造record

postgres=# select row(1,2,'ab');
   row    
----------
 (1,2,ab)
(1 row)

构造表

postgres=# select * from ( values (1,2,'2014-01-01'),(1,2,'2014-01-01'),(1,2,'2014-01-01') ) as t(c1,c2,c3);
 c1 | c2 |     c3     
----+----+------------
  1 |  2 | 2014-01-01
  1 |  2 | 2014-01-01
  1 |  2 | 2014-01-01
(3 rows)

↧

PostgreSQL MySQL 兼容性之 - Gis类型

March 17, 2016, 9:01 am

≫ Next: PostgreSQL MySQL 兼容性之 - 自增值

≪ Previous: PostgreSQL 构造用法举例

PostGIS的GIS功能相比MySQL强大太多，本文仅仅列举了MySQL支持的部分。
欲了解PostGIS请参考:
http://postgis.net/docs/manual-2.2/reference.html
PostGIS有几百个操作函数, 对GIS支持强大。

POINT

MySQL

  POINT
    PointFromText('POINT(10 10)')
    PointFromWKB(AsWKB(PointFromText('POINT(10 20)'))

PostgreSQL

  # PostgreSQL
  point
    point( x , y )
  # PostGIS
  point

LINESTRING

MySQL

  LINESTRING

  CREATE TABLE gis_line  (g LINESTRING);
  SHOW FIELDS FROM gis_line;
  INSERT INTO gis_line VALUES
    (LineFromText('LINESTRING(0 0,0 10,10 0)')),
    (LineStringFromText('LINESTRING(10 10,20 10,20 20,10 20,10 10)')),
    (LineStringFromWKB(AsWKB(LineString(Point(10, 10), Point(40, 10)))));

  GLENGTH
    Length of a LineString value
  ST_ENDPOINT
    Returns the endpoint of a LineString
  ST_NUMPOINTS
    Returns the number of Point objects in a LineString
  ST_POINTN
    Returns the N-th Point in the LineString
  ST_STARTPOINT
    Returns the start point of a LineString

PostgreSQL

  # PostGIS
  LINESTRING

  ST_Length — Returns the 2D length of the geometry if it is a LineString or MultiLineString. geometry are in units of spatial reference and geography are in meters (default spheroid)
  ST_Length2D — Returns the 2-dimensional length of the geometry if it is a linestring or multi-linestring. This is an alias for ST_Length
  ST_3DLength — Returns the 3-dimensional or 2-dimensional length of the geometry if it is a linestring or multi-linestring.
  ST_LengthSpheroid — Calculates the 2D or 3D length of a linestring/multilinestring on an ellipsoid. This is useful if the coordinates of the geometry are in longitude/latitude and a length is desired without reprojection.
  ST_Length2D_Spheroid — Calculates the 2D length of a linestring/multilinestring on an ellipsoid. This is useful if the coordinates of the geometry are in longitude/latitude and a length is desired without reprojection.

  ST_EndPoint — Returns the last point of a LINESTRING or CIRCULARLINESTRING geometry as a POINT.

  ST_NumPoints — Return the number of points in an ST_LineString or ST_CircularString value.

  ST_PointN — Return the Nth point in the first linestring or circular linestring in the geometry. Return NULL if there is no linestring in the geometry.

  ST_StartPoint — Returns the first point of a LINESTRING geometry as a POINT.

  http://postgis.net/docs/manual-2.2/reference.html
  PostGIS有几百个操作函数, 对GIS支持强大。

POLYGON

MySQL

  Polygon properties

  ST_AREA
    Area of a Polygon
  ST_ExteriorRing
    Returns the exterior ring of a Polygon as a LineString
  ST_InteriorRingN
    Returns the N-th interior ring for a Polygon
  ST_NUMINTERIORRINGS
    Number of interior rings in a Polygon

PostgreSQL

  # PostGIS

  ST_Area — Returns the area of the surface if it is a Polygon or MultiPolygon. For geometry, a 2D Cartesian area is determined with units specified by the SRID. For geography, area is determined on a curved surface with units in square meters.
  ST_ExteriorRing — Returns a line string representing the exterior ring of the POLYGON geometry. Return NULL if the geometry is not a polygon. Will not work with MULTIPOLYGON
  ST_InteriorRingN — Return the Nth interior linestring ring of the polygon geometry. Return NULL if the geometry is not a polygon or the given N is out of range.
  ST_NumInteriorRings — Return the number of interior rings of the a polygon in the geometry. This will work with POLYGON and return NULL for a MULTIPOLYGON type or any other type

  ST_GeometryN — Return the 1-based Nth geometry if the geometry is a GEOMETRYCOLLECTION, (MULTI)POINT, (MULTI)LINESTRING, MULTICURVE or (MULTI)POLYGON, POLYHEDRALSURFACE Otherwise, return NULL.
  ST_IsEmpty — Returns true if this Geometry is an empty geometrycollection, polygon, point etc.
  ST_NRings — If the geometry is a polygon or multi-polygon returns the number of rings.
  ST_ForceRHR — Forces the orientation of the vertices in a polygon to follow the Right-Hand-Rule.
  ST_3DIntersects — Returns TRUE if the Geometries "spatially intersect" in 3d - only for points, linestrings, polygons, polyhedral surface (area). With SFCGAL backend enabled also supports TINS
  ST_Perimeter — Return the length measurement of the boundary of an ST_Surface or ST_MultiSurface geometry or geography. (Polygon, MultiPolygon). geometry measurement is in units of spatial reference and geography is in meters.
  ST_Perimeter2D — Returns the 2-dimensional perimeter of the geometry, if it is a polygon or multi-polygon. This is currently an alias for ST_Perimeter.
  ST_3DPerimeter — Returns the 3-dimensional perimeter of the geometry, if it is a polygon or multi-polygon.
  ST_CurveToLine — Converts a CIRCULARSTRING/CURVEPOLYGON to a LINESTRING/POLYGON
  ST_DumpRings — Returns a set of geometry_dump rows, representing the exterior and interior rings of a polygon.
  ST_LineToCurve — Converts a LINESTRING/POLYGON to a CIRCULARSTRING, CURVEPOLYGON
  ST_MinimumBoundingCircle — Returns the smallest circle polygon that can fully contain a geometry. Default uses 48 segments per quarter circle.
  ST_Polygonize — Aggregate. Creates a GeometryCollection containing possible polygons formed from the constituent linework of a set of geometries.
  ST_SimplifyPreserveTopology — Returns a "simplified" version of the given geometry using the Douglas-Peucker algorithm. Will avoid creating derived geometries (polygons in particular) that are invalid.
  ST_LocateAlong — Return a derived geometry collection value with elements that match the specified measure. Polygonal elements are not supported.
  ST_LocateBetween — Return a derived geometry collection value with elements that match the specified range of measures inclusively. Polygonal elements are not supported.

  http://postgis.net/docs/manual-2.2/reference.html
  PostGIS有几百个操作函数, 对GIS支持强大。

MultiPoint

MySQL

  MultiPoint(pt1,pt2,...)

PostgreSQL

  # PostGIS
  MultiPoint

MultiPolygon

MySQL

  MultiPolygon(poly1,poly2,...)

PostgreSQL

  # PostGIS
  MultiPolygon

ST_BUFFER

MySQL

  ST_BUFFER(g1,r),  BUFFER(g1,r)
  Returns a geometry that represents all points whose distance from geometry g1 is less than or equal to distance, or radius

PostgreSQL

  # PostGIS
  ST_Buffer — (T) Returns a geometry covering all points within a given distance from the input geometry.

ST_ConvexHull

MySQL

  ST_ConvexHull (g), ConvexHull(g)
  Given a geometry, returns a geometry that is the minimum convex geometry enclosing all geometries within the set. Returns NULL if the geometry value is NULL or an empty value.

PostgreSQL

  # PostGIS
  ST_ConvexHull — The convex hull of a geometry represents the minimum convex geometry that encloses all geometries within the set.

ST_INTERSECTION

MySQL

  ST_INTERSECTION(g1,g2)
  Returns a geometry that is the intersection, or shared portion, of geometry g1 and geometry g2.

PostgreSQL

  # PostGIS
  ST_Intersection — (T) Returns a geometry that represents the shared portion of geomA and geomB.
  ST_Difference — Returns a geometry that represents that part of geometry A that does not intersect with geometry B.

ST_PointOnSurface

MySQL

  ST_PointOnSurface (g), PointOnSurface(g)
  Given a geometry, returns a POINT guaranteed to intersect a surface.

PostgreSQL

  # PostGIS
  ST_PointOnSurface — Returns a POINT guaranteed to lie on the surface.

ST_SYMDIFFERENCE

MySQL

  ST_SYMDIFFERENCE(g1,g2)
  Returns a geometry that represents the portions of geometry g1 and geometry g2 that don't intersect.

PostgreSQL

  # PostGIS
  ST_Difference — Returns a geometry that represents that part of geometry A that does not intersect with geometry B
  ST_SymDifference — Returns a geometry that represents the portions of A and B that do not intersect. It is called a symmetric difference because ST_SymDifference(A,B) = ST_SymDifference(B,A).

ST_UNION

MySQL

  ST_UNION(g1,g2)
  Returns a geometry that is the union of the geometry g1 and geometry g2.

PostgreSQL

  # PostGIS
  ST_Union — Returns a geometry that represents the point set union of the Geometries.
  ST_UnaryUnion — Like ST_Union, but working at the geometry component level.

PostGIS reference

http://postgis.net/docs/manual-2.2/reference.html
PostGIS有几百个操作函数, 对GIS支持强大。

↧

PostgreSQL MySQL 兼容性之 - 自增值

March 17, 2016, 9:02 am

≫ Next: PostgreSQL MySQL 兼容性之 - 空(NULL)

≪ Previous: PostgreSQL MySQL 兼容性之 - Gis类型

AUTO_INCREMENT , sequence

MySQL

  AUTO_INCREMENT 
    The AUTO_INCREMENT attribute can be used to generate a unique identity for new rows. 
    When you insert a new record to the table, and the auto_increment field is NULL or DEFAULT, the value will automatically be incremented. 
    This also applies to 0, unless the NO_AUTO_VALUE_ON_ZERO SQL_MODE is enabled.

PostgreSQL

  serial4
  create table test(id serial4);
    or
  serial8
  create table test(id serial8);
    or
  create sequence seq;
  create table test(id int default nextval('seq'::regclass));

↧

PostgreSQL MySQL 兼容性之 - 空(NULL)

March 17, 2016, 9:02 am

≫ Next: PostgreSQL 启动时会自动清理temporary-files directory

≪ Previous: PostgreSQL MySQL 兼容性之 - 自增值

NULL compare operator <=>

MySQL

SELECT 99 <=> NULL, NULL <=> NULL;
+-------------+---------------+
| 99 <=> NULL | NULL <=> NULL |
+-------------+---------------+
|           0 |             1 |
+-------------+---------------+

IS NULL
IS NOT NULL

PostgreSQL

针对不同的类型，需要创建不同的函数 和 <=>
create or replace function nulleq(int,int) returns int as $$
declare
begin
  if $1 is null and $2 is null then
    return 1;
  else
    return 0; 
  end if;
end;
$$ language plpgsql;

postgres=# create operator <=> (procedure=nulleq,leftarg=int,rightarg=int);
CREATE OPERATOR
postgres=# select 1 <=> null;
 ?column? 
----------
        0
(1 row)

postgres=# select null <=> null;
 ?column? 
----------
        1
(1 row)

IS NULL
IS NOT NULL

coalesce

MySQL

  coalesce

PostgreSQL

postgres=# select coalesce(null,1,2);
 coalesce 
----------
        1
(1 row)

postgres=# select coalesce(null,null,2);
 coalesce 
----------
        2
(1 row)

postgres=# select coalesce('a',null,'b');
 coalesce 
----------
 a
(1 row)

order

MySQL

  SELECT col1 FROM tab ORDER BY ISNULL(col1), col1;  -- null is first
  SELECT col1 FROM tab ORDER BY IF(col1 IS NULL, 0, 1), col1 DESC;  -- Descending order, with NULLs first

  All NULL values are also regarded as equivalent for the purposes of the DISTINCT and GROUP BY clauses.

PostgreSQL

默认nulls比其他值更大
postgres=# create table test(id int);
CREATE TABLE
postgres=# insert into test values (1),(2),(3),(null),(null);
INSERT 0 5
postgres=# select * from test order by test;
 id 
----
  1
  2
  3


(5 rows)
postgres=# select * from test order by id nulls first;
 id 
----


  1
  2
  3
(5 rows)
postgres=# select * from test order by id nulls last;
 id 
----
  1
  2
  3


(5 rows)
postgres=# select * from test order by id desc;
 id 
----


  3
  2
  1
(5 rows)

postgres=# select * from test order by id desc nulls first;
 id 
----


  3
  2
  1
(5 rows)

postgres=# select * from test order by id desc nulls last;
 id 
----
  3
  2
  1


(5 rows)

ISNULL, NULLIF, IFNULL

MySQL

  IFNULL(expr1,expr2)
    If expr1 is not NULL, IFNULL() returns expr1; otherwise it returns expr2. IFNULL() returns a numeric or string value, depending on the context in which it is used.

SELECT IFNULL(1,0); 
+-------------+
| IFNULL(1,0) |
+-------------+
|           1 |
+-------------+

SELECT IFNULL(NULL,10);
+-----------------+
| IFNULL(NULL,10) |
+-----------------+
|              10 |
+-----------------+

  NULLIF(expr1,expr2)
    Returns NULL if expr1 = expr2 is true, otherwise returns expr1. This is the same as CASE WHEN expr1 = expr2 THEN NULL ELSE expr1 END.

SELECT NULLIF(1,1);
+-------------+
| NULLIF(1,1) |
+-------------+
|        NULL |
+-------------+

SELECT NULLIF(1,2);
+-------------+
| NULLIF(1,2) |
+-------------+
|           1 |
+-------------+

  ISNULL(expr)
    If expr is NULL, ISNULL() returns 1, otherwise it returns 0.

SELECT ISNULL(1+1);
+-------------+
| ISNULL(1+1) |
+-------------+
|           0 |
+-------------+

SELECT ISNULL(1/0);
+-------------+
| ISNULL(1/0) |
+-------------+
|           1 |
+-------------+

  ISNULL(expr)
    If expr is NULL, ISNULL() returns 1, otherwise it returns 0.
SELECT ISNULL(1+1);
+-------------+
| ISNULL(1+1) |
+-------------+
|           0 |
+-------------+

SELECT ISNULL(1/0);
+-------------+
| ISNULL(1/0) |
+-------------+
|           1 |
+-------------+

PostgreSQL

postgres=# create or replace function ifnull(int,int) returns int as $$
  select case when $1 is not null then $1 else $2 end;
$$ language sql;
CREATE FUNCTION
postgres=# select ifnull(null,2);
 ifnull 
--------
      2
(1 row)

postgres=# select ifnull(1,3);
 ifnull 
--------
      1
(1 row)

nullif
postgres=# select nullif(1,1);
 nullif 
--------

(1 row)

postgres=# select nullif(1,2);
 nullif 
--------
      1
(1 row)

isnull
postgres=# create or replace function isnull(anyelement) returns int as $$
select case when $1 is null then 1 else 0 end;              
$$ language sql;
CREATE FUNCTION

ostgres=# create table ttt(id int);
CREATE TABLE
postgres=# insert into ttt values (null);
INSERT 0 1
postgres=# insert into ttt values (1);
INSERT 0 1
postgres=# select isnull(id),id from ttt;
 isnull | id 
--------+----
      1 |   
      0 |  1
(2 rows)

↧

PostgreSQL 启动时会自动清理temporary-files directory

March 17, 2016, 9:03 am

≫ Next: PostgreSQL 锁等待跟踪

≪ Previous: PostgreSQL MySQL 兼容性之 - 空(NULL)

在使用数据库时，跑某些SQL的时候，如果work_mem内存不足会涉及一些临时空间的使用，比如排序，聚合，group by。
如果数据库突然crash了，或者某些原因未清除temp file。
数据库在重启的时候，会自动清理。

PostmasterMain(int argc, char *argv[])
  call
RemovePgTempFiles(void)
  call
RemovePgTempFilesInDir(const char *tmpdirname)

代码如下：
src/backend/storage/file/fd.c

/* Process one pgsql_tmp directory for RemovePgTempFiles */
static void
RemovePgTempFilesInDir(const char *tmpdirname)
{
    DIR        *temp_dir;
    struct dirent *temp_de;
    char        rm_path[MAXPGPATH];

    temp_dir = AllocateDir(tmpdirname);
    if (temp_dir == NULL)
    {
        /* anything except ENOENT is fishy */
        if (errno != ENOENT)
            elog(LOG,
                 "could not open temporary-files directory \"%s\": %m",
                 tmpdirname);
        return;
    }

    while ((temp_de = ReadDir(temp_dir, tmpdirname)) != NULL)
    {
        if (strcmp(temp_de->d_name, ".") == 0 ||
            strcmp(temp_de->d_name, "..") == 0)
            continue;

        snprintf(rm_path, sizeof(rm_path), "%s/%s",
                 tmpdirname, temp_de->d_name);

        if (strncmp(temp_de->d_name,
                    PG_TEMP_FILE_PREFIX,
                    strlen(PG_TEMP_FILE_PREFIX)) == 0)
            unlink(rm_path);    /* note we ignore any error */
        else
            elog(LOG,
                 "unexpected file found in temporary-files directory: \"%s\"",
                 rm_path);
    }

    FreeDir(temp_dir);
}

↧

PostgreSQL 锁等待跟踪

March 17, 2016, 9:04 am

≫ Next: PostgreSQL Oracle 兼容性之 - sys_guid()

≪ Previous: PostgreSQL 启动时会自动清理temporary-files directory

PostgreSQL 在打印LONG SQL时，锁等待的时间也会算在内，并且目前在日志中没有将锁等待的时间单独打印出来。

shared_preload_libraries='auto_explain'
auto_explain.log_min_duration='1s'
auto_explain.log_analyze=true
auto_explain.log_buffers=true
auto_explain.log_timing=true
auto_explain.log_triggers=true
auto_explain.log_verbose=true
auto_explain.log_nested_statements=true

pg_ctl restart -m fast

例子:

session A:
postgres=# create table test2(id int, info text);
CREATE TABLE
postgres=# insert into test2 values (1,'test');
INSERT 0 1
postgres=# begin;
BEGIN
postgres=# update test2 set info='a' where id=1;
UPDATE 1

session B:
postgres=# update test2 set info='b' ;
wait

session A:
postgres=# end;
COMMIT

session B:
UPDATE 1

查看日志如下：

2016-03-15 15:44:23.618 CST,"postgres","postgres",106815,"[local]",56e7bc6c.1a13f,3,"UPDATE",2016-03-15 15:40:28 CST,3/12,574614687,LOG,00000,"duration: 32038.420 ms  plan:
Query Text: update test2 set info='b' ;
Update on test2  (cost=0.00..22.70 rows=1270 width=10) (actual time=32038.418..32038.418 rows=0 loops=1)
  Buffers: shared hit=5
  ->  Seq Scan on test2  (cost=0.00..22.70 rows=1270 width=10) (actual time=0.014..0.015 rows=1 loops=1)
        Buffers: shared hit=1",,,,,,,,"explain_ExecutorEnd, auto_explain.c:333","psql"
2016-03-15 15:44:23.618 CST,"postgres","postgres",106815,"[local]",56e7bc6c.1a13f,4,"UPDATE",2016-03-15 15:40:28 CST,3/0,0,LOG,00000,"duration: 32039.289 ms  statement: update test2 set info='b' ;",,,,,,,,"exec_simple_query, postgres.c:1181","psql"

等待时间也被计算在内了。

如果要分析锁等待的话，最好加上如下参数：

log_lock_waits = on
deadlock_timeout = 1s

那么在日志中，可以看到会话等待锁的时间超过deadlock_timeout时，会打印一条日志，告诉你在等待那个PID，等待什么锁：

2016-03-15 16:30:57.129 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,32,"UPDATE waiting",2016-03-15 16:12:15 CST,3/17,574614691,LOG,00000,"process 10220 still waiting for ShareLock on transaction 574614690 after 1000.036 ms","Process holding the lock: 9725. Wait queue: 10220.",,,,"while updating tuple (0,5) in relation ""test2""","update test2 set info='b' ;",,"ProcSleep, proc.c:1323","psql"

在获取到锁之后，又会打印一条日志：

2016-03-15 16:32:36.323 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,33,"UPDATE waiting",2016-03-15 16:12:15 CST,3/17,574614691,LOG,00000,"process 10220 acquired ShareLock on transaction 574614690 after 100194.020 ms",,,,,"while updating tuple (0,5) in relation ""test2""","update test2 set info='b' ;",,"ProcSleep, proc.c:1327","psql"

分析以上两条日志，和long sql的日志关联起来，就可以知道LONG SQL的锁等待花了多少时间。

如果要跟踪更详细的锁信息，需要修改一下头文件，重新编译：

vi src/include/pg_config_manual.h
#define LOCK_DEBUG

make clean
make distclean
configure again
make -j 32
make install -j 32

vi $PGDATA/postgresql.conf
trace_locks = true

pg_ctl restart -m fast

以上CASE，可以跟踪到如下锁信息：

2016-03-15 16:12:08.389 CST,,,9725,"",56e7c3d8.25fd,1,"",2016-03-15 16:12:08 CST,,0,LOG,00000,"connection received: host=[local]",,,,,,,,"BackendInitialize, postmaster.c:4081",""
2016-03-15 16:12:08.390 CST,"postgres","postgres",9725,"[local]",56e7c3d8.25fd,2,"authentication",2016-03-15 16:12:08 CST,2/11,0,LOG,00000,"connection authorized: user=postgres database=postgres",,,,,,,,"PerformAuthentication, postinit.c:259",""
2016-03-15 16:12:08.391 CST,"postgres","postgres",9725,"[local]",56e7c3d8.25fd,3,"startup",2016-03-15 16:12:08 CST,2/0,0,LOG,00000,"LockReleaseAll: lockmethod=1",,,,,,,,"LockReleaseAll, lock.c:1951","psql"
2016-03-15 16:12:08.391 CST,"postgres","postgres",9725,"[local]",56e7c3d8.25fd,4,"startup",2016-03-15 16:12:08 CST,2/0,0,LOG,00000,"LockReleaseAll done",,,,,,,,"LockReleaseAll, lock.c:2198","psql"
2016-03-15 16:12:13.968 CST,"postgres","postgres",9725,"[local]",56e7c3d8.25fd,5,"UPDATE",2016-03-15 16:12:08 CST,2/12,0,LOG,00000,"LockAcquire: lock [13241,53029] RowExclusiveLock",,,,,,"update test2 set info='a' where id=1;",8,"LockAcquireExtended, lock.c:724","psql"
2016-03-15 16:12:13.969 CST,"postgres","postgres",9725,"[local]",56e7c3d8.25fd,6,"UPDATE",2016-03-15 16:12:08 CST,2/12,0,LOG,00000,"LockAcquire: lock [13241,53029] RowExclusiveLock",,,,,,"update test2 set info='a' where id=1;",,"LockAcquireExtended, lock.c:724","psql"
2016-03-15 16:12:15.815 CST,,,10220,"",56e7c3df.27ec,1,"",2016-03-15 16:12:15 CST,,0,LOG,00000,"connection received: host=[local]",,,,,,,,"BackendInitialize, postmaster.c:4081",""
2016-03-15 16:12:15.816 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,2,"authentication",2016-03-15 16:12:15 CST,3/15,0,LOG,00000,"connection authorized: user=postgres database=postgres",,,,,,,,"PerformAuthentication, postinit.c:259",""
2016-03-15 16:12:15.817 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,3,"startup",2016-03-15 16:12:15 CST,3/0,0,LOG,00000,"LockReleaseAll: lockmethod=1",,,,,,,,"LockReleaseAll, lock.c:1951","psql"
2016-03-15 16:12:15.817 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,4,"startup",2016-03-15 16:12:15 CST,3/0,0,LOG,00000,"LockReleaseAll done",,,,,,,,"LockReleaseAll, lock.c:2198","psql"
2016-03-15 16:12:16.777 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,5,"UPDATE",2016-03-15 16:12:15 CST,3/16,0,LOG,00000,"LockAcquire: lock [13241,53029] RowExclusiveLock",,,,,,"update test2 set info='b' ;",8,"LockAcquireExtended, lock.c:724","psql"
2016-03-15 16:12:16.778 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,6,"UPDATE",2016-03-15 16:12:15 CST,3/16,0,LOG,00000,"LockAcquire: lock [13241,53029] RowExclusiveLock",,,,,,"update test2 set info='b' ;",,"LockAcquireExtended, lock.c:724","psql"
2016-03-15 16:12:16.778 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,7,"UPDATE",2016-03-15 16:12:15 CST,3/16,574614689,LOG,00000,"LockAcquire: lock [13241,53029] ExclusiveLock",,,,,,"update test2 set info='b' ;",,"LockAcquireExtended, lock.c:724","psql"
2016-03-15 16:12:16.778 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,8,"UPDATE",2016-03-15 16:12:15 CST,3/16,574614689,LOG,00000,"LockAcquire: new: lock(0x7f88ead65ed8) id(13241,53029,0,3,3,1) grantMask(0) req(0,0,0,0,0,0,0)=0 grant(0,0,0,0,0,0,0)=0 wait(0) type(ExclusiveLock)",,,,,,"update test2 set info='b' ;",,"LOCK_PRINT, lock.c:319","psql"
2016-03-15 16:12:16.778 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,9,"UPDATE",2016-03-15 16:12:15 CST,3/16,574614689,LOG,00000,"LockAcquire: new: proclock(0x7f88eadf4878) lock(0x7f88ead65ed8) method(1) proc(0x7f88eb0211f0) hold(0)",,,,,,"update test2 set info='b' ;",,"PROCLOCK_PRINT, lock.c:331","psql"
2016-03-15 16:12:16.778 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,10,"UPDATE",2016-03-15 16:12:15 CST,3/16,574614689,LOG,00000,"LockCheckConflicts: no conflict: proclock(0x7f88eadf4878) lock(0x7f88ead65ed8) method(1) proc(0x7f88eb0211f0) hold(0)",,,,,,"update test2 set info='b' ;",,"PROCLOCK_PRINT, lock.c:331","psql"
2016-03-15 16:12:16.778 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,11,"UPDATE",2016-03-15 16:12:15 CST,3/16,574614689,LOG,00000,"GrantLock: lock(0x7f88ead65ed8) id(13241,53029,0,3,3,1) grantMask(80) req(0,0,0,0,0,0,1)=1 grant(0,0,0,0,0,0,1)=1 wait(0) type(ExclusiveLock)",,,,,,"update test2 set info='b' ;",,"LOCK_PRINT, lock.c:319","psql"
2016-03-15 16:17:05.528 CST,"postgres","postgres",9725,"[local]",56e7c3d8.25fd,7,"COMMIT",2016-03-15 16:12:08 CST,2/0,574614688,LOG,00000,"LockReleaseAll: lockmethod=1",,,,,,"end;",,"LockReleaseAll, lock.c:1951","psql"
2016-03-15 16:17:05.528 CST,"postgres","postgres",9725,"[local]",56e7c3d8.25fd,8,"COMMIT",2016-03-15 16:12:08 CST,2/0,574614688,LOG,00000,"LockReleaseAll done",,,,,,"end;",,"LockReleaseAll, lock.c:2198","psql"
2016-03-15 16:17:05.528 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,12,"UPDATE",2016-03-15 16:12:15 CST,3/16,574614689,LOG,00000,"LockRelease: lock [13241,53029] ExclusiveLock",,,,,,"update test2 set info='b' ;",,"LockRelease, lock.c:1758","psql"
2016-03-15 16:17:05.528 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,13,"UPDATE",2016-03-15 16:12:15 CST,3/16,574614689,LOG,00000,"LockRelease: found: lock(0x7f88ead65ed8) id(13241,53029,0,3,3,1) grantMask(80) req(0,0,0,0,0,0,1)=1 grant(0,0,0,0,0,0,1)=1 wait(0) type(ExclusiveLock)",,,,,,"update test2 set info='b' ;",,"LOCK_PRINT, lock.c:319","psql"
2016-03-15 16:17:05.528 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,14,"UPDATE",2016-03-15 16:12:15 CST,3/16,574614689,LOG,00000,"LockRelease: found: proclock(0x7f88eadf4878) lock(0x7f88ead65ed8) method(1) proc(0x7f88eb0211f0) hold(80)",,,,,,"update test2 set info='b' ;",,"PROCLOCK_PRINT, lock.c:331","psql"
2016-03-15 16:17:05.528 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,15,"UPDATE",2016-03-15 16:12:15 CST,3/16,574614689,LOG,00000,"UnGrantLock: updated: lock(0x7f88ead65ed8) id(13241,53029,0,3,3,1) grantMask(0) req(0,0,0,0,0,0,0)=0 grant(0,0,0,0,0,0,0)=0 wait(0) type(ExclusiveLock)",,,,,,"update test2 set info='b' ;",,"LOCK_PRINT, lock.c:319","psql"
2016-03-15 16:17:05.528 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,16,"UPDATE",2016-03-15 16:12:15 CST,3/16,574614689,LOG,00000,"UnGrantLock: updated: proclock(0x7f88eadf4878) lock(0x7f88ead65ed8) method(1) proc(0x7f88eb0211f0) hold(0)",,,,,,"update test2 set info='b' ;",,"PROCLOCK_PRINT, lock.c:331","psql"
2016-03-15 16:17:05.528 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,17,"UPDATE",2016-03-15 16:12:15 CST,3/16,574614689,LOG,00000,"CleanUpLock: deleting: proclock(0x7f88eadf4878) lock(0x7f88ead65ed8) method(1) proc(0x7f88eb0211f0) hold(0)",,,,,,"update test2 set info='b' ;",,"PROCLOCK_PRINT, lock.c:331","psql"
2016-03-15 16:17:05.528 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,18,"UPDATE",2016-03-15 16:12:15 CST,3/16,574614689,LOG,00000,"CleanUpLock: deleting: lock(0x7f88ead65ed8) id(13241,53029,0,3,3,1) grantMask(0) req(0,0,0,0,0,0,0)=0 grant(0,0,0,0,0,0,0)=0 wait(0) type(INVALID)",,,,,,"update test2 set info='b' ;",,"LOCK_PRINT, lock.c:319","psql"
2016-03-15 16:17:05.528 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,19,"UPDATE",2016-03-15 16:12:15 CST,3/16,574614689,LOG,00000,"LockAcquire: lock [13241,53029] AccessShareLock",,,,,,"update test2 set info='b' ;",,"LockAcquireExtended, lock.c:724","psql"
2016-03-15 16:17:05.528 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,20,"UPDATE",2016-03-15 16:12:15 CST,3/16,574614689,LOG,00000,"LockRelease: lock [13241,53029] AccessShareLock",,,,,,"update test2 set info='b' ;",,"LockRelease, lock.c:1758","psql"
2016-03-15 16:17:05.528 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,21,"UPDATE",2016-03-15 16:12:15 CST,3/16,574614689,LOG,00000,"duration: 288750.628 ms  plan:
Query Text: update test2 set info='b' ;
Update on test2  (cost=0.00..22.70 rows=1270 width=10) (actual time=288750.624..288750.624 rows=0 loops=1)
  Buffers: shared hit=5
  ->  Seq Scan on test2  (cost=0.00..22.70 rows=1270 width=10) (actual time=0.013..0.015 rows=1 loops=1)
        Buffers: shared hit=1",,,,,,,,"explain_ExecutorEnd, auto_explain.c:333","psql"
2016-03-15 16:17:05.529 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,22,"UPDATE",2016-03-15 16:12:15 CST,3/0,574614689,LOG,00000,"LockReleaseAll: lockmethod=1",,,,,,"update test2 set info='b' ;",,"LockReleaseAll, lock.c:1951","psql"
2016-03-15 16:17:05.529 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,23,"UPDATE",2016-03-15 16:12:15 CST,3/0,574614689,LOG,00000,"LockReleaseAll done",,,,,,"update test2 set info='b' ;",,"LockReleaseAll, lock.c:2198","psql"
2016-03-15 16:17:05.529 CST,"postgres","postgres",10220,"[local]",56e7c3df.27ec,24,"UPDATE",2016-03-15 16:12:15 CST,3/0,0,LOG,00000,"duration: 288751.635 ms  statement: update test2 set info='b' ;",,,,,,,,"exec_simple_query, postgres.c:1181","psql"

锁的解释需要对照src/include/storage/lock.h

↧

PostgreSQL Oracle 兼容性之 - sys_guid()

March 17, 2016, 9:04 am

≫ Next: PostgreSQL MySQL 兼容性之 - bit 函数和操作符

≪ Previous: PostgreSQL 锁等待跟踪

Oracle 使用sys_guid()用来产生UUID值。
在PostgreSQL中有类似的函数，需要安装uuid-ossp插件。
如果用户不想修改代码，还是需要使用sys_guid()函数的话，可以自己写一个。
如下：

postgres=# create extension "uuid-ossp";
CREATE EXTENSION
postgres=# create or replace function sys_guid() returns uuid as $$
select uuid_generate_v4();
$$ language sql strict;
CREATE FUNCTION
postgres=# select sys_guid();
               sys_guid               
--------------------------------------
 92bbbf05-a23c-41b3-95d4-8732c93d95dd
(1 row)

postgres=# select sys_guid();
               sys_guid               
--------------------------------------
 37e34cfb-46aa-44ed-9403-9e23b6c2bfc0
(1 row)