PostgreSQL 递归查询一例

April 5, 2016, 5:07 am

≫ Next: 如何评估Greenplum master 空间以及segment元数据占用的空间

≪ Previous: 一个简单算法可以帮助物联网,金融用户节约98%的数据存储成本 (PostgreSQL,Greenplum帮你做到)

云栖社区问答中，一位网友的问题：
one等于上一个one加上现在的money，语句怎么写？

在PostgreSQL中，可以使用递归查询满足以上业务场景的需求：
需要用到递归查询。

postgres=# create table m(id serial primary key,money int, one int);
CREATE TABLE
postgres=# insert into m(money,one) values (0,2000),(85,0),(100,0),(19,0),(21,0);
INSERT 0 5
postgres=# select * from m;
 id | money | one  
----+-------+------
  1 |     0 | 2000
  2 |    85 |    0
  3 |   100 |    0
  4 |    19 |    0
  5 |    21 |    0
(5 rows)

postgres=# with recursive t(id,money,one) as (select 1 id,0 money,2000 one union all select t1.id,t1.money,t.one-t1.money one from t,m t1 where t.id=t1.id-1) select * from t;
 id | money | one  
----+-------+------
  1 |     0 | 2000
  2 |    85 | 1915
  3 |   100 | 1815
  4 |    19 | 1796
  5 |    21 | 1775
(5 rows)

↧

如何评估Greenplum master 空间以及segment元数据占用的空间

April 5, 2016, 11:29 pm

≫ Next: 如何用PostgreSQL解决一个人工智能语义去重的小问题

≪ Previous: PostgreSQL 递归查询一例

Greenplum master节点是用来存储元数据的，包括 :
序列，表，临时表，分区，函数，视图，类型，操作符，规则，触发器等。

segment 上也会存储部分元数据，
序列，表，临时表，函数，视图，类型，操作符，规则，触发器等。

master比segment更多的信息包括：
分布策略，分区表，以及一些特殊的配置元数据。

gp_distribution_policy 
pg_partition 
pg_partition_encoding 
pg_partition_rule 
pg_statistic

仅仅从元数据的角度来看，master比segment存储的信息略多一些，主要是表的分布策略和表分区的定义。

如何评估master的空间？
主要考虑几个因素 :
.1. 定义多少个对象
序列对应的元表： pg_class , pg_statistic, pg_attribute 平均每个序列一条记录
10万个序列，约占用30万条元数据。

表对应的元表： pg_class （2）, pg_statistic ( 64， only on master ) , pg_attribute ( 64 ) ，gp_distribution_policy （1）。 (有变长字段，会新增TOAST元数据)
1000万张表(含分区表)，约占用14亿条元数据。

临时表对应的元表： pg_class （2）, pg_statistic ( 64， only on master ) , pg_attribute ( 64 ) 。 (有变长字段，会新增TOAST元数据)
1万张临时表，约占用130万条元数据。

分区： pg_partition (每个表1条), pg_partition_encoding （一般0）, pg_partition_rule (每个分区表一条)
2万主表，900万个分区表，约占用902万条元数据。

函数：pg_proc (每个函数1条)
10万函数，约占用10万条元数据。

视图：pg_class
10万视图，约占用10万条元数据。

类型：pg_type
1万类型，约占用1万条元数据。

操作符：pg_operator, pg_op...
1万操作符，约占用5万条元数据。

规则：pg_rewrite
1万规则，约占用1万条元数据。

触发器：pg_trigger
1万个触发器，约占用1万条元数据。

.2. 是否使用临时对象
临时表，会产生元数据，会话关闭后，自动释放，从而产生垃圾，可能导致元数据膨胀。

.3. 膨胀率
不断的新增，删除表。或修改字段定义。会导致元数据变化，可能导致元数据膨胀。
特别是存在长事务时，由于只能回收到该事务起点以前的事务产生的垃圾，这样容易造成垃圾积累。
假设膨胀率为30%，正常情况下比这个要少点。

如何推算master节点需要多少空间？
首先需要评估每个元表的平均记录大小，单位字节：

postgres=# select relname,relkind,round((relpages::numeric*8*1024)/reltuples::numeric,2) from pg_class where relpages<>0 and reltuples<>0 and relkind='r' and reltuples>100 order by 1;
           relname           | relkind |  round  
-----------------------------+---------+---------
 gp_distribution_policy      | r       |   40.96
 gp_fastsequence             | r       |   47.63
 gp_persistent_relation_node | r       |   33.57
 gp_relation_node            | r       |   39.77
 pg_aggregate                | r       |   60.68
 pg_amop                     | r       |   29.20
 pg_amproc                   | r       |   31.51
 pg_appendonly               | r       |  163.84
 pg_attrdef                  | r       |  160.63
 pg_attribute                | r       |   93.85
 pg_attribute_encoding       | r       |   83.22
 pg_cast                     | r       |   30.57
 pg_class                    | r       |  137.23
 pg_constraint               | r       |  548.95
 pg_conversion               | r       |   62.06
 pg_depend                   | r       |   21.42
 pg_description              | r       |   17.75
 pg_index                    | r       |   77.14
 pg_inherits                 | r       |   42.67
 pg_opclass                  | r       |   58.10
 pg_operator                 | r       |   48.19
 pg_partition_rule           | r       |  341.33
 pg_proc                     | r       |   50.83
 pg_rewrite                  | r       | 1079.57
 pg_stat_last_operation      | r       |  138.51
 pg_statistic                | r       |   78.21
 pg_type                     | r       |   93.19
 pg_window                   | r       |   28.44
 sql_features                | r       |   25.24
 supplier                    | r       |   38.89

其次，需要告知在集群中有多少元数据。
假设用户需要在GP集群中创建：
10万个序列，1000万张表（包含分区表），同时存在1万张临时表，10万函数，10万视图，1万自定义类型，1万自定义操作符，1万条规则，1万个触发器。
需要
约14.1090亿条元数据，平均每条元数据假设200字节（实际可能更小，参考各个元表的relpages*8*1024/reltuples 得到的一个参考值），约260GB。
算上膨胀率，Master约占用空间338GB空间。

segment的元数据大小评估：
需要扣除

gp_distribution_policy 
pg_partition 
pg_partition_encoding 
pg_partition_rule 
pg_statistic

上面的例子，约比master少7亿数据。约占170GB元数据空间。

↧

如何用PostgreSQL解决一个人工智能语义去重的小问题

April 8, 2016, 5:48 am

≫ Next: PostgreSQL 如何计算两个时间点之间正常的工作日时间

≪ Previous: 如何评估Greenplum master 空间以及segment元数据占用的空间

在云栖社区的问答区，有一位网友提到有一个问题：

表里相似数据太多，想删除相似度高的数据，有什么办法能实现吗？
例如：
银屑病怎么治？
银屑病怎么治疗？
银屑病怎么治疗好？
银屑病怎么能治疗好？
等等

解这个问题的思路
.1. 首先如何判断内容的相似度，PostgreSQL中提供了中文分词，pg_trgm(将字符串切成多个不重复的token,计算两个字符串的相似度) .
对于本题，我建议采取中文分词的方式，首先将内容拆分成词组。
.2. 在拆分成词组后，首先分组聚合，去除完全重复的数据。
.3. 然后自关联生成笛卡尔(矩阵)，计算出每条记录和其他记录的相似度。相似度的算法很简单，重叠的token数量除以集合的token去重后的数量。
.4. 根据相似度，去除不需要的数据。
这里如果数据量非常庞大，使用专业的分析编程语言会更好例如 PL/R。

实操的例子：
首先要安装PostgreSQL 中文分词插件
(阿里云AliCloudDB PostgreSQL已包含这个插件，用法参考官方手册)

git clone https://github.com/jaiminpan/pg_jieba.git
mv pg_jieba $PGSRC/contrib/
export PATH=/home/digoal/pgsql9.5/bin:$PATH
cd $PGSRC/contrib/pg_jieba
make clean;make;make install

git clone https://github.com/jaiminpan/pg_scws.git
mv pg_jieba $PGSRC/contrib/
export PATH=/home/digoal/pgsql9.5/bin:$PATH
cd $PGSRC/contrib/pg_scws
make clean;make;make install

创建插件

psql
# create extension pg_jieba;
# create extension pg_scws;

创建测试CASE

create table tdup1 (id int primary key, info text);
create extension pg_trgm;
insert into tdup1 values (1, '银屑病怎么治？');
insert into tdup1 values (2, '银屑病怎么治疗？');
insert into tdup1 values (3, '银屑病怎么治疗好？');
insert into tdup1 values (4, '银屑病怎么能治疗好？');

这两种分词插件，可以任选一种。

postgres=# select to_tsvector('jiebacfg', info),* from tdup1 ;
     to_tsvector     | id |         info         
---------------------+----+----------------------
 '治':3 '银屑病':1   |  1 | 银屑病怎么治？
 '治疗':3 '银屑病':1 |  2 | 银屑病怎么治疗？
 '治疗':3 '银屑病':1 |  3 | 银屑病怎么治疗好？
 '治疗':4 '银屑病':1 |  4 | 银屑病怎么能治疗好？
(4 rows)

postgres=# select to_tsvector('scwscfg', info),* from tdup1 ;
            to_tsvector            | id |         info         
-----------------------------------+----+----------------------
 '治':2 '银屑病':1                 |  1 | 银屑病怎么治？
 '治疗':2 '银屑病':1               |  2 | 银屑病怎么治疗？
 '好':3 '治疗':2 '银屑病':1        |  3 | 银屑病怎么治疗好？
 '好':4 '治疗':3 '能':2 '银屑病':1 |  4 | 银屑病怎么能治疗好？
(4 rows)

创建三个函数，
计算2个数组的集合（去重后的集合）

postgres=# create or replace function array_union(text[], text[]) returns text[] as $$
  select array_agg(c1) from (select c1 from unnest($1||$2) t(c1) group by c1) t;
$$ language sql strict;
CREATE FUNCTION

数组去重

postgres=# create or replace function array_dist(text[]) returns text[] as $$         
  select array_agg(c1) from (select c1 from unnest($1) t(c1) group by c1) t;    
$$ language sql strict;
CREATE FUNCTION

计算两个数组的重叠部分（去重后的重叠部分）

postgres=# create or replace function array_share(text[], text[]) returns text[] as $$
  select array_agg(unnest) from (select unnest($1) intersect select unnest($2) group by 1) t;
$$ language sql strict;
CREATE FUNCTION

笛卡尔结果是这样的：
regexp_split_to_array((regexp_replace(to_tsvector('jiebacfg',info)::text,'(:\d+)', '', 'g')),' ') 用于将info转换成数组。

postgres=# with t(c1,c2,c3) as 
(select id,info,array_dist(regexp_split_to_array((regexp_replace(to_tsvector('jiebacfg',info)::text,'(:\d+)', '', 'g')),' ')) from tdup1) 
select * from (select t1.c1 t1c1,t2.c1 t2c1,t1.c2 t1c2,t2.c2 t2c2,t1.c3 t1c3,t2.c3 t2c3,round(array_length(array_share(t1.c3,t2.c3),1)::numeric/array_length(array_union(t1.c3,t2.c3),1),2) 
simulate from t t1,t t2) t;
 t1c1 | t2c1 |         t1c2         |         t2c2         |       t1c3        |       t2c3        | simulate 
------+------+----------------------+----------------------+-------------------+-------------------+----------
    1 |    1 | 银屑病怎么治？       | 银屑病怎么治？       | {'银屑病','治'}   | {'银屑病','治'}   |     1.00
    1 |    2 | 银屑病怎么治？       | 银屑病怎么治疗？     | {'银屑病','治'}   | {'银屑病','治疗'} |     0.33
    1 |    3 | 银屑病怎么治？       | 银屑病怎么治疗好？   | {'银屑病','治'}   | {'银屑病','治疗'} |     0.33
    1 |    4 | 银屑病怎么治？       | 银屑病怎么能治疗好？ | {'银屑病','治'}   | {'银屑病','治疗'} |     0.33
    2 |    1 | 银屑病怎么治疗？     | 银屑病怎么治？       | {'银屑病','治疗'} | {'银屑病','治'}   |     0.33
    2 |    2 | 银屑病怎么治疗？     | 银屑病怎么治疗？     | {'银屑病','治疗'} | {'银屑病','治疗'} |     1.00
    2 |    3 | 银屑病怎么治疗？     | 银屑病怎么治疗好？   | {'银屑病','治疗'} | {'银屑病','治疗'} |     1.00
    2 |    4 | 银屑病怎么治疗？     | 银屑病怎么能治疗好？ | {'银屑病','治疗'} | {'银屑病','治疗'} |     1.00
    3 |    1 | 银屑病怎么治疗好？   | 银屑病怎么治？       | {'银屑病','治疗'} | {'银屑病','治'}   |     0.33
    3 |    2 | 银屑病怎么治疗好？   | 银屑病怎么治疗？     | {'银屑病','治疗'} | {'银屑病','治疗'} |     1.00
    3 |    3 | 银屑病怎么治疗好？   | 银屑病怎么治疗好？   | {'银屑病','治疗'} | {'银屑病','治疗'} |     1.00
    3 |    4 | 银屑病怎么治疗好？   | 银屑病怎么能治疗好？ | {'银屑病','治疗'} | {'银屑病','治疗'} |     1.00
    4 |    1 | 银屑病怎么能治疗好？ | 银屑病怎么治？       | {'银屑病','治疗'} | {'银屑病','治'}   |     0.33
    4 |    2 | 银屑病怎么能治疗好？ | 银屑病怎么治疗？     | {'银屑病','治疗'} | {'银屑病','治疗'} |     1.00
    4 |    3 | 银屑病怎么能治疗好？ | 银屑病怎么治疗好？   | {'银屑病','治疗'} | {'银屑病','治疗'} |     1.00
    4 |    4 | 银屑病怎么能治疗好？ | 银屑病怎么能治疗好？ | {'银屑病','治疗'} | {'银屑病','治疗'} |     1.00
(16 rows)

以上生成的实际上是一个矩阵,simulate就是矩阵中我们需要计算的相似度：

我们在去重计算时不需要所有的笛卡尔积，只需要这个矩阵对角线的上部分或下部分数据即可。
所以加个条件就能完成。

postgres=# with t(c1,c2,c3) as 
(select id,info,array_dist(regexp_split_to_array((regexp_replace(to_tsvector('jiebacfg',info)::text,'(:\d+)', '', 'g')),' ')) from tdup1) 
select * from (select t1.c1 t1c1,t2.c1 t2c1,t1.c2 t1c2,t2.c2 t2c2,t1.c3 t1c3,t2.c3 t2c3,round(array_length(array_share(t1.c3,t2.c3),1)::numeric/array_length(array_union(t1.c3,t2.c3),1),2) 
simulate from t t1,t t2 where t1.c1<>t2.c1 and t1.c1<t2.c1) t;
 t1c1 | t2c1 |        t1c2        |         t2c2         |       t1c3        |       t2c3        | simulate 
------+------+--------------------+----------------------+-------------------+-------------------+----------
    1 |    2 | 银屑病怎么治？     | 银屑病怎么治疗？     | {'银屑病','治'}   | {'银屑病','治疗'} |     0.33
    1 |    3 | 银屑病怎么治？     | 银屑病怎么治疗好？   | {'银屑病','治'}   | {'银屑病','治疗'} |     0.33
    1 |    4 | 银屑病怎么治？     | 银屑病怎么能治疗好？ | {'银屑病','治'}   | {'银屑病','治疗'} |     0.33
    2 |    3 | 银屑病怎么治疗？   | 银屑病怎么治疗好？   | {'银屑病','治疗'} | {'银屑病','治疗'} |     1.00
    2 |    4 | 银屑病怎么治疗？   | 银屑病怎么能治疗好？ | {'银屑病','治疗'} | {'银屑病','治疗'} |     1.00
    3 |    4 | 银屑病怎么治疗好？ | 银屑病怎么能治疗好？ | {'银屑病','治疗'} | {'银屑病','治疗'} |     1.00
(6 rows)

开始对这些数据去重，去重的第一步，明确simulate, 例如相似度大于0.5的，需要去重。

postgres=# with t(c1,c2,c3) as 
(select id,info,array_dist(regexp_split_to_array((regexp_replace(to_tsvector('jiebacfg',info)::text,'(:\d+)', '', 'g')),' ')) from tdup1) 
select * from (select t1.c1 t1c1,t2.c1 t2c1,t1.c2 t1c2,t2.c2 t2c2,t1.c3 t1c3,t2.c3 t2c3,round(array_length(array_share(t1.c3,t2.c3),1)::numeric/array_length(array_union(t1.c3,t2.c3),1),2) 
simulate from t t1,t t2 where t1.c1<>t2.c1 and t1.c1<t2.c1) t where simulate>0.5;
 t1c1 | t2c1 |        t1c2        |         t2c2         |       t1c3        |       t2c3        | simulate 
------+------+--------------------+----------------------+-------------------+-------------------+----------
    2 |    3 | 银屑病怎么治疗？   | 银屑病怎么治疗好？   | {'银屑病','治疗'} | {'银屑病','治疗'} |     1.00
    2 |    4 | 银屑病怎么治疗？   | 银屑病怎么能治疗好？ | {'银屑病','治疗'} | {'银屑病','治疗'} |     1.00
    3 |    4 | 银屑病怎么治疗好？ | 银屑病怎么能治疗好？ | {'银屑病','治疗'} | {'银屑病','治疗'} |     1.00
(3 rows)

去重第二步，将t2c1列的ID对应的记录删掉即可。

delete from tdup1 where id in (with t(c1,c2,c3) as 
(select id,info,array_dist(regexp_split_to_array((regexp_replace(to_tsvector('jiebacfg',info)::text,'(:\d+)', '', 'g')),' ')) from tdup1) 
select t2c1 from (select t1.c1 t1c1,t2.c1 t2c1,t1.c2 t1c2,t2.c2 t2c2,t1.c3 t1c3,t2.c3 t2c3,round(array_length(array_share(t1.c3,t2.c3),1)::numeric/array_length(array_union(t1.c3,t2.c3),1),2) 
simulate from t t1,t t2 where t1.c1<>t2.c1 and t1.c1<t2.c1) t where simulate>0.5);
例如 : 
postgres=# insert into tdup1 values (11, '白血病怎么治？');
INSERT 0 1
postgres=# insert into tdup1 values (22, '白血病怎么治疗？');
INSERT 0 1
postgres=# insert into tdup1 values (13, '白血病怎么治疗好？');
INSERT 0 1
postgres=# insert into tdup1 values (24, '白血病怎么能治疗好？');
INSERT 0 1
postgres=# 
postgres=# with t(c1,c2,c3) as                             
(select id,info,array_dist(regexp_split_to_array((regexp_replace(to_tsvector('jiebacfg',info)::text,'(:\d+)', '', 'g')),' ')) from tdup1) 
select * from (select t1.c1 t1c1,t2.c1 t2c1,t1.c2 t1c2,t2.c2 t2c2,t1.c3 t1c3,t2.c3 t2c3,round(array_length(array_share(t1.c3,t2.c3),1)::numeric/array_length(array_union(t1.c3,t2.c3),1),2) 
simulate from t t1,t t2 where t1.c1<>t2.c1 and t1.c1<t2.c1) t where simulate>0.5;
 t1c1 | t2c1 |        t1c2        |         t2c2         |       t1c3        |       t2c3        | simulate 
------+------+--------------------+----------------------+-------------------+-------------------+----------
    2 |    3 | 银屑病怎么治疗？   | 银屑病怎么治疗好？   | {'银屑病','治疗'} | {'银屑病','治疗'} |     1.00
    2 |    4 | 银屑病怎么治疗？   | 银屑病怎么能治疗好？ | {'银屑病','治疗'} | {'银屑病','治疗'} |     1.00
    3 |    4 | 银屑病怎么治疗好？ | 银屑病怎么能治疗好？ | {'银屑病','治疗'} | {'银屑病','治疗'} |     1.00
   22 |   24 | 白血病怎么治疗？   | 白血病怎么能治疗好？ | {'治疗','白血病'} | {'治疗','白血病'} |     1.00
   13 |   22 | 白血病怎么治疗好？ | 白血病怎么治疗？     | {'治疗','白血病'} | {'治疗','白血病'} |     1.00
   13 |   24 | 白血病怎么治疗好？ | 白血病怎么能治疗好？ | {'治疗','白血病'} | {'治疗','白血病'} |     1.00
(6 rows)

postgres=# begin;
BEGIN
postgres=# delete from tdup1 where id in (with t(c1,c2,c3) as 
postgres(# (select id,info,array_dist(regexp_split_to_array((regexp_replace(to_tsvector('jiebacfg',info)::text,'(:\d+)', '', 'g')),' ')) from tdup1) 
postgres(# select t2c1 from (select t1.c1 t1c1,t2.c1 t2c1,t1.c2 t1c2,t2.c2 t2c2,t1.c3 t1c3,t2.c3 t2c3,round(array_length(array_share(t1.c3,t2.c3),1)::numeric/array_length(array_union(t1.c3,t2.c3),1),2) 
postgres(# simulate from t t1,t t2 where t1.c1<>t2.c1 and t1.c1<t2.c1) t where simulate>0.5);
DELETE 4
postgres=# select * from tdup1 ;
 id |        info        
----+--------------------
  1 | 银屑病怎么治？
  2 | 银屑病怎么治疗？
 11 | 白血病怎么治？
 13 | 白血病怎么治疗好？
(4 rows)

用数据库解会遇到的问题, 因为我们的JOIN filter是<>和<，用不上hashjoin。
数据量比较大的情况下，耗时会非常的长。

postgres=# explain delete from tdup1 where id in (with t(c1,c2,c3) as 
(select id,info,array_dist(regexp_split_to_array((regexp_replace(to_tsvector('jiebacfg',info)::text,'(:\d+)', '', 'g')),' ')) from tdup1) 
select t2c1 from (select t1.c1 t1c1,t2.c1 t2c1,t1.c2 t1c2,t2.c2 t2c2,t1.c3 t1c3,t2.c3 t2c3,round(array_length(array_share(t1.c3,t2.c3),1)::numeric/array_length(array_union(t1.c3,t2.c3),1),2) 
simulate from t t1,t t2 where t1.c1<>t2.c1 and t1.c1<t2.c1) t where simulate>0.5);
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Delete on tdup1  (cost=10005260133.58..10005260215.84 rows=2555 width=34)
   ->  Hash Join  (cost=10005260133.58..10005260215.84 rows=2555 width=34)
         Hash Cond: (tdup1.id = "ANY_subquery".t2c1)
         ->  Seq Scan on tdup1  (cost=0.00..61.10 rows=5110 width=10)
         ->  Hash  (cost=10005260131.08..10005260131.08 rows=200 width=32)
               ->  HashAggregate  (cost=10005260129.08..10005260131.08 rows=200 width=32)
                     Group Key: "ANY_subquery".t2c1
                     ->  Subquery Scan on "ANY_subquery"  (cost=10000002667.20..10005252911.99 rows=2886838 width=32)
                           ->  Subquery Scan on t  (cost=10000002667.20..10005224043.61 rows=2886838 width=4)
                                 Filter: (t.simulate > 0.5)
                                 CTE t
                                   ->  Seq Scan on tdup1 tdup1_1  (cost=0.00..2667.20 rows=5110 width=36)
                                 ->  Nested Loop  (cost=10000000000.00..10005113119.99 rows=8660513 width=68)
                                       Join Filter: ((t1.c1 <> t2.c1) AND (t1.c1 < t2.c1))
                                       ->  CTE Scan on t t1  (cost=0.00..102.20 rows=5110 width=36)
                                       ->  CTE Scan on t t2  (cost=0.00..102.20 rows=5110 width=36)
(16 rows)

其他更优雅的方法，使用PLR或者R进行矩阵运算，得出结果后再进行筛选。
PLR
R
或者使用MPP数据库例如Greenplum加上R和madlib可以对非常庞大的数据进行处理。
MADLIB
MPP

小结
这里用到了PG的什么特性？
.1. 中文分词
.2. 窗口查询功能
(本例中没有用到，但是如果你的数据没有主键时，则需要用ctid和row_number来定位到一条唯一记录)

↧

PostgreSQL 如何计算两个时间点之间正常的工作日时间

April 13, 2016, 7:17 pm

≫ Next: 怎样按一定时间间隔跳跃查询数据？

≪ Previous: 如何用PostgreSQL解决一个人工智能语义去重的小问题

create or replace function minus_weekend(timestamp, timestamp) returns interval as $$
declare
  s timestamp := $1;
  e timestamp := $2;
  sd date;
  ed date;
  i interval := interval '0';
  x int;
  x1 interval;
  x2 interval;
begin
  if e < s then
    s := $2;
    e := $1;
  end if;

  select case when extract(isodow from s) not in (6,7) then date(s+interval '1 day')-s else interval '0' end, 
         case when extract(isodow from e) not in (6,7) then e-date(e) else interval '0' end
  into x1, x2;

  if date(e)-date(s) = 0 then
    if extract(isodow from s) not in (6,7) then
      return e-s;
    else 
      return interval '0';
    end if;
  elsif date(e)-date(s) = 1 then
    return x1 + x2;
  end if;

  sd := date(s)+1;
  ed := date(e);

  for x in 0..(ed-sd-1) loop
    if extract(isodow from sd+x) not in (6,7) then 
      i := i + interval '1 day';
    end if;
  end loop;

  return i+x1+x2;
end;
$$ language plpgsql strict;

例子：

postgres=> create table tbl(username name, begin_time timestamp, end_time timestamp);
CREATE TABLE
postgres=> insert into tbl values ('a','2012-10-28 08:30','2012-11-05 17:30');
INSERT 0 1
postgres=> insert into tbl values ('b','2012-11-02 08:30', '2012-11-07 13:30');
INSERT 0 1
postgres=> insert into tbl values ('a','2012-11-08 13:30', '2012-11-09 17:30');
INSERT 0 1

计算a用户实际工作时间。

postgres=> select minus_weekend(begin_time,end_time),username from tbl where username='a';
  minus_weekend  | username 
-----------------+----------
 5 days 17:30:00 | a
 28:00:00        | a
(2 rows)

postgres=> select sum(minus_weekend(begin_time,end_time)) from tbl where username='a' ;
       sum       
-----------------
 5 days 45:30:00
(1 row)

↧

怎样按一定时间间隔跳跃查询数据？

April 13, 2016, 7:20 pm

≫ Next: EDB xDB need pg_authid.rolcatupdate ?

≪ Previous: PostgreSQL 如何计算两个时间点之间正常的工作日时间

问题来自云栖问答，觉得比较好，所以记录一下:

我的项目用的是MySQL，但也想同时请教下在Oracle、SqlServer中应该如何处理如下问题：

有数据表如下所示：
希望从表中抽取数据，要求两条记录的时间间隔至少2分钟
对下面图片中的数据而言，假如我查询的时间范围是从2014-08-10 23:20:00开始的，
则希望抽取到如下结果集：
'83', '57', '10041', '74.27', '0', '2014-08-10 23:20:04'
'113', '57', '10041', '59.25', '0', '2014-08-10 23:22:06'
'145', '57', '10041', '96.21', '0', '2014-08-10 23:24:07'
'177', '57', '10041', '34.16', '0', '2014-08-10 23:26:08'
'209', '57', '10041', '39.11', '0', '2014-08-10 23:28:09'
真实的场景是：传感器每隔30秒左右会向数据库里写一条记录，我要取N天的数据绘图，如果一次性查询N天的记录再进行抽稀的话，由于结果集太大，循环次数过多，导致时耗严重。我希望能通过sql语句直接在数据库层面进行数据抽稀，程序里要处理的数据就会少很多。

问题就是，应该如何写SQL语句？

对于PostgreSQL数据库来说，这个需求很简单，写个函数就可以搞定。
例子：

digoal=# create table test(id serial, crt_time timestamp);
CREATE TABLE
digoal=# insert into test (crt_time) select generate_series(now(),now()+interval '10 min', interval '30 sec');
INSERT 0 21
digoal=# select * from test;
 id |          crt_time          
----+----------------------------
  1 | 2016-04-12 10:25:08.696388
  2 | 2016-04-12 10:25:38.696388
  3 | 2016-04-12 10:26:08.696388
  4 | 2016-04-12 10:26:38.696388
  5 | 2016-04-12 10:27:08.696388
  6 | 2016-04-12 10:27:38.696388
  7 | 2016-04-12 10:28:08.696388
  8 | 2016-04-12 10:28:38.696388
  9 | 2016-04-12 10:29:08.696388
 10 | 2016-04-12 10:29:38.696388
 11 | 2016-04-12 10:30:08.696388
 12 | 2016-04-12 10:30:38.696388
 13 | 2016-04-12 10:31:08.696388
 14 | 2016-04-12 10:31:38.696388
 15 | 2016-04-12 10:32:08.696388
 16 | 2016-04-12 10:32:38.696388
 17 | 2016-04-12 10:33:08.696388
 18 | 2016-04-12 10:33:38.696388
 19 | 2016-04-12 10:34:08.696388
 20 | 2016-04-12 10:34:38.696388
 21 | 2016-04-12 10:35:08.696388
(21 rows)

create or replace function get_sparse_data(b timestamp, e timestamp, sparse interval, lmt int) returns setof test as $$
declare
  res test;
  rec test;
  cn int := 0;
begin
  for rec in select * from test where crt_time between b and e order by crt_time loop
    if res is null or rec.crt_time - res.crt_time >= sparse then
      res := rec;
      cn := cn+1;
      return next res;
    end if;

    if cn >= lmt then
      return;
    end if;
  end loop;
end;
$$ language plpgsql;

digoal=# select get_sparse_data('2016-04-12 10:26:38.696388', '2016-04-12 10:34:08.696388', '1 min', 5);
          get_sparse_data          
-----------------------------------
 (4,"2016-04-12 10:26:38.696388")
 (6,"2016-04-12 10:27:38.696388")
 (8,"2016-04-12 10:28:38.696388")
 (10,"2016-04-12 10:29:38.696388")
 (12,"2016-04-12 10:30:38.696388")
(5 rows)

digoal=# select get_sparse_data('2016-04-12 10:26:38.696388', '2016-04-12 10:34:08.696388', '2 min', 5);
          get_sparse_data          
-----------------------------------
 (4,"2016-04-12 10:26:38.696388")
 (8,"2016-04-12 10:28:38.696388")
 (12,"2016-04-12 10:30:38.696388")
 (16,"2016-04-12 10:32:38.696388")
(4 rows)

↧

EDB xDB need pg_authid.rolcatupdate ?

April 13, 2016, 7:20 pm

≫ Next: PostgreSQL 物联网黑科技 - 阅后即焚

≪ Previous: 怎样按一定时间间隔跳跃查询数据？

PostgreSQL 9.5以前的版本，pg_authid有个字段rolcatupdate，用来标记用户是否有更新catalog的权限。如果rolcatupdate=false，即使是超级用户也不能更新catalog。
但是在9.5以后，这个字段被删掉了，如下commit：
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bb8582abf3c4db18b508627a52effd43672f9410

Remove rolcatupdate

This role attribute is an ancient PostgreSQL feature, but could only be
set by directly updating the system catalogs, and it doesn't have any
clearly defined use.

Author: Adam Brightwell <adam.brightwell@crunchydatasolutions.com>

因此你在9.5的版本看不到pg_authid的字段rolcatupdate了。
然而EDB有个用来增量同步Oracle, PostgreSQL数据的工具xDB，这个工具需要用到这个权限，更新pg_class.relhastriggers字段来禁用触发器。
例子：

digoal=# create table tab(id int);
CREATE TABLE
digoal=# create or replace function tg() returns trigger as $$
digoal$# declare
digoal$# begin
digoal$#   raise notice 'trigged';
digoal$#   return null;
digoal$# end;
digoal$# $$ language plpgsql strict;
CREATE FUNCTION
digoal=# create trigger tg after insert on tab for each row execute procedure tg();
CREATE TRIGGER
digoal=# insert into tab values (1);
NOTICE:  trigged
INSERT 0 1

更新pg_class.relhastriggers = false，就看不到这个触发器了。

digoal=# update pg_class set relhastriggers =false where relname='tab';
UPDATE 1
digoal=# insert into tab values (1);
INSERT 0 1
digoal=# insert into tab values (2);
INSERT 0 1
digoal=# insert into tab values (3);
INSERT 0 1
digoal=# \d+ tab
                         Table "public.tab"
 Column |  Type   | Modifiers | Storage | Stats target | Description 
--------+---------+-----------+---------+--------------+-------------
 id     | integer |           | plain   |              | 

digoal=# update pg_class set relhastriggers =true where relname='tab';
UPDATE 1
digoal=# \d+ tab
                         Table "public.tab"
 Column |  Type   | Modifiers | Storage | Stats target | Description 
--------+---------+-----------+---------+--------------+-------------
 id     | integer |           | plain   |              | 
Triggers:
    tg AFTER INSERT ON tab FOR EACH ROW EXECUTE PROCEDURE tg()

使用这种语法也可禁用触发器

digoal=# alter table tab disable trigger tg;
ALTER TABLE
digoal=# \d+ tab
                         Table "public.tab"
 Column |  Type   | Modifiers | Storage | Stats target | Description 
--------+---------+-----------+---------+--------------+-------------
 id     | integer |           | plain   |              | 
Disabled user triggers:
    tg AFTER INSERT ON tab FOR EACH ROW EXECUTE PROCEDURE tg()

这种方法禁用触发器实际上是改动pg_trigger.tgenabled

digoal=# \d pg_trigger
       Table "pg_catalog.pg_trigger"
     Column     |     Type     | Modifiers 
----------------+--------------+-----------
 tgrelid        | oid          | not null
 tgname         | name         | not null
 tgfoid         | oid          | not null
 tgtype         | smallint     | not null
 tgenabled      | "char"       | not null
 tgisinternal   | boolean      | not null
 tgconstrrelid  | oid          | not null
 tgconstrindid  | oid          | not null
 tgconstraint   | oid          | not null
 tgdeferrable   | boolean      | not null
 tginitdeferred | boolean      | not null
 tgnargs        | smallint     | not null
 tgattr         | int2vector   | not null
 tgargs         | bytea        | not null
 tgqual         | pg_node_tree | 
Indexes:
    "pg_trigger_oid_index" UNIQUE, btree (oid)
    "pg_trigger_tgrelid_tgname_index" UNIQUE, btree (tgrelid, tgname)
    "pg_trigger_tgconstraint_index" btree (tgconstraint)

digoal=# insert into tab values (2);
INSERT 0 1
digoal=# alter table tab enable trigger tg;
ALTER TABLE
digoal=# insert into tab values (2);
NOTICE:  trigged
INSERT 0 1

赋予普通用户alter table enable|disable trigger的权限

digoal=# grant trigger on table tab to digoal;
GRANT
digoal=# \c digoal digoal
digoal=# alter table tab disable trigger tg;
ALTER TABLE

↧

PostgreSQL 物联网黑科技 - 阅后即焚

April 13, 2016, 7:21 pm

≫ Next: PostgreSQL 物联网黑科技 - 瘦身500倍的索引(范围索引)

≪ Previous: EDB xDB need pg_authid.rolcatupdate ?

在物联网应用场景中，有大量的传感器，会产生非常大量的消息以极高的并发进入数据库。
这些数据如果直接进入面向OLAP场景设计的数据仓库，数据实时入库会成为瓶颈，并且OLAP系统很难接受非常高并发的请求。
面对这样的应用场景，这些既要又要还要怎么满足呢？
.1. 既要实时入库，
.2. 又要实时分析，
.3. 还要历史留档，应对随时变化的分析需求。

实时入库比较容易满足，我前些天写过一篇 "PostgreSQL 如何潇洒的处理每天上百TB的数据增量"
https://yq.aliyun.com/articles/8528

实时分析也比较好满足，我前些天写过一篇 "PostgreSQL "物联网"应用 - 1 实时流式数据处理案例(万亿每天)"
https://yq.aliyun.com/articles/166

历史留档，应对随时变化的分析需求。这一点的需求其实也非常简单，其实就是在满足了前面两点后，把数据LOAD到OLAP系统。
但是不要小看这个非常简单的操作，做到实时性，一致性是非常关键的。
一般的做法存在的gap问题(一致性问题)
GAP问题可解，例如通过快照或者单线程来解，太low了。
以前写过关于解GAP问题的一系列文章：
.1. http://blog.163.com/digoal@126/blog/static/163877040201331252945440/
.2. http://blog.163.com/digoal@126/blog/static/16387704020133151402415/
.3. http://blog.163.com/digoal@126/blog/static/16387704020133155179877/
.4. http://blog.163.com/digoal@126/blog/static/16387704020133156636579/
.5. http://blog.163.com/digoal@126/blog/static/16387704020133218305242/
.6. http://blog.163.com/digoal@126/blog/static/16387704020133224161563/
.7. http://blog.163.com/digoal@126/blog/static/16387704020133271134563/
.8. http://blog.163.com/digoal@126/blog/static/16387704020134311144755/
GAP问题出现的原因,用一张图来表示：

简单来说，就是读取数据的事务快照把一些未提交，但是序列或时间靠前的记录屏蔽了。下次再读取时就会产生GAP，实时性越高，产生GAP的概率越高。有GAP，OLTP和OLAP系统的数据就会不一致。
传统的解决这个问题的办法：
.1. 延迟同步，例如同步一个小时前的数据，来减少GAP。
.2. 串行插入，数据串行插入，不存在GAP。
.3. 在记录中添加一个XID字段，记录数据插入的事务号；读取数据时通过事务快照，记录未提交的事务XID；下次再次读取数据时，根据快照中表示未结束事务的XID，以及行上的XID找到这些GAP记录。
不用多说，前面几种方法，都有一定的弊端。
要解决实时性问题，又要高逼格。
PostgreSQL的阅后即焚完美的解决了以上问题，可以完美的实现并发性，一致性，实时性。

并发指并发的插入和并发的读取；
一致性指数据进去N条，出去一定是N条；
实时性，指数据可以实时|流式的取走，不需要设间隔；

阅后即焚的语法很简单，例子：

postgres=# create table tbl(id serial, crt_time timestamp, info jsonb default '
{
  "k1": "v1", 
  "k2": "v2", 
  "k3": "v3", 
  "k4": {
         "subk1": "subv1", 
         "subk2": "subv2", 
         "subk3": {
                   "ssubk1": "ssubv1"
                }
      }
}
');
postgres=# insert into tbl (crt_time) select clock_timestamp() from generate_series(1,1000);
INSERT 0 1000
postgres=# select * from tbl limit 1;
 id |          crt_time          |                                                      info                                                       
----+----------------------------+-----------------------------------------------------------------------------------------------------------------
  1 | 2016-04-13 15:02:06.603235 | {"k1": "v1", "k2": "v2", "k3": "v3", "k4": {"subk1": "subv1", "subk2": "subv2", "subk3": {"ssubk1": "ssubv1"}}}
(1 row)

postgres=# select * from tbl limit 5;
 id |          crt_time          |                                                      info                                                       
----+----------------------------+-----------------------------------------------------------------------------------------------------------------
  1 | 2016-04-13 15:02:06.603235 | {"k1": "v1", "k2": "v2", "k3": "v3", "k4": {"subk1": "subv1", "subk2": "subv2", "subk3": {"ssubk1": "ssubv1"}}}
  2 | 2016-04-13 15:02:06.60337  | {"k1": "v1", "k2": "v2", "k3": "v3", "k4": {"subk1": "subv1", "subk2": "subv2", "subk3": {"ssubk1": "ssubv1"}}}
  3 | 2016-04-13 15:02:06.603375 | {"k1": "v1", "k2": "v2", "k3": "v3", "k4": {"subk1": "subv1", "subk2": "subv2", "subk3": {"ssubk1": "ssubv1"}}}
  4 | 2016-04-13 15:02:06.603378 | {"k1": "v1", "k2": "v2", "k3": "v3", "k4": {"subk1": "subv1", "subk2": "subv2", "subk3": {"ssubk1": "ssubv1"}}}
  5 | 2016-04-13 15:02:06.603379 | {"k1": "v1", "k2": "v2", "k3": "v3", "k4": {"subk1": "subv1", "subk2": "subv2", "subk3": {"ssubk1": "ssubv1"}}}
(5 rows)

阅后即焚：

postgres=# delete from tbl where id<=5 returning *;
 id |          crt_time          |                                                      info                                                       
----+----------------------------+-----------------------------------------------------------------------------------------------------------------
  1 | 2016-04-13 15:02:06.603235 | {"k1": "v1", "k2": "v2", "k3": "v3", "k4": {"subk1": "subv1", "subk2": "subv2", "subk3": {"ssubk1": "ssubv1"}}}
  2 | 2016-04-13 15:02:06.60337  | {"k1": "v1", "k2": "v2", "k3": "v3", "k4": {"subk1": "subv1", "subk2": "subv2", "subk3": {"ssubk1": "ssubv1"}}}
  3 | 2016-04-13 15:02:06.603375 | {"k1": "v1", "k2": "v2", "k3": "v3", "k4": {"subk1": "subv1", "subk2": "subv2", "subk3": {"ssubk1": "ssubv1"}}}
  4 | 2016-04-13 15:02:06.603378 | {"k1": "v1", "k2": "v2", "k3": "v3", "k4": {"subk1": "subv1", "subk2": "subv2", "subk3": {"ssubk1": "ssubv1"}}}
  5 | 2016-04-13 15:02:06.603379 | {"k1": "v1", "k2": "v2", "k3": "v3", "k4": {"subk1": "subv1", "subk2": "subv2", "subk3": {"ssubk1": "ssubv1"}}}
(5 rows)
DELETE 5

postgres=# select count(*) from tbl where id<=5;
 count 
-------
     0
(1 row)

下面进行并发测试，
验证一致性，实时性，并发性：

postgres=# create table tbl(id serial, crt_time timestamp, info jsonb default '
{
  "k1": "v1", 
  "k2": "v2", 
  "k3": "v3", 
  "k4": {
         "subk1": "subv1", 
         "subk2": "subv2", 
         "subk3": {
                   "ssubk1": "ssubv1"
                }
      }
}
');

create index idx_tbl_1 on tbl(crt_time);

create table tbl1(like tbl including all);

create or replace function r_d(lmt int) returns setof tbl as $$
declare
  curs1 cursor for select * from tbl order by crt_time limit lmt for update SKIP LOCKED;
begin
  for res in curs1 loop
      delete from tbl where current of curs1;
      return next res;
  end loop;
  return;
end;
$$ language plpgsql;

并发插入2小时：

vi ins.sql
insert into tbl (crt_time) select clock_timestamp() from generate_series(1,5000);

pgbench -M prepared -n -r -P 5 -f ./ins.sql -c 64 -j 64 -T 7200 &

并发阅后即焚2小时：

vi r_d.sql
insert into tbl1 select * from r_d(100000);

pgbench -M prepared -n -r -P 5 -f ./r_d.sql -c 64 -j 64 -T 7200 &

验证插入和阅后即焚的记录数一致。

性能指标(64张表并发测试写入和阅后即焚的性能指标)：
插入: 230万行/s
阅后即焚: 384万行/s

这种技术在其他应用场景的使用：
.1. 延迟确认，在短信确认的应用中非常常见，如订阅一个运营商的业务，一般会收到二次确认的短信。
服务端会向数据库插入一条记录，然后等待用户反馈，反馈后更新之前插入的那条记录的状态。

insert into tbl values () returning id;
commit;

     then wait user's response

update tbl set ... where id=xxx;
commimt;

.2. 相关的用法(oracle也支持这种用法) :

insert into tbl values () returning *;
delete from tbl where ... returning *;
update tbl set xxx=xxx where xxx returning *;

skip locked;  -- oracle 11G以后也支持这种用法

扩展阅读：
"PostgreSQL 如何潇洒的处理每天上百TB的数据增量"
https://yq.aliyun.com/articles/8528
"PostgreSQL "物联网"应用 - 1 实时流式数据处理案例(万亿每天)"
https://yq.aliyun.com/articles/166

PostgreSQL的其他特性也非常的适合物联网：
JSON支持, GIS支持, 窗口查询, 树形查询, 轻数据分析, 范围类型, 范围索引等等。

↧

PostgreSQL 物联网黑科技 - 瘦身500倍的索引(范围索引)

April 13, 2016, 7:24 pm

≫ Next: MySQL 5.7 新特性 generated columns

≪ Previous: PostgreSQL 物联网黑科技 - 阅后即焚

在数据库中用得最多的当属btree索引，除了BTREE，一般的数据库可能还支持hash, bitmap索引。
但是这些索引到了物联网，会显得太重，对性能的损耗太大。
为什么呢？
物联网有大量的数据产生和入库，入库基本都是流式的。在使用这些数据时，基本是FIFO，或者范围查询的批量数据使用风格。
btree索引太重，因为索引需要存储每条记录的索引字段的值和寻址，使得索引非常庞大。
另一方面，物联网的大量范围查询和批量处理用法决定了它不需要这么重的索引。
例子：
如下所示，btree索引的空间占比是非常大的。

postgres=# \dt+ tab
                    List of relations
 Schema | Name | Type  |  Owner   |  Size   | Description 
--------+------+-------+----------+---------+-------------
 public | tab  | table | postgres | 3438 MB | 
(1 row)

postgres=# \di+ idx_tab_id
                           List of relations
 Schema |    Name    | Type  |  Owner   | Table |  Size   | Description 
--------+------------+-------+----------+-------+---------+-------------
 public | idx_tab_id | index | postgres | tab   | 2125 MB | 
(1 row)

除了大以外，btree索引同时也会影响数据的更新，删除，或插入的性能。
例子：
有btree索引, 每秒入库28.45万行

postgres=# create unlogged table tab(id serial8, info text, crt_time timestamp);
CREATE TABLE
postgres=# create index idx_tab_id on tab(id);
CREATE INDEX
vi test.sql
insert into tab (info) select '' from generate_series(1,10000);

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 48 -j 48 -T 100
tps = 28.453983 (excluding connections establishing)

无索引, 每秒入库66.88万行

postgres=# drop index idx_tab_id ;
DROP INDEX

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 48 -j 48 -T 100
tps = 66.880260 (excluding connections establishing)

从上面的介绍和测试数据，可以明显的看出btree索引存在的问题：
体积大，影响性能。

接下来该让PostgreSQL黑科技登场了：
范围索引，术语brin, block range index.
范围索引的原理，存储连续相邻的BLOCK的统计信息（min(val), max(val), has null? all null? left block id, right block id )。
例如一个表占用10000个BLOCK，创建brin index 时，指定统计每128个BLOCK的统计信息，那么这个索引只需要存储79份统计信息。

空间占用非常的小。

解决了空间的问题，还需要解决性能的问题，我们测试一下，在创建了brin索引后，插入的性能有多少？
范围索引, 每秒入库62.84万行

postgres=# drop index idx_tab_id ;
DROP INDEX
postgres=# create index idx_tab_id on tab using brin (id) with (pages_per_range=1);
CREATE INDEX

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 48 -j 48 -T 100
tps = 62.838701 (excluding connections establishing)

最后还需要对比一下 btree, brin 索引的大小，还有查询的性能。
索引大小比拼：
表 4163MB
btree索引 2491 MB
brin索引 4608 kB

postgres=# \di+ idx_tab_btree_id 
                              List of relations
 Schema |       Name       | Type  |  Owner   | Table |  Size   | Description 
--------+------------------+-------+----------+-------+---------+-------------
 public | idx_tab_btree_id | index | postgres | tab   | 2491 MB | 
(1 row)

postgres=# \di+ idx_tab_id
                           List of relations
 Schema |    Name    | Type  |  Owner   | Table |  Size   | Description 
--------+------------+-------+----------+-------+---------+-------------
 public | idx_tab_id | index | postgres | tab   | 4608 kB | 
(1 row)

postgres=# \dt+ tab
                    List of relations
 Schema | Name | Type  |  Owner   |  Size   | Description 
--------+------+-------+----------+---------+-------------
 public | tab  | table | postgres | 4163 MB | 
(1 row)

查询性能比拼 :
范围查询
全表扫描 11 秒
范围索引 64 毫秒
btree索引 24 毫秒

postgres=# /*+ seqscan(tab) */ explain (analyze,buffers,timing,costs,verbose) select count(*) from tab where id between 1 and 100000;
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1891578.12..1891578.13 rows=1 width=0) (actual time=11353.057..11353.058 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=133202
   ->  Seq Scan on public.tab  (cost=0.00..1891352.00 rows=90447 width=0) (actual time=1660.445..11345.123 rows=100000 loops=1)
         Output: id, info, crt_time
         Filter: ((tab.id >= 1) AND (tab.id <= 100000))
         Rows Removed by Filter: 117110000
         Buffers: shared hit=133202
 Planning time: 0.048 ms
 Execution time: 11353.080 ms
(10 rows)

postgres=# /*+ bitmapscan(tab idx_tab_id) */ explain (analyze,buffers,timing,costs,verbose) select count(*) from tab where id between 1 and 100000;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=70172.91..70172.92 rows=1 width=0) (actual time=63.735..63.735 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=298
   ->  Bitmap Heap Scan on public.tab  (cost=1067.08..69946.79 rows=90447 width=0) (actual time=40.700..55.868 rows=100000 loops=1)
         Output: id, info, crt_time
         Recheck Cond: ((tab.id >= 1) AND (tab.id <= 100000))
         Rows Removed by Index Recheck: 893
         Heap Blocks: lossy=111
         Buffers: shared hit=298
         ->  Bitmap Index Scan on idx_tab_id  (cost=0.00..1044.47 rows=90447 width=0) (actual time=40.675..40.675 rows=1110 loops=1)
               Index Cond: ((tab.id >= 1) AND (tab.id <= 100000))
               Buffers: shared hit=187
 Planning time: 0.049 ms
 Execution time: 63.755 ms
(14 rows)

postgres=# /*+ bitmapscan(tab idx_tab_btree_id) */ explain (analyze,buffers,timing,costs,verbose) select count(*) from tab where id between 1 and 100000;
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=76817.88..76817.89 rows=1 width=0) (actual time=23.780..23.780 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=181
   ->  Bitmap Heap Scan on public.tab  (cost=1118.87..76562.16 rows=102286 width=0) (actual time=6.569..15.950 rows=100000 loops=1)
         Output: id, info, crt_time
         Recheck Cond: ((tab.id >= 1) AND (tab.id <= 100000))
         Heap Blocks: exact=111
         Buffers: shared hit=181
         ->  Bitmap Index Scan on idx_tab_btree_id  (cost=0.00..1093.30 rows=102286 width=0) (actual time=6.530..6.530 rows=100000 loops=1)
               Index Cond: ((tab.id >= 1) AND (tab.id <= 100000))
               Buffers: shared hit=70
 Planning time: 0.099 ms
 Execution time: 23.798 ms
(13 rows)

精确查询
全表扫描 8 秒
范围索引 39 毫秒
btree索引 0.03 毫秒

postgres=# /*+ seqscan(tab) */ explain (analyze,buffers,timing,costs,verbose) select count(*) from tab where id=100000;
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1598327.00..1598327.01 rows=1 width=0) (actual time=8297.589..8297.589 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=133202
   ->  Seq Scan on public.tab  (cost=0.00..1598327.00 rows=2 width=0) (actual time=1221.359..8297.582 rows=1 loops=1)
         Output: id, info, crt_time
         Filter: (tab.id = 100000)
         Rows Removed by Filter: 117209999
         Buffers: shared hit=133202
 Planning time: 0.113 ms
 Execution time: 8297.619 ms
(10 rows)

postgres=# /*+ bitmapscan(tab idx_tab_id) */ explain (analyze,buffers,timing,costs,verbose) select count(*) from tab where id=100000;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=142.04..142.05 rows=1 width=0) (actual time=38.498..38.498 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=189
   ->  Bitmap Heap Scan on public.tab  (cost=140.01..142.04 rows=2 width=0) (actual time=38.432..38.495 rows=1 loops=1)
         Output: id, info, crt_time
         Recheck Cond: (tab.id = 100000)
         Rows Removed by Index Recheck: 1811
         Heap Blocks: lossy=2
         Buffers: shared hit=189
         ->  Bitmap Index Scan on idx_tab_id  (cost=0.00..140.01 rows=2 width=0) (actual time=38.321..38.321 rows=20 loops=1)
               Index Cond: (tab.id = 100000)
               Buffers: shared hit=187
 Planning time: 0.102 ms
 Execution time: 38.531 ms
(14 rows)

postgres=# /*+ indexscan(tab idx_tab_btree_id) */ explain (analyze,buffers,timing,costs,verbose) select count(*) from tab where id=100000;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2.76..2.77 rows=1 width=0) (actual time=0.018..0.018 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=4
   ->  Index Scan using idx_tab_btree_id on public.tab  (cost=0.44..2.76 rows=2 width=0) (actual time=0.015..0.016 rows=1 loops=1)
         Output: id, info, crt_time
         Index Cond: (tab.id = 100000)
         Buffers: shared hit=4
 Planning time: 0.049 ms
 Execution time: 0.036 ms
(9 rows)

对比图 :

小结：
.1. 范围索引重点的使用场景是物联网类型的，流式入库，范围查询的场景。不仅仅对插入的影响微乎其微，而且索引大小非常的小，范围查询的性能和BTREE差别微乎其微。
.2. 结合JSON和GIS功能，相信PostgreSQL会在物联网大放异彩。
ps: oracle 也有与之类似的索引，名为storage index. 但是只有Exadata产品里有，贵得离谱，屌丝绕道。哈哈。
https://docs.oracle.com/cd/E50790_01/doc/doc.121/e50471/concepts.htm#SAGUG20984

DBA应该具备抓住各种数据库的特性，并且将这种特性应用到适合的场景中去的能力。数据库与DBA的角色用千里马和伯乐来形容好像也不为过。
小伙伴们一起来玩PG吧，社区正在推Oracle DBA 7天速成PG的教程，敬请期待。

↧

MySQL 5.7 新特性 generated columns

April 16, 2016, 1:57 am

≫ Next: Greenplum的全量备份介绍, gpcrondump

≪ Previous: PostgreSQL 物联网黑科技 - 瘦身500倍的索引(范围索引)

MySQL 5.7的一个新特性，generated column
http://dev.mysql.com/doc/refman/5.7/en/create-table.html#create-table-generated-columns
即generated column的值是普通column产生值，有点像视图，但是又有别于视图，因为它可以选择是否存储generated column产生的值。

CREATE TABLE triangle (
  sidea DOUBLE,
  sideb DOUBLE,
  sidec DOUBLE AS (SQRT(sidea * sidea + sideb * sideb))
);
INSERT INTO triangle (sidea, sideb) VALUES(1,1),(3,4),(6,8);
mysql> SELECT * FROM triangle;
+-------+-------+--------------------+
| sidea | sideb | sidec              |
+-------+-------+--------------------+
|     1 |     1 | 1.4142135623730951 |
|     3 |     4 |                  5 |
|     6 |     8 |                 10 |
+-------+-------+--------------------+
col_name data_type [GENERATED ALWAYS] AS (expression)
  [VIRTUAL | STORED] [UNIQUE [KEY]] [COMMENT comment]
  [[NOT] NULL] [[PRIMARY] KEY]

virtual不存储值，stored存储值（并支持索引）。
但是MySQL这个特性貌似用处并不大，例如要参与计算的行只能是当前行。
在物联网中，可能存在类似的需求，但是一般会要求参与计算的行是相邻的N行，或者有规则可寻的N行。例如按照相邻的5行计算平均值，最大值，最小值，方差。
MySQL 满足不了这样的需求。

在PostgreSQL中，这不是什么新鲜概念，而且支持得更彻底。
例子：
对应 mysql vitrual generated column

postgres=# create table test(c1 int, c2 int);
CREATE TABLE
postgres=# create view v_test as select c1,c2,sqrt(c1*c2+c1*c2) from test;
CREATE VIEW
postgres=# insert into test values (1,2),(10,20);
INSERT 0 2
postgres=# select * from v_test;
 c1 | c2 | sqrt 
----+----+------
  1 |  2 |    2
 10 | 20 |   20
(2 rows)

对应 mysql stored generated column

postgres=# create materialized view v_test1 as select c1,c2,sqrt(c1*c2+c1*c2) from test;
SELECT 2
postgres=# select * from v_test1;
 c1 | c2 | sqrt 
----+----+------
  1 |  2 |    2
 10 | 20 |   20
(2 rows)

还有一个更适合物联网场景的，流式处理 :

pipeline=# create stream s1(c1 int, c2 int);
CREATE STREAM
pipeline=# create continuous view test as select c1,c2,sqrt(c1*c1+c2*c2) from s1;
CREATE CONTINUOUS VIEW
pipeline=# activate;
ACTIVATE
pipeline=# insert into s1 values (1,2),(10,20);
INSERT 0 2
pipeline=# select * from test;
 c1 | c2 |       sqrt       
----+----+------------------
  1 |  2 | 2.23606797749979
 10 | 20 | 22.3606797749979
(2 rows)

流式处理加窗口和实时聚合 :

pipeline=# create continuous view test1 as select c1,count(*) over(partition by c1) from s1 ;
CREATE CONTINUOUS VIEW
pipeline=# create continuous view test2 as select c2,count(*) over w from s1 window w as(partition by c2);
CREATE CONTINUOUS VIEW
pipeline=# insert into s1 values (1,2);
INSERT 0 1
pipeline=# select * from test1;
 c1 | count 
----+-------
  1 |     1
(1 row)

pipeline=# select * from test2;
 c2 | count 
----+-------
  2 |     1
(1 row)

实时分析每个URL的访问次数，用户数，99%用户的访问延迟低于多少。

/*   
 * This function will strip away any query parameters from each url,  
 * as we're not interested in them.  
 */  
CREATE FUNCTION url(raw text, regex text DEFAULT '\?.*', replace text DEFAULT '')  
    RETURNS text  
AS 'textregexreplace_noopt'    -- textregexreplace_noopt@src/backend/utils/adt/regexp.c  
LANGUAGE internal;  

CREATE CONTINUOUS VIEW url_stats AS  
    SELECT  
        url, -- url地址  
    percentile_cont(0.99) WITHIN GROUP (ORDER BY latency_ms) AS p99,  -- 99%的URL访问延迟小于多少  
        count(DISTINCT user) AS uniques,  -- 唯一用户数  
    count(*) total_visits  -- 总共访问次数  
  FROM  
    (SELECT   
        url(payload->>'url'),  -- 地址  
        payload->>'user' AS user,  -- 用户ID  
        (payload->>'latency')::float * 1000 AS latency_ms,  -- 访问延迟  
        arrival_timestamp  
    FROM logs_stream) AS unpacked  
WHERE arrival_timestamp > clock_timestamp() - interval '1 day'  
 GROUP BY url;  

CREATE CONTINUOUS VIEW user_stats AS  
    SELECT  
        day(arrival_timestamp),  
        payload->>'user' AS user,  
        sum(CASE WHEN payload->>'url' LIKE '%landing_page%' THEN 1 ELSE 0 END) AS landings,  
        sum(CASE WHEN payload->>'url' LIKE '%conversion%' THEN 1 ELSE 0 END) AS conversions,  
        count(DISTINCT url(payload->>'url')) AS unique_urls,  
        count(*) AS total_visits  
    FROM logs_stream GROUP BY payload->>'user', day;  

-- What are the top-10 most visited urls?  
SELECT url, total_visits FROM url_stats ORDER BY total_visits DESC limit 10;  
      url      | total_visits   
---------------+--------------  
 /page62/path4 |        10182  
 /page51/path4 |        10181  
 /page24/path5 |        10180  
 /page93/path3 |        10180  
 /page81/path0 |        10180  
 /page2/path5  |        10180  
 /page75/path2 |        10179  
 /page28/path3 |        10179  
 /page40/path2 |        10178  
 /page74/path0 |        10176  
(10 rows)  


-- What is the 99th percentile latency across all urls?  
SELECT combine(p99) FROM url_stats;  
     combine        
------------------  
 6.95410494731137  
(1 row)  

-- What is the average conversion rate each day for the last month?  
SELECT day, avg(conversions / landings) FROM user_stats GROUP BY day;  
          day           |            avg               
------------------------+----------------------------  
 2015-09-15 00:00:00-07 | 1.7455000000000000000000000  
(1 row)  

-- How many unique urls were visited each day for the last week?  
SELECT day, combine(unique_urls) FROM user_stats WHERE day > now() - interval '1 week' GROUP BY day;  
          day           | combine   
------------------------+---------  
 2015-09-15 00:00:00-07 |  100000  
(1 row)  

-- Is there a relationship between the number of unique urls visited and the highest conversion rates?  
SELECT unique_urls, sum(conversions) / sum(landings) AS conversion_rate FROM user_stats  
    GROUP BY unique_urls ORDER BY conversion_rate DESC LIMIT 10;  
 unique_urls |  conversion_rate    
-------------+-------------------  
          41 |  2.67121005785842  
          36 |  2.02713894173361  
          34 |  2.02034637010851  
          31 |  2.01958418072859  
          27 |  2.00045348712296  
          24 |  1.99714899522942  
          19 |  1.99438839453606  
          16 |  1.98083502184886  
          15 |  1.87983011139079  
          14 |  1.84906254929873  
(1 row)

↧

Greenplum的全量备份介绍, gpcrondump

April 16, 2016, 1:58 am

≫ Next: Greenplum的全量恢复介绍, gpdbrestore

≪ Previous: MySQL 5.7 新特性 generated columns

本节介绍一下Greenplum的全量备份。
全量备份脚本

#!/bin/bash
GPHOME=/home/digoal/gphome

# Replace with symlink path if it is present and correct
if [ -h ${GPHOME}/../greenplum-db ]; then
    GPHOME_BY_SYMLINK=`(cd ${GPHOME}/../greenplum-db/ && pwd -P)`
    if [ x"${GPHOME_BY_SYMLINK}" = x"${GPHOME}" ]; then
        GPHOME=`(cd ${GPHOME}/../greenplum-db/ && pwd -L)`/.
    fi
    unset GPHOME_BY_SYMLINK
fi
#setup PYTHONHOME
if [ -x $GPHOME/ext/python/bin/python ]; then
    PYTHONHOME="$GPHOME/ext/python"
fi
PYTHONPATH=$GPHOME/lib/python
PATH=$GPHOME/bin:$PYTHONHOME/bin:$PATH
LD_LIBRARY_PATH=$GPHOME/lib:$PYTHONHOME/lib:$LD_LIBRARY_PATH
OPENSSL_CONF=$GPHOME/etc/openssl.cnf

export GPHOME
export PATH
export LD_LIBRARY_PATH
export PYTHONPATH
export PYTHONHOME
export OPENSSL_CONF

export MASTER_DATA_DIRECTORY=/data01/digoal/gpdatalocal/gpseg-1
export PGHOST=127.0.0.1
export PGPORT=1922
export PGUSER=digoal
export PGDATABASE=postgres
export PGPASSWORD=digoal


backupdir="/data01/digoal/gpbackup"
logdir=$backupdir
masterdir="/data01/digoal/gpdatalocal/gpseg-1"
dbid="digoal"

dat=`psql -A -q -t -h $PGHOST -p $PGPORT -U $PGUSER -c "select ' -x '||string_agg(datname, ' -x ') from pg_database where datname <>'template0'"`
gpcrondump -a -C --dump-stats -g -G -h -r --use-set-session-authorization $dat -u $backupdir --prefix $dbid -l $logdir -d $masterdir

或者

backupdir="/data01/digoal/gpbackup"
logdir=$backupdir
masterdir="/data01/digoal/gpdatalocal/gpseg-1"
dbid="digoal"

for dbname in `psql -A -q -t -h $PGHOST -p $PGPORT -U $PGUSER -c "select datname from pg_database where datname <>'template0'"`
do
now=`date +%Y%m%d%H%M%S`
gpcrondump -a -C --dump-stats -g -G -h -r --use-set-session-authorization -x $dbname -u $backupdir --prefix $dbid -l $logdir -d $masterdir -K $now
done

gpcrondump会检查-K提供的时间戳，如果该时间对应的YYYYMMDD目录中存在比这个时间更未来的备份，则报错。因此，不同的数据库不能使用同一个时间戳来备份。

$for dbname in `psql -A -q -t -h $PGHOST -p $PGPORT -U $PGUSER -c "select datname from pg_database where datname <>'template0'"`
> do
> gpcrondump -a -C --dump-stats -g -G -h -r --use-set-session-authorization -x $dbname -u $backupdir --prefix $dbid -l $logdir -d $masterdir -K $now
> done
20160416:17:25:55:016061 gpcrondump:db153175032:digoal-[INFO]:-Starting gpcrondump with args: -a -C --dump-stats -g -G -h -r --use-set-session-authorization -x digoal -u /data01/digoal/gpbackup --prefix digoal -l /data01/digoal/gpbackup -d /data01/digoal/gpdatalocal/gpseg-1 -K 20160416171907
20160416:17:25:55:016061 gpcrondump:db153175032:digoal-[CRITICAL]:-gpcrondump failed. (Reason='There is a future dated backup on the system preventing new backups') exiting...
20160416:17:25:55:016151 gpcrondump:db153175032:digoal-[INFO]:-Starting gpcrondump with args: -a -C --dump-stats -g -G -h -r --use-set-session-authorization -x template1 -u /data01/digoal/gpbackup --prefix digoal -l /data01/digoal/gpbackup -d /data01/digoal/gpdatalocal/gpseg-1 -K 20160416171907
20160416:17:25:55:016151 gpcrondump:db153175032:digoal-[CRITICAL]:-gpcrondump failed. (Reason='There is a future dated backup on the system preventing new backups') exiting...
20160416:17:25:55:016241 gpcrondump:db153175032:digoal-[INFO]:-Starting gpcrondump with args: -a -C --dump-stats -g -G -h -r --use-set-session-authorization -x postgres -u /data01/digoal/gpbackup --prefix digoal -l /data01/digoal/gpbackup -d /data01/digoal/gpdatalocal/gpseg-1 -K 20160416171907
20160416:17:25:55:016241 gpcrondump:db153175032:digoal-[CRITICAL]:-gpcrondump failed. (Reason='There is a future dated backup on the system preventing new backups') exiting...
20160416:17:25:55:016331 gpcrondump:db153175032:digoal-[INFO]:-Starting gpcrondump with args: -a -C --dump-stats -g -G -h -r --use-set-session-authorization -x db2 -u /data01/digoal/gpbackup --prefix digoal -l /data01/digoal/gpbackup -d /data01/digoal/gpdatalocal/gpseg-1 -K 20160416171907
20160416:17:25:55:016331 gpcrondump:db153175032:digoal-[CRITICAL]:-gpcrondump failed. (Reason='There is a future dated backup on the system preventing new backups') exiting...
20160416:17:25:55:016421 gpcrondump:db153175032:digoal-[INFO]:-Starting gpcrondump with args: -a -C --dump-stats -g -G -h -r --use-set-session-authorization -x db3 -u /data01/digoal/gpbackup --prefix digoal -l /data01/digoal/gpbackup -d /data01/digoal/gpdatalocal/gpseg-1 -K 20160416171907
20160416:17:25:55:016421 gpcrondump:db153175032:digoal-[CRITICAL]:-gpcrondump failed. (Reason='There is a future dated backup on the system preventing new backups') exiting...
20160416:17:25:56:016511 gpcrondump:db153175032:digoal-[INFO]:-Starting gpcrondump with args: -a -C --dump-stats -g -G -h -r --use-set-session-authorization -x db1 -u /data01/digoal/gpbackup --prefix digoal -l /data01/digoal/gpbackup -d /data01/digoal/gpdatalocal/gpseg-1 -K 20160416171907
20160416:17:25:56:016511 gpcrondump:db153175032:digoal-[CRITICAL]:-gpcrondump failed. (Reason='There is a future dated backup on the system preventing new backups') exiting...

备份日志输出到

/data01/digoal/gpbackup

备份数据，自动生成子目录，输出到

/data01/digoal/gpbackup/db_dumps/$YYYYMMDD

每个数据库中都会记录对应数据库的备份历史信息。

postgres=# select * from gpcrondump_history ;
-[ RECORD 14 ]-----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
rec_date           | 2016-04-16 15:37:02.364166
start_time         | 15:33:19
end_time           | 15:37:00
options            | -a --dump-stats -g -G -h -r -x digoal -x template1 -x postgres -x db2 -x db3 -x db1 -u /data01/digoal/gpbackup --prefix digoal -l /data01/digoal/gpbackup -d /data01/digoal/gpdatalocal/gpseg-1
dump_key           | 20160416153319    -- 备份开始时间戳, 使用gpdbrestore进行恢复时，要用到这个KEY
dump_exit_status   | 0
script_exit_status | 0
exit_text          | COMPLETED

在gpcrondump标准输出的信息中，也包含了dump key

20160416:16:37:00:020201 gpcrondump:db153175032:digoal-[INFO]:-Target database                          = digoal
20160416:16:37:00:020201 gpcrondump:db153175032:digoal-[INFO]:-Dump subdirectory                        = 20160416
20160416:16:37:00:020201 gpcrondump:db153175032:digoal-[INFO]:-Dump type                                = Full database
20160416:16:37:00:020201 gpcrondump:db153175032:digoal-[INFO]:-Clear old dump directories               = Off
20160416:16:37:00:020201 gpcrondump:db153175032:digoal-[INFO]:-Dump start time                          = 16:36:55
20160416:16:37:00:020201 gpcrondump:db153175032:digoal-[INFO]:-Dump end time                            = 16:36:59
20160416:16:37:00:020201 gpcrondump:db153175032:digoal-[INFO]:-Status                                   = COMPLETED
20160416:16:37:00:020201 gpcrondump:db153175032:digoal-[INFO]:-Dump key                                 = 20160416163655
20160416:16:37:00:020201 gpcrondump:db153175032:digoal-[INFO]:-Dump file compression                    = On
20160416:16:37:00:020201 gpcrondump:db153175032:digoal-[INFO]:-Vacuum mode type                         = Off
20160416:16:37:00:020201 gpcrondump:db153175032:digoal-[INFO]:-Exit code zero, no warnings generated

以上digoal库，对应的备份文件：

$cd /data01/digoal/gpbackup/db_dumps/20160416

$ll *20160416163655*
-rw------- 1 digoal users  113 Apr 16 16:36 digoal_gp_cdatabase_1_1_20160416163655
-rw------- 1 digoal users 3.2K Apr 16 16:36 digoal_gp_dump_0_2_20160416163655.gz
-rw------- 1 digoal users 3.3K Apr 16 16:36 digoal_gp_dump_0_3_20160416163655.gz
-rw------- 1 digoal users 3.4K Apr 16 16:36 digoal_gp_dump_0_4_20160416163655.gz
-rw------- 1 digoal users 3.3K Apr 16 16:36 digoal_gp_dump_0_5_20160416163655.gz
-rw------- 1 digoal users 3.4K Apr 16 16:36 digoal_gp_dump_0_6_20160416163655.gz
-rw------- 1 digoal users 3.3K Apr 16 16:36 digoal_gp_dump_0_7_20160416163655.gz
-rw------- 1 digoal users 3.4K Apr 16 16:36 digoal_gp_dump_0_8_20160416163655.gz
-rw------- 1 digoal users 3.3K Apr 16 16:36 digoal_gp_dump_0_9_20160416163655.gz
-rw------- 1 digoal users  889 Apr 16 16:36 digoal_gp_dump_1_1_20160416163655.gz
-rw------- 1 digoal users  196 Apr 16 16:36 digoal_gp_dump_1_1_20160416163655_post_data.gz
-rw-r--r-- 1 digoal users    0 Apr 16 16:36 digoal_gp_dump_20160416163655_ao_state_file
-rw-r--r-- 1 digoal users    0 Apr 16 16:36 digoal_gp_dump_20160416163655_co_state_file
-rw-r--r-- 1 digoal users    0 Apr 16 16:36 digoal_gp_dump_20160416163655_last_operation
-rw-r--r-- 1 digoal users 2.3K Apr 16 16:36 digoal_gp_dump_20160416163655.rpt
-rw------- 1 digoal users 1.3K Apr 16 16:36 digoal_gp_dump_status_0_2_20160416163655
-rw------- 1 digoal users 1.3K Apr 16 16:36 digoal_gp_dump_status_0_3_20160416163655
-rw------- 1 digoal users 1.3K Apr 16 16:36 digoal_gp_dump_status_0_4_20160416163655
-rw------- 1 digoal users 1.3K Apr 16 16:36 digoal_gp_dump_status_0_5_20160416163655
-rw------- 1 digoal users 1.3K Apr 16 16:36 digoal_gp_dump_status_0_6_20160416163655
-rw------- 1 digoal users 1.3K Apr 16 16:36 digoal_gp_dump_status_0_7_20160416163655
-rw------- 1 digoal users 1.3K Apr 16 16:36 digoal_gp_dump_status_0_8_20160416163655
-rw------- 1 digoal users 1.3K Apr 16 16:36 digoal_gp_dump_status_0_9_20160416163655
-rw------- 1 digoal users 2.3K Apr 16 16:36 digoal_gp_dump_status_1_1_20160416163655
-rw-r--r-- 1 digoal users 1.0K Apr 16 16:37 digoal_gp_global_1_1_20160416163655
-rw-r--r-- 1 digoal users 4.8K Apr 16 16:36 digoal_gp_statistics_1_1_20160416163655

下一节讲一下增量备份.

Greenplum gpcrondump存在的BUG：
.1. 不支持指定备份用的超级用户名,默认在gpcrondump中会让pg_dump去调用OS对应的用户名.
.2. 备份language handler, create database的DDL时，没有使用双引号引用。如果用户名包含除小写字母和下划线以外的字符，在还原是会报错。
可能还有其他地方有类似的BUG。
.3. 不支持删除模板库，在使用gpdbrestore恢复时，如果使用了-e来清除库，会导致失败。
.4. copy需要大量的内存，可能触发OOM。
.5. 执行gpcrondump时，会使用getcwd获得当前目录，所以不能在一个不存在的目录环境下执行.

gpcrondump 以上用到的参数解释

**********************
Return Codes
**********************

The following is a list of the codes that gpcrondump returns.
   0 - Dump completed with no problems
   1 - Dump completed, but one or more warnings were generated
   2 - Dump failed with a fatal error

-a (do not prompt) 

 Do not prompt the user for confirmation. 

-d <master_data_directory> 

 The master host data directory. If not specified, the value set for 
 $MASTER_DATA_DIRECTORY will be used. 

--dump-stats

 Dump optimizer statistics from pg_statistic. Statistics are dumped in the
 master data directory to db_dumps/YYYYMMDD/gp_statistics_1_1_<timestamp>.

-g (copy config files) 

 Secure a copy of the master and segment configuration files 
 postgresql.conf, pg_ident.conf, and pg_hba.conf. These configuration 
 files are dumped in the master or segment data directory to 
 db_dumps/YYYYMMDD/config_files_<timestamp>.tar. 

 If --ddboost is specified, the backup is located on the default storage 
 unit in the directory specified by --ddboost-backupdir when the Data 
 Domain Boost credentials were set.

-G (dump global objects) 

 Use pg_dumpall to dump global objects such as roles and tablespaces. 
 Global objects are dumped in the master data directory to 
 db_dumps/YYYYMMDD/gp_global_1_1_<timestamp>. 

-h (record dump details) 

 Record details of database dump in database table 
 public.gpcrondump_history in database supplied via -x option. Utility 
 will create table if it does not currently exist. 

--incremental (backup changes to append-optimized tables)

 Adds an incremental backup to a backup set. When performing an 
 incremental backup, the complete backup set created prior to the 
 incremental backup must be available. The complete backup set includes 
 the following backup files: 

 * The last full backup before the current incremental backup 

 * All incremental backups created between the time of the full backup 
   the current incremental backup 

 An incremental backup is similar to a full back up except for 
 append-optimized tables, including column-oriented tables. An 
 append-optimized table is backed up only if at least one of the 
 following operations was performed on the table after the last backup. 
   ALTER TABLE 
   INSERT 
   UPDATE
   DELETE
   TRUNCATE 
   DROP and then re-create the table

 For partitioned append-optimized tables, only the changed table 
 partitions are backed up. 

 The -u option must be used consistently within a backup set that 
 includes a full and incremental backups. If you use the -u option with a 
 full backup, you must use the -u option when you create incremental 
 backups that are part of the backup set that includes the full backup. 

 You can create an incremental backup for a full backup of set of 
 database tables. When you create the full backup, specify the --prefix 
 option to identify the backup. To include a set of tables in the full 
 backup, use either the -t option or --table-file option. To exclude a 
 set of tables, use either the -T option or the --exclude-table-file 
 option. See the description of the option for more information on its 
 use. 

 To create an incremental backup based on the full backup of the set of 
 tables, specify the option --incremental and the --prefix option with 
 the string specified when creating the full backup. The incremental 
 backup is limited to only the tables in the full backup. 

 WARNING: gpcrondump does not check for available disk space prior to 
 performing an incremental backup.

 IMPORTANT: An incremental back up set, a full backup and associated 
 incremental backups, must be on a single device. For example, a the 
 backups in a backup set must all be on a file system or must all be on a 
 Data Domain system. 

--prefix <prefix_string> [--list-filter-tables ]

 Prepends <prefix_string> followed by an underscore character (_) to the 
 names of all the backup files created during a backup. 

-r (rollback on failure) 

 Rollback the dump files (delete a partial dump) if a failure is 
 detected. The default is to not rollback. 

-u <backup_directory> 

 Specifies the absolute path where the backup files will be placed on 
 each host. If the path does not exist, it will be created, if possible. 
 If not specified, defaults to the data directory of each instance to be 
 backed up. Using this option may be desirable if each segment host has 
 multiple segment instances as it will create the dump files in a 
 centralized location rather than the segment data directories. 

 Note: This option is not supported if --ddboost is specified. 

--use-set-session-authorization 

 Use SET SESSION AUTHORIZATION commands instead of ALTER OWNER commands 
 to set object ownership. 

-x <database_name> 

 Required. The name of the Greenplum database to dump. Specify multiple times for 
 multiple databases.

↧

Greenplum的全量恢复介绍, gpdbrestore

April 16, 2016, 4:45 am

≫ Next: PostgreSQL alter column type 1 to type 2 using express or auto cast

≪ Previous: Greenplum的全量备份介绍, gpcrondump

本节介绍一下Greenplum的全量恢复
恢复时需要指定dump key ( 即gpcrondump时，每个数据库备份都带的时间戳)
全量恢复需要考虑几个因素， DROP DATABASE, TRUNCATE TABLE, DROP TABLE.
这些和gpcrondump或者gpdbrestore的参数有关。
同时也关系到数据是否需要先被清除掉，然后从备份恢复。

#!/bin/bash
GPHOME=/home/digoal/gphome

# Replace with symlink path if it is present and correct
if [ -h ${GPHOME}/../greenplum-db ]; then
    GPHOME_BY_SYMLINK=`(cd ${GPHOME}/../greenplum-db/ && pwd -P)`
    if [ x"${GPHOME_BY_SYMLINK}" = x"${GPHOME}" ]; then
        GPHOME=`(cd ${GPHOME}/../greenplum-db/ && pwd -L)`/.
    fi
    unset GPHOME_BY_SYMLINK
fi
#setup PYTHONHOME
if [ -x $GPHOME/ext/python/bin/python ]; then
    PYTHONHOME="$GPHOME/ext/python"
fi
PYTHONPATH=$GPHOME/lib/python
PATH=$GPHOME/bin:$PYTHONHOME/bin:$PATH
LD_LIBRARY_PATH=$GPHOME/lib:$PYTHONHOME/lib:$LD_LIBRARY_PATH
OPENSSL_CONF=$GPHOME/etc/openssl.cnf

export GPHOME
export PATH
export LD_LIBRARY_PATH
export PYTHONPATH
export PYTHONHOME
export OPENSSL_CONF

export MASTER_DATA_DIRECTORY=/data01/digoal/gpdatalocal/gpseg-1
export PGHOST=127.0.0.1
export PGPORT=1922
export PGUSER=digoal
export PGDATABASE=postgres
export PGPASSWORD=digoal

backupdir="/data01/digoal/gpbackup"
logdir=$backupdir
masterdir="/data01/digoal/gpdatalocal/gpseg-1"
dat=`psql -A -q -t -h $PGHOST -p $PGPORT -U $PGUSER -c "select ' -x '||string_agg(datname, ' -x ') from pg_database where datname <>'template0'"`
dbid="digoal"

for dumpid in 20160416172728 20160416172733 20160416172738 20160416172743 20160416172748 20160416172753
do
  gpdbrestore -a -e -d $masterdir --prefix $dbid -u $backupdir --restore-stats include --report-status-dir $logdir -t $dumpid
done

特别注意
.1. gpdbrestore -e 参数表示恢复前是否执行 drop database, 然后执行 create database。
所以如果目标环境没有对应的数据库的话，不需要加-e参数，否则会报错。
表级恢复也不要使用-e。
.2. 如果 gpcrondump 时使用了-C 参数，则恢复时会先执行DROP TABLE再执行建表的动作。
.3. 如果 gpcrondump 时没有使用 -C 参数，参数恢复时想先清理数据的话，可以使用gpdbrestore的--truncate参数
（--truncate只能是表级恢复模式下使用, 即与-T . 或 --table-file 一同使用）
.4. Greenplum不允许删除模板库，所以如果使用-e恢复模板库，会报错。解决方法是改gpcrondump代码，对于模板库特殊处理，例如drop schema的方式清理模板库，跳过模板库的DROP database报错以及create database 报错。

本节用到的 gpdbrestore 参数介绍

-a (do not prompt) 

 Do not prompt the user for confirmation.

-b <YYYYMMDD> 

 Looks for dump files in the segment data directories on the Greenplum 
 Database array of hosts in db_dumps/<YYYYMMDD>.

-d <master_data_directory>

 Optional. The master host data directory. If not specified, the value 
 set for $MASTER_DATA_DIRECTORY will be used. 

-e (drop target database before restore) 

 Drops the target database before doing the restore and then recreates 
 it. 

-G [include|only]

 Restores global objects such as roles and tablespaces if the global 
 object dump file db_dumps/<date>/gp_global_1_1_<timestamp> is found in 
 the master data directory.

 Specify either "-G only" to only restore the global objects dump file
 or "-G include" to restore global objects along with a normal restore.
 Defaults to "include" if neither argument is provided.

-l <logfile_directory>

 The directory to write the log file. Defaults to ~/gpAdminLogs. 

-m (restore metadata only)

 Performs a restore of database metadata (schema and table definitions, SET
 statements, and so forth) without restoring data.  If the --restore-stats or
 -G options are provided as well, statistics or globals will also be restored.

 The --noplan and --noanalyze options are not supported in conjunction with
 this option, as they affect the restoration of data and no data is restored.

--prefix <prefix_string> 

 If you specified the gpcrondump option --prefix <prefix_string> to create 
 the backup, you must specify this option with the <prefix_string> when 
 restoring the backup. 

 If you created a full backup of a set of tables with gpcrondump and 
 specified a prefix, you can use gpcrondump with the options 
 --list-filter-tables and --prefix <prefix_string> to list the tables
 that were included or excluded for the backup. 

--restore-stats [include|only]

 Restores optimizer statistics if the statistics dump file
 db_dumps/<date>/gp_statistics_1_1_<timestamp> is found in the master data
 directory. Setting this option automatically skips the final analyze step,
 so it is not necessary to also set the --noanalyze flag in conjunction with
 this one.

-t <timestamp_key>

 The 14 digit timestamp key that uniquely identifies a backup set of data 
 to restore. It is of the form YYYYMMDDHHMMSS. Looks for dump files 
 matching this timestamp key in the segment data directories db_dumps 
 directory on the Greenplum Database array of hosts. 

-T <schema>.<table_name>

 Table names to restore, specify multiple times for multiple tables. The 
 named table(s) must exist in the backup set of the database being restored. 
 Existing tables are not automatically truncated before data is restored 
 from backup. If your intention is to replace existing data in the table 
 from backup, truncate the table prior to running gpdbrestore -T. 

-S <schema>

 Schema names to restore, specify multiple times for multiple schemas. 
 Existing tables are not automatically truncated before data is restored 
 from backup. If your intention is to replace existing data in the table 
 from backup, truncate the table prior to running gpdbrestore -S. 

--truncate

 Truncate table data before restoring data to the table from the backup.
 This option is supported only when restoring a set of tables with the 
 option -T or --table-file. 
 This option is not supported with the -e option.

-u <backup_directory> 

 Specifies the absolute path to the directory containing the db_dumps 
 directory on each host. If not specified, defaults to the data directory 
 of each instance to be backed up. Specify this option if you specified a 
 backup directory with the gpcrondump option -u when creating a backup 
 set. 

 If <backup_directory> is not writable, backup operation report status 
 files are written to segment data directories. You can specify a 
 different location where report status files are written with the 
 --report-status-dir option.

↧

PostgreSQL alter column type 1 to type 2 using express or auto cast

April 18, 2016, 9:55 pm

≫ Next: PostgreSQL 行级全文检索

≪ Previous: Greenplum的全量恢复介绍, gpdbrestore

在使用数据库时，有些应用开发人员可能喜欢使用数值来表示布尔逻辑值，或者在最初定义一个字段的状态时使用的类型，将来不能表达所有的值。
未来则可能需要对字段进行转换，例如数值转换为布尔，或者布尔转换为数值。
还有的时候，一开始可能使用了大量的重复文本，在进行统计时，文本比整型的效率低，在进入仓库后可能需要字典化这些文本（例如APPNAME），也会涉及字段类型的转换。
例子：

postgres=# create table tbl(id int, stat numeric(1));
CREATE TABLE

postgres=# insert into tbl select id,0 from generate_series(1,1000) t(id);
INSERT 0 1000
postgres=# insert into tbl select id,1 from generate_series(1001,2000) t(id);
INSERT 0 1000

postgres=# create or replace function n_to_b(numeric) returns boolean as $$
  select $1::int::boolean;
$$ language sql;
CREATE FUNCTION
postgres=# select n_to_b(1);
 n_to_b 
--------
 t
(1 row)

postgres=# select n_to_b(10);
 n_to_b 
--------
 t
(1 row)

postgres=# select n_to_b(0);
 n_to_b 
--------
 f
(1 row)

postgres=# select n_to_b(-1);
 n_to_b 
--------
 t
(1 row)

postgres=# alter table tbl alter column stat type boolean using stat::int::boolean;
ALTER TABLE

postgres=# select * from tbl limit 10;
 id | stat 
----+------
  1 | f
  2 | f
  3 | f
  4 | f
...

字典化

postgres=# create table test(id int, info text);
CREATE TABLE
postgres=# insert into test select id,'string a' from generate_series(1,100000) t(id);
INSERT 0 100000
postgres=# insert into test select id,'string b' from generate_series(1,100000) t(id);
INSERT 0 100000
postgres=# insert into test select id,'string c' from generate_series(1,100000) t(id);
INSERT 0 100000

postgres=# create or replace function fun(text) returns int as $$
declare
begin  
case $1 
  when 'string a' then return 0;
  when 'string b' then return 1;
  when 'string c' then return 2; 
  else return 9999;
  end case;
end;
$$ language plpgsql strict;
CREATE FUNCTION
postgres=# select fun('a');
 fun  
------
 9999
(1 row)

postgres=# select fun('string a');
 fun 
-----
   0
(1 row)

postgres=# alter table test alter column info type int using fun(info);
ALTER TABLE
postgres=# select * from test where id=1 limit 5;
 id | info 
----+------
  1 |    0
  1 |    1
  1 |    2
(3 rows)

还有时，会涉及文本转数值，也可以使用类似的方法：
你可能需要用到to_number或者自定义函数（例如对于带有非数值的字符串，返回一个固定值）

postgres=# select to_number('123ab2','999')
postgres-# ;
 to_number 
-----------
       123
(1 row)

postgres=# select to_number('123ab2','999');
 to_number 
-----------
       123
(1 row)

postgres=# select to_number('1a123ab2','999');
 to_number 
-----------
        11
(1 row)

postgres=# select to_number('1a123ab2','999999999999');
 to_number 
-----------
     11232
(1 row)

↧

PostgreSQL 行级全文检索

April 18, 2016, 9:55 pm

≫ Next: 用PostgreSQL支持含有更新，删除，插入的实时流式计算

≪ Previous: PostgreSQL alter column type 1 to type 2 using express or auto cast

在一些应用程序中，可能需要对表的所有字段进行检索，有些字段可能需要精准查询，有些字段可能需要模糊查询或全文检索。
这种需求对于应用开发人员来说，会很蛋疼，因为写SQL很麻烦，例子：

postgres=# create table t(phonenum text, info text, c1 int, c2 text, c3 text, c4 timestamp);
CREATE TABLE
postgres=# insert into t values ('13888888888','i am digoal, a postgresqler',123,'china','中华人民共和国，阿里巴巴，阿',now());
INSERT 0 1
postgres=# select * from t;
  phonenum   |            info             | c1  |  c2   |              c3              |             c4             
-------------+-----------------------------+-----+-------+------------------------------+----------------------------
 13888888888 | i am digoal, a postgresqler | 123 | china | 中华人民共和国，阿里巴巴，阿 | 2016-04-19 11:15:55.208658
(1 row)

例如查询t表，条件是，任意字段匹配digoal就返回该记录。

select * from t where phonenum='digoal' or info ~ 'digoal' or c1='digoal' or ......;

每个字段都要写一个条件，有精准匹配，有全文检索。

使用行级全文检索，可以大大简化这个查询。
以结巴分词为例：
源码如下，
https://github.com/jaiminpan/pg_jieba
还有一个基于scws的pg_scws，
https://github.com/jaiminpan/pg_scws
以上都支持自定义词典。
安装略，下面看看用法：

postgres=# select t::text from t;
                                                        t                                                        
-----------------------------------------------------------------------------------------------------------------
 (13888888888,"i am digoal, a postgresqler",123,china,中华人民共和国，阿里巴巴，阿,"2016-04-19 11:15:55.208658")
(1 row)

postgres=# select to_tsvector('jiebacfg',t::text) from t;
                                                                                 to_tsvector                                                                                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 ' ':6,8,11,13,33 '04':30 '11':34 '123':17 '13888888888':2 '15':36 '19':32 '2016':28 '55.208658':38 'china':19 'digoal':9 'postgresqler':14 '中华人民共和国':21 '阿里巴巴':23
(1 row)

使用t::text可以将行转成一个大文本。

postgres=# select to_tsvector('jiebacfg',t::text) @@ to_tsquery('digoal & china') from t;
 ?column? 
----------
 t
(1 row)

postgres=# select to_tsvector('jiebacfg',t::text) @@ to_tsquery('digoal & post') from t;
 ?column? 
----------
 f
(1 row)

创建行级文本索引，需要用到immutable函数索引

postgres=# create or replace function f1(regconfig,text) returns tsvector as $$
 select to_tsvector($1,$2);
 $$ language sql immutable strict;
CREATE FUNCTION

postgres=# create or replace function f1(text) returns tsvector as $$          
select to_tsvector($1);   
$$ language sql immutable strict;
CREATE FUNCTION

postgres=# alter function record_out(record) immutable;
ALTER FUNCTION
postgres=# alter function textin(cstring) immutable;
ALTER FUNCTION
postgres=# create index idx_t_1 on t using gin (f1('jiebacfg'::regconfig,t::text)) ;
CREATE INDEX

验证：

postgres=# select * from t where f1('jiebacfg'::regconfig,t::text) @@ to_tsquery('digoal & post') ;
 phonenum | info | c1 | c2 | c3 | c4 
----------+------+----+----+----+----
(0 rows)
postgres=# select * from t where f1('jiebacfg'::regconfig,t::text) @@ to_tsquery('digoal & china') ;
  phonenum   |            info             | c1  |  c2   |              c3              |             c4             
-------------+-----------------------------+-----+-------+------------------------------+----------------------------
 13888888888 | i am digoal, a postgresqler | 123 | china | 中华人民共和国，阿里巴巴，阿 | 2016-04-19 11:15:55.208658
(1 row)

postgres=# select * from t where f1('jiebacfg'::regconfig,t::text) @@ to_tsquery('digoal & 阿里巴巴') ;
  phonenum   |            info             | c1  |  c2   |              c3              |             c4             
-------------+-----------------------------+-----+-------+------------------------------+----------------------------
 13888888888 | i am digoal, a postgresqler | 123 | china | 中华人民共和国，阿里巴巴，阿 | 2016-04-19 11:15:55.208658
(1 row)

postgres=# explain select * from t where f1('jiebacfg'::regconfig,t::text) @@ to_tsquery('digoal & 阿里巴巴') ;
                                              QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..1.52 rows=1 width=140)
   Filter: (to_tsvector('jiebacfg'::regconfig, (t.*)::text) @@ to_tsquery('digoal & 阿里巴巴'::text))
(2 rows)

如果记录数很多，就会用到索引，记录数很少的时候，我们可以用hint或者开关来强制索引:

postgres=# set enable_seqscan=off;
SET
postgres=# explain select * from t where f1('jiebacfg'::regconfig,t::text) @@ to_tsquery('digoal & 阿里巴巴') ;
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=12.25..16.77 rows=1 width=140)
   Recheck Cond: (to_tsvector('jiebacfg'::regconfig, (t.*)::text) @@ to_tsquery('digoal & 阿里巴巴'::text))
   ->  Bitmap Index Scan on idx_t_1  (cost=0.00..12.25 rows=1 width=0)
         Index Cond: (to_tsvector('jiebacfg'::regconfig, (t.*)::text) @@ to_tsquery('digoal & 阿里巴巴'::text))
(4 rows)

happy it.

↧

用PostgreSQL支持含有更新，删除，插入的实时流式计算

April 19, 2016, 12:15 am

≫ Next: 如何追溯 PostgreSQL 慢查询当时的状态

≪ Previous: PostgreSQL 行级全文检索

大多数的流式计算产品只支持APPEND ONLY的应用场景，也就是只有插入，没有更新和删除操作。
如果要实现更新和删除的实时流式计算，在PostgreSQL中可以这样来实现。
在此前你可以阅读我以前写的文章来了解PG是如何处理一天一万亿的实时流式计算的：
https://yq.aliyun.com/articles/166

要支持更新和删除，思路是这样的，加一张前置表，这个前置表的某个字段用来记录字段的最终状态，即到达这个状态后，记录不会被更新或删除。
通过触发器来控制什么记录插入到流中同时从前置表删除，什么记录现暂存在前置表。
下面是例子
本文假设flag=2是最终状态，应用层自己来定义这个FLAG。

pipeline=# create table pret1(id serial primary key, info text, flag smallint);
CREATE TABLE

pipeline=# create stream s0 (like pret1);
CREATE STREAM

pipeline=# create continuous view v0 as select count(*) from s0;
CREATE CONTINUOUS VIEW

flag=2的记录旁路到流，其他记录放到前置表。

pipeline=# create or replace function tg0() returns trigger as $$
 declare
 begin
   if new.flag=2 then
     insert into s0 values (new.*);
     return null;
   end if;
     return new;
 end;
 $$ language plpgsql strict;
CREATE FUNCTION

pipeline=# create trigger tg0 before insert on pret1 for each row execute procedure tg0();
CREATE TRIGGER

更新后flag=2的记录旁路到流，并删除前置表的对应记录。

pipeline=# create or replace function tg1() returns trigger as $$
 declare
 begin
   if new.flag=2 then
     insert into s0 values (new.*); 
     delete from pret1 where id=new.id; 
     return null;
   end if;
     return new;
 end;
 $$ language plpgsql strict;
CREATE FUNCTION

pipeline=# create trigger tg1 before update on pret1 for each row execute procedure tg1();
CREATE TRIGGER

测试

pipeline=# insert into pret1(info,flag) values ('test',0);
INSERT 0 1
pipeline=# select * from v0;
 count 
-------
(0 rows)

pipeline=# insert into pret1(info,flag) values ('test',1);
INSERT 0 1
pipeline=# select * from v0;
 count 
-------
(0 rows)

pipeline=# select * from pret1;
 id | info | flag 
----+------+------
  1 | test |    0
  2 | test |    1
(2 rows)

pipeline=# update pret1 set flag=2;
UPDATE 0
pipeline=# select * from pret1;
 id | info | flag 
----+------+------
(0 rows)

pipeline=# select * from v0;
 count 
-------
     2
(1 row)

pipeline=# insert into pret1(info,flag) values ('test',1);
INSERT 0 1
pipeline=# delete from pret1 ;
DELETE 1
pipeline=# select * from v0;
 count 
-------
     2
(1 row)

pipeline=# insert into pret1(info,flag) values ('test',1);
INSERT 0 1
pipeline=# select * from v0;
 count 
-------
     2
(1 row)

pipeline=# update pret1 set flag =10;
UPDATE 1
pipeline=# select * from v0;
 count 
-------
     2
(1 row)

pipeline=# select * from pret1;
 id | info | flag 
----+------+------
  4 | test |   10
(1 row)

pipeline=# update pret1 set flag =2;
UPDATE 0
pipeline=# select * from pret1;
 id | info | flag 
----+------+------
(0 rows)

pipeline=# select * from v0;
 count 
-------
     3
(1 row)

详情请参考
http://docs.pipelinedb.com/introduction.html

↧

如何追溯 PostgreSQL 慢查询当时的状态

April 26, 2016, 7:04 am

≫ Next: PostgreSQL IDE pgadmin , edb postgres enterprise manager 查询慢的问题分析

≪ Previous: 用PostgreSQL支持含有更新，删除，插入的实时流式计算

数据库出现慢查询的原因很多，例如IO等待，CPU繁忙，执行计划异常，锁等待，等等。
那么在发生慢查询后，如何能追溯慢查询当时的状态呢？
下面给大家提供一种思路，
.1. 首先，我们是如何监测慢查询的
.2. 监测到慢查询后，需要采集哪些信息
.3. 数据库内核层面能做什么
.4. 如何分析
如何实现？
.1. 如何监测慢查询

select datname, pid, usename, application_name, client_addr, client_port, xact_start, query_start, state_change, waiting, state, backend_xid, backend_xmin, query, xact_start,now()-xact_start from pg_stat_activity where state<>'idle' and (backend_xid is not null or backend_xmin is not null) order by now()-xact_start;

其中 now()-xact_start 是指事务截至当前已运行时间。
now() - query_start query截至当前已运行时间。
pid 指服务端进程ID。
.2. 采集哪些信息
如果发现运行时间超过设定阈值，记录该进程的以下信息：
.2.1.
针对pid查看它的pstack, 采集间隔自己定，比如1秒，直到对应的PID运行结束。

.2.2.
锁等待记录，采集间隔自己定，比如1秒，直到对应的PID运行结束。

with t_wait as                     
(select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,
a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a,
transactionid,b.query,b.xact_start,b.query_start,b.usename,b.datname 
  from pg_locks a,pg_stat_activity b where a.pid=b.pid and not a.granted),
t_run as 
(select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,
a.classid,a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,
a,transactionid,b.query,b.xact_start,b.query_start,
b.usename,b.datname from pg_locks a,pg_stat_activity b where 
a.pid=b.pid and a.granted) 
select r.locktype,r.mode r_mode,r.usename r_user,r.datname r_db,
r.relation::regclass,r.pid r_pid,
r.page r_page,r.tuple r_tuple,r.xact_start r_xact_start,
r.query_start r_query_start,
now()-r.query_start r_locktime,r.query r_query,w.mode w_mode,
w.pid w_pid,w.page w_page,
w.tuple w_tuple,w.xact_start w_xact_start,w.query_start w_query_start,
now()-w.query_start w_locktime,w.query w_query  
from t_wait w,t_run r where
  r.locktype is not distinct from w.locktype and
  r.database is not distinct from w.database and
  r.relation is not distinct from w.relation and
  r.page is not distinct from w.page and
  r.tuple is not distinct from w.tuple and
  r.classid is not distinct from w.classid and
  r.objid is not distinct from w.objid and
  r.objsubid is not distinct from w.objsubid and
  r.transactionid is not distinct from w.transactionid and
  r.pid <> w.pid
  order by 
  ((  case w.mode
    when 'INVALID' then 0
    when 'AccessShareLock' then 1
    when 'RowShareLock' then 2
    when 'RowExclusiveLock' then 3
    when 'ShareUpdateExclusiveLock' then 4
    when 'ShareLock' then 5
    when 'ShareRowExclusiveLock' then 6
    when 'ExclusiveLock' then 7
    when 'AccessExclusiveLock' then 8
    else 0
  end  ) + 
  (  case r.mode
    when 'INVALID' then 0
    when 'AccessShareLock' then 1
    when 'RowShareLock' then 2
    when 'RowExclusiveLock' then 3
    when 'ShareUpdateExclusiveLock' then 4
    when 'ShareLock' then 5
    when 'ShareRowExclusiveLock' then 6
    when 'ExclusiveLock' then 7
    when 'AccessExclusiveLock' then 8
    else 0
  end  )) desc,r.xact_start;

.2.3.
整机 io 情况, 例如 iostat -x 1 ，采集间隔自己定，比如1秒，直到对应的PID运行结束。
进程IO情况, iotop -p $PID ，采集间隔自己定，比如1秒，直到对应的PID运行结束。

.2.4.
网络情况，例如sar -n DEV 1 1 ，采集间隔自己定，比如1秒，直到对应的PID运行结束。
进程网络情况，例如 iptraf，根据客户端IP和端口号，采集间隔自己定，比如1秒，直到对应的PID运行结束。

.2.5.
CPU 使用情况
top -p $PID ，采集间隔自己定，比如1秒，直到对应的PID运行结束。

.3. 数据库内核层面能做什么
.3.1. 对执行时间超过阈值的SQL，自动记录SQL的explain 输出，以及每个NODE的耗时。
配置auto_explain来实现以上目的，配置例子：
http://blog.163.com/digoal@126/blog/static/16387704020115825612145/

.3.2. 自动记录SQL的锁等待耗时。
配置例子：

log_lock_waits=on
deadlock_timeout = 1s

.3.3. 内核还可以记录SQL IO的时间，需要开启io timing trace.

.3.4. PG内核目前输出的SQL时间包含了数据传输到客户端的时间，但是网络传输的时间没有单独统计，所以这个可以通过HACK内核来实现。

有了以上信息，就可以追溯慢查询到底慢在什么地方了。

↧

PostgreSQL IDE pgadmin , edb postgres enterprise manager 查询慢的问题分析

April 26, 2016, 7:04 am

≫ Next: PostgreSQL SQL log duration time 源码分析

≪ Previous: 如何追溯 PostgreSQL 慢查询当时的状态

PostgreSQL 的GUI客户端比较多，有开源的，也有商业的。
用得比较多的可能是PgAdmin了，有些人可能会用EDB的PEM。
但实际上这两个GUI都有一个小问题，在返回较大的结果集时，会非常的慢。
例如：
数据库端创建一个表，插入约30MB数据。

postgres=> create table test (like pg_class);
CREATE TABLE
postgres=> insert into test select * from pg_class;
INSERT 0 301
postgres=> insert into test select * from test;
INSERT 0 301
postgres=> insert into test select * from test;
INSERT 0 602
...
postgres=> insert into test select * from test;
INSERT 0 77056
postgres=> \dt+
                          List of relations
 Schema |      Name       | Type  | Owner  |    Size    | Description 
--------+-----------------+-------+--------+------------+-------------
 public | test            | table | digoal | 29 MB      | 
(3 rows)

使用EDB的PEM或者pgadmin连接到数据库：
在GUI中执行：

select * from test;

耗时20秒。

换个执行语句：

copy (select * from test) to stdout;
或者
copy test to stdout;

3秒返回。
copy与select * 查询的数据量一样多，而且都是全表扫描，但是时间却相差10几秒。

原因排查
在pgadmin客户端的机器上，观察到一个现象：
执行select * from test;时，网络使用率不高，持续时间长。
网络传输结束后，CPU马上飙高，估计pgadmin在处理数据，很长一段时间后，才开始展示结果。

而更换为

copy (select * from test) to stdout;
或者
copy test to stdout;

后，执行非常迅速，而且展示也非常快，可以看到网络使用率很高，出现了一个尖峰。

将GUI客户端更换为heidisql后，执行 select * from test; 执行速度很快，与COPY相当。
从网络使用率来看，也出现了一个尖峰，数据很快就传完了。

使用 PostgreSQL 客户端 psql 命令执行select * from test，速度也和heidisql一样，很快。

对比以上几种情况，说明pgadmin和pem在处理 select 时，效果并不理想，如果要返回大量的结果集，请慎用。
如果使用PEM或者pgadmin要返回大量结果集，建议使用游标来返回：
例子：

begin;
declare c1 cursor for select * from test;  
fetch 100 from c1;  -- 这里不断的LOOP.

网络流量对比图 :

从左往右数
第1个尖峰，heidisql中执行select * from test;
第2个尖峰，pgadmin中执行copy (select * from test) to stdout;
第3个尖峰，pgadmin中执行copy test to stdout;
第4个尖峰，psql中执行select * from test;
说明 select * from test 的网络传输流量确实比copy的更大一些。
heidisql不支持 copy命令.
如果你用的是windows平台，并且遇到了与之类似的问题，建议排查一下客户端程序的代码，从程序层面来解决这个问题。
这个问题我也会反馈给pgadmin和EDB，看看他们怎么解决。

最后要给应用开发人员的一个小建议 :
查询大结果集，给用户展示数据的SQL，建议修改为用游标打开，一次FETCH少量数据, 拿到数据马上就可以向用户展示，后台可以根据策略选择是否再继续fetch剩余的数据。
这样做的好处是用户体验更好，同时有可能可以大大减少数据库的网络开销和CPU开销（因为用户并不一定要查询所有数据）如果用户关闭窗口，可以不再fetch其他数据。
大多数类似的应用场景，都是这样来设计的。

↧

PostgreSQL SQL log duration time 源码分析

April 26, 2016, 7:05 am

≫ Next: iperf 测试网络性能指标

≪ Previous: PostgreSQL IDE pgadmin , edb postgres enterprise manager 查询慢的问题分析

PostgreSQL 可以通过参数设置是否要记录SQL的执行时间，以及执行时间超过多少的SQL。
注意这里的执行时间实际上包含了网络的传输时间。
所以在遇到慢查询时，除了要排查数据库的问题，实际上还需要排查网络以及客户端的问题，因为客户端接收数据堵塞也会造成慢查询，就像我前天写的文章。
PostgreSQL IDE pgadmin , edb postgres enterprise manager 查询慢的问题分析
https://yq.aliyun.com/articles/32438

另外需要指出的是，PostgreSQL的内核在这方面有改进的空间，最好是把网络传输的时间另外计算。
这样更容易排查问题。
如果要将网络时间另外计算，需要hack一下内核的postgres.c中的几个函数，文章后面会分析。

测试
在数据库中创建表和测试数据

postgres=> create table tbl(id int);
CREATE TABLE
postgres=> insert into tbl select generate_series(1,200000);
INSERT 0 200000
postgres=> \dt+ tbl
                   List of relations
 Schema | Name | Type  | Owner |  Size   | Description 
--------+------+-------+-------+---------+-------------
 public | tbl  | table | test  | 7104 kB | 
(1 row)

确保打开了慢查询的审计日志

postgres=> show log_min_duration_statement ;
 log_min_duration_statement 
----------------------------
 1s
(1 row)

在数据库所在服务器的本地执行如下查询，很快就返回

digoal@localhost-> date; psql -c "select * from tbl" >/dev/null ; date;
Fri Apr 22 11:59:52 CST 2016
Fri Apr 22 11:59:53 CST 2016

开启一个比较慢的网络，例如手机的2G网络，然后通过手机上网连接到数据库，执行同样的SQL，耗时变长了，因为网络不稳定，时快时慢。

digoal@digoal01-> date ; psql -h remote_host -p 1921 -U test postgres -c "select * from tbl" >/dev/null ; date;
Fri Apr 22 12:31:08 CST 2016
Fri Apr 22 12:31:18 CST 2016


digoal@digoal01-> date ; psql -h remote_host -p 1921 -U test postgres -c "select * from tbl limit 20000" >/dev/null ; date;
Fri Apr 22 12:34:30 CST 2016
Fri Apr 22 12:34:47 CST 2016

在数据库的日志中，可以看到慢查询的审计日志，耗时包含了SQL数据库将数据传输到客户端的时间。

2016-04-22 12:33:13.112 CST,"test","postgres",2680,"36.16.129.195:11812",5719a97f.a78,3,"SELECT",2016-04-22 12:33:03 CST,7/0,0,LOG,00000,"duration: 8300.129 ms  statement: select * from tbl limit 20000",,,,,,,,"exec_simple_query, postgres.c:1149","psql"

对应的代码
src/backend/tcop/postgres.c

check_log_duration
检查是否需要输出duration。
同时计算从语句时间开始到当前时间的一个时间差，也就是SQL的duration。
有4个接口会记录时间
exec_execute_message 使用绑定变量时, execute sql的时间
exec_bind_message 使用绑定变量时, bind的时间
exec_parse_message 使用绑定变量时， parse的时间
exec_simple_query 未使用绑定变量时，执行SQL的时间

这几个函数的代码如下

/*
 * check_log_duration
 *      Determine whether current command's duration should be logged
 *
 * Returns:
 *      0 if no logging is needed, 不需要记录SQL时间
 *      1 if just the duration should be logged, 需要记录SQL时间, 但是不需要记录SQL详情
 *      2 if duration and query details should be logged, 需要记录SQL时间，同时需要记录SQL 详情
 *
 * If logging is needed, the duration in msec is formatted into msec_str[],
 * which must be a 32-byte buffer.
 *
 * was_logged should be TRUE if caller already logged query details (this
 * essentially prevents 2 from being returned).
 */
int
check_log_duration(char *msec_str, bool was_logged)
{
    if (log_duration || log_min_duration_statement >= 0)
    {
        long        secs;
        int         usecs;
        int         msecs;
        bool        exceeded;

        TimestampDifference(GetCurrentStatementStartTimestamp(),
                            GetCurrentTimestamp(),
                            &secs, &usecs);  // 语句开始到当前的时间
        msecs = usecs / 1000;

        /*
         * This odd-looking test for log_min_duration_statement being exceeded
         * is designed to avoid integer overflow with very long durations:
         * don't compute secs * 1000 until we've verified it will fit in int.
         */
        exceeded = (log_min_duration_statement == 0 ||
                    (log_min_duration_statement > 0 &&
                     (secs > log_min_duration_statement / 1000 ||
                      secs * 1000 + msecs >= log_min_duration_statement)));

        if (exceeded || log_duration)
        {
            snprintf(msec_str, 32, "%ld.%03d",
                     secs * 1000 + msecs, usecs % 1000);
            if (exceeded && !was_logged)
                return 2;
            else
                return 1;
        }
    }

    return 0;
}


simple exec
/*
 * exec_simple_query
 *
 * Execute a "simple Query" protocol message.
 */
static void
exec_simple_query(const char *query_string)
{
    CommandDest dest = whereToSendOutput;
    MemoryContext oldcontext;
    List       *parsetree_list;
    ListCell   *parsetree_item;
    bool        save_log_statement_stats = log_statement_stats;
    bool        was_logged = false;
    bool        isTopLevel;
    char        msec_str[32];
...
        /*
         * Create unnamed portal to run the query or queries in. If there
         * already is one, silently drop it.
         */
        portal = CreatePortal("", true, true);
        /* Don't display the portal in pg_cursors */
        portal->visible = false;

        /*
         * We don't have to copy anything into the portal, because everything
         * we are passing here is in MessageContext, which will outlive the
         * portal anyway.
         */
        PortalDefineQuery(portal,
                          NULL,
                          query_string,
                          commandTag,
                          plantree_list,
                          NULL);

        /*
         * Start the portal.  No parameters here.
         */
        PortalStart(portal, NULL, 0, InvalidSnapshot);

        /*
         * Select the appropriate output format: text unless we are doing a
         * FETCH from a binary cursor.  (Pretty grotty to have to do this here
         * --- but it avoids grottiness in other places.  Ah, the joys of
         * backward compatibility...)
         */
        format = 0;             /* TEXT is default */
        if (IsA(parsetree, FetchStmt))
        {
            FetchStmt  *stmt = (FetchStmt *) parsetree;

            if (!stmt->ismove)
            {
                Portal      fportal = GetPortalByName(stmt->portalname);

                if (PortalIsValid(fportal) &&
                    (fportal->cursorOptions & CURSOR_OPT_BINARY))
                    format = 1; /* BINARY */
            }
        }
        PortalSetResultFormat(portal, 1, &format);

        /*
         * Now we can create the destination receiver object.
         */
        receiver = CreateDestReceiver(dest);
        if (dest == DestRemote)
            SetRemoteDestReceiverParams(receiver, portal);

        /*
         * Switch back to transaction context for execution.
         */
        MemoryContextSwitchTo(oldcontext);

        /*
         * Run the portal to completion, and then drop it (and the receiver).
         */
        (void) PortalRun(portal,
                         FETCH_ALL,
                         isTopLevel,
                         receiver,
                         receiver,
                         completionTag);

        (*receiver->rDestroy) (receiver);

        PortalDrop(portal, false);

        if (IsA(parsetree, TransactionStmt))
        {
            /*
             * If this was a transaction control statement, commit it. We will
             * start a new xact command for the next command (if any).
             */
            finish_xact_command();
        }
...
    /*
     * Close down transaction statement, if one is open.
     */
    finish_xact_command();

    /*
     * If there were no parsetrees, return EmptyQueryResponse message.
     */
    if (!parsetree_list)
        NullCommand(dest);

    /*
     * Emit duration logging if appropriate.
     */
    switch (check_log_duration(msec_str, was_logged))
    {
        case 1:
            ereport(LOG,
                    (errmsg("duration: %s ms", msec_str),
                     errhidestmt(true)));
            break;
        case 2:
            ereport(LOG,
                    (errmsg("duration: %s ms  statement: %s",
                            msec_str, query_string),
                     errhidestmt(true),
                     errdetail_execute(parsetree_list)));
            break;
    }

    if (save_log_statement_stats)
        ShowUsage("QUERY STATISTICS");

    TRACE_POSTGRESQL_QUERY_DONE(query_string);

    debug_query_string = NULL;
}




parse
/*
 * exec_parse_message
 *
 * Execute a "Parse" protocol message.
 */
static void
exec_parse_message(const char *query_string,    /* string to execute */
                   const char *stmt_name,       /* name for prepared stmt */
                   Oid *paramTypes,     /* parameter types */
                   int numParams)       /* number of parameters */
{
    MemoryContext unnamed_stmt_context = NULL;
    MemoryContext oldcontext;
    List       *parsetree_list;
    Node       *raw_parse_tree;
    const char *commandTag;
    List       *querytree_list;
    CachedPlanSource *psrc;
    bool        is_named;
    bool        save_log_statement_stats = log_statement_stats;
    char        msec_str[32];
...
    /*
     * Send ParseComplete.
     */
    if (whereToSendOutput == DestRemote)
        pq_putemptymessage('1');

    /*
     * Emit duration logging if appropriate.
     */
    switch (check_log_duration(msec_str, false))
    {
        case 1:
            ereport(LOG,
                    (errmsg("duration: %s ms", msec_str),
                     errhidestmt(true)));
            break;
        case 2:
            ereport(LOG,
                    (errmsg("duration: %s ms  parse %s: %s",
                            msec_str,
                            *stmt_name ? stmt_name : "<unnamed>",
                            query_string),
                     errhidestmt(true)));
            break;
    }

    if (save_log_statement_stats)
        ShowUsage("PARSE MESSAGE STATISTICS");

    debug_query_string = NULL;
}




bind
/*
 * exec_bind_message
 *
 * Process a "Bind" message to create a portal from a prepared statement
 */
static void
exec_bind_message(StringInfo input_message)
{
    const char *portal_name;
    const char *stmt_name;
    int         numPFormats;
    int16      *pformats = NULL;
    int         numParams;
    int         numRFormats;
    int16      *rformats = NULL;
    CachedPlanSource *psrc;
    CachedPlan *cplan;
    Portal      portal;
    char       *query_string;
    char       *saved_stmt_name;
    ParamListInfo params;
    MemoryContext oldContext;
    bool        save_log_statement_stats = log_statement_stats;
    bool        snapshot_set = false;
    char        msec_str[32];
...
    /*
     * Now we can define the portal.
     *
     * DO NOT put any code that could possibly throw an error between the
     * above GetCachedPlan call and here.
     */
    PortalDefineQuery(portal,
                      saved_stmt_name,
                      query_string,
                      psrc->commandTag,
                      cplan->stmt_list,
                      cplan);

    /* Done with the snapshot used for parameter I/O and parsing/planning */
    if (snapshot_set)
        PopActiveSnapshot();

    /*
     * And we're ready to start portal execution.
     */
    PortalStart(portal, params, 0, InvalidSnapshot);

    /*
     * Apply the result format requests to the portal.
     */
    PortalSetResultFormat(portal, numRFormats, rformats);

    /*
     * Send BindComplete.
     */
    if (whereToSendOutput == DestRemote)
        pq_putemptymessage('2');

    /*
     * Emit duration logging if appropriate.
     */
    switch (check_log_duration(msec_str, false))
    {
        case 1:
            ereport(LOG,
                    (errmsg("duration: %s ms", msec_str),
                     errhidestmt(true)));
            break;
        case 2:
            ereport(LOG,
                    (errmsg("duration: %s ms  bind %s%s%s: %s",
                            msec_str,
                            *stmt_name ? stmt_name : "<unnamed>",
                            *portal_name ? "/" : "",
                            *portal_name ? portal_name : "",
                            psrc->query_string),
                     errhidestmt(true),
                     errdetail_params(params)));
            break;
    }

    if (save_log_statement_stats)
        ShowUsage("BIND MESSAGE STATISTICS");

    debug_query_string = NULL;
}



execute



/*
 * exec_execute_message
 *
 * Process an "Execute" message for a portal
 */
static void
exec_execute_message(const char *portal_name, long max_rows)
{
    CommandDest dest;
    DestReceiver *receiver;
    Portal      portal;
    bool        completed;
    char        completionTag[COMPLETION_TAG_BUFSIZE];
    const char *sourceText;
    const char *prepStmtName;
    ParamListInfo portalParams;
    bool        save_log_statement_stats = log_statement_stats;
    bool        is_xact_command;
    bool        execute_is_fetch;
    bool        was_logged = false;
    char        msec_str[32];

...
    /*
     * Okay to run the portal.
     */
    if (max_rows <= 0)
        max_rows = FETCH_ALL;

    completed = PortalRun(portal,
                          max_rows,
                          true, /* always top level */
                          receiver,
                          receiver,
                          completionTag);

    (*receiver->rDestroy) (receiver);

    if (completed)
    {
        if (is_xact_command)
        {
            /*
             * If this was a transaction control statement, commit it.  We
             * will start a new xact command for the next command (if any).
             */
            finish_xact_command();
        }
        else
        {
            /*
             * We need a CommandCounterIncrement after every query, except
             * those that start or end a transaction block.
             */
            CommandCounterIncrement();
        }

        /* Send appropriate CommandComplete to client */
        EndCommand(completionTag, dest);
    }
    else
    {
        /* Portal run not complete, so send PortalSuspended */
        if (whereToSendOutput == DestRemote)
            pq_putemptymessage('s');
    }

    /*
     * Emit duration logging if appropriate.
     */
    switch (check_log_duration(msec_str, was_logged))
    {
        case 1:
            ereport(LOG,
                    (errmsg("duration: %s ms", msec_str),
                     errhidestmt(true)));
            break;
        case 2:
            ereport(LOG,
                    (errmsg("duration: %s ms  %s %s%s%s: %s",
                            msec_str,
                            execute_is_fetch ?
                            _("execute fetch from") :
                            _("execute"),
                            prepStmtName,
                            *portal_name ? "/" : "",
                            *portal_name ? portal_name : "",
                            sourceText),
                     errhidestmt(true),
                     errdetail_params(portalParams)));
            break;
    }

    if (save_log_statement_stats)
        ShowUsage("EXECUTE MESSAGE STATISTICS");

    debug_query_string = NULL;
}

↧

iperf 测试网络性能指标

April 26, 2016, 7:06 am

≫ Next: PostgreSQL prepared statement和simple query的profile及性能差异

≪ Previous: PostgreSQL SQL log duration time 源码分析

Iperf是一个网络性能测试工具,主要应用于LINUX服务器下面。可以测量最大TCP和UDP带宽，具有多种参数和特性，可以记录带宽，延迟抖动和数据包丢失，最大组和MTU等统计信息，通过这些信息可以发现网络问题，检查网络质量，定位网络瓶颈。Iperf在linux和windows平台均有二进制版本供自由使用。

对于需要大量网络交互的产品，例如Greenplum，网络性能指标是一个很重要的指标。

安装在需要测试网络的两台主机上。

git clone https://github.com/esnet/iperf.git
cd iperf

切换到最新的稳定分支后安装

git checkout 3.1-STABLE
./configure --prefix=/home/digoal/iperfhome
make
make install

将so加入

$ sudo vi /etc/ld.so.conf
/home/digoal/iperfhome/lib
# ldconfig
# ldconfig -p |grep iperf

详细的命令说明

./iperfhome/bin/iperf3 --help
Usage: iperf [-s|-c host] [options]
       iperf [-h|--help] [-v|--version]

Server or Client:
  -p, --port      #         server port to listen on/connect to
  -f, --format    [kmgKMG]  format to report: Kbits, Mbits, KBytes, MBytes
  -i, --interval  #         seconds between periodic bandwidth reports
  -F, --file name           xmit/recv the specified file
  -A, --affinity n/n,m      set CPU affinity
  -B, --bind      <host>    bind to a specific interface
  -V, --verbose             more detailed output
  -J, --json                output in JSON format
  --logfile f               send output to a log file
  -d, --debug               emit debugging output
  -v, --version             show version information and quit
  -h, --help                show this message and quit
Server specific:
  -s, --server              run in server mode
  -D, --daemon              run the server as a daemon
  -I, --pidfile file        write PID file
  -1, --one-off             handle one client connection then exit
Client specific:
  -c, --client    <host>    run in client mode, connecting to <host>
  -u, --udp                 use UDP rather than TCP
  -b, --bandwidth #[KMG][/#] target bandwidth in bits/sec (0 for unlimited)
                            (default 1 Mbit/sec for UDP, unlimited for TCP)
                            (optional slash and packet count for burst mode)
  -t, --time      #         time in seconds to transmit for (default 10 secs)
  -n, --bytes     #[KMG]    number of bytes to transmit (instead of -t)
  -k, --blockcount #[KMG]   number of blocks (packets) to transmit (instead of -t or -n)
  -l, --len       #[KMG]    length of buffer to read or write
                            (default 128 KB for TCP, 8 KB for UDP)
  --cport         <port>    bind to a specific client port (TCP and UDP, default: ephemeral port)
  -P, --parallel  #         number of parallel client streams to run
  -R, --reverse             run in reverse mode (server sends, client receives)
  -w, --window    #[KMG]    set window size / socket buffer size
  -C, --congestion <algo>   set TCP congestion control algorithm (Linux and FreeBSD only)
  -M, --set-mss   #         set TCP/SCTP maximum segment size (MTU - 40 bytes)
  -N, --no-delay            set TCP/SCTP no delay, disabling Nagle's Algorithm
  -4, --version4            only use IPv4
  -6, --version6            only use IPv6
  -S, --tos N               set the IP 'type of service'
  -L, --flowlabel N         set the IPv6 flow label (only supported on Linux)
  -Z, --zerocopy            use a 'zero copy' method of sending data
  -O, --omit N              omit the first n seconds
  -T, --title str           prefix every output line with this string
  --get-server-output       get results from server
  --udp-counters-64bit      use 64-bit counters in UDP test packets

[KMG] indicates options that support a K/M/G suffix for kilo-, mega-, or giga-

iperf3 homepage at: http://software.es.net/iperf/
Report bugs to:     https://github.com/esnet/iperf

帮助文档

man man1/iperf3.1 
IPERF(1)                         User Manuals                         IPERF(1)

NAME
       iperf3 - perform network throughput tests

SYNOPSIS
       iperf3 -s [ options ]
       iperf3 -c server [ options ]

DESCRIPTION
       iperf3 is a tool for performing network throughput measurements.  It can test either TCP or UDP throughput.  To perform an iperf3 test the user must establish both a server and a client.

GENERAL OPTIONS
       -p, --port n
              set server port to listen on/connect to to n (default 5201)

       -f, --format
              [kmKM]   format to report: Kbits, Mbits, KBytes, MBytes

       -i, --interval n
              pause n seconds between periodic bandwidth reports; default is 1, use 0 to disable

       -F, --file name
              client-side: read from the file and write to the network, instead of using random data; server-side: read from the network and write to the file, instead of throwing the data away

       -A, --affinity n/n,m
              Set the CPU affinity, if possible (Linux and FreeBSD only).  On both the client and server you can set the local affinity by using the n form of this argument (where n is a CPU number).  In addition,
              on the client side you can override the server’s affinity for just that one test, using the n,m form of argument.  Note that when using this feature, a process will only be bound to a single CPU  (as
              opposed to a set containing potentialy multiple CPUs).

       -B, --bind host
              bind to a specific interface

       -V, --verbose
              give more detailed output

       -J, --json
              output in JSON format

       --logfile file
              send output to a log file.

       -d, --debug
              emit debugging output.  Primarily (perhaps exclusively) of use to developers.

       -v, --version
              show version information and quit

       -h, --help
              show a help synopsis

SERVER SPECIFIC OPTIONS
       -s, --server
              run in server mode

       -D, --daemon
              run the server in background as a daemon

       -I, --pidfile file
              write a file with the process ID, most useful when running as a daemon.

       -1, --one-off
              handle one client connection, then exit.

CLIENT SPECIFIC OPTIONS
       -c, --client host
              run in client mode, connecting to the specified server

       --sctp use SCTP rather than TCP (FreeBSD and Linux)

       -u, --udp
              use UDP rather than TCP

       -b, --bandwidth n[KM]
              set  target  bandwidth to n bits/sec (default 1 Mbit/sec for UDP, unlimited for TCP).  If there are multiple streams (-P flag), the bandwidth limit is applied separately to each stream.  You can also
              add a ’/’ and a number to the bandwidth specifier.  This is called "burst mode".  It will send the given number of packets without pausing, even if that temporarily exceeds  the  specified  bandwidth
              limit.  Setting the target bandwidth to 0 will disable bandwidth limits (particularly useful for UDP tests).

       -t, --time n
              time in seconds to transmit for (default 10 secs)

       -n, --bytes n[KM]
              number of bytes to transmit (instead of -t)

       -k, --blockcount n[KM]
              number of blocks (packets) to transmit (instead of -t or -n)

       -l, --length n[KM]
              length of buffer to read or write (default 128 KB for TCP, 8KB for UDP)

       --cport port
              bind data streams to a specific client port (for TCP and UDP only, default is to use an ephemeral port)

       -P, --parallel n
              number of parallel client streams to run

       -R, --reverse
              run in reverse mode (server sends, client receives)

       -w, --window n[KM]
              window size / socket buffer size (this gets sent to the server and used on that side too)

       -M, --set-mss n
              set TCP/SCTP maximum segment size (MTU - 40 bytes)

       -N, --no-delay
              set TCP/SCTP no delay, disabling Nagle’s Algorithm

       -4, --version4
              only use IPv4

       -6, --version6
              only use IPv6

       -S, --tos n
              set the IP ’type of service’

       -L, --flowlabel n
              set the IPv6 flow label (currently only supported on Linux)

       -X, --xbind name
              Bind  SCTP  associations  to  a specific subset of links using sctp_bindx(3).  The --B flag will be ignored if this flag is specified.  Normally SCTP will include the protocol addresses of all active
              links on the local host when setting up an association. Specifying at least one --X name will disable this behaviour.  This flag must be specified for each link to be included in the association, and
              is  supported  for  both iperf servers and clients (the latter are supported by passing the first --X argument to bind(2)).  Hostnames are accepted as arguments and are resolved using getaddrinfo(3).
              If the --4 or --6 flags are specified, names which do not resolve to addresses within the specified protocol family will be ignored.

       --nstreams n
              Set number of SCTP streams.

       -Z, --zerocopy
              Use a "zero copy" method of sending data, such as sendfile(2), instead of the usual write(2).

       -O, --omit n
              Omit the first n seconds of the test, to skip past the TCP slow-start period.

       -T, --title str
              Prefix every output line with this string.

       -C, --congestion algo
              Set the congestion control algorithm (Linux and FreeBSD only).  An older --linux-congestion synonym for this flag is accepted but is deprecated.

       --get-server-output
              Get the output from the server.  The output format is determined by the server (in particular, if the server was invoked with the --json flag, the output will be in JSON format, otherwise it will  be
              in human-readable format).  If the client is run with --json, the server output is included in a JSON object; otherwise it is appended at the bottom of the human-readable output.

AUTHORS
       A list of the contributors to iperf3 can be found within the documentation located at http://software.es.net/iperf/dev.html#authors.

SEE ALSO
       libiperf(3), http://software.es.net/iperf

ESnet                            October 2015                         IPERF(1)

简单的测试
服务端

./iperfhome/bin/iperf3 -p 8181 -f M -i 3 -B 0.0.0.0 -V --logfile /tmp/iperf.log -s -D

tail -f -n 1 /tmp/iperf.log

客户端, 通过-M 指定测试的TCP包大小。

./iperfhome/bin/iperf3 -c xxx.xxx.xxx.xxx -p 8181 -b 0 -t 100 -P 64 -i 3 -M 90

有必要的话，可以指定并行度，测试tcp或udp, 缓冲区的大小，TCP窗口的大小，MTU大小，堵塞控制算法等等。

通过sar -n DEV 1 10000可以观察服务端以及客户端的接收和发送pps, 带宽等信息。

↧

PostgreSQL prepared statement和simple query的profile及性能差异

April 26, 2016, 7:06 am

≫ Next: 如何用 sysbench 并行装载 PostgreSQL 测试数据

≪ Previous: iperf 测试网络性能指标

prepared statement是非常重要的高并发SQL优化手段之一，效果也显而易见。
下面是测试，同时观察绑定和不绑定的情况下的profile。
在未使用绑定变量的时候，新增或上升了一些硬解析相关的CODE。

测试数据

postgres=# create table test(id int primary key, info text);

postgres=# insert into test select generate_series(1,1000000),'test';

postgres=# create or replace function f1(int) returns setof text as $$
  select info from test where id=$1;
$$ language sql;

测试用例

vi test.sql
\setrandom id 1 1000000
select f1(:id);

使用绑定变量

pgbench -M prepared -n -r -P 5 -f ./test.sql -c 64 -j 64 -T 100

progress: 10.0 s, 526016.9 tps, lat 0.120 ms stddev 0.033
progress: 15.0 s, 523072.8 tps, lat 0.121 ms stddev 0.027
progress: 20.0 s, 523305.2 tps, lat 0.121 ms stddev 0.017
progress: 25.0 s, 523320.9 tps, lat 0.121 ms stddev 0.015
progress: 30.0 s, 523290.4 tps, lat 0.121 ms stddev 0.016
progress: 35.0 s, 523216.3 tps, lat 0.121 ms stddev 0.015
progress: 40.0 s, 523046.3 tps, lat 0.121 ms stddev 0.022
progress: 45.0 s, 523200.9 tps, lat 0.121 ms stddev 0.015
progress: 50.0 s, 523853.5 tps, lat 0.121 ms stddev 0.016
progress: 55.0 s, 526587.1 tps, lat 0.120 ms stddev 0.005
progress: 60.0 s, 526710.0 tps, lat 0.120 ms stddev 0.008

TOP 调用

perf top

   PerfTop:   62851 irqs/sec  kernel:12.9%  exact:  0.0% [1000Hz cycles],  (all, 64 CPUs)
---------------------------------------------------------------------------------------------
  samples  pcnt function                      DSO
  _______ _____ _____________________________ ____________________________________
 39828.00  4.3% AllocSetAlloc                 /home/digoal/pgsql9.5.2/bin/postgres    
 33282.00  3.6% SearchCatCache                /home/digoal/pgsql9.5.2/bin/postgres    
 23098.00  2.5% base_yyparse                  /home/digoal/pgsql9.5.2/bin/postgres    
 21318.00  2.3% GetSnapshotData               /home/digoal/pgsql9.5.2/bin/postgres    
 13218.00  1.4% hash_search_with_hash_value   /home/digoal/pgsql9.5.2/bin/postgres    
 11399.00  1.2% _int_malloc                   /lib64/libc-2.12.so                       
 11362.00  1.2% LWLockAcquire                 /home/digoal/pgsql9.5.2/bin/postgres    
 11151.00  1.2% palloc                        /home/digoal/pgsql9.5.2/bin/postgres    
  9536.00  1.0% __GI_vfprintf                 /lib64/libc-2.12.so                       
  9160.00  1.0% __strcmp_sse42                /lib64/libc-2.12.so                       
  8997.00  1.0% schedule                      [kernel.kallsyms]                         
  8921.00  1.0% __strlen_sse42                /lib64/libc-2.12.so                       
  8799.00  0.9% nocachegetattr                /home/digoal/pgsql9.5.2/bin/postgres    
  8431.00  0.9% MemoryContextAllocZeroAligned /home/digoal/pgsql9.5.2/bin/postgres    
  8314.00  0.9% expression_tree_walker        /home/digoal/pgsql9.5.2/bin/postgres    
  7968.00  0.9% core_yylex                    /home/digoal/pgsql9.5.2/bin/postgres    
  7193.00  0.8% _bt_compare                   /home/digoal/pgsql9.5.2/bin/postgres    
  6402.00  0.7% _int_free                     /lib64/libc-2.12.so                       
  6185.00  0.7% memcpy                        /lib64/libc-2.12.so                       
  5988.00  0.6% fmgr_info_cxt_security        /home/digoal/pgsql9.5.2/bin/postgres    
  5749.00  0.6% __GI___libc_malloc            /lib64/libc-2.12.so                       
  5697.00  0.6% PostgresMain                  /home/digoal/pgsql9.5.2/bin/postgres    
  5444.00  0.6% fmgr_sql                      /home/digoal/pgsql9.5.2/bin/postgres    
  5372.00  0.6% LWLockRelease                 /home/digoal/pgsql9.5.2/bin/postgres    
  4917.00  0.5% grouping_planner              /home/digoal/pgsql9.5.2/bin/postgres    
  4902.00  0.5% ExecInitExpr                  /home/digoal/pgsql9.5.2/bin/postgres    
  4626.00  0.5% pfree                         /home/digoal/pgsql9.5.2/bin/postgres    
  4607.00  0.5% doCustom                      /home/digoal/pgsql9.5.2/bin/pgbench     
  4537.00  0.5% DirectFunctionCall1Coll       /home/digoal/pgsql9.5.2/bin/postgres    
  4521.00  0.5% fget_light                    [kernel.kallsyms]                         
  4329.00  0.5% pqParseInput3                 /home/digoal/pgsql9.5.2/lib/libpq.so.5.8
  4164.00  0.4% AllocSetFree                  /home/digoal/pgsql9.5.2/bin/postgres    
  4013.00  0.4% hash_any                      /home/digoal/pgsql9.5.2/bin/postgres    
  3998.00  0.4% new_list                      /home/digoal/pgsql9.5.2/bin/postgres    
  3994.00  0.4% do_select                     [kernel.kallsyms]                         
  3653.00  0.4% LockReleaseAll                /home/digoal/pgsql9.5.2/bin/postgres    
  3618.00  0.4% hash_search                   /home/digoal/pgsql9.5.2/bin/postgres    
  3505.00  0.4% palloc0                       /home/digoal/pgsql9.5.2/bin/postgres    
  3457.00  0.4% ScanKeywordLookup             /home/digoal/pgsql9.5.2/bin/postgres    
  3390.00  0.4% FunctionCall2Coll             /home/digoal/pgsql9.5.2/bin/postgres    
  3296.00  0.4% LockAcquireExtended           /home/digoal/pgsql9.5.2/bin/postgres    
  3275.00  0.4% __memset_sse2                 /lib64/libc-2.12.so                       
  3201.00  0.3% __cfree                       /lib64/libc-2.12.so                       
  3125.00  0.3% lappend                       /home/digoal/pgsql9.5.2/bin/postgres    
  3004.00  0.3% exec_bind_message             /home/digoal/pgsql9.5.2/bin/postgres    
  2995.00  0.3% __strcpy_ssse3                /lib64/libc-2.12.so                       
  2992.00  0.3% device_not_available          [kernel.kallsyms]

不使用绑定变量
性能明显下降

pgbench -M simple -n -r -P 5 -f ./test.sql -c 64 -j 64 -T 100

progress: 10.0 s, 480056.6 tps, lat 0.132 ms stddev 0.028
progress: 15.0 s, 480355.0 tps, lat 0.132 ms stddev 0.019
progress: 20.0 s, 480321.8 tps, lat 0.132 ms stddev 0.020
progress: 25.0 s, 480246.2 tps, lat 0.132 ms stddev 0.019
progress: 30.0 s, 480274.6 tps, lat 0.132 ms stddev 0.020
progress: 35.0 s, 480286.1 tps, lat 0.132 ms stddev 0.018
progress: 40.0 s, 480229.3 tps, lat 0.132 ms stddev 0.020
progress: 45.0 s, 480095.6 tps, lat 0.132 ms stddev 0.021
progress: 50.0 s, 480098.9 tps, lat 0.132 ms stddev 0.020
progress: 55.0 s, 480066.5 tps, lat 0.132 ms stddev 0.025
progress: 60.0 s, 480148.3 tps, lat 0.132 ms stddev 0.021

TOP 调用

perf top
   PerfTop:   65503 irqs/sec  kernel:12.3%  exact:  0.0% [1000Hz cycles],  (all, 64 CPUs)
----------------------------------------------------------------------------------------------
  samples  pcnt function                       DSO
  _______ _____ ______________________________ ____________________________________
 45824.00  4.6% AllocSetAlloc                  /home/digoal/pgsql9.5.2/bin/postgres    
 38982.00  3.9% base_yyparse                   /home/digoal/pgsql9.5.2/bin/postgres    
 35333.00  3.6% SearchCatCache                 /home/digoal/pgsql9.5.2/bin/postgres    
 23770.00  2.4% GetSnapshotData                /home/digoal/pgsql9.5.2/bin/postgres    
 12440.00  1.3% palloc                         /home/digoal/pgsql9.5.2/bin/postgres    
 12092.00  1.2% hash_search_with_hash_value    /home/digoal/pgsql9.5.2/bin/postgres    
 12092.00  1.2% _int_malloc                    /lib64/libc-2.12.so                       
 11911.00  1.2% core_yylex                     /home/digoal/pgsql9.5.2/bin/postgres    上升
 11286.00  1.1% LWLockAcquire                  /home/digoal/pgsql9.5.2/bin/postgres    
 10893.00  1.1% __strcmp_sse42                 /lib64/libc-2.12.so                       
 10759.00  1.1% MemoryContextAllocZeroAligned  /home/digoal/pgsql9.5.2/bin/postgres    上升
  9946.00  1.0% expression_tree_walker         /home/digoal/pgsql9.5.2/bin/postgres    上升
  9175.00  0.9% schedule                       [kernel.kallsyms]                         
  9049.00  0.9% nocachegetattr                 /home/digoal/pgsql9.5.2/bin/postgres    
  8859.00  0.9% __strlen_sse42                 /lib64/libc-2.12.so                       
  8020.00  0.8% __GI_vfprintf                  /lib64/libc-2.12.so                       
  7396.00  0.7% _int_free                      /lib64/libc-2.12.so                       
  6847.00  0.7% __GI___libc_malloc             /lib64/libc-2.12.so                       
  6842.00  0.7% _bt_compare                    /home/digoal/pgsql9.5.2/bin/postgres    
  6468.00  0.7% grouping_planner               /home/digoal/pgsql9.5.2/bin/postgres    
  5468.00  0.6% fmgr_sql                       /home/digoal/pgsql9.5.2/bin/postgres    
  5403.00  0.5% memcpy                         /lib64/libc-2.12.so                       
  5328.00  0.5% LWLockRelease                  /home/digoal/pgsql9.5.2/bin/postgres    
  5277.00  0.5% fmgr_info_cxt_security         /home/digoal/pgsql9.5.2/bin/postgres    
  5024.00  0.5% ExecInitExpr                   /home/digoal/pgsql9.5.2/bin/postgres    
  4819.00  0.5% DirectFunctionCall1Coll        /home/digoal/pgsql9.5.2/bin/postgres    
  4620.00  0.5% new_list                       /home/digoal/pgsql9.5.2/bin/postgres    
  4582.00  0.5% fget_light                     [kernel.kallsyms]                         
  4563.00  0.5% ScanKeywordLookup              /home/digoal/pgsql9.5.2/bin/postgres    
  4501.00  0.5% doCustom                       /home/digoal/pgsql9.5.2/bin/pgbench     
  4453.00  0.4% AllocSetFree                   /home/digoal/pgsql9.5.2/bin/postgres    
  4354.00  0.4% pfree                          /home/digoal/pgsql9.5.2/bin/postgres    
  4096.00  0.4% pqParseInput3                  /home/digoal/pgsql9.5.2/lib/libpq.so.5.8
  4050.00  0.4% do_select                      [kernel.kallsyms]                         
  4000.00  0.4% lappend                        /home/digoal/pgsql9.5.2/bin/postgres    
  3892.00  0.4% hash_any                       /home/digoal/pgsql9.5.2/bin/postgres    
  3863.00  0.4% __memset_sse2                  /lib64/libc-2.12.so                       
  3798.00  0.4% expression_tree_mutator        /home/digoal/pgsql9.5.2/bin/postgres    下降
  3777.00  0.4% palloc0                        /home/digoal/pgsql9.5.2/bin/postgres    
  3773.00  0.4% check_stack_depth              /home/digoal/pgsql9.5.2/bin/postgres    新增
  3643.00  0.4% heap_getsysattr                /home/digoal/pgsql9.5.2/bin/postgres    新增
  3487.00  0.4% SearchSysCache                 /home/digoal/pgsql9.5.2/bin/postgres    新增
  3485.00  0.4% LockReleaseAll                 /home/digoal/pgsql9.5.2/bin/postgres    
  3460.00  0.3% eval_const_expressions_mutator /home/digoal/pgsql9.5.2/bin/postgres    新增
  3444.00  0.3% FunctionCall2Coll              /home/digoal/pgsql9.5.2/bin/postgres    
  3419.00  0.3% __strcpy_ssse3                 /lib64/libc-2.12.so                       
  3201.00  0.3% LockAcquireExtended            /home/digoal/pgsql9.5.2/bin/postgres

↧

如何用 sysbench 并行装载 PostgreSQL 测试数据

April 26, 2016, 8:06 am

≫ Next: 让 sysbench 支持 PostgreSQL 服务端绑定变量

≪ Previous: PostgreSQL prepared statement和simple query的profile及性能差异

本文参考老唐的使用sysbench和sqlldr并行装载Oracle测试数据而成。
http://blog.osdba.net/538.html

sysbench原来自带的lua数据装载脚本是使用以下方式串行装载的，速度比较慢(比单条insert快，但是比COPY慢)。

insert into table1 values (),(),()....    
insert into table2 values (),(),()....    
...
insert into tablen values (),(),()....

使用prepare导入数据的用法举例

./sysbench_pg --test=lua/oltp.lua --db-driver=pgsql --pgsql-host=127.0.0.1 --pgsql-port=1921 --pgsql-user=postgres --pgsql-password=postgres --pgsql-db=postgres --oltp-tables-count=64 --oltp-table-size=1000000 --num-threads=64 prepare

prepare 表示装载数据，但是它串行的。
sysbench0.5中可以在命令行中指定测试时启动的并行线程数，这个测试过程是使用run命令，而且是多线程并发的，所以我们可以使用sysbench的run命令来造数据，而不再使用其提供的prepare命令的方法来造数据。run命令会根据命令行参数--num-threads来指定并发线程数的多少。
在sysbench中自定义的lua脚本中要求实现以下几个函数：

function thread_init(thread_id): 此函数在线程创建后只被执行一次  
function event(thread_id): 每执行一次就会被调用一次。

由上可以知道，本次造数据的脚本我们只需要实现thread_init()函数就可以了。

生成测试数据的脚本沿用老唐提供的代码：

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <stdint.h>
#include <sys/time.h>
uint64_t my_rand(struct random_data * r1, struct random_data * r2)
{
    uint64_t rand_max = 100000000000LL;
    uint64_t result;
    uint32_t u1, u2;
    random_r(r1, &u1);
    random_r(r2, &u2);
    result = (int64_t)u1 * (int64_t)u2;
    result = result % rand_max;
    return result;
}
int main(int argc, char *argv[])
{
    struct timeval tpstart;
    struct random_data r1, r2;
    int i;
    int r;
    int max_value;
    char rand_state1[128];
    char rand_state2[128];
    if (argc !=2)
    {
        printf("Usage: %s <rownums>\n", argv[0]);
        return 1;
    }
    max_value = atoi(argv[1]);
    gettimeofday(&tpstart,NULL);
    initstate_r(tpstart.tv_usec,rand_state1,sizeof(rand_state1),&r1);
    srandom_r(tpstart.tv_usec, &r1);
    gettimeofday(&tpstart,NULL);
    initstate_r(tpstart.tv_usec,rand_state2,sizeof(rand_state1),&r2);
    srandom_r(tpstart.tv_usec, &r2);
    for (i=1; i<max_value+1; i++)
    {
        r = my_rand(&r1, &r2) % max_value; 
        printf("%d,%d,%011llu-%011llu-%011llu-%011llu-%011llu-%011llu-%011llu-%011llu-%011llu-%011llu,%011llu-%011llu-%011llu-%011llu-%011llu\n",
                i,
                r,
                 my_rand(&r1, &r2),
                 my_rand(&r1, &r2),
                 my_rand(&r1, &r2),
                 my_rand(&r1, &r2),
                 my_rand(&r1, &r2),
                 my_rand(&r1, &r2),
                 my_rand(&r1, &r2),
                 my_rand(&r1, &r2),
                 my_rand(&r1, &r2),
                 my_rand(&r1, &r2),
                 my_rand(&r1, &r2),
                 my_rand(&r1, &r2),
                 my_rand(&r1, &r2),
                 my_rand(&r1, &r2),
                 my_rand(&r1, &r2)
              );
    }
    return 0;
}

编译此C语言程序的方法如下：

gcc gendata.c -o gendata

新建一个copy.lua的脚本，内容如下
调用 common.lua中的 set_vars() 继承来自 common.lua 的全局变量。
函数 copydata(table_id) ：创建表，创建管道，将管道数据传输到psql -c "copy ..."客户端的方式导入数据。
函数 create_index(table_id) ：创建索引，调整SEQUENCE next val。
注意咯, oltp_tables_count 必须是 num_threads 的倍数，在 thread_init 中，以num_threads 为步调，以thread_id+1为起始值，设置i的值，并调用copydata(table_id)和create_index(table_id)。

$ vi lua/copy.lua
pathtest = string.match(test, "(.*/)") or ""

dofile(pathtest .. "common.lua")

function copydata(table_id)
  local query

  query = [[
CREATE UNLOGGED TABLE sbtest]] .. table_id .. [[ (
id SERIAL NOT NULL,
k INTEGER,
c CHAR(120) DEFAULT '' NOT NULL,
pad CHAR(60) DEFAULT '' NOT NULL,
PRIMARY KEY (id)
) ]]

  db_query(query)

  os.execute ('export PGPASSWORD=' .. pgsql_password)
  os.execute ('rm -f sbtest' .. table_id .. '.dat')
  os.execute ('mknod sbtest' .. table_id .. '.dat p')
  os.execute ('./gendata ' .. oltp_table_size .. ' >> sbtest'..table_id ..'.dat &')
  os.execute ('cat sbtest' .. table_id .. '.dat | psql -h ' .. pgsql_host .. ' -p ' .. pgsql_port .. ' -U ' .. pgsql_user .. ' -d ' .. pgsql_db .. ' -c "copy sbtest' .. table_id .. ' from stdin with csv"')
  os.execute ('rm -f sbtest' .. table_id .. '.dat')
end

function create_index(table_id)
  db_query("select setval('sbtest" .. table_id .. "_id_seq', " .. (oltp_table_size+1) .. ")" )
  db_query("CREATE INDEX k_" .. table_id .. " on sbtest" .. table_id .. "(k)")
end

function thread_init(thread_id)
   set_vars()

   print("thread prepare"..thread_id)

   for i=thread_id+1, oltp_tables_count, num_threads  do
     copydata(i)
     create_index(i)
   end
end

function event(thread_id)
   os.exit()
end

用法，必须把psql放到路径中，因为lua中需要用到psql命令

export PATH=/home/digoal/pgsql9.5/bin:$PATH

生成数据，速度比以前快多了

./sysbench_pg --test=lua/copy.lua \
  --db-driver=pgsql \
  --pgsql-host=127.0.0.1 \
  --pgsql-port=1921 \
  --pgsql-user=postgres \
  --pgsql-password=postgres \
  --pgsql-db=postgres \
  --oltp-tables-count=64 \
  --oltp-table-size=1000000 \
  --num-threads=64 \
  run

清除数据, drop table

./sysbench_pg --test=lua/copy.lua \
  --db-driver=pgsql \
  --pgsql-host=127.0.0.1 \
  --pgsql-port=1921 \
  --pgsql-user=postgres \
  --pgsql-password=postgres \
  --pgsql-db=postgres \
  --oltp-tables-count=64 \
  --oltp-table-size=1000000 \
  --num-threads=64 \
  cleanup

lua全局变量代码：

sysbench/scripting/lua/src/lua.h:#define lua_register(L,n,f) (lua_pushcfunction(L, (f)), lua_setglobal(L, (n)))
sysbench/scripting/lua/src/lua.h:#define lua_setglobal(L,s)     lua_setfield(L, LUA_GLOBALSINDEX, (s))
sysbench/scripting/lua/src/lbaselib.c:  lua_setglobal(L, "_G");
sysbench/scripting/lua/src/lbaselib.c:  lua_setglobal(L, "_VERSION");  /* set global _VERSION */
sysbench/scripting/lua/src/lbaselib.c:  lua_setglobal(L, "newproxy");  /* set global `newproxy' */
sysbench/scripting/script_lua.c:    lua_setglobal(state, opt->name);
sysbench/scripting/script_lua.c:  lua_setglobal(state, "sb_rand");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "sb_rand_uniq");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "sb_rnd");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "sb_rand_str");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "sb_rand_uniform");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "sb_rand_gaussian");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "sb_rand_special");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "db_connect");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "db_disconnect");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "db_query");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "db_bulk_insert_init");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "db_bulk_insert_next");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "db_bulk_insert_done");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "db_prepare");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "db_bind_param");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "db_bind_result");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "db_execute");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "db_close");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "db_store_results");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "db_free_results");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "DB_ERROR_NONE");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "DB_ERROR_DEADLOCK");
sysbench/scripting/script_lua.c:  lua_setglobal(state, "DB_ERROR_FAILED");
sysbench/scripting/script_lua.c:  lua_setglobal(L, "db_driver");

传入参数，可以把sysbench_pg的参数-替换成_在lua脚本中使用这些变量，例子

--pgsql-host=127.0.0.1  -> 对应lua中的变量名 pgsql_host
--pgsql-port=1921   -> 对应lua中的变量名 pgsql_port
--pgsql-user=postgres   -> 对应lua中的变量名 pgsql_user
--pgsql-password=postgres   -> 对应lua中的变量名 pgsql_password
--pgsql-db=postgres   -> 对应lua中的变量名 pgsql_db
--oltp-tables-count=64   -> 对应lua中的变量名 oltp_tables_count
--oltp-table-size=1000000   -> 对应lua中的变量名 oltp_table_size
--num-threads=64  -> 对应lua中的变量名 num_threads

↧