文档章节

将PostgreSQL数据库的表导入到elasticsearch中

o
 osc_zoa3moe9
发布于 2019/12/07 23:03
字数 722
阅读 7
收藏 0

精选30+云产品,助力企业轻松上云!>>>

1.查看PostgreSQL表结构和数据信息

edbstore=# \d customers
                                          Table "edbstore.customers"
        Column        |         Type          |                           Modifiers                            
----------------------+-----------------------+----------------------------------------------------------------
 customerid           | integer               | not null default nextval('customers_customerid_seq'::regclass)
 firstname            | character varying(50) | not null
 lastname             | character varying(50) | not null
 address1             | character varying(50) | not null
 address2             | character varying(50) | 
 city                 | character varying(50) | not null
 state                | character varying(50) | 
 zip                  | integer               | 
 country              | character varying(50) | not null
 region               | smallint              | not null
 email                | character varying(50) | 
 phone                | character varying(50) | 
 creditcardtype       | integer               | not null
 creditcard           | character varying(50) | not null
 creditcardexpiration | character varying(50) | not null
 username             | character varying(50) | not null
 password             | character varying(50) | not null
 age                  | smallint              | 
 income               | integer               | 
 gender               | character varying(1)  | 
Indexes:
    "customers_pkey" PRIMARY KEY, btree (customerid)
    "ix_cust_username" UNIQUE, btree (username)
Referenced by:
    TABLE "cust_hist" CONSTRAINT "fk_cust_hist_customerid" FOREIGN KEY (customerid) REFERENCES customers(customerid) ON DELETE CASCADE
    TABLE "orders" CONSTRAINT "fk_customerid" FOREIGN KEY (customerid) REFERENCES customers(customerid) ON DELETE SET NULL

edbstore=# select count(1) from customers;
 count 
-------
 20000
(1 row)

2.利用PostgreSQL的row_to_json函数将表结构导出并保存为json格式

edbstore=# \t
Tuples only is on.
edbstore=# \o customer.json
edbstore=# select row_to_json(r) from customers as r;
edbstore=# \q

[postgres@sht-sgmhadoopcm-01 dba]$ ls -lh customer.json 
-rw-r--r-- 1 postgres appuser 7.7M Dec  7 22:37 customer.json

$ head -1 customer.json 
 {"customerid":1,"firstname":"VKUUXF","lastname":"ITHOMQJNYX","address1":"4608499546 Dell Way","address2":null,"city":"QSDPAGD","state":"SD","zip":24101,"country":"US","region":1,"email":"ITHOMQJNYX@dell.com","phone":"4608499546","creditcardtype":1,"creditcard":"1979279217775911","creditcardexpiration":"2012/03","username":"user1","password":"password","age":55,"income":100000,"gender":"M"}

此时customer表虽然转储为json格式文件,但是并不能直接导入到elasticsearch,否则会报错如下

$ curl -H "Content-Type: application/json" -XPOST "172.16.101.55:9200/bank/_bulk?pretty&refresh" --data-binary "@customer.json"
{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_NUMBER]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_NUMBER]"
  },
  "status" : 400
}

 根据文档https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html说明,我们的json数据里并未指定每行数据唯一的文档id值

3.为json格式的表数据添加id字段

因为之前我们看到该customer表共有2000行,所以我们需要生成对应的20000个id值,我们借助python实现,新建build_id.py文件,并写入如下内容,看清楚是20001,因为包头不包尾原则,1-20000实际打印出来是1-19999,所以我们写1-20001

for i in range(1,20001):
    print('{"index":{"_id":"%s"}}' %i ) 

为该文件添加可执行权限,然后执行即可

$ python build_id.py > build_id.txt

$ head -3 build_id.txt 
{"index":{"_id":"1"}}
{"index":{"_id":"2"}}
{"index":{"_id":"3"}}

利用linux “paste"命令,将id文件和表文件合并

$ paste -d'\n' build_id.txt customer.json > customer_new.json

$ head -4 customer_new.json 
{"index":{"_id":"1"}}
 {"customerid":1,"firstname":"VKUUXF","lastname":"ITHOMQJNYX","address1":"4608499546 Dell Way","address2":null,"city":"QSDPAGD","state":"SD","zip":24101,"country":"US","region":1,"email":"ITHOMQJNYX@dell.com","phone":"4608499546","creditcardtype":1,"creditcard":"1979279217775911","creditcardexpiration":"2012/03","username":"user1","password":"password","age":55,"income":100000,"gender":"M"}
{"index":{"_id":"2"}}
 {"customerid":2,"firstname":"HQNMZH","lastname":"UNUKXHJVXB","address1":"5119315633 Dell Way","address2":null,"city":"YNCERXJ","state":"AZ","zip":11802,"country":"US","region":1,"email":"UNUKXHJVXB@dell.com","phone":"5119315633","creditcardtype":1,"creditcard":"3144519586581737","creditcardexpiration":"2012/11","username":"user2","password":"password","age":80,"income":40000,"gender":"M"}

 4.此时处理过的json格式的表文件就可以正常导入到elasticsearch中了,测试

$ curl -H "Content-Type: application/json" -XPOST "172.16.101.55:9200/customer/_bulk?pretty&refresh" --data-binary "@customer_new.json"
$ curl http://172.16.101.55:9200/_cat/indices?v
health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   customer DvLoM7NjSYyjTwD5BSkK3A   1   1      20000            0       10mb           10mb
o
粉丝 1
博文 500
码字总数 0
作品 0
私信 提问
加载中
请先登录后再评论。
Postgresql与Elasticsearch数据同步提高查询性能

一般来说,影响数据库最大的性能问题有两个,一个是对数据库的读写操作,一个是数据库中的数据太大导致操作慢,对于前者我们可以适当借助缓存来减少一部分读操作,而针对一些复杂的报表分析和...

郭柏雅
2019/06/16
0
0
CentOS安装SonarQube7.9.1

1、准备 SonarQube版本:sonarqube-7.9.1.zip,官网地址:https://www.sonarqube.org/downloads/ jdk版本:jdk-11.0.4linux-x64bin.tar.gz(sonarqube从7.9起,不再支持jdk11以下版本) 数据...

济南刘振
2019/10/03
500
0
CentOS安装SonarQube7.9.1

1、准备 SonarQube版本:sonarqube-7.9.1.zip,官网地址:https://www.sonarqube.org/downloads/ jdk版本:jdk-11.0.4linux-x64bin.tar.gz(sonarqube从7.9起,不再支持jdk11以下版本) 数据...

osc_lrhq0eax
2019/08/29
5
0
DB-Engines 2017 年度数据库:PostgreSQL 实至名归

DB-Engines 网站宣布 PostgreSQL 为 2017 年度数据库管理系统。 DB-Engines 表示,PostgreSQL 在 2017 年的数据库排名中,比其他监测到的 341 个数据库管理系统都更受欢迎。因此,决定宣布 ...

局长
2018/01/10
3.6K
10
做日志系统,用Elasticsearch,mongodb,postgresql做日志数据库的话,优缺点各方面有哪些呢?

目前系统里有各种log,每次处理比较麻烦,想做一个专有的日志服务web程序,用数据库存日志,但日志可视化显示定制为自己的网页。关于数据库,我想到了postgresql,Elasticsearch,mongodb其中...

齿轮1
04/14
6.5K
17

没有更多内容

加载失败,请刷新页面

加载更多

使用命名管道承载gRPC

最近GRPC很火,感觉整RPC不用GRPC都快跟不上时髦了。 gRPC设计 gRPC是一种与语言无关的高性能远程过程调用 (RPC) 框架。刚好需要使用一个的RPC应用系统,自然而然就盯上了它,但是它真能够解...

osc_nq69o22c
35分钟前
16
0
06-敏捷开发框架-apis 脚本库 引用位置无关性设计

动态引入技术的设计,对我们来说非常重要。 同时也说明动态语言的使用对我们来说也是非常重要。 没有动态语言的支撑,有些想法可能不容易实现,或者有替代方案,可能会花更大的代价。 前端开...

osc_5zg9z6t1
37分钟前
21
0
(三)学习了解OrchardCore笔记——灵魂中间件ModularTenantContainerMiddleware的第一行①的模块部分

  了解到了OrchardCore主要由两个中间件(ModularTenantContainerMiddleware和ModularTenantRouterMiddleware)构成,下面开始了解ModularTenantContainerMiddleware中间件第一行代码。   ...

osc_kdarxvx0
38分钟前
15
0
50Mn18Cr4V锻锻环件

电机无磁护环怎么锻性能才能《高高》?50Mn18Cr4V高锰无磁钢在变形温度为900~1 100℃、应变速率为0.1 ~10s-1条件下的热变形行为. 结果,VC第二相的应变诱导析出对50Mn18Cr4V的热变形行为产生...

无磁钢
39分钟前
16
0
【遇见offer】一汽-大众实习生专场来啦!成长+学习+福利,一个也不能少~

在上次一汽-大众的社招直播之后,实习生的专场招聘也终于来啦! 针对2020年暑期,我们提供了非常多的实习岗位给大家选择。 如果你想得到大厂实习的宝贵经验,如果你想得到更快速的成长,如果...

osc_b88oux8w
40分钟前
25
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部