使用gprecoverseg修复Segment节点
greenplum环境中测试的时候, segment节点sdw2由于硬盘空间不足,显示宕机了,重新启动的时候节点报错,启动不了;
使用gpstate -m查看节点状态显示sdw2节点失败:
[gpadmin@dw01 gpmaster]$ gpstate -m
gpstate:dw01:gpadmin-[INFO]:-Starting gpstate with args: -m
gpstate:dw01:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.12.0 build 1'
gpstate:dw01:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.12.0 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 27 2017 20:45:12'
gpstate:dw01:gpadmin-[INFO]:-Obtaining Segment details from master...
gpstate:dw01:gpadmin-[INFO]:--------------------------------------------------------------
gpstate:dw01:gpadmin-[INFO]:--Current GPDB mirror list and status
gpstate:dw01:gpadmin-[INFO]:--Type = Group
gpstate:dw01:gpadmin-[INFO]:--------------------------------------------------------------
gpstate:dw01:gpadmin-[INFO]:- Mirror Datadir Port Status Data Status
gpstate:dw01:gpadmin-[WARNING]:-sdw2 /data/gpdata/gpdatam1/gpseg0 50000 Failed <<<<<<<<
gpstate:dw01:gpadmin-[WARNING]:-sdw2 /data/gpdata/gpdatam1/gpseg1 50001 Failed <<<<<<<<
gpstate:dw01:gpadmin-[INFO]:- sdw1 /data/gpdata/gpdatam1/gpseg2 50000 Passive Synchronized
gpstate:dw01:gpadmin-[INFO]:- sdw1 /data/gpdata/gpdatam1/gpseg3 50001 Passive Synchronized
gpstate:dw01:gpadmin-[INFO]:--------------------------------------------------------------
gpstate:dw01:gpadmin-[WARNING]:-2 segment(s) configured as mirror(s) have failed
gprecoverseg参数选项
-a (不提示)
不要提示用户确认。
-B parallel_processes
并行恢复的Segment数。如果未指定,则实用程序将启动最多四个并行进程,具体取决于需要恢复多少个Segment实例。
-d master_data_directory
可选。Master主机的数据目录。如果未指定,则使用为$MASTER_DATA_DIRECTORY设置的值。
-F (完全恢复)
可选。执行活动Segment实例的完整副本以恢复出现故障的Segment。 默认情况下,仅复制Segment关闭时发生的增量更改。
-i recover_config_file
指定文件的名称以及有关失效Segment要恢复的详细信息。文件中的每一行都是以下格式。SPACE关键字表示所需空间的位置。不要添加额外的空间。
filespaceOrder=[filespace1_fsname[, filespace2_fsname[, ...]]
<failed_host_address>:<port>:<data_directory>SPACE
<recovery_host_address>:<port>:<replication_port>:<data_directory>
[:<fselocation>:...]
恢复所有失效的Segment实例:
gprecoverseg
恢复后,重新平衡用户的Greenplum数据库系统,将所有Segment重置为其首选角色。 首先检查所有Segment已启动并同步
将任何失效的Segment实例恢复到新配置的空闲Segment主机:
$ gprecoverseg -i recover_config_file
本例使用gprecoverseg修复:
20180420_172.28.95.255038[gpadmin@dw01 pg_log]$ gprecoverseg
20180420_172.28.95.25503820180420:21:50:37:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Starting gprecoverseg with args:
20180420_172.28.95.25503820180420:21:50:37:002098 gprecoverseg:dw01:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.12.0 build 1'
20180420_172.28.95.25503820180420:21:50:37:002098 gprecoverseg:dw01:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.12.0 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 27 2017 20:45:12'
20180420_172.28.95.25503820180420:21:50:37:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Checking if segments are ready to connect
20180420_172.28.95.25503820180420:21:50:37:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Obtaining Segment details from master...
20180420_172.28.95.25503820180420:21:50:37:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Obtaining Segment details from master...
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Greenplum instance recovery parameters
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:----------------------------------------------------------
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Recovery type = Standard
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:----------------------------------------------------------
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Recovery 1 of 2
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:----------------------------------------------------------
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Synchronization mode = Incremental
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Failed instance host = dw04
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Failed instance address = sdw2
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Failed instance directory = /data/gpdata/gpdatam1/gpseg0
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Failed instance port = 50000
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Failed instance replication port = 51000
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Recovery Source instance host = dw03
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Recovery Source instance address = sdw1
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Recovery Source instance directory = /data/gpdata/gpdatap1/gpseg0
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Recovery Source instance port = 40000
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Recovery Source instance replication port = 41000
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Recovery Target = in-place
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:----------------------------------------------------------
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Recovery 2 of 2
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:----------------------------------------------------------
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Synchronization mode = Incremental
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Failed instance host = dw04
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Failed instance address = sdw2
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Failed instance directory = /data/gpdata/gpdatam1/gpseg1
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Failed instance port = 50001
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Failed instance replication port = 51001
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Recovery Source instance host = dw03
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Recovery Source instance address = sdw1
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Recovery Source instance directory = /data/gpdata/gpdatap1/gpseg1
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Recovery Source instance port = 40001
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Recovery Source instance replication port = 41001
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:- Recovery Target = in-place
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:----------------------------------------------------------
20180420_172.28.95.255039
20180420_172.28.95.255039Continue with segment recovery procedure Yy|Nn (default=N):
20180420_172.28.95.255041> y
20180420_172.28.95.25504120180420:21:50:40:002098 gprecoverseg:dw01:gpadmin-[INFO]:-2 segment(s) to recover
20180420_172.28.95.25504120180420:21:50:40:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Ensuring 2 failed segment(s) are stopped
20180420_172.28.95.255042
20180420_172.28.95.25504220180420:21:50:41:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Ensuring that shared memory is cleaned up for stopped segments
20180420_172.28.95.255047updating flat files
20180420_172.28.95.25504720180420:21:50:46:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Updating configuration with new mirrors
20180420_172.28.95.25504720180420:21:50:46:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Updating mirrors
20180420_172.28.95.255048.
20180420_172.28.95.25504820180420:21:50:47:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Starting mirrors
20180420_172.28.95.25504820180420:21:50:48:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
20180420_172.28.95.255052....
20180420_172.28.95.25505220180420:21:50:52:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Process results...
20180420_172.28.95.25505220180420:21:50:52:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Updating configuration to mark mirrors up
20180420_172.28.95.25505220180420:21:50:52:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Updating primaries
20180420_172.28.95.25505220180420:21:50:52:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Commencing parallel primary conversion of 2 segments, please wait...
20180420_172.28.95.255054..
20180420_172.28.95.25505420180420:21:50:54:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Process results...
20180420_172.28.95.25505420180420:21:50:54:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Done updating primaries
20180420_172.28.95.25505420180420:21:50:54:002098 gprecoverseg:dw01:gpadmin-[INFO]:-******************************************************************
20180420_172.28.95.25505420180420:21:50:54:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Updating segments for resynchronization is completed.
20180420_172.28.95.25505420180420:21:50:54:002098 gprecoverseg:dw01:gpadmin-[INFO]:-For segments updated successfully, resynchronization will continue in the background.
20180420_172.28.95.25505420180420:21:50:54:002098 gprecoverseg:dw01:gpadmin-[INFO]:-
20180420_172.28.95.25505420180420:21:50:54:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Use gpstate -s to check the resynchronization progress.
20180420_172.28.95.25505420180420:21:50:54:002098 gprecoverseg:dw01:gpadmin-[INFO]:-******************************************************************
修复完成查看节点状态:
20180420_172.28.95.255110[gpadmin@dw01 pg_log]$ gpstate -m
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:-Starting gpstate with args: -m
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.12.0 build 1'
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.12.0 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 27 2017 20:45:12'
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:-Obtaining Segment details from master...
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:--------------------------------------------------------------
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:--Current GPDB mirror list and status
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:--Type = Group
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:--------------------------------------------------------------
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:- Mirror Datadir Port Status Data Status
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:- sdw2 /data/gpdata/gpdatam1/gpseg0 50000 Passive Resynchronizing
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:- sdw2 /data/gpdata/gpdatam1/gpseg1 50001 Passive Resynchronizing
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:- sdw1 /data/gpdata/gpdatam1/gpseg2 50000 Passive Synchronized
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:- sdw1 /data/gpdata/gpdatam1/gpseg3 50001 Passive Synchronized
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:--------------------------------------------------------------
节点全部启动,sdw2节点正在重新同步,过一段时间一般几分钟即可,根据数据量大小而定,一般很很快同步完毕;
参考文档:
https://gp-docs-cn.github.io/docs/utility_guide/admin_utilities/gprecoverseg.html
http://mysql.taobao.org/monthly/2016/04/03/
Tag标签:「gprecoverseg Segment 节点」更新时间:「2021-11-05 01:12:12」阅读次数:「991」