分类: PostgreSQL
使用gprecoverseg修复Segment节点

greenplum环境中测试的时候, segment节点sdw2由于硬盘空间不足,显示宕机了,重新启动的时候节点报错,启动不了;
使用gpstate -m查看节点状态显示sdw2节点失败:

[gpadmin@dw01 gpmaster]$ gpstate -m

gpstate:dw01:gpadmin-[INFO]:-Starting gpstate with args: -m
gpstate:dw01:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.12.0 build 1'
gpstate:dw01:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.12.0 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 27 2017 20:45:12'
gpstate:dw01:gpadmin-[INFO]:-Obtaining Segment details from master...
gpstate:dw01:gpadmin-[INFO]:--------------------------------------------------------------
gpstate:dw01:gpadmin-[INFO]:--Current GPDB mirror list and status
gpstate:dw01:gpadmin-[INFO]:--Type = Group
gpstate:dw01:gpadmin-[INFO]:--------------------------------------------------------------
gpstate:dw01:gpadmin-[INFO]:-   Mirror   Datadir                        Port    Status    Data Status    
gpstate:dw01:gpadmin-[WARNING]:-sdw2     /data/gpdata/gpdatam1/gpseg0   50000   Failed                   <<<<<<<<
gpstate:dw01:gpadmin-[WARNING]:-sdw2     /data/gpdata/gpdatam1/gpseg1   50001   Failed                   <<<<<<<<
gpstate:dw01:gpadmin-[INFO]:-   sdw1     /data/gpdata/gpdatam1/gpseg2   50000   Passive   Synchronized
gpstate:dw01:gpadmin-[INFO]:-   sdw1     /data/gpdata/gpdatam1/gpseg3   50001   Passive   Synchronized
gpstate:dw01:gpadmin-[INFO]:--------------------------------------------------------------
gpstate:dw01:gpadmin-[WARNING]:-2 segment(s) configured as mirror(s) have failed

gprecoverseg参数选项

-a (不提示)
不要提示用户确认。
-B parallel_processes
并行恢复的Segment数。如果未指定,则实用程序将启动最多四个并行进程,具体取决于需要恢复多少个Segment实例。
-d master_data_directory
可选。Master主机的数据目录。如果未指定,则使用为$MASTER_DATA_DIRECTORY设置的值。
-F (完全恢复)
可选。执行活动Segment实例的完整副本以恢复出现故障的Segment。 默认情况下,仅复制Segment关闭时发生的增量更改。
-i recover_config_file
指定文件的名称以及有关失效Segment要恢复的详细信息。文件中的每一行都是以下格式。SPACE关键字表示所需空间的位置。不要添加额外的空间。
filespaceOrder=[filespace1_fsname[, filespace2_fsname[, ...]]
<failed_host_address>:<port>:<data_directory>SPACE 
<recovery_host_address>:<port>:<replication_port>:<data_directory>
[:<fselocation>:...]

恢复所有失效的Segment实例:

gprecoverseg

恢复后,重新平衡用户的Greenplum数据库系统,将所有Segment重置为其首选角色。 首先检查所有Segment已启动并同步
将任何失效的Segment实例恢复到新配置的空闲Segment主机:

$ gprecoverseg -i recover_config_file

本例使用gprecoverseg修复:

20180420_172.28.95.255038[gpadmin@dw01 pg_log]$ gprecoverseg
20180420_172.28.95.25503820180420:21:50:37:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Starting gprecoverseg with args: 
20180420_172.28.95.25503820180420:21:50:37:002098 gprecoverseg:dw01:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.12.0 build 1'
20180420_172.28.95.25503820180420:21:50:37:002098 gprecoverseg:dw01:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.12.0 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 27 2017 20:45:12'
20180420_172.28.95.25503820180420:21:50:37:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Checking if segments are ready to connect
20180420_172.28.95.25503820180420:21:50:37:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Obtaining Segment details from master...
20180420_172.28.95.25503820180420:21:50:37:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Obtaining Segment details from master...
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Greenplum instance recovery parameters
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:----------------------------------------------------------
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Recovery type              = Standard
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:----------------------------------------------------------
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Recovery 1 of 2
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:----------------------------------------------------------
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Synchronization mode                        = Incremental
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Failed instance host                        = dw04
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Failed instance address                     = sdw2
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Failed instance directory                   = /data/gpdata/gpdatam1/gpseg0
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Failed instance port                        = 50000
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Failed instance replication port            = 51000
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Recovery Source instance host               = dw03
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Recovery Source instance address            = sdw1
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Recovery Source instance directory          = /data/gpdata/gpdatap1/gpseg0
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Recovery Source instance port               = 40000
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Recovery Source instance replication port   = 41000
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Recovery Target                             = in-place
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:----------------------------------------------------------
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Recovery 2 of 2
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:----------------------------------------------------------
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Synchronization mode                        = Incremental
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Failed instance host                        = dw04
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Failed instance address                     = sdw2
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Failed instance directory                   = /data/gpdata/gpdatam1/gpseg1
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Failed instance port                        = 50001
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Failed instance replication port            = 51001
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Recovery Source instance host               = dw03
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Recovery Source instance address            = sdw1
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Recovery Source instance directory          = /data/gpdata/gpdatap1/gpseg1
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Recovery Source instance port               = 40001
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Recovery Source instance replication port   = 41001
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:-   Recovery Target                             = in-place
20180420_172.28.95.25503920180420:21:50:38:002098 gprecoverseg:dw01:gpadmin-[INFO]:----------------------------------------------------------
20180420_172.28.95.255039
20180420_172.28.95.255039Continue with segment recovery procedure Yy|Nn (default=N):
20180420_172.28.95.255041> y
20180420_172.28.95.25504120180420:21:50:40:002098 gprecoverseg:dw01:gpadmin-[INFO]:-2 segment(s) to recover
20180420_172.28.95.25504120180420:21:50:40:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Ensuring 2 failed segment(s) are stopped
20180420_172.28.95.255042 
20180420_172.28.95.25504220180420:21:50:41:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Ensuring that shared memory is cleaned up for stopped segments
20180420_172.28.95.255047updating flat files
20180420_172.28.95.25504720180420:21:50:46:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Updating configuration with new mirrors
20180420_172.28.95.25504720180420:21:50:46:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Updating mirrors
20180420_172.28.95.255048. 
20180420_172.28.95.25504820180420:21:50:47:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Starting mirrors
20180420_172.28.95.25504820180420:21:50:48:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
20180420_172.28.95.255052.... 
20180420_172.28.95.25505220180420:21:50:52:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Process results...
20180420_172.28.95.25505220180420:21:50:52:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Updating configuration to mark mirrors up
20180420_172.28.95.25505220180420:21:50:52:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Updating primaries
20180420_172.28.95.25505220180420:21:50:52:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Commencing parallel primary conversion of 2 segments, please wait...
20180420_172.28.95.255054.. 
20180420_172.28.95.25505420180420:21:50:54:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Process results...
20180420_172.28.95.25505420180420:21:50:54:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Done updating primaries
20180420_172.28.95.25505420180420:21:50:54:002098 gprecoverseg:dw01:gpadmin-[INFO]:-******************************************************************
20180420_172.28.95.25505420180420:21:50:54:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Updating segments for resynchronization is completed.
20180420_172.28.95.25505420180420:21:50:54:002098 gprecoverseg:dw01:gpadmin-[INFO]:-For segments updated successfully, resynchronization will continue in the background.
20180420_172.28.95.25505420180420:21:50:54:002098 gprecoverseg:dw01:gpadmin-[INFO]:-
20180420_172.28.95.25505420180420:21:50:54:002098 gprecoverseg:dw01:gpadmin-[INFO]:-Use  gpstate -s  to check the resynchronization progress.
20180420_172.28.95.25505420180420:21:50:54:002098 gprecoverseg:dw01:gpadmin-[INFO]:-******************************************************************
修复完成查看节点状态:

20180420_172.28.95.255110[gpadmin@dw01 pg_log]$ gpstate -m
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:-Starting gpstate with args: -m
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.12.0 build 1'
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.12.0 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 27 2017 20:45:12'
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:-Obtaining Segment details from master...
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:--------------------------------------------------------------
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:--Current GPDB mirror list and status
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:--Type = Group
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:--------------------------------------------------------------
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:-   Mirror   Datadir                        Port    Status    Data Status       
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:-   sdw2     /data/gpdata/gpdatam1/gpseg0   50000   Passive   Resynchronizing
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:-   sdw2     /data/gpdata/gpdatam1/gpseg1   50001   Passive   Resynchronizing
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:-   sdw1     /data/gpdata/gpdatam1/gpseg2   50000   Passive   Synchronized
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:-   sdw1     /data/gpdata/gpdatam1/gpseg3   50001   Passive   Synchronized
20180420_172.28.95.25511120180420:21:51:10:002350 gpstate:dw01:gpadmin-[INFO]:--------------------------------------------------------------

节点全部启动,sdw2节点正在重新同步,过一段时间一般几分钟即可,根据数据量大小而定,一般很很快同步完毕;
参考文档:
https://gp-docs-cn.github.io/docs/utility_guide/admin_utilities/gprecoverseg.html
http://mysql.taobao.org/monthly/2016/04/03/


相关博文:

发表新评论