本文为墨天轮数据库管理服务团队第 137 期技术分享,作者为技术顾问罗海鸥,内容原创,如需转载请联系小墨(VX:modb666)并注明来源。
PG数据库发生误操作,造成数据丢失后如何恢复数据。
一、实验环境
|
|
|
|
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
源库需要提前打开如下参数:
db5=# select name,setting from pg_settings where name in ('wal_level','archive_mode','archive_command','restore_command');name | setting-----------------+-------------------------archive_command | cp %p /opt/pgwal/arc/%farchive_mode | onrestore_command | cp /opt/pgwal/arc/%f %pwal_level | replica(4 rows)
二、在目标库服务器上对源库做一个备份。
[postgres@pg2 opt]$ pg_basebackup -Fp -Xs -P -v -D /opt/pgdata -h luoPassword:pg_basebackup: initiating base backup, waiting for checkpoint to completepg_basebackup: checkpoint completedpg_basebackup: write-ahead log start point: 0/2C000028 on timeline 1pg_basebackup: starting background WAL receiverpg_basebackup: created temporary replication slot "pg_basebackup_13727"55973/55973 kB (100%), 1/1 tablespacepg_basebackup: write-ahead log end point: 0/2C000158pg_basebackup: waiting for background process to finish streaming ...pg_basebackup: syncing data to disk ...pg_basebackup: renaming backup_manifest.tmp to backup_manifestpg_basebackup: base backup completed
三、源库模拟生产,写入数据
postgres=# create database db5; --新建一个数据库db5CREATE DATABASEpostgres=#postgres=#postgres=#postgres=# \c db5You are now connected to database "db5" as user "postgres".db5=# create table t1 as select * from pg_class;SELECT 415db5=#db5=#db5=# create table t2 (id int);CREATE TABLEdb5=# insert into t2 values(1);INSERT 0 1db5=# insert into t2 values(2);INSERT 0 1db5=#db5=#db5=#db5=# select * from t2;id----12(2 rows)db5=# delete from t2 where id=2; --模拟误操作,删除t2表上一行数据DELETE 1db5=# select * from t2;id----1(1 row)db5=# create table t3(id int); --新建t3表CREATE TABLEdb5=# checkpoint ;CHECKPOINTdb5=# select pg_switch_wal(); --切换WAL日志pg_switch_wal---------------0/2D46A8A0(1 row)db5=#db5=# select pg_relation_filepath('t2'); --查看t2表文件名,文件名是16485pg_relation_filepath----------------------base/16479/16485(1 row)db5=# \dList of relationsSchema | Name | Type | Owner--------+------+-------+----------public | t1 | table | postgrespublic | t2 | table | postgrespublic | t3 | table | postgres(3 rows)db5=# select * from t2;id----1(1 row)
四、源库上pg_waldump分析wal日志,定位删除的具体lsn
[postgres@luo arc]$[postgres@luo arc]$ pwd --归档目录/opt/pgwal/arc[postgres@luo arc]$ ls -lrt --根据时间列出WAL日志总用量 65540-rw------- 1 postgres dba 16777216 8月 27 16:08 00000001000000000000002A-rw------- 1 postgres dba 16777216 8月 27 16:10 00000001000000000000002B-rw------- 1 postgres dba 16777216 8月 27 16:10 00000001000000000000002C-rw------- 1 postgres dba 341 8月 27 16:10 00000001000000000000002C.00000028.backup-rw------- 1 postgres dba 16777216 8月 27 16:13 00000001000000000000002D[postgres@luo arc]$
WAL日志定位到如下关键信息:误操作是DELETE,文件名是16485,XID=833,lsn是0/2D469640,上一个lsn是0/2D469608。
[postgres@luo arc]$ pg_waldump 00000001000000000000002D|grep -i delete|grep 16485rmgr: Heap len (rec/tot): 54/ 54, tx: 833, lsn: 0/2D469640, prev 0/2D469608, desc: DELETE xmax: 833, off: 2, infobits: [KEYS_UPDATED], flags: 0x00, blkref#0: rel 1663/16479/16485 blk 0[postgres@luo arc]$
五、目标库开始恢复数据
1.配置recovery_target_lsn参数
vi postgresql.confrecovery_target_lsn = '0/2D469608' --lsn为上一个lsn
2.创建standby.signal文件
[postgres@pg2 pgdata]$ cd /opt/pgdata/[postgres@pg2 pgdata]$ touch standby.signal[postgres@pg2 pgdata]$[postgres@pg2 pgdata]$ ll总用量 388-rw------- 1 postgres dba 227 8月 27 16:10 backup_label.old-rw------- 1 postgres dba 318484 8月 27 16:10 backup_manifestdrwx------ 6 postgres dba 46 8月 27 16:18 base-rw------- 1 postgres dba 47 8月 27 16:17 current_logfilesdrwx------ 2 postgres dba 4096 8月 27 16:18 globaldrwx------ 2 postgres dba 6 8月 27 16:10 pg_commit_tsdrwx------ 2 postgres dba 6 8月 27 16:10 pg_dynshmem-rw------- 1 postgres dba 5867 8月 27 16:10 pg_hba.conf-rw------- 1 postgres dba 2640 8月 27 16:10 pg_ident.confdrwx------ 2 postgres dba 4096 8月 27 16:17 pg_logdrwx------ 4 postgres dba 68 8月 27 16:22 pg_logicaldrwx------ 4 postgres dba 36 8月 27 16:10 pg_multixactdrwx------ 2 postgres dba 6 8月 27 16:10 pg_notifydrwx------ 2 postgres dba 6 8月 27 16:10 pg_replslotdrwx------ 2 postgres dba 6 8月 27 16:10 pg_serialdrwx------ 2 postgres dba 6 8月 27 16:10 pg_snapshotsdrwx------ 2 postgres dba 6 8月 27 16:10 pg_statdrwx------ 2 postgres dba 6 8月 27 16:10 pg_stat_tmpdrwx------ 2 postgres dba 18 8月 27 16:22 pg_subtransdrwx------ 2 postgres dba 6 8月 27 16:10 pg_tblspcdrwx------ 2 postgres dba 6 8月 27 16:10 pg_twophase-rw------- 1 postgres dba 3 8月 27 16:10 PG_VERSIONdrwx------ 4 postgres dba 77 8月 27 16:24 pg_waldrwx------ 2 postgres dba 18 8月 27 16:10 pg_xact-rw------- 1 postgres dba 88 8月 27 16:10 postgresql.auto.conf-rw------- 1 postgres dba 30969 8月 27 16:17 postgresql.conf-rw------- 1 postgres dba 23 8月 27 16:17 postmaster.opts-rw------- 1 postgres dba 70 8月 27 16:17 postmaster.pid-rw-r--r-- 1 postgres dba 0 8月 27 16:11 standby.signal[postgres@pg2 pgdata]$
打开目标库
[postgres@pg2 pgdata]$ pg_ctl startwaiting for server to start....2025-08-27 16:17:42.609 CST [14902] :[ [txid=0] user= ,db=,app=,client=LOG: redirecting log output to logging collector process2025-08-27 16:17:42.609 CST [14902] :[ [txid=0] user= ,db=,app=,client=HINT: Future log output will appear in directory "pg_log".doneserver started
目标库缺少最新WAL日志。
2025-08-27 17:17:40.417 CSTcp: 无法获取"/opt/pgwal/arc/00000001000000000000002D" 的文件状态(stat): 没有那个文件或目录
从源库传输过去
[postgres arc]$ scp 00000001000000000000002D pg2:/opt/pgwal/arcpostgres's password:00000001000000000000002D 100% 16MB 13.9MB/s 00:01[postgres arc]$
目标库不再报错了,recovery stopping after WAL location (LSN) “0/2D469608”,说明数据库恢复到我们指定的lsn了,pausing at the end of recovery,数据库处于暂停恢复状态。
cp: 无法获取"/opt/pgwal/arc/00000002.history" 的文件状态(stat): 没有那个文件或目录2025-08-27 17:20:00.706 CST [15581] :[ [txid=0] user= ,db=,app=,client=LOG: waiting for WAL to become available at 0/2D0000182025-08-27 17:20:05.694 CST [15581] :[ [txid=0] user= ,db=,app=,client=LOG: restored log file "00000001000000000000002D" from archivecp: 无法获取"/opt/pgwal/arc/00000001000000000000002E" 的文件状态(stat): 没有那个文件或目录2025-08-27 17:20:05.740 CST [15581] :[ [txid=0] user= ,db=,app=,client=LOG: recovery stopping after WAL location (LSN) "0/2D469608"2025-08-27 17:20:05.741 CST [15581] :[ [txid=0] user= ,db=,app=,client=LOG: pausing at the end of recovery2025-08-27 17:20:05.741 CST [15581] :[ [txid=0] user= ,db=,app=,client=HINT: Execute pg_wal_replay_resume() to promote.
六、登录数据库,检查恢复是否符合预期。
源库
[postgres@luo ~]$ psql db5psql (17.0)Type "help" for help.db5=# \dList of relationsSchema | Name | Type | Owner--------+------+-------+----------public | t1 | table | postgrespublic | t2 | table | postgrespublic | t3 | table | postgres(3 rows)db5=# select * from t2;id----1(1 row)db5=#
目标库
[postgres@pg2 pgdata]$ psql db5psql (17.0)Type "help" for help.db5=# \dList of relationsSchema | Name | Type | Owner--------+------+-------+----------public | t1 | table | postgrespublic | t2 | table | postgres(2 rows)db5=# select * from t2;id----12(2 rows)db5=#
目标库基于备份和WAL日志恢复已完成。
THE END
墨天轮从乐知乐享的数据库技术社区蓄势出发,全面升级,提供多类型数据库管理服务。墨天轮数据库管理服务旨在为用户构建信赖可托付的数据库环境,并为数据库厂商提供中立的生态支持。
服务官网:https://www.modb.pro/service
点击进入服务团队官网

