Last modified: January 26, 2018
This article is written in: 🇺🇸
Backup and recovery strategies are essential components of any robust database management plan, ensuring that data remains durable, available, and that business operations can continue uninterrupted. One of the significant challenges in designing these strategies is performing backups without disrupting or slowing down production read and write operations. In this discussion, we'll delve into various backup types, explore methods to minimize production impact, examine recovery strategies, and highlight best practices and tools.
After reading the material, you should be able to answer the following questions:
The primary goals of effective backup and recovery strategies revolve around three main objectives:
Designing backup processes that don't interfere with production systems presents several challenges. Backup operations can consume significant resources, leading to competition with production workloads for I/O, CPU, and network bandwidth. Ensuring data consistency during backups, especially without locking tables or halting transactions, adds another layer of complexity. Additionally, as databases grow in size, scalability becomes a concern, requiring efficient methods to handle large volumes of data without affecting application performance.
Understanding the different types of backups is fundamental to creating an effective backup strategy. Each type has its advantages and trade-offs, and often, a combination of these methods is employed to balance recovery needs and resource utilization.
A full backup involves creating a complete copy of the entire database at a specific point in time. This method is straightforward and simplifies the recovery process since all the data is contained in a single backup set. However, full backups can be time-consuming to create, especially for large databases, and require substantial storage space. If not managed properly, performing full backups can impact production systems due to the resources they consume.
Incremental backups save only the data that has changed since the last backup, whether it was a full or another incremental backup. This approach reduces the amount of data that needs to be backed up, resulting in faster backup times and reduced storage requirements. While incremental backups are efficient in terms of backup resources, the recovery process can be more complex. Restoring from incremental backups requires the last full backup and all subsequent incremental backups, which can increase recovery time.
Differential backups capture all the data that has changed since the last full backup. Unlike incremental backups, differential backups do not consider previous differential backups, which simplifies the recovery process. Restoring from a differential backup requires only the last full backup and the latest differential backup. However, as time passes, differential backups can grow in size, potentially approaching the size of a full backup, which may impact backup windows and storage needs.
Backups can also be categorized based on whether they are logical or physical.
I. Logical Backups:
These backups involve exporting database objects such as schemas and data using database utilities like mysqldump
for MySQL or pg_dump
for PostgreSQL. Logical backups are portable across different database versions and platforms and allow for selective restoration of specific objects. However, they can be slower to create and may impact performance due to the need to scan tables. Additionally, they may not capture certain database-specific features or configurations.
II. Physical Backups:
Physical backups involve copying the physical files that store the database data, such as data files and logs. This method tends to be faster for both backup and restoration and generally has less impact on performance if managed correctly. However, physical backups are usually platform-specific and require measures to ensure data consistency, such as pausing writes or using snapshots.
To prevent backups from interfering with production operations, it's important to employ methods designed to minimize resource contention and maintain data consistency without halting read or write activities.
Online, or hot backups, are performed while the database is running and accessible to users. This method leverages database features that allow for consistent backups without locking tables. The backup tool reads data files while tracking changes, often through mechanisms like write-ahead logs. It's essential to ensure that the backup tool supports online backups and to monitor resource utilization during the process to prevent performance degradation.
Example with PostgreSQL:
Using pg_basebackup
, you can perform an online backup without stopping the database:
pg_basebackup -h localhost -D /backup/data -U replicator -Fp -Xs -P
Example Output:
Password:
32768/32768 kB (100%), 1/1 tablespace
Base backup completed.
Snapshot-based backups utilize filesystem or storage-level snapshots to capture the state of the database at a specific point in time. Filesystem snapshots, like those provided by LVM, ZFS, or Btrfs, and storage snapshots from SAN or NAS systems can create instant snapshots with minimal impact on the running system.
+------------------------+
| Production Database |
| (Running) |
+-----------+------------+
|
Initiate Snapshot
|
v
+------------------------+
| Snapshot Created |
| (Point-in-Time Copy) |
+------------------------+
In this setup, the production database continues to operate without interruption while the snapshot is created. The snapshot can then be backed up or moved without affecting the production workload.
Example with LVM Snapshots:
# Create LVM snapshot
lvcreate --size 10G --snapshot --name db_snapshot /dev/vg0/db_volume
# Mount snapshot and perform backup
mount /dev/vg0/db_snapshot /mnt/db_snapshot
tar -czf db_backup.tar.gz /mnt/db_snapshot
# Remove snapshot after backup
umount /mnt/db_snapshot
lvremove /dev/vg0/db_snapshot
db_snapshot
, allocating 10G of space for the snapshot from the original logical volume /dev/vg0/db_volume
./mnt/db_snapshot
, enabling access to its files for further operations.db_backup.tar.gz
, for efficient storage and transfer.Using a replica or standby server to perform backups can offload the backup workload from the primary server. Replication keeps the standby server synchronized with the primary, and backups are performed on the standby, avoiding any impact on the primary server's performance.
+-------------------+
| Primary Server |
| (Production) |
+---------+---------+
|
Replication
|
v
+-------------------+
| Standby Server |
| (Backup) |
+---------+---------+
|
Backup Process
|
v
+-------------------+
| Backup Storage |
+-------------------+
In this configuration, the standby server receives updates from the primary server and serves as the source for backups. This approach ensures that the backup process does not consume resources on the primary server, thereby preserving performance.
Considerations:
An effective recovery strategy is essential to restore operations quickly after data loss or corruption. Recovery strategies should align with business requirements, such as acceptable downtime and data loss thresholds.
Point-in-Time Recovery allows the database to be restored to a specific moment by using backups and transaction logs. The process involves restoring the latest full backup, applying any incremental backups, and replaying transaction logs up to the desired point.
Requirements:
Example with PostgreSQL:
I. Configure Continuous Archiving:
In the postgresql.conf
file:
archive_mode = on
archive_command = 'cp %p /archive_location/%f'
This setup ensures that transaction logs are copied to a designated archive location.
II. Perform Recovery:
recovery.conf
file specifying the target recovery time:
restore_command = 'cp /archive_location/%f %p'
recovery_target_time = '2023-09-14 12:34:56'
Start the PostgreSQL server, which will enter recovery mode and apply logs up to the specified time.
Continuous Data Protection involves capturing and storing all changes in real-time or near-real-time, allowing for recovery to any point. This method provides the most granular recovery option but requires significant storage for logs and may impact performance due to the overhead of continuous logging.
Maintaining one or more standby databases synchronized with the primary via replication offers immediate failover capabilities. In case of a primary server failure, a standby can be promoted to become the new primary, minimizing downtime.
Failover Process Illustration:
[Normal Operation]
+-------------------+ Replication +-------------------+
| Primary Server | ----------------------> | Standby Server |
| (Active) | | (Passive) |
+-------------------+ +-------------------+
[After Primary Failure]
+-------------------+ +-------------------+
| Primary Server | Promote Standby | Standby Server |
| (Failed) | -----------------------> | (Now Active) |
+-------------------+ +-------------------+
Clients are redirected to the new primary server, ensuring continuity of service.
Implementing best practices enhances the effectiveness of backup and recovery strategies:
Efficient backups require tools and technologies that minimize disruption to production systems while ensuring data reliability and accessibility.