Skip to content

NAS backup: compression, encryption, bandwidth throttle, integrity check#12898

Open
jmsperu wants to merge 2 commits intoapache:4.22from
jmsperu:fix/nasbackup-enhancements-combined
Open

NAS backup: compression, encryption, bandwidth throttle, integrity check#12898
jmsperu wants to merge 2 commits intoapache:4.22from
jmsperu:fix/nasbackup-enhancements-combined

Conversation

@jmsperu
Copy link
Copy Markdown

@jmsperu jmsperu commented Mar 26, 2026

Summary

Adds four optional, zone-scoped features to NAS backup operations on KVM, all disabled by default:

  • Compression (-c): Uses qcow2 internal compression (qemu-img convert -c) to reduce backup size
  • LUKS Encryption (-e): Encrypts backup files at rest using LUKS via qemu-img convert --object secret
  • Bandwidth Throttle (-b): Limits backup I/O — virsh blockjob --bandwidth for running VMs, qemu-img convert -r + ionice for stopped VMs
  • Integrity Check (--verify): Runs qemu-img check on each backup file after creation

Configuration Keys (Zone scope)

Setting Type Default Description
nas.backup.compression.enabled Boolean false Enable qcow2 compression for backup files
nas.backup.encryption.enabled Boolean false Enable LUKS encryption for backup files
nas.backup.encryption.passphrase String (Secure) "" Passphrase for LUKS encryption
nas.backup.bandwidth.limit.mbps Integer 0 Bandwidth limit in MiB/s (0 = unlimited)
nas.backup.integrity.check Boolean false Run qemu-img check after backup

Architecture

  1. NASBackupProvider reads zone-scoped ConfigKeys and populates a details map on TakeBackupCommand
  2. TakeBackupCommand carries the details map from management server to KVM agent
  3. LibvirtTakeBackupCommandWrapper extracts the details and translates them to nasbackup.sh CLI flags
  4. nasbackup.sh implements the actual compression, encryption, throttling, and verification logic

Files Changed

  • scripts/vm/hypervisor/kvm/nasbackup.sh — new -c, -b, -e, --verify flags with encrypt_backup() and verify_backup() functions
  • core/.../TakeBackupCommand.java — added details map (HashMap) with getter/setter/addDetail
  • plugins/backup/nas/.../NASBackupProvider.java — 5 new ConfigKeys, populate command details in takeBackup()
  • plugins/hypervisors/kvm/.../LibvirtTakeBackupCommandWrapper.java — extract details, build dynamic CLI args, temp passphrase file lifecycle

Notes

Test plan

  • Verify backup works with all four features disabled (default) — no behavioral change
  • Enable nas.backup.compression.enabled at zone scope, take backup, verify qcow2 files are compressed
  • Enable nas.backup.bandwidth.limit.mbps (e.g. 50), take backup of running VM, verify virsh blockjob bandwidth is applied
  • Enable nas.backup.bandwidth.limit.mbps, take backup of stopped VM, verify qemu-img -r rate limit is applied
  • Enable nas.backup.encryption.enabled with passphrase, take backup, verify files are LUKS encrypted (qemu-img info shows encryption)
  • Enable nas.backup.integrity.check, take backup, verify qemu-img check runs and passes
  • Test with multiple features enabled simultaneously (compression + integrity check)
  • Verify restore still works for backups created with compression/encryption
  • Test with RBD storage pools — verify bandwidth throttle applies correctly

… integrity check

Adds four optional features to NAS backup operations, configurable at
zone scope via CloudStack global settings:

- Compression (-c): qcow2 internal compression of backup files
  Config: nas.backup.compression.enabled (default: false)

- LUKS Encryption (-e): encrypt backup files at rest using qemu-img
  Config: nas.backup.encryption.enabled (default: false)
  Config: nas.backup.encryption.passphrase (Secure category)

- Bandwidth Throttle (-b): limit backup I/O bandwidth via virsh
  blockjob for running VMs or qemu-img -r for stopped VMs
  Config: nas.backup.bandwidth.limit.mbps (default: 0/unlimited)

- Integrity Check (--verify): qemu-img check after backup creation
  Config: nas.backup.integrity.check (default: false)

All features are disabled by default and fully backward compatible.
Settings are read from zone-scoped ConfigKeys in NASBackupProvider,
passed to the KVM agent via TakeBackupCommand details map, and
translated to nasbackup.sh CLI flags in LibvirtTakeBackupCommandWrapper.

Changes:
- nasbackup.sh: add -c, -b, -e, --verify flags with encrypt_backup()
  and verify_backup() helper functions
- TakeBackupCommand.java: add details map for passing config to agent
- NASBackupProvider.java: add 5 ConfigKeys, populate command details
- LibvirtTakeBackupCommandWrapper.java: extract details, build CLI args,
  handle passphrase temp file lifecycle

Combines and supersedes PRs apache#12844, apache#12846, apache#12848, apache#12845
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

❌ Patch coverage is 18.84058% with 56 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.60%. Comparing base (c1af36f) to head (2c40fdd).

Files with missing lines Patch % Lines
...ource/wrapper/LibvirtTakeBackupCommandWrapper.java 0.00% 36 Missing ⚠️
...rg/apache/cloudstack/backup/NASBackupProvider.java 52.17% 7 Missing and 4 partials ⚠️
...rg/apache/cloudstack/backup/TakeBackupCommand.java 10.00% 9 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               4.22   #12898      +/-   ##
============================================
- Coverage     17.61%   17.60%   -0.01%     
+ Complexity    15676    15671       -5     
============================================
  Files          5917     5917              
  Lines        531537   531601      +64     
  Branches      64985    64997      +12     
============================================
- Hits          93610    93602       -8     
- Misses       427369   427439      +70     
- Partials      10558    10560       +2     
Flag Coverage Δ
uitests 3.70% <ø> (ø)
unittests 18.67% <18.84%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds optional, zone-scoped enhancements for KVM NAS backups (compression, LUKS encryption, bandwidth throttling, and post-backup integrity verification) by plumbing config from management server → TakeBackupCommand details → KVM agent wrapper → nasbackup.sh flags.

Changes:

  • Add new CLI flags and implementation in nasbackup.sh for compression (-c), encryption (-e), bandwidth throttling (-b), and verification (--verify).
  • Extend TakeBackupCommand with a details map to carry optional settings to the agent.
  • Add zone-scoped NAS backup ConfigKeys and populate command details; update KVM wrapper to translate details into script args and manage a temporary passphrase file.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.

File Description
scripts/vm/hypervisor/kvm/nasbackup.sh Implements compression/encryption/throttle/verify logic and argument parsing for NAS backup operations.
core/src/main/java/org/apache/cloudstack/backup/TakeBackupCommand.java Adds a details map to carry optional backup feature settings from management to agent.
plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java Introduces zone-scoped ConfigKeys and passes enabled settings into TakeBackupCommand details.
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtTakeBackupCommandWrapper.java Builds dynamic nasbackup.sh command args from TakeBackupCommand details and writes an encryption passphrase temp file.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +99 to +102
if [[ ! -f "$ENCRYPT_PASSFILE" ]]; then
echo "Encryption passphrase file not found: $ENCRYPT_PASSFILE"
exit 1
fi
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encrypt_backup calls exit 1 on missing/invalid passphrase file, which bypasses cleanup()/unmount logic and can leave the NAS mount + temp dir behind. Prefer returning a non-zero status and letting callers invoke cleanup() (or add a trap-based cleanup) so failures don’t leak mounts/directories.

Copilot uses AI. Check for mistakes.
Comment on lines +134 to +137
if [[ $failed -ne 0 ]]; then
echo "One or more backup files failed verification"
exit 1
fi
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify_backup exits directly on failure, which skips cleanup()/unmount in the calling backup paths and can leave the NAS store mounted and temp directories behind. Return failure to the caller and perform cleanup/unmount before exiting.

Copilot uses AI. Check for mistakes.
} catch (IOException e) {
logger.error("Failed to create encryption passphrase file", e);
return new BackupAnswer(command, false, "Failed to create encryption passphrase file: " + e.getMessage());
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If details indicates encryption is enabled but the passphrase is missing/empty, the wrapper silently skips adding -e and the backup proceeds unencrypted. This should fail the command (or at least log and return an error) to avoid reporting a successful encrypted backup when encryption was requested.

Suggested change
}
}
} else {
logger.error("Encryption requested for backup but no encryption passphrase was provided");
return new BackupAnswer(command, false, "Encryption requested but encryption_passphrase is missing or empty");

Copilot uses AI. Check for mistakes.
Comment on lines +254 to +258
command.addDetail("encryption", "true");
String passphrase = NASBackupEncryptionPassphrase.valueIn(zoneId);
if (passphrase != null && !passphrase.isEmpty()) {
command.addDetail("encryption_passphrase", passphrase);
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When encryption is enabled at zone scope, the code sets the encryption detail even if the passphrase is missing/empty, which will currently result in an unencrypted backup on the agent side. Since the passphrase is required, fail fast here with a clear error instead of proceeding (e.g., throw a CloudRuntimeException / mark backup failed).

Suggested change
command.addDetail("encryption", "true");
String passphrase = NASBackupEncryptionPassphrase.valueIn(zoneId);
if (passphrase != null && !passphrase.isEmpty()) {
command.addDetail("encryption_passphrase", passphrase);
}
String passphrase = NASBackupEncryptionPassphrase.valueIn(zoneId);
if (passphrase == null || passphrase.trim().isEmpty()) {
throw new CloudRuntimeException(String.format(
"NAS backup encryption is enabled for zone %d but no encryption passphrase is configured",
zoneId));
}
command.addDetail("encryption", "true");
command.addDetail("encryption_passphrase", passphrase);

Copilot uses AI. Check for mistakes.
Comment on lines +248 to +266
// Pass optional backup enhancement settings from zone-scoped configs
Long zoneId = vm.getDataCenterId();
if (Boolean.TRUE.equals(NASBackupCompressionEnabled.valueIn(zoneId))) {
command.addDetail("compression", "true");
}
if (Boolean.TRUE.equals(NASBackupEncryptionEnabled.valueIn(zoneId))) {
command.addDetail("encryption", "true");
String passphrase = NASBackupEncryptionPassphrase.valueIn(zoneId);
if (passphrase != null && !passphrase.isEmpty()) {
command.addDetail("encryption_passphrase", passphrase);
}
}
Integer bandwidthLimit = NASBackupBandwidthLimitMbps.valueIn(zoneId);
if (bandwidthLimit != null && bandwidthLimit > 0) {
command.addDetail("bandwidth_limit", String.valueOf(bandwidthLimit));
}
if (Boolean.TRUE.equals(NASBackupIntegrityCheckEnabled.valueIn(zoneId))) {
command.addDetail("integrity_check", "true");
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new zone-scoped feature flags are translated into TakeBackupCommand details here, but the existing NASBackupProviderTest.takeBackupSuccessfully doesn’t assert the details map contents. Add/extend unit tests to verify the correct details are added for each config (compression, bandwidth limit, integrity check, and encryption+passphrase; and that encryption without passphrase fails).

Copilot uses AI. Check for mistakes.
@@ -35,6 +37,7 @@ public class TakeBackupCommand extends Command {
private Boolean quiesce;
@LogLevel(LogLevel.Log4jLevel.Off)
private String mountOptions;
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

details may carry sensitive values (e.g., an encryption passphrase). CloudStack’s Gson logging uses LoggingExclusionStrategy with @LogLevel to exclude fields, so leaving this unannotated can leak secrets in debug logs. Annotate details with @LogLevel(Off) (or avoid putting secrets in this map).

Suggested change
private String mountOptions;
private String mountOptions;
@LogLevel(LogLevel.Log4jLevel.Off)

Copilot uses AI. Check for mistakes.
}

public void setDetails(Map<String, String> details) {
this.details = details;
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setDetails assigns the map directly and can set it to null; later addDetail will NPE. Normalize null to an empty map inside setDetails (or remove the setter / keep details final) to make the command robust.

Suggested change
this.details = details;
this.details = details != null ? details : new HashMap<>();

Copilot uses AI. Check for mistakes.
if ! qemu-img convert -O qcow2 "$disk" "$output" > "$logFile" 2> >(cat >&2); then
if ! ionice -c 3 qemu-img convert $([[ "$COMPRESS" == "true" ]] && echo "-c") $([[ -n "$BANDWIDTH" ]] && echo "-r" "${BANDWIDTH}M") -O qcow2 "$disk" "$output" > "$logFile" 2> >(cat >&2); then
echo "qemu-img convert failed for $disk $output"
cleanup
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On qemu-img convert failure, this calls cleanup but then continues execution (no exit/return). If cleanup succeeds, the function proceeds to later steps with an unmounted/removed dest, which can cause confusing follow-on failures and potentially report incorrect results. Exit the script (or return 1) after cleanup here.

Suggested change
cleanup
cleanup
return 1

Copilot uses AI. Check for mistakes.
fi
output="$dest/$name.$volUuid.qcow2"
if ! qemu-img convert -O qcow2 "$disk" "$output" > "$logFile" 2> >(cat >&2); then
if ! ionice -c 3 qemu-img convert $([[ "$COMPRESS" == "true" ]] && echo "-c") $([[ -n "$BANDWIDTH" ]] && echo "-r" "${BANDWIDTH}M") -O qcow2 "$disk" "$output" > "$logFile" 2> >(cat >&2); then
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This redirects stdout to > "$logFile", which truncates /var/log/cloudstack/agent/agent.log each time a stopped-VM disk conversion runs (and for every disk). Use append (>>) or pipe through tee -a to avoid destroying the agent log.

Suggested change
if ! ionice -c 3 qemu-img convert $([[ "$COMPRESS" == "true" ]] && echo "-c") $([[ -n "$BANDWIDTH" ]] && echo "-r" "${BANDWIDTH}M") -O qcow2 "$disk" "$output" > "$logFile" 2> >(cat >&2); then
if ! ionice -c 3 qemu-img convert $([[ "$COMPRESS" == "true" ]] && echo "-c") $([[ -n "$BANDWIDTH" ]] && echo "-r" "${BANDWIDTH}M") -O qcow2 "$disk" "$output" >> "$logFile" 2> >(cat >&2); then

Copilot uses AI. Check for mistakes.
Comment on lines +97 to +106
passphraseFile = File.createTempFile("cs-backup-enc-", ".key");
passphraseFile.deleteOnExit();
try (FileWriter fw = new FileWriter(passphraseFile)) {
fw.write(passphrase);
}
cmdArgs.add("-e"); cmdArgs.add(passphraseFile.getAbsolutePath());
} catch (IOException e) {
logger.error("Failed to create encryption passphrase file", e);
return new BackupAnswer(command, false, "Failed to create encryption passphrase file: " + e.getMessage());
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The temp passphrase file can be left behind if an exception is thrown after createTempFile succeeds (e.g., FileWriter fails) because the catch returns without deleting it. Also consider setting strict permissions (0600) and using an explicit charset (UTF-8) when writing the passphrase.

Copilot uses AI. Check for mistakes.
- nasbackup.sh: Replace exit 1 with return 1 in encrypt_backup and
  verify_backup so callers can run cleanup before terminating
- nasbackup.sh: Append (>>) instead of truncate (>) agent.log in
  qemu-img convert for stopped VM backups
- nasbackup.sh: Add return 1 after cleanup on qemu-img convert failure
  to stop execution
- nasbackup.sh: Callers of encrypt_backup/verify_backup now check
  return code and run cleanup on failure
- LibvirtTakeBackupCommandWrapper: Fail with error when encryption is
  enabled but passphrase is missing instead of silently skipping
- LibvirtTakeBackupCommandWrapper: Delete temp passphrase file in
  finally block, set 0600 permissions, use explicit UTF-8 charset
- NASBackupProvider: Throw CloudRuntimeException when encryption is
  enabled but passphrase is null/empty
- NASBackupProviderTest: Add tests for compression, bandwidth,
  integrity check, encryption+passphrase, and encryption-without-
  passphrase failure scenarios
- TakeBackupCommand: Add @loglevel(Off) to details field to prevent
  passphrase leaking in debug logs
- TakeBackupCommand: Normalize null to empty HashMap in setDetails
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants