NAS backup: compression, encryption, bandwidth throttle, integrity check#12898
NAS backup: compression, encryption, bandwidth throttle, integrity check#12898jmsperu wants to merge 2 commits intoapache:4.22from
Conversation
… integrity check Adds four optional features to NAS backup operations, configurable at zone scope via CloudStack global settings: - Compression (-c): qcow2 internal compression of backup files Config: nas.backup.compression.enabled (default: false) - LUKS Encryption (-e): encrypt backup files at rest using qemu-img Config: nas.backup.encryption.enabled (default: false) Config: nas.backup.encryption.passphrase (Secure category) - Bandwidth Throttle (-b): limit backup I/O bandwidth via virsh blockjob for running VMs or qemu-img -r for stopped VMs Config: nas.backup.bandwidth.limit.mbps (default: 0/unlimited) - Integrity Check (--verify): qemu-img check after backup creation Config: nas.backup.integrity.check (default: false) All features are disabled by default and fully backward compatible. Settings are read from zone-scoped ConfigKeys in NASBackupProvider, passed to the KVM agent via TakeBackupCommand details map, and translated to nasbackup.sh CLI flags in LibvirtTakeBackupCommandWrapper. Changes: - nasbackup.sh: add -c, -b, -e, --verify flags with encrypt_backup() and verify_backup() helper functions - TakeBackupCommand.java: add details map for passing config to agent - NASBackupProvider.java: add 5 ConfigKeys, populate command details - LibvirtTakeBackupCommandWrapper.java: extract details, build CLI args, handle passphrase temp file lifecycle Combines and supersedes PRs apache#12844, apache#12846, apache#12848, apache#12845
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## 4.22 #12898 +/- ##
============================================
- Coverage 17.61% 17.60% -0.01%
+ Complexity 15676 15671 -5
============================================
Files 5917 5917
Lines 531537 531601 +64
Branches 64985 64997 +12
============================================
- Hits 93610 93602 -8
- Misses 427369 427439 +70
- Partials 10558 10560 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds optional, zone-scoped enhancements for KVM NAS backups (compression, LUKS encryption, bandwidth throttling, and post-backup integrity verification) by plumbing config from management server → TakeBackupCommand details → KVM agent wrapper → nasbackup.sh flags.
Changes:
- Add new CLI flags and implementation in
nasbackup.shfor compression (-c), encryption (-e), bandwidth throttling (-b), and verification (--verify). - Extend
TakeBackupCommandwith adetailsmap to carry optional settings to the agent. - Add zone-scoped NAS backup ConfigKeys and populate command details; update KVM wrapper to translate details into script args and manage a temporary passphrase file.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
scripts/vm/hypervisor/kvm/nasbackup.sh |
Implements compression/encryption/throttle/verify logic and argument parsing for NAS backup operations. |
core/src/main/java/org/apache/cloudstack/backup/TakeBackupCommand.java |
Adds a details map to carry optional backup feature settings from management to agent. |
plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java |
Introduces zone-scoped ConfigKeys and passes enabled settings into TakeBackupCommand details. |
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtTakeBackupCommandWrapper.java |
Builds dynamic nasbackup.sh command args from TakeBackupCommand details and writes an encryption passphrase temp file. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if [[ ! -f "$ENCRYPT_PASSFILE" ]]; then | ||
| echo "Encryption passphrase file not found: $ENCRYPT_PASSFILE" | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
encrypt_backup calls exit 1 on missing/invalid passphrase file, which bypasses cleanup()/unmount logic and can leave the NAS mount + temp dir behind. Prefer returning a non-zero status and letting callers invoke cleanup() (or add a trap-based cleanup) so failures don’t leak mounts/directories.
| if [[ $failed -ne 0 ]]; then | ||
| echo "One or more backup files failed verification" | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
verify_backup exits directly on failure, which skips cleanup()/unmount in the calling backup paths and can leave the NAS store mounted and temp directories behind. Return failure to the caller and perform cleanup/unmount before exiting.
| } catch (IOException e) { | ||
| logger.error("Failed to create encryption passphrase file", e); | ||
| return new BackupAnswer(command, false, "Failed to create encryption passphrase file: " + e.getMessage()); | ||
| } |
There was a problem hiding this comment.
If details indicates encryption is enabled but the passphrase is missing/empty, the wrapper silently skips adding -e and the backup proceeds unencrypted. This should fail the command (or at least log and return an error) to avoid reporting a successful encrypted backup when encryption was requested.
| } | |
| } | |
| } else { | |
| logger.error("Encryption requested for backup but no encryption passphrase was provided"); | |
| return new BackupAnswer(command, false, "Encryption requested but encryption_passphrase is missing or empty"); |
| command.addDetail("encryption", "true"); | ||
| String passphrase = NASBackupEncryptionPassphrase.valueIn(zoneId); | ||
| if (passphrase != null && !passphrase.isEmpty()) { | ||
| command.addDetail("encryption_passphrase", passphrase); | ||
| } |
There was a problem hiding this comment.
When encryption is enabled at zone scope, the code sets the encryption detail even if the passphrase is missing/empty, which will currently result in an unencrypted backup on the agent side. Since the passphrase is required, fail fast here with a clear error instead of proceeding (e.g., throw a CloudRuntimeException / mark backup failed).
| command.addDetail("encryption", "true"); | |
| String passphrase = NASBackupEncryptionPassphrase.valueIn(zoneId); | |
| if (passphrase != null && !passphrase.isEmpty()) { | |
| command.addDetail("encryption_passphrase", passphrase); | |
| } | |
| String passphrase = NASBackupEncryptionPassphrase.valueIn(zoneId); | |
| if (passphrase == null || passphrase.trim().isEmpty()) { | |
| throw new CloudRuntimeException(String.format( | |
| "NAS backup encryption is enabled for zone %d but no encryption passphrase is configured", | |
| zoneId)); | |
| } | |
| command.addDetail("encryption", "true"); | |
| command.addDetail("encryption_passphrase", passphrase); |
| // Pass optional backup enhancement settings from zone-scoped configs | ||
| Long zoneId = vm.getDataCenterId(); | ||
| if (Boolean.TRUE.equals(NASBackupCompressionEnabled.valueIn(zoneId))) { | ||
| command.addDetail("compression", "true"); | ||
| } | ||
| if (Boolean.TRUE.equals(NASBackupEncryptionEnabled.valueIn(zoneId))) { | ||
| command.addDetail("encryption", "true"); | ||
| String passphrase = NASBackupEncryptionPassphrase.valueIn(zoneId); | ||
| if (passphrase != null && !passphrase.isEmpty()) { | ||
| command.addDetail("encryption_passphrase", passphrase); | ||
| } | ||
| } | ||
| Integer bandwidthLimit = NASBackupBandwidthLimitMbps.valueIn(zoneId); | ||
| if (bandwidthLimit != null && bandwidthLimit > 0) { | ||
| command.addDetail("bandwidth_limit", String.valueOf(bandwidthLimit)); | ||
| } | ||
| if (Boolean.TRUE.equals(NASBackupIntegrityCheckEnabled.valueIn(zoneId))) { | ||
| command.addDetail("integrity_check", "true"); | ||
| } |
There was a problem hiding this comment.
The new zone-scoped feature flags are translated into TakeBackupCommand details here, but the existing NASBackupProviderTest.takeBackupSuccessfully doesn’t assert the details map contents. Add/extend unit tests to verify the correct details are added for each config (compression, bandwidth limit, integrity check, and encryption+passphrase; and that encryption without passphrase fails).
| @@ -35,6 +37,7 @@ public class TakeBackupCommand extends Command { | |||
| private Boolean quiesce; | |||
| @LogLevel(LogLevel.Log4jLevel.Off) | |||
| private String mountOptions; | |||
There was a problem hiding this comment.
details may carry sensitive values (e.g., an encryption passphrase). CloudStack’s Gson logging uses LoggingExclusionStrategy with @LogLevel to exclude fields, so leaving this unannotated can leak secrets in debug logs. Annotate details with @LogLevel(Off) (or avoid putting secrets in this map).
| private String mountOptions; | |
| private String mountOptions; | |
| @LogLevel(LogLevel.Log4jLevel.Off) |
| } | ||
|
|
||
| public void setDetails(Map<String, String> details) { | ||
| this.details = details; |
There was a problem hiding this comment.
setDetails assigns the map directly and can set it to null; later addDetail will NPE. Normalize null to an empty map inside setDetails (or remove the setter / keep details final) to make the command robust.
| this.details = details; | |
| this.details = details != null ? details : new HashMap<>(); |
| if ! qemu-img convert -O qcow2 "$disk" "$output" > "$logFile" 2> >(cat >&2); then | ||
| if ! ionice -c 3 qemu-img convert $([[ "$COMPRESS" == "true" ]] && echo "-c") $([[ -n "$BANDWIDTH" ]] && echo "-r" "${BANDWIDTH}M") -O qcow2 "$disk" "$output" > "$logFile" 2> >(cat >&2); then | ||
| echo "qemu-img convert failed for $disk $output" | ||
| cleanup |
There was a problem hiding this comment.
On qemu-img convert failure, this calls cleanup but then continues execution (no exit/return). If cleanup succeeds, the function proceeds to later steps with an unmounted/removed dest, which can cause confusing follow-on failures and potentially report incorrect results. Exit the script (or return 1) after cleanup here.
| cleanup | |
| cleanup | |
| return 1 |
| fi | ||
| output="$dest/$name.$volUuid.qcow2" | ||
| if ! qemu-img convert -O qcow2 "$disk" "$output" > "$logFile" 2> >(cat >&2); then | ||
| if ! ionice -c 3 qemu-img convert $([[ "$COMPRESS" == "true" ]] && echo "-c") $([[ -n "$BANDWIDTH" ]] && echo "-r" "${BANDWIDTH}M") -O qcow2 "$disk" "$output" > "$logFile" 2> >(cat >&2); then |
There was a problem hiding this comment.
This redirects stdout to > "$logFile", which truncates /var/log/cloudstack/agent/agent.log each time a stopped-VM disk conversion runs (and for every disk). Use append (>>) or pipe through tee -a to avoid destroying the agent log.
| if ! ionice -c 3 qemu-img convert $([[ "$COMPRESS" == "true" ]] && echo "-c") $([[ -n "$BANDWIDTH" ]] && echo "-r" "${BANDWIDTH}M") -O qcow2 "$disk" "$output" > "$logFile" 2> >(cat >&2); then | |
| if ! ionice -c 3 qemu-img convert $([[ "$COMPRESS" == "true" ]] && echo "-c") $([[ -n "$BANDWIDTH" ]] && echo "-r" "${BANDWIDTH}M") -O qcow2 "$disk" "$output" >> "$logFile" 2> >(cat >&2); then |
| passphraseFile = File.createTempFile("cs-backup-enc-", ".key"); | ||
| passphraseFile.deleteOnExit(); | ||
| try (FileWriter fw = new FileWriter(passphraseFile)) { | ||
| fw.write(passphrase); | ||
| } | ||
| cmdArgs.add("-e"); cmdArgs.add(passphraseFile.getAbsolutePath()); | ||
| } catch (IOException e) { | ||
| logger.error("Failed to create encryption passphrase file", e); | ||
| return new BackupAnswer(command, false, "Failed to create encryption passphrase file: " + e.getMessage()); | ||
| } |
There was a problem hiding this comment.
The temp passphrase file can be left behind if an exception is thrown after createTempFile succeeds (e.g., FileWriter fails) because the catch returns without deleting it. Also consider setting strict permissions (0600) and using an explicit charset (UTF-8) when writing the passphrase.
- nasbackup.sh: Replace exit 1 with return 1 in encrypt_backup and verify_backup so callers can run cleanup before terminating - nasbackup.sh: Append (>>) instead of truncate (>) agent.log in qemu-img convert for stopped VM backups - nasbackup.sh: Add return 1 after cleanup on qemu-img convert failure to stop execution - nasbackup.sh: Callers of encrypt_backup/verify_backup now check return code and run cleanup on failure - LibvirtTakeBackupCommandWrapper: Fail with error when encryption is enabled but passphrase is missing instead of silently skipping - LibvirtTakeBackupCommandWrapper: Delete temp passphrase file in finally block, set 0600 permissions, use explicit UTF-8 charset - NASBackupProvider: Throw CloudRuntimeException when encryption is enabled but passphrase is null/empty - NASBackupProviderTest: Add tests for compression, bandwidth, integrity check, encryption+passphrase, and encryption-without- passphrase failure scenarios - TakeBackupCommand: Add @loglevel(Off) to details field to prevent passphrase leaking in debug logs - TakeBackupCommand: Normalize null to empty HashMap in setDetails
Summary
Adds four optional, zone-scoped features to NAS backup operations on KVM, all disabled by default:
-c): Uses qcow2 internal compression (qemu-img convert -c) to reduce backup size-e): Encrypts backup files at rest using LUKS viaqemu-img convert --object secret-b): Limits backup I/O —virsh blockjob --bandwidthfor running VMs,qemu-img convert -r+ionicefor stopped VMs--verify): Runsqemu-img checkon each backup file after creationConfiguration Keys (Zone scope)
nas.backup.compression.enablednas.backup.encryption.enablednas.backup.encryption.passphrasenas.backup.bandwidth.limit.mbpsnas.backup.integrity.checkArchitecture
detailsmap onTakeBackupCommandnasbackup.shCLI flagsFiles Changed
scripts/vm/hypervisor/kvm/nasbackup.sh— new-c,-b,-e,--verifyflags withencrypt_backup()andverify_backup()functionscore/.../TakeBackupCommand.java— addeddetailsmap (HashMap) with getter/setter/addDetailplugins/backup/nas/.../NASBackupProvider.java— 5 new ConfigKeys, populate command details intakeBackup()plugins/hypervisors/kvm/.../LibvirtTakeBackupCommandWrapper.java— extract details, build dynamic CLI args, temp passphrase file lifecycleNotes
Test plan
nas.backup.compression.enabledat zone scope, take backup, verify qcow2 files are compressednas.backup.bandwidth.limit.mbps(e.g. 50), take backup of running VM, verifyvirsh blockjobbandwidth is appliednas.backup.bandwidth.limit.mbps, take backup of stopped VM, verifyqemu-img -rrate limit is appliednas.backup.encryption.enabledwith passphrase, take backup, verify files are LUKS encrypted (qemu-img infoshows encryption)nas.backup.integrity.check, take backup, verifyqemu-img checkruns and passes