# IMPROVEMENTS GUIDE: Ansible Proxmox VM Role ## Summary of Changes This document outlines the improvements made to your Ansible role for robustness, maintainability, and best practices. ### What Was Improved 1. **Task Modularization** - Split monolithic tasks into 6 logical stages 2. **Error Handling** - Added try-catch blocks with recovery strategies 3. **Idempotency** - Ensured all operations are safe to re-run 4. **Pre-flight Validation** - Comprehensive environment checks before execution 5. **Documentation** - Extensive inline comments and variable documentation 6. **Logging** - Rich task names and debug output for troubleshooting --- ## File Structure ### New/Modified Files ``` tasks/ ├─ main.yml # REFACTORED: Now orchestrates subtasks ├─ preflight-checks.yml # NEW: Environment validation ├─ download-image.yml # IMPROVED: Better error handling & caching ├─ create-vm.yml # IMPROVED: Idempotent VM creation ├─ configure-vm.yml # IMPROVED: Disk, Cloud-Init, TPM, GPU with error handling ├─ create-template.yml # IMPROVED: Idempotent template conversion ├─ create-clones.yml # IMPROVED: Clone creation with validation └─ helpers.yml # NEW: Utility tasks for common operations defaults/ └─ main.yml # IMPROVED: Complete documentation & new options templates/ ├─ cloudinit_userdata.yaml.j2 # No changes └─ cloudinit_vendor.yaml.j2 # No changes ``` --- ## 1. TASK MODULARIZATION ### Before All tasks were in a single `main.yml` file (~150+ lines), making it: - Difficult to debug - Hard to extend - Not reusable ### After Each stage has its own file: | File | Purpose | Key Features | |------|---------|--------------| | `preflight-checks.yml` | Validate environment | Checks Proxmox, storage, SSH keys, IPs | | `download-image.yml` | Get Debian image | Caching, retry logic, size verification | | `create-vm.yml` | Create VM | Idempotent, error handling | | `configure-vm.yml` | Configure VM | Disk, Cloud-Init, TPM, GPU all in one | | `create-template.yml` | Make template | Skip if already templated | | `create-clones.yml` | Deploy clones | Loop through clone list with validation | | `helpers.yml` | Utilities | Reusable helper functions | ### Running Specific Stages ```bash # Run only pre-flight checks ansible-playbook tasks/main.yml --tags preflight # Run everything except template/clone ansible-playbook tasks/main.yml --skip-tags template,clones # Run only clone creation ansible-playbook tasks/main.yml --tags clones # Run image download and VM creation only ansible-playbook tasks/main.yml --tags image,vm ``` --- ## 2. ERROR HANDLING ### Before - Minimal error checking - Tasks would fail silently or with generic errors - No recovery paths ### After Each major operation has: **Block/Rescue Structure** ```yaml block: - name: "[CONFIG] Try to import disk" command: qm importdisk ... rescue: - name: "[CONFIG] Handle import failure" fail: msg: "Clear error message with context" ``` **Retry Logic** ```yaml register: result retries: 3 delay: 5 until: result is succeeded ``` **Validation Checks** ```yaml - name: "[VM] Verify VM was created" stat: path: "/etc/pve/qemu-server/{{ vm_id }}.conf" register: vm_verify failed_when: not vm_verify.stat.exists ``` ### Error Messages Include - What went wrong - Which VM/resource was affected - Next steps to fix --- ## 3. IDEMPOTENCY ### Before - Running playbook twice would fail or cause issues - Template conversion would fail if already templated - No checks for existing resources ### After All operations are idempotent: **Check Before Action** ```yaml - name: "Check if VM already exists" stat: path: "/etc/pve/qemu-server/{{ vm_id }}.conf" register: vm_conf - name: "Create VM" command: qm create ... when: not vm_conf.stat.exists ``` **Safe Re-runs** - Already-created VMs are skipped - Already-converted templates are skipped - Already-deployed clones are skipped - Image is cached and reused **Result**: You can run the playbook 10 times safely! --- ## 4. PRE-FLIGHT CHECKS ### New `preflight-checks.yml` Validates before starting: ✓ Proxmox is installed (`qm` command exists) ✓ User can run Proxmox commands (permissions) ✓ Storage pool exists and is accessible ✓ SSH key file exists and is readable ✓ VM IDs are unique (warns if conflict) ✓ Clone IDs are unique (warns if conflict) ✓ IP addresses are valid format ✓ Gateway and DNS are valid IPs ✓ Snippets directory exists ### Sample Output ``` [PREFLIGHT] Check if running on Proxmox host ... ok [PREFLIGHT] Verify qm command is available ... ok [PREFLIGHT] Check if user can run qm commands ... ok [PREFLIGHT] Verify storage pool exists ... ok [PREFLIGHT] Summary - All checks passed ``` --- ## 5. IMPROVED DEFAULTS ### New Variables in `defaults/main.yml` ```yaml # Retry settings max_retries: 3 retry_delay: 5 # Timeout settings (seconds) image_download_timeout: 300 vm_boot_timeout: 60 cloud_init_timeout: 120 # Debug mode debug_mode: false ``` ### Better Documentation Each variable has: - Purpose explanation - Valid values - Examples - Security warnings --- ## 6. IDEMPOTENT TEMPLATE CONVERSION ### Before ```yaml - name: Convert VM to template command: qm template {{ vm_id }} args: creates: "/etc/pve/qemu-server/{{ vm_id }}.conf.lock" ``` ❌ `.lock` file doesn't exist; always runs ### After ```yaml - name: "[TEMPLATE] Check if VM is already a template" shell: "qm config {{ vm_id }} | grep -q 'template: 1'" register: is_template failed_when: false - name: "[TEMPLATE] Convert VM to template" command: "qm template {{ vm_id }}" when: is_template.rc != 0 ``` ✅ Checks actual template status; skips if already templated --- ## 7. BETTER CLOUD-INIT HANDLING ### Before - Snippets not validated - SSH key lookup could fail silently ### After ```yaml - name: "[CONFIG] Verify SSH key is readable" stat: path: "{{ ssh_key_path | expanduser }}" register: ssh_key_stat failed_when: not ssh_key_stat.stat.readable - name: "[CONFIG] Copy SSH public key to snippets" copy: src: "{{ ssh_key_path | expanduser }}" dest: "/var/lib/vz/snippets/{{ vm_id }}-sshkey.pub" ``` ✓ Validates before use ✓ Proper error messages if missing --- ## 8. HELPER FUNCTIONS ### New `helpers.yml` Reusable utility tasks: | Helper | Function | |--------|----------| | `check_vm_exists` | Check if VM exists | | `check_template` | Check if VM is template | | `check_vm_status` | Get VM running status | | `check_storage` | Check storage space | | `validate_vm_id` | Validate VM ID format | | `get_vm_info` | Read VM configuration | | `list_vms` | List all VMs | | `cleanup_snippets` | Remove old Cloud-Init snippets | ### Usage Example ```yaml - name: "Verify VM exists" include_tasks: helpers.yml vars: helper_task: check_vm_exists target_vm_id: "{{ vm_id }}" - name: "Print result" debug: msg: "VM exists: {{ vm_exists }}" ``` --- ## 9. IMPROVED CLONE CREATION ### Before - No validation of clone IDs - No error handling per clone - All-or-nothing approach ### After ```yaml loop: "{{ clones }}" loop_control: loop_var: clone block: - name: "[CLONES] Check if clone already exists" stat: path: "/etc/pve/qemu-server/{{ clone.id }}.conf" register: clone_conf - name: "[CLONES] Clone VM" command: qm clone {{ vm_id }} {{ clone.id }} when: not clone_conf.stat.exists rescue: - name: "[CLONES] Handle error for this clone" debug: msg: "WARNING: Clone {{ clone.id }} failed, continuing with next..." ``` ✓ Each clone is independent ✓ One failed clone doesn't stop others ✓ Clear logging of what succeeded/failed --- ## 10. RICH LOGGING AND PROGRESS ### Task Naming Convention ``` [STAGE] Action: description ├─ [PREFLIGHT] Check if running on Proxmox ├─ [IMAGE] Download Debian GenericCloud ├─ [VM] Create base VM ├─ [CONFIG] Configure disk ├─ [TEMPLATE] Convert to template └─ [CLONES] Create clone 301 ``` ### Progress Display **Start** ``` ╔════════════════════════════════════════════════════════════╗ ║ Proxmox VM Template & Clone Manager ║ ║ Template VM: debian-template-base (ID: 150) ║ ║ Storage: local-lvm ║ ║ CPU: 4 cores | RAM: 4096MB ║ ╚════════════════════════════════════════════════════════════╝ ``` **End** ``` ╔════════════════════════════════════════════════════════════╗ ║ ✓ Playbook execution completed ║ ║ Template VM: debian-template-base (ID: 150) ║ ║ ✓ Converted to template ║ ║ ✓ 2 clone(s) created ║ ║ Next steps: ║ ║ - Verify VMs: qm list ║ ║ - Connect: ssh debian@ ║ ║ - Check Cloud-Init: cloud-init status ║ ╚════════════════════════════════════════════════════════════╝ ``` --- ## Usage Examples ### 1. Full Deployment ```bash ansible-playbook tasks/main.yml -i inventory ``` Runs all stages: preflight → image → VM → configure → template → clones ### 2. Re-run Safely (Idempotent) ```bash ansible-playbook tasks/main.yml -i inventory ``` Second run skips already-completed operations. ### 3. Template Only If you want to update template without re-downloading image: ```bash ansible-playbook tasks/main.yml \ -i inventory \ --skip-tags image,vm,clones ``` ### 4. Clone Only After template is created, add new clones: ```yaml # Update defaults/main.yml clones: - id: 303 hostname: app03 ip: "192.168.1.83/24" gateway: "192.168.1.1" ``` Then run: ```bash ansible-playbook tasks/main.yml \ -i inventory \ --tags clones ``` ### 5. Debug Output ```bash ansible-playbook tasks/main.yml \ -i inventory \ -vvv ``` Shows all task details, command output, variable values. --- ## Migration from Old Version ### Step 1: Backup ```bash cp -r ansible_proxmox_VM ansible_proxmox_VM.backup ``` ### Step 2: Replace Files Use the new versions: - `tasks/main.yml` → orchestrator - All `tasks/*.yml` files → new implementations - `defaults/main.yml` → improved defaults ### Step 3: Test with Dry-Run ```bash ansible-playbook tasks/main.yml \ -i inventory \ --check ``` Shows what would happen without making changes. ### Step 4: Run Normally ```bash ansible-playbook tasks/main.yml -i inventory ``` --- ## Best Practices Going Forward 1. **Always use tags** for partial execution 2. **Run preflight checks** before major changes 3. **Test with `--check`** before production 4. **Use `--skip-tags`** to avoid re-downloading images 5. **Monitor Cloud-Init** inside VMs: `cloud-init status` 6. **Keep backups** of `.orig` files (already present) 7. **Review error messages** carefully for context --- ## Security Improvements ### Password Management ```yaml # OLD ci_password: "SecurePass123" # NEW - Use Vault ci_password: "{{ vault_debian_password }}" ``` Create vault file: ```bash ansible-vault create group_vars/proxmox/vault.yml ``` Add: ```yaml vault_debian_password: "YourSecurePassword" ``` ### SSH Key Validation Before: SSH key could be missing → confusing error After: Validates key exists and is readable --- ## Troubleshooting ### Problem: Playbook fails at preflight **Solution**: Run preflight checks manually to see what's missing ```bash ansible-playbook tasks/main.yml -i inventory --tags preflight -vvv ``` ### Problem: VM already exists, need to recreate **Solution**: Delete the old VM first ```bash qm destroy {{ vm_id }} ``` Then re-run playbook (idempotent). ### Problem: Clone creation fails **Solution**: Check clone configuration and IDs ```bash qm list # See all VMs ``` Ensure clone IDs don't conflict with existing VMs. ### Problem: Cloud-Init not applying **Solution**: Check snippets directory exists ```bash ls -la /var/lib/vz/snippets/ ``` Verify permissions are correct (644 for YAML files). --- ## Next Steps Consider these additional improvements: 1. **Molecule Testing** - Add automated tests 2. **Vault Integration** - Secure password management 3. **Role Packaging** - Create Ansible Galaxy package 4. **Custom Filters** - For more complex logic 5. **Notification** - Send completion alerts (Slack, email) 6. **Metrics** - Track VM creation time, resource usage 7. **Cleanup Role** - Destroy VMs and templates 8. **Backup/Restore** - Template and clone backup --- ## Questions? Refer to task inline comments for specifics. Each task file has extensive documentation.