Files
ansible_proxmox_VM/IMPROVEMENTS.md
Jose f62750fe2f feat: Implement Debian VM template creation and cloning on Proxmox
- Added default configuration for VM creation in defaults/main.yml.
- Created tasks for configuring the VM with UEFI, TPM, disks, GPU, and Cloud-Init in tasks/configure-vm.yml.
- Implemented clone creation and configuration logic in tasks/create-clones.yml.
- Added template conversion functionality in tasks/create-template.yml.
- Developed base VM creation logic in tasks/create-vm.yml.
- Included image download and caching tasks in tasks/download-image.yml.
- Introduced utility tasks for common operations in tasks/helpers.yml.
- Organized main orchestration logic in tasks/main.yml, with clear stages for each operation.
- Added pre-flight checks to validate the environment before execution in tasks/preflight-checks.yml.
2025-11-15 17:22:21 +01:00

13 KiB

IMPROVEMENTS GUIDE: Ansible Proxmox VM Role

Summary of Changes

This document outlines the improvements made to your Ansible role for robustness, maintainability, and best practices.

What Was Improved

  1. Task Modularization - Split monolithic tasks into 6 logical stages
  2. Error Handling - Added try-catch blocks with recovery strategies
  3. Idempotency - Ensured all operations are safe to re-run
  4. Pre-flight Validation - Comprehensive environment checks before execution
  5. Documentation - Extensive inline comments and variable documentation
  6. Logging - Rich task names and debug output for troubleshooting

File Structure

New/Modified Files

tasks/
├─ main.yml                    # REFACTORED: Now orchestrates subtasks
├─ preflight-checks.yml        # NEW: Environment validation
├─ download-image.yml          # IMPROVED: Better error handling & caching
├─ create-vm.yml              # IMPROVED: Idempotent VM creation
├─ configure-vm.yml           # IMPROVED: Disk, Cloud-Init, TPM, GPU with error handling
├─ create-template.yml        # IMPROVED: Idempotent template conversion
├─ create-clones.yml          # IMPROVED: Clone creation with validation
└─ helpers.yml                # NEW: Utility tasks for common operations

defaults/
└─ main.yml                    # IMPROVED: Complete documentation & new options

templates/
├─ cloudinit_userdata.yaml.j2  # No changes
└─ cloudinit_vendor.yaml.j2    # No changes

1. TASK MODULARIZATION

Before

All tasks were in a single main.yml file (~150+ lines), making it:

  • Difficult to debug
  • Hard to extend
  • Not reusable

After

Each stage has its own file:

File Purpose Key Features
preflight-checks.yml Validate environment Checks Proxmox, storage, SSH keys, IPs
download-image.yml Get Debian image Caching, retry logic, size verification
create-vm.yml Create VM Idempotent, error handling
configure-vm.yml Configure VM Disk, Cloud-Init, TPM, GPU all in one
create-template.yml Make template Skip if already templated
create-clones.yml Deploy clones Loop through clone list with validation
helpers.yml Utilities Reusable helper functions

Running Specific Stages

# Run only pre-flight checks
ansible-playbook tasks/main.yml --tags preflight

# Run everything except template/clone
ansible-playbook tasks/main.yml --skip-tags template,clones

# Run only clone creation
ansible-playbook tasks/main.yml --tags clones

# Run image download and VM creation only
ansible-playbook tasks/main.yml --tags image,vm

2. ERROR HANDLING

Before

  • Minimal error checking
  • Tasks would fail silently or with generic errors
  • No recovery paths

After

Each major operation has:

Block/Rescue Structure

block:
  - name: "[CONFIG] Try to import disk"
    command: qm importdisk ...
    
rescue:
  - name: "[CONFIG] Handle import failure"
    fail:
      msg: "Clear error message with context"

Retry Logic

register: result
retries: 3
delay: 5
until: result is succeeded

Validation Checks

- name: "[VM] Verify VM was created"
  stat:
    path: "/etc/pve/qemu-server/{{ vm_id }}.conf"
  register: vm_verify
  failed_when: not vm_verify.stat.exists

Error Messages Include

  • What went wrong
  • Which VM/resource was affected
  • Next steps to fix

3. IDEMPOTENCY

Before

  • Running playbook twice would fail or cause issues
  • Template conversion would fail if already templated
  • No checks for existing resources

After

All operations are idempotent:

Check Before Action

- name: "Check if VM already exists"
  stat:
    path: "/etc/pve/qemu-server/{{ vm_id }}.conf"
  register: vm_conf

- name: "Create VM"
  command: qm create ...
  when: not vm_conf.stat.exists

Safe Re-runs

  • Already-created VMs are skipped
  • Already-converted templates are skipped
  • Already-deployed clones are skipped
  • Image is cached and reused

Result: You can run the playbook 10 times safely!


4. PRE-FLIGHT CHECKS

New preflight-checks.yml

Validates before starting:

✓ Proxmox is installed (qm command exists) ✓ User can run Proxmox commands (permissions) ✓ Storage pool exists and is accessible ✓ SSH key file exists and is readable ✓ VM IDs are unique (warns if conflict) ✓ Clone IDs are unique (warns if conflict) ✓ IP addresses are valid format ✓ Gateway and DNS are valid IPs ✓ Snippets directory exists

Sample Output

[PREFLIGHT] Check if running on Proxmox host ... ok
[PREFLIGHT] Verify qm command is available ... ok
[PREFLIGHT] Check if user can run qm commands ... ok
[PREFLIGHT] Verify storage pool exists ... ok
[PREFLIGHT] Summary - All checks passed

5. IMPROVED DEFAULTS

New Variables in defaults/main.yml

# Retry settings
max_retries: 3
retry_delay: 5

# Timeout settings (seconds)
image_download_timeout: 300
vm_boot_timeout: 60
cloud_init_timeout: 120

# Debug mode
debug_mode: false

Better Documentation

Each variable has:

  • Purpose explanation
  • Valid values
  • Examples
  • Security warnings

6. IDEMPOTENT TEMPLATE CONVERSION

Before

- name: Convert VM to template
  command: qm template {{ vm_id }}
  args:
    creates: "/etc/pve/qemu-server/{{ vm_id }}.conf.lock"

.lock file doesn't exist; always runs

After

- name: "[TEMPLATE] Check if VM is already a template"
  shell: "qm config {{ vm_id }} | grep -q 'template: 1'"
  register: is_template
  failed_when: false

- name: "[TEMPLATE] Convert VM to template"
  command: "qm template {{ vm_id }}"
  when: is_template.rc != 0

Checks actual template status; skips if already templated


7. BETTER CLOUD-INIT HANDLING

Before

  • Snippets not validated
  • SSH key lookup could fail silently

After

- name: "[CONFIG] Verify SSH key is readable"
  stat:
    path: "{{ ssh_key_path | expanduser }}"
  register: ssh_key_stat
  failed_when: not ssh_key_stat.stat.readable

- name: "[CONFIG] Copy SSH public key to snippets"
  copy:
    src: "{{ ssh_key_path | expanduser }}"
    dest: "/var/lib/vz/snippets/{{ vm_id }}-sshkey.pub"

✓ Validates before use ✓ Proper error messages if missing


8. HELPER FUNCTIONS

New helpers.yml

Reusable utility tasks:

Helper Function
check_vm_exists Check if VM exists
check_template Check if VM is template
check_vm_status Get VM running status
check_storage Check storage space
validate_vm_id Validate VM ID format
get_vm_info Read VM configuration
list_vms List all VMs
cleanup_snippets Remove old Cloud-Init snippets

Usage Example

- name: "Verify VM exists"
  include_tasks: helpers.yml
  vars:
    helper_task: check_vm_exists
    target_vm_id: "{{ vm_id }}"

- name: "Print result"
  debug:
    msg: "VM exists: {{ vm_exists }}"

9. IMPROVED CLONE CREATION

Before

  • No validation of clone IDs
  • No error handling per clone
  • All-or-nothing approach

After

loop: "{{ clones }}"
loop_control:
  loop_var: clone

block:
  - name: "[CLONES] Check if clone already exists"
    stat:
      path: "/etc/pve/qemu-server/{{ clone.id }}.conf"
    register: clone_conf

  - name: "[CLONES] Clone VM"
    command: qm clone {{ vm_id }} {{ clone.id }}
    when: not clone_conf.stat.exists

rescue:
  - name: "[CLONES] Handle error for this clone"
    debug:
      msg: "WARNING: Clone {{ clone.id }} failed, continuing with next..."

✓ Each clone is independent ✓ One failed clone doesn't stop others ✓ Clear logging of what succeeded/failed


10. RICH LOGGING AND PROGRESS

Task Naming Convention

[STAGE] Action: description
├─ [PREFLIGHT] Check if running on Proxmox
├─ [IMAGE] Download Debian GenericCloud
├─ [VM] Create base VM
├─ [CONFIG] Configure disk
├─ [TEMPLATE] Convert to template
└─ [CLONES] Create clone 301

Progress Display

Start

╔════════════════════════════════════════════════════════════╗
║  Proxmox VM Template & Clone Manager                       ║
║  Template VM: debian-template-base (ID: 150)               ║
║  Storage: local-lvm                                        ║
║  CPU: 4 cores | RAM: 4096MB                                ║
╚════════════════════════════════════════════════════════════╝

End

╔════════════════════════════════════════════════════════════╗
║  ✓ Playbook execution completed                            ║
║  Template VM: debian-template-base (ID: 150)               ║
║  ✓ Converted to template                                   ║
║  ✓ 2 clone(s) created                                      ║
║  Next steps:                                               ║
║  - Verify VMs: qm list                                     ║
║  - Connect: ssh debian@<vm-ip>                             ║
║  - Check Cloud-Init: cloud-init status                     ║
╚════════════════════════════════════════════════════════════╝

Usage Examples

1. Full Deployment

ansible-playbook tasks/main.yml -i inventory

Runs all stages: preflight → image → VM → configure → template → clones

2. Re-run Safely (Idempotent)

ansible-playbook tasks/main.yml -i inventory

Second run skips already-completed operations.

3. Template Only

If you want to update template without re-downloading image:

ansible-playbook tasks/main.yml \
  -i inventory \
  --skip-tags image,vm,clones

4. Clone Only

After template is created, add new clones:

# Update defaults/main.yml
clones:
  - id: 303
    hostname: app03
    ip: "192.168.1.83/24"
    gateway: "192.168.1.1"

Then run:

ansible-playbook tasks/main.yml \
  -i inventory \
  --tags clones

5. Debug Output

ansible-playbook tasks/main.yml \
  -i inventory \
  -vvv

Shows all task details, command output, variable values.


Migration from Old Version

Step 1: Backup

cp -r ansible_proxmox_VM ansible_proxmox_VM.backup

Step 2: Replace Files

Use the new versions:

  • tasks/main.yml → orchestrator
  • All tasks/*.yml files → new implementations
  • defaults/main.yml → improved defaults

Step 3: Test with Dry-Run

ansible-playbook tasks/main.yml \
  -i inventory \
  --check

Shows what would happen without making changes.

Step 4: Run Normally

ansible-playbook tasks/main.yml -i inventory

Best Practices Going Forward

  1. Always use tags for partial execution
  2. Run preflight checks before major changes
  3. Test with --check before production
  4. Use --skip-tags to avoid re-downloading images
  5. Monitor Cloud-Init inside VMs: cloud-init status
  6. Keep backups of .orig files (already present)
  7. Review error messages carefully for context

Security Improvements

Password Management

# OLD
ci_password: "SecurePass123"

# NEW - Use Vault
ci_password: "{{ vault_debian_password }}"

Create vault file:

ansible-vault create group_vars/proxmox/vault.yml

Add:

vault_debian_password: "YourSecurePassword"

SSH Key Validation

Before: SSH key could be missing → confusing error After: Validates key exists and is readable


Troubleshooting

Problem: Playbook fails at preflight

Solution: Run preflight checks manually to see what's missing

ansible-playbook tasks/main.yml -i inventory --tags preflight -vvv

Problem: VM already exists, need to recreate

Solution: Delete the old VM first

qm destroy {{ vm_id }}

Then re-run playbook (idempotent).

Problem: Clone creation fails

Solution: Check clone configuration and IDs

qm list  # See all VMs

Ensure clone IDs don't conflict with existing VMs.

Problem: Cloud-Init not applying

Solution: Check snippets directory exists

ls -la /var/lib/vz/snippets/

Verify permissions are correct (644 for YAML files).


Next Steps

Consider these additional improvements:

  1. Molecule Testing - Add automated tests
  2. Vault Integration - Secure password management
  3. Role Packaging - Create Ansible Galaxy package
  4. Custom Filters - For more complex logic
  5. Notification - Send completion alerts (Slack, email)
  6. Metrics - Track VM creation time, resource usage
  7. Cleanup Role - Destroy VMs and templates
  8. Backup/Restore - Template and clone backup

Questions?

Refer to task inline comments for specifics. Each task file has extensive documentation.