Jose 6361b3fc41 refactor ♻️: Improve retry logic consistency across tasks
Standardized the use of `retries` and `delay` variables in tasks to ensure consistent behavior, making it easier to manage and maintain the workflow. This change also enables better error handling and reduces potential issues with task execution.
2025-11-16 08:02:18 +01:00
2025-11-15 18:46:21 +01:00
2025-11-15 12:59:08 +01:00
2025-11-15 17:39:23 +01:00

Ansible Role: Proxmox VM → Template → Clones (CloudInit)

Production-grade automation for Debian GenericCloud VMs on Proxmox with error handling, idempotency, and comprehensive validation.

Automates the complete lifecycle:

  • Pre-flight environment validation (20+ checks)
  • Download & cache Debian GenericCloud image
  • Create base VM with error recovery
  • Configure disk, networking, Cloud-Init, TPM, GPU
  • Convert VM to template (idempotent - safe to re-run!)
  • Deploy multiple clones with custom networking
  • Per-clone error handling (failures don't cascade)

Features

  • Error Handling - Automatic retry (3x, 5-sec delay) with clear messages
  • Idempotency - Truly safe to re-run; skips already-completed operations
  • Pre-flight Validation - 20+ environment checks before execution
  • Modular Design - 6 independent task stages with tag-based execution
  • Image Caching - Downloads once, reuses on re-runs (faster!)
  • DHCP or Static IP - Flexible networking configuration
  • Cloud-Init - Users, SSH keys, passwords, timezone, packages
  • TPM 2.0 + SecureBoot - Optional UEFI firmware support
  • GPU Passthrough - Optional PCI device or VirtIO GPU
  • Disk Resize - Optional automatic disk expansion
  • Multi-Clone - Deploy multiple clones independently
  • Rich Logging - Progress tracking and debug output

Folder Structure

ansible_proxmox_VM/
├─ defaults/
│   └─ main.yml                    # All configuration (comprehensive docs)
├─ tasks/
│   ├─ main.yml                    # Orchestrator (calls subtasks)
│   ├─ preflight-checks.yml        # Environment validation (20+ checks)
│   ├─ download-image.yml          # Download Debian image (with caching)
│   ├─ create-vm.yml               # Create VM (idempotent)
│   ├─ configure-vm.yml            # Configure disk, Cloud-Init, TPM, GPU
│   ├─ create-template.yml         # Convert to template (idempotent - FIXED!)
│   ├─ create-clones.yml           # Deploy clones (per-clone error handling)
│   └─ helpers.yml                 # 8 utility functions
├─ templates/
│   ├─ cloudinit_userdata.yaml.j2  # Cloud-Init user data template
│   └─ cloudinit_vendor.yaml.j2    # Cloud-Init vendor data template
└─ README.md                        # This file

Requirements

  • Proxmox VE 7.x or 8.x installed and accessible
  • Ansible 2.9+ with SSH access to Proxmox host
  • Proxmox user with permission to run qm commands (root recommended)
  • Storage pool configured (e.g., local-lvm)
  • Snippets storage enabled for Cloud-Init (Datacenter → Storage)

Quick Start

1. Validate Environment

ansible-playbook tasks/main.yml --tags preflight -vvv

Checks Proxmox connectivity, storage, SSH keys, permissions.

2. Dry Run (Preview Changes)

ansible-playbook tasks/main.yml --check -vv

Shows what would happen without making any changes.

3. Full Deployment

ansible-playbook tasks/main.yml -i inventory

Creates VM → configures it → converts to template → deploys clones

4. Re-run (Test Idempotency)

ansible-playbook tasks/main.yml -i inventory

Second run is much faster (~30 sec)! Skips already-completed operations.

Configuration Variables

All variables are in defaults/main.yml with comprehensive inline documentation.

Base VM Configuration

vm_id: 150                           # Unique Proxmox VM ID (≥100)
hostname: debian-template-base       # VM hostname
memory: 4096                         # RAM in MB
cores: 4                             # CPU cores
cpu_type: host                       # CPU type
bridge: vmbr0                        # Network bridge
storage: local-lvm                   # Storage pool

Networking

ip_mode: dhcp                        # 'dhcp' or 'static'
ip_address: "192.168.1.60/24"       # Static IP if ip_mode: static
gateway: "192.168.1.1"              # Gateway
dns:
  - "1.1.1.1"
  - "8.8.8.8"

Cloud-Init

ci_user: debian                      # Default user
ci_password: "SecurePass123"        # Use Vault in production!
ssh_key_path: "~/.ssh/id_rsa.pub"   # SSH public key path
timezone: "Europe/Berlin"            # Timezone
packages:
  - qemu-guest-agent
  - curl
  - htop

Advanced Options

enable_tpm: false                    # UEFI + TPM 2.0
gpu_passthrough: false               # PCI GPU passthrough
virtio_gpu: false                    # VirtIO GPU
resize_disk: true                    # Auto-resize disk
resize_size: "16G"                   # Target disk size
make_template: true                  # Convert to template
create_clones: true                  # Deploy clones

Clone Definition

clones:
  - id: 301
    hostname: app01
    ip: "192.168.1.81/24"
    gateway: "192.168.1.1"
    full: 1                          # 1=full, 0=linked
  - id: 302
    hostname: app02
    ip: "192.168.1.82/24"
    gateway: "192.168.1.1"
    full: 0                          # Linked clones are faster

See defaults/main.yml for all options with detailed documentation.

Usage

Include in Playbook

- hosts: proxmox_host
  become: true
  roles:
    - ansible_proxmox_vm

Run Directly

ansible-playbook tasks/main.yml -i inventory

Using Tags (Run Specific Stages)

# Pre-flight checks only
ansible-playbook tasks/main.yml --tags preflight -vvv

# Create VM and template (skip clones)
ansible-playbook tasks/main.yml --skip-tags clones

# Add clones to existing template
ansible-playbook tasks/main.yml --tags clones

# Skip image re-download
ansible-playbook tasks/main.yml --skip-tags image

Playbook Stages (6 Stages)

Stage Task Purpose Idempotent
1 preflight-checks.yml Validate environment (20+ checks) Yes
2 download-image.yml Download/cache Debian image Yes
3 create-vm.yml Create base VM Yes
4 configure-vm.yml Configure disk, network, Cloud-Init Yes
5 create-template.yml Convert to template Yes (FIXED!)
6 create-clones.yml Deploy clones from template Yes

Key Improvements

Error Handling

  • Automatic retry with configurable delays (3x, 5-sec)
  • Context-aware error messages
  • Per-clone error isolation (doesn't cascade)

Idempotency

  • Safe to re-run multiple times
  • Skips already-completed operations
  • Image cached and reused
  • Template conversion idempotent (was broken in v1!)

Pre-flight Validation

  • Proxmox connectivity & permissions
  • Storage pool availability
  • SSH key readiness
  • IP address format validation
  • VM ID uniqueness checks

Advanced Features

  • UEFI/TPM 2.0 support
  • GPU passthrough (PCI or VirtIO)
  • Automatic disk resize
  • Cloud-Init with user/password/SSH
  • DHCP or static networking
  • Multi-clone deployment

Testing & Validation

Preflight Checks

ansible-playbook tasks/main.yml --tags preflight -vvv

Dry Run (Preview)

ansible-playbook tasks/main.yml --check -vv

Test Idempotency

# First run
ansible-playbook tasks/main.yml -vv

# Second run (should be much faster)
ansible-playbook tasks/main.yml -vv

Cloud-Init Templates

cloudinit_userdata.yaml.j2

Configures:

  • User creation with sudo access
  • SSH key injection
  • Password authentication
  • Timezone setting
  • Package updates

cloudinit_vendor.yaml.j2

Configures:

  • Package installation
  • DNS settings (optional)

Security Notes

⚠️ Passwords: Use Ansible Vault in production:

ansible-vault create group_vars/proxmox/vault.yml

Then reference: ci_password: "{{ vault_ci_password }}"

SSH Keys: Automatically validated before use Permissions: Checks if user can run qm commands No Hardcoded Secrets: All in variables

Best Practices

  1. Always run with --check first
  2. Validate environment with --tags preflight
  3. Skip image re-download with --skip-tags image
  4. Monitor Cloud-Init: cloud-init status inside VM
  5. Test in dev environment first
  6. Use linked clones (full: 0) for faster deployments
  7. Enable Proxmox snippets storage

Performance

  • First run: ~5-10 minutes (downloads image, creates VM)
  • Re-runs: ~30 seconds (operations skipped)
  • Linked clones: Much faster than full clones

Troubleshooting

Preflight validation fails

ansible-playbook tasks/main.yml --tags preflight -vvv

Cloud-Init not applying

# Inside VM:
cloud-init status
cloud-init logs

# Check snippets:
ls -la /var/lib/vz/snippets/

SSH key issues

# Verify SSH key
ls -la ~/.ssh/id_rsa.pub

# Run with verbose
ansible-playbook tasks/main.yml -vvv

Common Proxmox Commands

# List all VMs
qm list

# Check VM status
qm status 150

# View VM config
qm config 150

# Connect to console
qm terminal 150

# SSH into VM
ssh debian@<vm-ip>

# Check Cloud-Init
cloud-init status --all

Compatibility

  • Proxmox: 7.x, 8.x (uses qm CLI)
  • Debian: Bookworm GenericCloud (configurable)
  • Ansible: 2.9+ (standard modules)
  • Backward Compatible: 100%

Support

Refer to:

  • defaults/main.yml - Complete variable documentation
  • Task files - Inline comments explaining implementation
  • Run with -vvv flag for debug output
  • Check /var/lib/vz/snippets/ for Cloud-Init files

License

Open source - use as-is for Proxmox automation.

Description
No description provided
Readme 262 KiB
Languages
Jinja 100%