- Added default configuration for VM creation in defaults/main.yml. - Created tasks for configuring the VM with UEFI, TPM, disks, GPU, and Cloud-Init in tasks/configure-vm.yml. - Implemented clone creation and configuration logic in tasks/create-clones.yml. - Added template conversion functionality in tasks/create-template.yml. - Developed base VM creation logic in tasks/create-vm.yml. - Included image download and caching tasks in tasks/download-image.yml. - Introduced utility tasks for common operations in tasks/helpers.yml. - Organized main orchestration logic in tasks/main.yml, with clear stages for each operation. - Added pre-flight checks to validate the environment before execution in tasks/preflight-checks.yml.
13 KiB
IMPROVEMENTS GUIDE: Ansible Proxmox VM Role
Summary of Changes
This document outlines the improvements made to your Ansible role for robustness, maintainability, and best practices.
What Was Improved
- Task Modularization - Split monolithic tasks into 6 logical stages
- Error Handling - Added try-catch blocks with recovery strategies
- Idempotency - Ensured all operations are safe to re-run
- Pre-flight Validation - Comprehensive environment checks before execution
- Documentation - Extensive inline comments and variable documentation
- Logging - Rich task names and debug output for troubleshooting
File Structure
New/Modified Files
tasks/
├─ main.yml # REFACTORED: Now orchestrates subtasks
├─ preflight-checks.yml # NEW: Environment validation
├─ download-image.yml # IMPROVED: Better error handling & caching
├─ create-vm.yml # IMPROVED: Idempotent VM creation
├─ configure-vm.yml # IMPROVED: Disk, Cloud-Init, TPM, GPU with error handling
├─ create-template.yml # IMPROVED: Idempotent template conversion
├─ create-clones.yml # IMPROVED: Clone creation with validation
└─ helpers.yml # NEW: Utility tasks for common operations
defaults/
└─ main.yml # IMPROVED: Complete documentation & new options
templates/
├─ cloudinit_userdata.yaml.j2 # No changes
└─ cloudinit_vendor.yaml.j2 # No changes
1. TASK MODULARIZATION
Before
All tasks were in a single main.yml file (~150+ lines), making it:
- Difficult to debug
- Hard to extend
- Not reusable
After
Each stage has its own file:
| File | Purpose | Key Features |
|---|---|---|
preflight-checks.yml |
Validate environment | Checks Proxmox, storage, SSH keys, IPs |
download-image.yml |
Get Debian image | Caching, retry logic, size verification |
create-vm.yml |
Create VM | Idempotent, error handling |
configure-vm.yml |
Configure VM | Disk, Cloud-Init, TPM, GPU all in one |
create-template.yml |
Make template | Skip if already templated |
create-clones.yml |
Deploy clones | Loop through clone list with validation |
helpers.yml |
Utilities | Reusable helper functions |
Running Specific Stages
# Run only pre-flight checks
ansible-playbook tasks/main.yml --tags preflight
# Run everything except template/clone
ansible-playbook tasks/main.yml --skip-tags template,clones
# Run only clone creation
ansible-playbook tasks/main.yml --tags clones
# Run image download and VM creation only
ansible-playbook tasks/main.yml --tags image,vm
2. ERROR HANDLING
Before
- Minimal error checking
- Tasks would fail silently or with generic errors
- No recovery paths
After
Each major operation has:
Block/Rescue Structure
block:
- name: "[CONFIG] Try to import disk"
command: qm importdisk ...
rescue:
- name: "[CONFIG] Handle import failure"
fail:
msg: "Clear error message with context"
Retry Logic
register: result
retries: 3
delay: 5
until: result is succeeded
Validation Checks
- name: "[VM] Verify VM was created"
stat:
path: "/etc/pve/qemu-server/{{ vm_id }}.conf"
register: vm_verify
failed_when: not vm_verify.stat.exists
Error Messages Include
- What went wrong
- Which VM/resource was affected
- Next steps to fix
3. IDEMPOTENCY
Before
- Running playbook twice would fail or cause issues
- Template conversion would fail if already templated
- No checks for existing resources
After
All operations are idempotent:
Check Before Action
- name: "Check if VM already exists"
stat:
path: "/etc/pve/qemu-server/{{ vm_id }}.conf"
register: vm_conf
- name: "Create VM"
command: qm create ...
when: not vm_conf.stat.exists
Safe Re-runs
- Already-created VMs are skipped
- Already-converted templates are skipped
- Already-deployed clones are skipped
- Image is cached and reused
Result: You can run the playbook 10 times safely!
4. PRE-FLIGHT CHECKS
New preflight-checks.yml
Validates before starting:
✓ Proxmox is installed (qm command exists)
✓ User can run Proxmox commands (permissions)
✓ Storage pool exists and is accessible
✓ SSH key file exists and is readable
✓ VM IDs are unique (warns if conflict)
✓ Clone IDs are unique (warns if conflict)
✓ IP addresses are valid format
✓ Gateway and DNS are valid IPs
✓ Snippets directory exists
Sample Output
[PREFLIGHT] Check if running on Proxmox host ... ok
[PREFLIGHT] Verify qm command is available ... ok
[PREFLIGHT] Check if user can run qm commands ... ok
[PREFLIGHT] Verify storage pool exists ... ok
[PREFLIGHT] Summary - All checks passed
5. IMPROVED DEFAULTS
New Variables in defaults/main.yml
# Retry settings
max_retries: 3
retry_delay: 5
# Timeout settings (seconds)
image_download_timeout: 300
vm_boot_timeout: 60
cloud_init_timeout: 120
# Debug mode
debug_mode: false
Better Documentation
Each variable has:
- Purpose explanation
- Valid values
- Examples
- Security warnings
6. IDEMPOTENT TEMPLATE CONVERSION
Before
- name: Convert VM to template
command: qm template {{ vm_id }}
args:
creates: "/etc/pve/qemu-server/{{ vm_id }}.conf.lock"
❌ .lock file doesn't exist; always runs
After
- name: "[TEMPLATE] Check if VM is already a template"
shell: "qm config {{ vm_id }} | grep -q 'template: 1'"
register: is_template
failed_when: false
- name: "[TEMPLATE] Convert VM to template"
command: "qm template {{ vm_id }}"
when: is_template.rc != 0
✅ Checks actual template status; skips if already templated
7. BETTER CLOUD-INIT HANDLING
Before
- Snippets not validated
- SSH key lookup could fail silently
After
- name: "[CONFIG] Verify SSH key is readable"
stat:
path: "{{ ssh_key_path | expanduser }}"
register: ssh_key_stat
failed_when: not ssh_key_stat.stat.readable
- name: "[CONFIG] Copy SSH public key to snippets"
copy:
src: "{{ ssh_key_path | expanduser }}"
dest: "/var/lib/vz/snippets/{{ vm_id }}-sshkey.pub"
✓ Validates before use ✓ Proper error messages if missing
8. HELPER FUNCTIONS
New helpers.yml
Reusable utility tasks:
| Helper | Function |
|---|---|
check_vm_exists |
Check if VM exists |
check_template |
Check if VM is template |
check_vm_status |
Get VM running status |
check_storage |
Check storage space |
validate_vm_id |
Validate VM ID format |
get_vm_info |
Read VM configuration |
list_vms |
List all VMs |
cleanup_snippets |
Remove old Cloud-Init snippets |
Usage Example
- name: "Verify VM exists"
include_tasks: helpers.yml
vars:
helper_task: check_vm_exists
target_vm_id: "{{ vm_id }}"
- name: "Print result"
debug:
msg: "VM exists: {{ vm_exists }}"
9. IMPROVED CLONE CREATION
Before
- No validation of clone IDs
- No error handling per clone
- All-or-nothing approach
After
loop: "{{ clones }}"
loop_control:
loop_var: clone
block:
- name: "[CLONES] Check if clone already exists"
stat:
path: "/etc/pve/qemu-server/{{ clone.id }}.conf"
register: clone_conf
- name: "[CLONES] Clone VM"
command: qm clone {{ vm_id }} {{ clone.id }}
when: not clone_conf.stat.exists
rescue:
- name: "[CLONES] Handle error for this clone"
debug:
msg: "WARNING: Clone {{ clone.id }} failed, continuing with next..."
✓ Each clone is independent ✓ One failed clone doesn't stop others ✓ Clear logging of what succeeded/failed
10. RICH LOGGING AND PROGRESS
Task Naming Convention
[STAGE] Action: description
├─ [PREFLIGHT] Check if running on Proxmox
├─ [IMAGE] Download Debian GenericCloud
├─ [VM] Create base VM
├─ [CONFIG] Configure disk
├─ [TEMPLATE] Convert to template
└─ [CLONES] Create clone 301
Progress Display
Start
╔════════════════════════════════════════════════════════════╗
║ Proxmox VM Template & Clone Manager ║
║ Template VM: debian-template-base (ID: 150) ║
║ Storage: local-lvm ║
║ CPU: 4 cores | RAM: 4096MB ║
╚════════════════════════════════════════════════════════════╝
End
╔════════════════════════════════════════════════════════════╗
║ ✓ Playbook execution completed ║
║ Template VM: debian-template-base (ID: 150) ║
║ ✓ Converted to template ║
║ ✓ 2 clone(s) created ║
║ Next steps: ║
║ - Verify VMs: qm list ║
║ - Connect: ssh debian@<vm-ip> ║
║ - Check Cloud-Init: cloud-init status ║
╚════════════════════════════════════════════════════════════╝
Usage Examples
1. Full Deployment
ansible-playbook tasks/main.yml -i inventory
Runs all stages: preflight → image → VM → configure → template → clones
2. Re-run Safely (Idempotent)
ansible-playbook tasks/main.yml -i inventory
Second run skips already-completed operations.
3. Template Only
If you want to update template without re-downloading image:
ansible-playbook tasks/main.yml \
-i inventory \
--skip-tags image,vm,clones
4. Clone Only
After template is created, add new clones:
# Update defaults/main.yml
clones:
- id: 303
hostname: app03
ip: "192.168.1.83/24"
gateway: "192.168.1.1"
Then run:
ansible-playbook tasks/main.yml \
-i inventory \
--tags clones
5. Debug Output
ansible-playbook tasks/main.yml \
-i inventory \
-vvv
Shows all task details, command output, variable values.
Migration from Old Version
Step 1: Backup
cp -r ansible_proxmox_VM ansible_proxmox_VM.backup
Step 2: Replace Files
Use the new versions:
tasks/main.yml→ orchestrator- All
tasks/*.ymlfiles → new implementations defaults/main.yml→ improved defaults
Step 3: Test with Dry-Run
ansible-playbook tasks/main.yml \
-i inventory \
--check
Shows what would happen without making changes.
Step 4: Run Normally
ansible-playbook tasks/main.yml -i inventory
Best Practices Going Forward
- Always use tags for partial execution
- Run preflight checks before major changes
- Test with
--checkbefore production - Use
--skip-tagsto avoid re-downloading images - Monitor Cloud-Init inside VMs:
cloud-init status - Keep backups of
.origfiles (already present) - Review error messages carefully for context
Security Improvements
Password Management
# OLD
ci_password: "SecurePass123"
# NEW - Use Vault
ci_password: "{{ vault_debian_password }}"
Create vault file:
ansible-vault create group_vars/proxmox/vault.yml
Add:
vault_debian_password: "YourSecurePassword"
SSH Key Validation
Before: SSH key could be missing → confusing error After: Validates key exists and is readable
Troubleshooting
Problem: Playbook fails at preflight
Solution: Run preflight checks manually to see what's missing
ansible-playbook tasks/main.yml -i inventory --tags preflight -vvv
Problem: VM already exists, need to recreate
Solution: Delete the old VM first
qm destroy {{ vm_id }}
Then re-run playbook (idempotent).
Problem: Clone creation fails
Solution: Check clone configuration and IDs
qm list # See all VMs
Ensure clone IDs don't conflict with existing VMs.
Problem: Cloud-Init not applying
Solution: Check snippets directory exists
ls -la /var/lib/vz/snippets/
Verify permissions are correct (644 for YAML files).
Next Steps
Consider these additional improvements:
- Molecule Testing - Add automated tests
- Vault Integration - Secure password management
- Role Packaging - Create Ansible Galaxy package
- Custom Filters - For more complex logic
- Notification - Send completion alerts (Slack, email)
- Metrics - Track VM creation time, resource usage
- Cleanup Role - Destroy VMs and templates
- Backup/Restore - Template and clone backup
Questions?
Refer to task inline comments for specifics. Each task file has extensive documentation.