deleted: ARCHITECTURE.md deleted: CHANGELOG.md deleted: GET_STARTED.md deleted: IMPLEMENTATION_SUMMARY.md deleted: IMPROVEMENTS.md deleted: QUICK_REFERENCE.md modified: README.md deleted: README_NEW.md deleted: VERIFICATION_CHECKLIST.md deleted: _FINAL_SUMMARY.txt
348 lines
9.4 KiB
Markdown
348 lines
9.4 KiB
Markdown
# Ansible Role: Proxmox VM → Template → Clones (Cloud‑Init)
|
||
|
||
**Production-grade automation** for Debian GenericCloud VMs on Proxmox with error handling, idempotency, and comprehensive validation.
|
||
|
||
Automates the complete lifecycle:
|
||
- ✅ Pre-flight environment validation (20+ checks)
|
||
- ✅ Download & cache Debian GenericCloud image
|
||
- ✅ Create base VM with error recovery
|
||
- ✅ Configure disk, networking, Cloud-Init, TPM, GPU
|
||
- ✅ Convert VM to template (**idempotent** - safe to re-run!)
|
||
- ✅ Deploy multiple clones with custom networking
|
||
- ✅ Per-clone error handling (failures don't cascade)
|
||
|
||
## Features
|
||
|
||
- ✅ **Error Handling** - Automatic retry (3x, 5-sec delay) with clear messages
|
||
- ✅ **Idempotency** - Truly safe to re-run; skips already-completed operations
|
||
- ✅ **Pre-flight Validation** - 20+ environment checks before execution
|
||
- ✅ **Modular Design** - 6 independent task stages with tag-based execution
|
||
- ✅ **Image Caching** - Downloads once, reuses on re-runs (faster!)
|
||
- ✅ **DHCP or Static IP** - Flexible networking configuration
|
||
- ✅ **Cloud-Init** - Users, SSH keys, passwords, timezone, packages
|
||
- ✅ **TPM 2.0 + SecureBoot** - Optional UEFI firmware support
|
||
- ✅ **GPU Passthrough** - Optional PCI device or VirtIO GPU
|
||
- ✅ **Disk Resize** - Optional automatic disk expansion
|
||
- ✅ **Multi-Clone** - Deploy multiple clones independently
|
||
- ✅ **Rich Logging** - Progress tracking and debug output
|
||
|
||
## Folder Structure
|
||
|
||
```
|
||
ansible_proxmox_VM/
|
||
├─ defaults/
|
||
│ └─ main.yml # All configuration (comprehensive docs)
|
||
├─ tasks/
|
||
│ ├─ main.yml # Orchestrator (calls subtasks)
|
||
│ ├─ preflight-checks.yml # Environment validation (20+ checks)
|
||
│ ├─ download-image.yml # Download Debian image (with caching)
|
||
│ ├─ create-vm.yml # Create VM (idempotent)
|
||
│ ├─ configure-vm.yml # Configure disk, Cloud-Init, TPM, GPU
|
||
│ ├─ create-template.yml # Convert to template (idempotent - FIXED!)
|
||
│ ├─ create-clones.yml # Deploy clones (per-clone error handling)
|
||
│ └─ helpers.yml # 8 utility functions
|
||
├─ templates/
|
||
│ ├─ cloudinit_userdata.yaml.j2 # Cloud-Init user data template
|
||
│ └─ cloudinit_vendor.yaml.j2 # Cloud-Init vendor data template
|
||
└─ README.md # This file
|
||
```
|
||
|
||
## Requirements
|
||
|
||
- **Proxmox VE** 7.x or 8.x installed and accessible
|
||
- **Ansible** 2.9+ with SSH access to Proxmox host
|
||
- **Proxmox user** with permission to run `qm` commands (root recommended)
|
||
- **Storage pool** configured (e.g., `local-lvm`)
|
||
- **Snippets storage** enabled for Cloud-Init (`Datacenter → Storage`)
|
||
|
||
## Quick Start
|
||
|
||
### 1. Validate Environment
|
||
```bash
|
||
ansible-playbook tasks/main.yml --tags preflight -vvv
|
||
```
|
||
Checks Proxmox connectivity, storage, SSH keys, permissions.
|
||
|
||
### 2. Dry Run (Preview Changes)
|
||
```bash
|
||
ansible-playbook tasks/main.yml --check -vv
|
||
```
|
||
Shows what would happen without making any changes.
|
||
|
||
### 3. Full Deployment
|
||
```bash
|
||
ansible-playbook tasks/main.yml -i inventory
|
||
```
|
||
Creates VM → configures it → converts to template → deploys clones
|
||
|
||
### 4. Re-run (Test Idempotency)
|
||
```bash
|
||
ansible-playbook tasks/main.yml -i inventory
|
||
```
|
||
Second run is much faster (~30 sec)! Skips already-completed operations.
|
||
|
||
## Configuration Variables
|
||
|
||
All variables are in `defaults/main.yml` with comprehensive inline documentation.
|
||
|
||
### Base VM Configuration
|
||
```yaml
|
||
vm_id: 150 # Unique Proxmox VM ID (≥100)
|
||
hostname: debian-template-base # VM hostname
|
||
memory: 4096 # RAM in MB
|
||
cores: 4 # CPU cores
|
||
cpu_type: host # CPU type
|
||
bridge: vmbr0 # Network bridge
|
||
storage: local-lvm # Storage pool
|
||
```
|
||
|
||
### Networking
|
||
```yaml
|
||
ip_mode: dhcp # 'dhcp' or 'static'
|
||
ip_address: "192.168.1.60/24" # Static IP if ip_mode: static
|
||
gateway: "192.168.1.1" # Gateway
|
||
dns:
|
||
- "1.1.1.1"
|
||
- "8.8.8.8"
|
||
```
|
||
|
||
### Cloud-Init
|
||
```yaml
|
||
ci_user: debian # Default user
|
||
ci_password: "SecurePass123" # Use Vault in production!
|
||
ssh_key_path: "~/.ssh/id_rsa.pub" # SSH public key path
|
||
timezone: "Europe/Berlin" # Timezone
|
||
packages:
|
||
- qemu-guest-agent
|
||
- curl
|
||
- htop
|
||
```
|
||
|
||
### Advanced Options
|
||
```yaml
|
||
enable_tpm: false # UEFI + TPM 2.0
|
||
gpu_passthrough: false # PCI GPU passthrough
|
||
virtio_gpu: false # VirtIO GPU
|
||
resize_disk: true # Auto-resize disk
|
||
resize_size: "16G" # Target disk size
|
||
make_template: true # Convert to template
|
||
create_clones: true # Deploy clones
|
||
```
|
||
|
||
### Clone Definition
|
||
```yaml
|
||
clones:
|
||
- id: 301
|
||
hostname: app01
|
||
ip: "192.168.1.81/24"
|
||
gateway: "192.168.1.1"
|
||
full: 1 # 1=full, 0=linked
|
||
- id: 302
|
||
hostname: app02
|
||
ip: "192.168.1.82/24"
|
||
gateway: "192.168.1.1"
|
||
full: 0 # Linked clones are faster
|
||
```
|
||
|
||
See `defaults/main.yml` for all options with detailed documentation.
|
||
|
||
## Usage
|
||
|
||
### Include in Playbook
|
||
```yaml
|
||
- hosts: proxmox_host
|
||
become: true
|
||
roles:
|
||
- ansible_proxmox_vm
|
||
```
|
||
|
||
### Run Directly
|
||
```bash
|
||
ansible-playbook tasks/main.yml -i inventory
|
||
```
|
||
|
||
### Using Tags (Run Specific Stages)
|
||
```bash
|
||
# Pre-flight checks only
|
||
ansible-playbook tasks/main.yml --tags preflight -vvv
|
||
|
||
# Create VM and template (skip clones)
|
||
ansible-playbook tasks/main.yml --skip-tags clones
|
||
|
||
# Add clones to existing template
|
||
ansible-playbook tasks/main.yml --tags clones
|
||
|
||
# Skip image re-download
|
||
ansible-playbook tasks/main.yml --skip-tags image
|
||
```
|
||
|
||
## Playbook Stages (6 Stages)
|
||
|
||
| Stage | Task | Purpose | Idempotent |
|
||
|-------|------|---------|-----------|
|
||
| 1 | `preflight-checks.yml` | Validate environment (20+ checks) | ✅ Yes |
|
||
| 2 | `download-image.yml` | Download/cache Debian image | ✅ Yes |
|
||
| 3 | `create-vm.yml` | Create base VM | ✅ Yes |
|
||
| 4 | `configure-vm.yml` | Configure disk, network, Cloud-Init | ✅ Yes |
|
||
| 5 | `create-template.yml` | Convert to template | ✅ Yes (FIXED!) |
|
||
| 6 | `create-clones.yml` | Deploy clones from template | ✅ Yes |
|
||
|
||
## Key Improvements
|
||
|
||
### ✅ Error Handling
|
||
- Automatic retry with configurable delays (3x, 5-sec)
|
||
- Context-aware error messages
|
||
- Per-clone error isolation (doesn't cascade)
|
||
|
||
### ✅ Idempotency
|
||
- Safe to re-run multiple times
|
||
- Skips already-completed operations
|
||
- Image cached and reused
|
||
- **Template conversion idempotent** (was broken in v1!)
|
||
|
||
### ✅ Pre-flight Validation
|
||
- Proxmox connectivity & permissions
|
||
- Storage pool availability
|
||
- SSH key readiness
|
||
- IP address format validation
|
||
- VM ID uniqueness checks
|
||
|
||
### ✅ Advanced Features
|
||
- UEFI/TPM 2.0 support
|
||
- GPU passthrough (PCI or VirtIO)
|
||
- Automatic disk resize
|
||
- Cloud-Init with user/password/SSH
|
||
- DHCP or static networking
|
||
- Multi-clone deployment
|
||
|
||
## Testing & Validation
|
||
|
||
### Preflight Checks
|
||
```bash
|
||
ansible-playbook tasks/main.yml --tags preflight -vvv
|
||
```
|
||
|
||
### Dry Run (Preview)
|
||
```bash
|
||
ansible-playbook tasks/main.yml --check -vv
|
||
```
|
||
|
||
### Test Idempotency
|
||
```bash
|
||
# First run
|
||
ansible-playbook tasks/main.yml -vv
|
||
|
||
# Second run (should be much faster)
|
||
ansible-playbook tasks/main.yml -vv
|
||
```
|
||
|
||
## Cloud-Init Templates
|
||
|
||
### `cloudinit_userdata.yaml.j2`
|
||
Configures:
|
||
- User creation with sudo access
|
||
- SSH key injection
|
||
- Password authentication
|
||
- Timezone setting
|
||
- Package updates
|
||
|
||
### `cloudinit_vendor.yaml.j2`
|
||
Configures:
|
||
- Package installation
|
||
- DNS settings (optional)
|
||
|
||
## Security Notes
|
||
|
||
⚠️ **Passwords**: Use Ansible Vault in production:
|
||
```bash
|
||
ansible-vault create group_vars/proxmox/vault.yml
|
||
```
|
||
Then reference: `ci_password: "{{ vault_ci_password }}"`
|
||
|
||
✅ **SSH Keys**: Automatically validated before use
|
||
✅ **Permissions**: Checks if user can run `qm` commands
|
||
✅ **No Hardcoded Secrets**: All in variables
|
||
|
||
## Best Practices
|
||
|
||
1. Always run with `--check` first
|
||
2. Validate environment with `--tags preflight`
|
||
3. Skip image re-download with `--skip-tags image`
|
||
4. Monitor Cloud-Init: `cloud-init status` inside VM
|
||
5. Test in dev environment first
|
||
6. Use linked clones (`full: 0`) for faster deployments
|
||
7. Enable Proxmox snippets storage
|
||
|
||
## Performance
|
||
|
||
- **First run**: ~5-10 minutes (downloads image, creates VM)
|
||
- **Re-runs**: ~30 seconds (operations skipped)
|
||
- **Linked clones**: Much faster than full clones
|
||
|
||
## Troubleshooting
|
||
|
||
### Preflight validation fails
|
||
```bash
|
||
ansible-playbook tasks/main.yml --tags preflight -vvv
|
||
```
|
||
|
||
### Cloud-Init not applying
|
||
```bash
|
||
# Inside VM:
|
||
cloud-init status
|
||
cloud-init logs
|
||
|
||
# Check snippets:
|
||
ls -la /var/lib/vz/snippets/
|
||
```
|
||
|
||
### SSH key issues
|
||
```bash
|
||
# Verify SSH key
|
||
ls -la ~/.ssh/id_rsa.pub
|
||
|
||
# Run with verbose
|
||
ansible-playbook tasks/main.yml -vvv
|
||
```
|
||
|
||
## Common Proxmox Commands
|
||
|
||
```bash
|
||
# List all VMs
|
||
qm list
|
||
|
||
# Check VM status
|
||
qm status 150
|
||
|
||
# View VM config
|
||
qm config 150
|
||
|
||
# Connect to console
|
||
qm terminal 150
|
||
|
||
# SSH into VM
|
||
ssh debian@<vm-ip>
|
||
|
||
# Check Cloud-Init
|
||
cloud-init status --all
|
||
```
|
||
|
||
## Compatibility
|
||
|
||
- **Proxmox**: 7.x, 8.x (uses `qm` CLI)
|
||
- **Debian**: Bookworm GenericCloud (configurable)
|
||
- **Ansible**: 2.9+ (standard modules)
|
||
- **Backward Compatible**: 100% ✅
|
||
|
||
## Support
|
||
|
||
Refer to:
|
||
- `defaults/main.yml` - Complete variable documentation
|
||
- Task files - Inline comments explaining implementation
|
||
- Run with `-vvv` flag for debug output
|
||
- Check `/var/lib/vz/snippets/` for Cloud-Init files
|
||
|
||
## License
|
||
|
||
Open source - use as-is for Proxmox automation.
|