561 lines
13 KiB
Markdown
561 lines
13 KiB
Markdown
|
|
# IMPROVEMENTS GUIDE: Ansible Proxmox VM Role
|
||
|
|
|
||
|
|
## Summary of Changes
|
||
|
|
|
||
|
|
This document outlines the improvements made to your Ansible role for robustness, maintainability, and best practices.
|
||
|
|
|
||
|
|
### What Was Improved
|
||
|
|
|
||
|
|
1. **Task Modularization** - Split monolithic tasks into 6 logical stages
|
||
|
|
2. **Error Handling** - Added try-catch blocks with recovery strategies
|
||
|
|
3. **Idempotency** - Ensured all operations are safe to re-run
|
||
|
|
4. **Pre-flight Validation** - Comprehensive environment checks before execution
|
||
|
|
5. **Documentation** - Extensive inline comments and variable documentation
|
||
|
|
6. **Logging** - Rich task names and debug output for troubleshooting
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## File Structure
|
||
|
|
|
||
|
|
### New/Modified Files
|
||
|
|
|
||
|
|
```
|
||
|
|
tasks/
|
||
|
|
├─ main.yml # REFACTORED: Now orchestrates subtasks
|
||
|
|
├─ preflight-checks.yml # NEW: Environment validation
|
||
|
|
├─ download-image.yml # IMPROVED: Better error handling & caching
|
||
|
|
├─ create-vm.yml # IMPROVED: Idempotent VM creation
|
||
|
|
├─ configure-vm.yml # IMPROVED: Disk, Cloud-Init, TPM, GPU with error handling
|
||
|
|
├─ create-template.yml # IMPROVED: Idempotent template conversion
|
||
|
|
├─ create-clones.yml # IMPROVED: Clone creation with validation
|
||
|
|
└─ helpers.yml # NEW: Utility tasks for common operations
|
||
|
|
|
||
|
|
defaults/
|
||
|
|
└─ main.yml # IMPROVED: Complete documentation & new options
|
||
|
|
|
||
|
|
templates/
|
||
|
|
├─ cloudinit_userdata.yaml.j2 # No changes
|
||
|
|
└─ cloudinit_vendor.yaml.j2 # No changes
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 1. TASK MODULARIZATION
|
||
|
|
|
||
|
|
### Before
|
||
|
|
All tasks were in a single `main.yml` file (~150+ lines), making it:
|
||
|
|
- Difficult to debug
|
||
|
|
- Hard to extend
|
||
|
|
- Not reusable
|
||
|
|
|
||
|
|
### After
|
||
|
|
Each stage has its own file:
|
||
|
|
|
||
|
|
| File | Purpose | Key Features |
|
||
|
|
|------|---------|--------------|
|
||
|
|
| `preflight-checks.yml` | Validate environment | Checks Proxmox, storage, SSH keys, IPs |
|
||
|
|
| `download-image.yml` | Get Debian image | Caching, retry logic, size verification |
|
||
|
|
| `create-vm.yml` | Create VM | Idempotent, error handling |
|
||
|
|
| `configure-vm.yml` | Configure VM | Disk, Cloud-Init, TPM, GPU all in one |
|
||
|
|
| `create-template.yml` | Make template | Skip if already templated |
|
||
|
|
| `create-clones.yml` | Deploy clones | Loop through clone list with validation |
|
||
|
|
| `helpers.yml` | Utilities | Reusable helper functions |
|
||
|
|
|
||
|
|
### Running Specific Stages
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Run only pre-flight checks
|
||
|
|
ansible-playbook tasks/main.yml --tags preflight
|
||
|
|
|
||
|
|
# Run everything except template/clone
|
||
|
|
ansible-playbook tasks/main.yml --skip-tags template,clones
|
||
|
|
|
||
|
|
# Run only clone creation
|
||
|
|
ansible-playbook tasks/main.yml --tags clones
|
||
|
|
|
||
|
|
# Run image download and VM creation only
|
||
|
|
ansible-playbook tasks/main.yml --tags image,vm
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 2. ERROR HANDLING
|
||
|
|
|
||
|
|
### Before
|
||
|
|
- Minimal error checking
|
||
|
|
- Tasks would fail silently or with generic errors
|
||
|
|
- No recovery paths
|
||
|
|
|
||
|
|
### After
|
||
|
|
Each major operation has:
|
||
|
|
|
||
|
|
**Block/Rescue Structure**
|
||
|
|
```yaml
|
||
|
|
block:
|
||
|
|
- name: "[CONFIG] Try to import disk"
|
||
|
|
command: qm importdisk ...
|
||
|
|
|
||
|
|
rescue:
|
||
|
|
- name: "[CONFIG] Handle import failure"
|
||
|
|
fail:
|
||
|
|
msg: "Clear error message with context"
|
||
|
|
```
|
||
|
|
|
||
|
|
**Retry Logic**
|
||
|
|
```yaml
|
||
|
|
register: result
|
||
|
|
retries: 3
|
||
|
|
delay: 5
|
||
|
|
until: result is succeeded
|
||
|
|
```
|
||
|
|
|
||
|
|
**Validation Checks**
|
||
|
|
```yaml
|
||
|
|
- name: "[VM] Verify VM was created"
|
||
|
|
stat:
|
||
|
|
path: "/etc/pve/qemu-server/{{ vm_id }}.conf"
|
||
|
|
register: vm_verify
|
||
|
|
failed_when: not vm_verify.stat.exists
|
||
|
|
```
|
||
|
|
|
||
|
|
### Error Messages Include
|
||
|
|
|
||
|
|
- What went wrong
|
||
|
|
- Which VM/resource was affected
|
||
|
|
- Next steps to fix
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 3. IDEMPOTENCY
|
||
|
|
|
||
|
|
### Before
|
||
|
|
- Running playbook twice would fail or cause issues
|
||
|
|
- Template conversion would fail if already templated
|
||
|
|
- No checks for existing resources
|
||
|
|
|
||
|
|
### After
|
||
|
|
All operations are idempotent:
|
||
|
|
|
||
|
|
**Check Before Action**
|
||
|
|
```yaml
|
||
|
|
- name: "Check if VM already exists"
|
||
|
|
stat:
|
||
|
|
path: "/etc/pve/qemu-server/{{ vm_id }}.conf"
|
||
|
|
register: vm_conf
|
||
|
|
|
||
|
|
- name: "Create VM"
|
||
|
|
command: qm create ...
|
||
|
|
when: not vm_conf.stat.exists
|
||
|
|
```
|
||
|
|
|
||
|
|
**Safe Re-runs**
|
||
|
|
- Already-created VMs are skipped
|
||
|
|
- Already-converted templates are skipped
|
||
|
|
- Already-deployed clones are skipped
|
||
|
|
- Image is cached and reused
|
||
|
|
|
||
|
|
**Result**: You can run the playbook 10 times safely!
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 4. PRE-FLIGHT CHECKS
|
||
|
|
|
||
|
|
### New `preflight-checks.yml`
|
||
|
|
|
||
|
|
Validates before starting:
|
||
|
|
|
||
|
|
✓ Proxmox is installed (`qm` command exists)
|
||
|
|
✓ User can run Proxmox commands (permissions)
|
||
|
|
✓ Storage pool exists and is accessible
|
||
|
|
✓ SSH key file exists and is readable
|
||
|
|
✓ VM IDs are unique (warns if conflict)
|
||
|
|
✓ Clone IDs are unique (warns if conflict)
|
||
|
|
✓ IP addresses are valid format
|
||
|
|
✓ Gateway and DNS are valid IPs
|
||
|
|
✓ Snippets directory exists
|
||
|
|
|
||
|
|
### Sample Output
|
||
|
|
|
||
|
|
```
|
||
|
|
[PREFLIGHT] Check if running on Proxmox host ... ok
|
||
|
|
[PREFLIGHT] Verify qm command is available ... ok
|
||
|
|
[PREFLIGHT] Check if user can run qm commands ... ok
|
||
|
|
[PREFLIGHT] Verify storage pool exists ... ok
|
||
|
|
[PREFLIGHT] Summary - All checks passed
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 5. IMPROVED DEFAULTS
|
||
|
|
|
||
|
|
### New Variables in `defaults/main.yml`
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
# Retry settings
|
||
|
|
max_retries: 3
|
||
|
|
retry_delay: 5
|
||
|
|
|
||
|
|
# Timeout settings (seconds)
|
||
|
|
image_download_timeout: 300
|
||
|
|
vm_boot_timeout: 60
|
||
|
|
cloud_init_timeout: 120
|
||
|
|
|
||
|
|
# Debug mode
|
||
|
|
debug_mode: false
|
||
|
|
```
|
||
|
|
|
||
|
|
### Better Documentation
|
||
|
|
|
||
|
|
Each variable has:
|
||
|
|
- Purpose explanation
|
||
|
|
- Valid values
|
||
|
|
- Examples
|
||
|
|
- Security warnings
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 6. IDEMPOTENT TEMPLATE CONVERSION
|
||
|
|
|
||
|
|
### Before
|
||
|
|
```yaml
|
||
|
|
- name: Convert VM to template
|
||
|
|
command: qm template {{ vm_id }}
|
||
|
|
args:
|
||
|
|
creates: "/etc/pve/qemu-server/{{ vm_id }}.conf.lock"
|
||
|
|
```
|
||
|
|
❌ `.lock` file doesn't exist; always runs
|
||
|
|
|
||
|
|
### After
|
||
|
|
```yaml
|
||
|
|
- name: "[TEMPLATE] Check if VM is already a template"
|
||
|
|
shell: "qm config {{ vm_id }} | grep -q 'template: 1'"
|
||
|
|
register: is_template
|
||
|
|
failed_when: false
|
||
|
|
|
||
|
|
- name: "[TEMPLATE] Convert VM to template"
|
||
|
|
command: "qm template {{ vm_id }}"
|
||
|
|
when: is_template.rc != 0
|
||
|
|
```
|
||
|
|
✅ Checks actual template status; skips if already templated
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 7. BETTER CLOUD-INIT HANDLING
|
||
|
|
|
||
|
|
### Before
|
||
|
|
- Snippets not validated
|
||
|
|
- SSH key lookup could fail silently
|
||
|
|
|
||
|
|
### After
|
||
|
|
```yaml
|
||
|
|
- name: "[CONFIG] Verify SSH key is readable"
|
||
|
|
stat:
|
||
|
|
path: "{{ ssh_key_path | expanduser }}"
|
||
|
|
register: ssh_key_stat
|
||
|
|
failed_when: not ssh_key_stat.stat.readable
|
||
|
|
|
||
|
|
- name: "[CONFIG] Copy SSH public key to snippets"
|
||
|
|
copy:
|
||
|
|
src: "{{ ssh_key_path | expanduser }}"
|
||
|
|
dest: "/var/lib/vz/snippets/{{ vm_id }}-sshkey.pub"
|
||
|
|
```
|
||
|
|
✓ Validates before use
|
||
|
|
✓ Proper error messages if missing
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 8. HELPER FUNCTIONS
|
||
|
|
|
||
|
|
### New `helpers.yml`
|
||
|
|
|
||
|
|
Reusable utility tasks:
|
||
|
|
|
||
|
|
| Helper | Function |
|
||
|
|
|--------|----------|
|
||
|
|
| `check_vm_exists` | Check if VM exists |
|
||
|
|
| `check_template` | Check if VM is template |
|
||
|
|
| `check_vm_status` | Get VM running status |
|
||
|
|
| `check_storage` | Check storage space |
|
||
|
|
| `validate_vm_id` | Validate VM ID format |
|
||
|
|
| `get_vm_info` | Read VM configuration |
|
||
|
|
| `list_vms` | List all VMs |
|
||
|
|
| `cleanup_snippets` | Remove old Cloud-Init snippets |
|
||
|
|
|
||
|
|
### Usage Example
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
- name: "Verify VM exists"
|
||
|
|
include_tasks: helpers.yml
|
||
|
|
vars:
|
||
|
|
helper_task: check_vm_exists
|
||
|
|
target_vm_id: "{{ vm_id }}"
|
||
|
|
|
||
|
|
- name: "Print result"
|
||
|
|
debug:
|
||
|
|
msg: "VM exists: {{ vm_exists }}"
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 9. IMPROVED CLONE CREATION
|
||
|
|
|
||
|
|
### Before
|
||
|
|
- No validation of clone IDs
|
||
|
|
- No error handling per clone
|
||
|
|
- All-or-nothing approach
|
||
|
|
|
||
|
|
### After
|
||
|
|
```yaml
|
||
|
|
loop: "{{ clones }}"
|
||
|
|
loop_control:
|
||
|
|
loop_var: clone
|
||
|
|
|
||
|
|
block:
|
||
|
|
- name: "[CLONES] Check if clone already exists"
|
||
|
|
stat:
|
||
|
|
path: "/etc/pve/qemu-server/{{ clone.id }}.conf"
|
||
|
|
register: clone_conf
|
||
|
|
|
||
|
|
- name: "[CLONES] Clone VM"
|
||
|
|
command: qm clone {{ vm_id }} {{ clone.id }}
|
||
|
|
when: not clone_conf.stat.exists
|
||
|
|
|
||
|
|
rescue:
|
||
|
|
- name: "[CLONES] Handle error for this clone"
|
||
|
|
debug:
|
||
|
|
msg: "WARNING: Clone {{ clone.id }} failed, continuing with next..."
|
||
|
|
```
|
||
|
|
|
||
|
|
✓ Each clone is independent
|
||
|
|
✓ One failed clone doesn't stop others
|
||
|
|
✓ Clear logging of what succeeded/failed
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 10. RICH LOGGING AND PROGRESS
|
||
|
|
|
||
|
|
### Task Naming Convention
|
||
|
|
|
||
|
|
```
|
||
|
|
[STAGE] Action: description
|
||
|
|
├─ [PREFLIGHT] Check if running on Proxmox
|
||
|
|
├─ [IMAGE] Download Debian GenericCloud
|
||
|
|
├─ [VM] Create base VM
|
||
|
|
├─ [CONFIG] Configure disk
|
||
|
|
├─ [TEMPLATE] Convert to template
|
||
|
|
└─ [CLONES] Create clone 301
|
||
|
|
```
|
||
|
|
|
||
|
|
### Progress Display
|
||
|
|
|
||
|
|
**Start**
|
||
|
|
```
|
||
|
|
╔════════════════════════════════════════════════════════════╗
|
||
|
|
║ Proxmox VM Template & Clone Manager ║
|
||
|
|
║ Template VM: debian-template-base (ID: 150) ║
|
||
|
|
║ Storage: local-lvm ║
|
||
|
|
║ CPU: 4 cores | RAM: 4096MB ║
|
||
|
|
╚════════════════════════════════════════════════════════════╝
|
||
|
|
```
|
||
|
|
|
||
|
|
**End**
|
||
|
|
```
|
||
|
|
╔════════════════════════════════════════════════════════════╗
|
||
|
|
║ ✓ Playbook execution completed ║
|
||
|
|
║ Template VM: debian-template-base (ID: 150) ║
|
||
|
|
║ ✓ Converted to template ║
|
||
|
|
║ ✓ 2 clone(s) created ║
|
||
|
|
║ Next steps: ║
|
||
|
|
║ - Verify VMs: qm list ║
|
||
|
|
║ - Connect: ssh debian@<vm-ip> ║
|
||
|
|
║ - Check Cloud-Init: cloud-init status ║
|
||
|
|
╚════════════════════════════════════════════════════════════╝
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Usage Examples
|
||
|
|
|
||
|
|
### 1. Full Deployment
|
||
|
|
|
||
|
|
```bash
|
||
|
|
ansible-playbook tasks/main.yml -i inventory
|
||
|
|
```
|
||
|
|
|
||
|
|
Runs all stages: preflight → image → VM → configure → template → clones
|
||
|
|
|
||
|
|
### 2. Re-run Safely (Idempotent)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
ansible-playbook tasks/main.yml -i inventory
|
||
|
|
```
|
||
|
|
|
||
|
|
Second run skips already-completed operations.
|
||
|
|
|
||
|
|
### 3. Template Only
|
||
|
|
|
||
|
|
If you want to update template without re-downloading image:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
ansible-playbook tasks/main.yml \
|
||
|
|
-i inventory \
|
||
|
|
--skip-tags image,vm,clones
|
||
|
|
```
|
||
|
|
|
||
|
|
### 4. Clone Only
|
||
|
|
|
||
|
|
After template is created, add new clones:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
# Update defaults/main.yml
|
||
|
|
clones:
|
||
|
|
- id: 303
|
||
|
|
hostname: app03
|
||
|
|
ip: "192.168.1.83/24"
|
||
|
|
gateway: "192.168.1.1"
|
||
|
|
```
|
||
|
|
|
||
|
|
Then run:
|
||
|
|
```bash
|
||
|
|
ansible-playbook tasks/main.yml \
|
||
|
|
-i inventory \
|
||
|
|
--tags clones
|
||
|
|
```
|
||
|
|
|
||
|
|
### 5. Debug Output
|
||
|
|
|
||
|
|
```bash
|
||
|
|
ansible-playbook tasks/main.yml \
|
||
|
|
-i inventory \
|
||
|
|
-vvv
|
||
|
|
```
|
||
|
|
|
||
|
|
Shows all task details, command output, variable values.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Migration from Old Version
|
||
|
|
|
||
|
|
### Step 1: Backup
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cp -r ansible_proxmox_VM ansible_proxmox_VM.backup
|
||
|
|
```
|
||
|
|
|
||
|
|
### Step 2: Replace Files
|
||
|
|
|
||
|
|
Use the new versions:
|
||
|
|
- `tasks/main.yml` → orchestrator
|
||
|
|
- All `tasks/*.yml` files → new implementations
|
||
|
|
- `defaults/main.yml` → improved defaults
|
||
|
|
|
||
|
|
### Step 3: Test with Dry-Run
|
||
|
|
|
||
|
|
```bash
|
||
|
|
ansible-playbook tasks/main.yml \
|
||
|
|
-i inventory \
|
||
|
|
--check
|
||
|
|
```
|
||
|
|
|
||
|
|
Shows what would happen without making changes.
|
||
|
|
|
||
|
|
### Step 4: Run Normally
|
||
|
|
|
||
|
|
```bash
|
||
|
|
ansible-playbook tasks/main.yml -i inventory
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Best Practices Going Forward
|
||
|
|
|
||
|
|
1. **Always use tags** for partial execution
|
||
|
|
2. **Run preflight checks** before major changes
|
||
|
|
3. **Test with `--check`** before production
|
||
|
|
4. **Use `--skip-tags`** to avoid re-downloading images
|
||
|
|
5. **Monitor Cloud-Init** inside VMs: `cloud-init status`
|
||
|
|
6. **Keep backups** of `.orig` files (already present)
|
||
|
|
7. **Review error messages** carefully for context
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Security Improvements
|
||
|
|
|
||
|
|
### Password Management
|
||
|
|
```yaml
|
||
|
|
# OLD
|
||
|
|
ci_password: "SecurePass123"
|
||
|
|
|
||
|
|
# NEW - Use Vault
|
||
|
|
ci_password: "{{ vault_debian_password }}"
|
||
|
|
```
|
||
|
|
|
||
|
|
Create vault file:
|
||
|
|
```bash
|
||
|
|
ansible-vault create group_vars/proxmox/vault.yml
|
||
|
|
```
|
||
|
|
|
||
|
|
Add:
|
||
|
|
```yaml
|
||
|
|
vault_debian_password: "YourSecurePassword"
|
||
|
|
```
|
||
|
|
|
||
|
|
### SSH Key Validation
|
||
|
|
Before: SSH key could be missing → confusing error
|
||
|
|
After: Validates key exists and is readable
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Problem: Playbook fails at preflight
|
||
|
|
**Solution**: Run preflight checks manually to see what's missing
|
||
|
|
```bash
|
||
|
|
ansible-playbook tasks/main.yml -i inventory --tags preflight -vvv
|
||
|
|
```
|
||
|
|
|
||
|
|
### Problem: VM already exists, need to recreate
|
||
|
|
**Solution**: Delete the old VM first
|
||
|
|
```bash
|
||
|
|
qm destroy {{ vm_id }}
|
||
|
|
```
|
||
|
|
|
||
|
|
Then re-run playbook (idempotent).
|
||
|
|
|
||
|
|
### Problem: Clone creation fails
|
||
|
|
**Solution**: Check clone configuration and IDs
|
||
|
|
```bash
|
||
|
|
qm list # See all VMs
|
||
|
|
```
|
||
|
|
|
||
|
|
Ensure clone IDs don't conflict with existing VMs.
|
||
|
|
|
||
|
|
### Problem: Cloud-Init not applying
|
||
|
|
**Solution**: Check snippets directory exists
|
||
|
|
```bash
|
||
|
|
ls -la /var/lib/vz/snippets/
|
||
|
|
```
|
||
|
|
|
||
|
|
Verify permissions are correct (644 for YAML files).
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
Consider these additional improvements:
|
||
|
|
|
||
|
|
1. **Molecule Testing** - Add automated tests
|
||
|
|
2. **Vault Integration** - Secure password management
|
||
|
|
3. **Role Packaging** - Create Ansible Galaxy package
|
||
|
|
4. **Custom Filters** - For more complex logic
|
||
|
|
5. **Notification** - Send completion alerts (Slack, email)
|
||
|
|
6. **Metrics** - Track VM creation time, resource usage
|
||
|
|
7. **Cleanup Role** - Destroy VMs and templates
|
||
|
|
8. **Backup/Restore** - Template and clone backup
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Questions?
|
||
|
|
|
||
|
|
Refer to task inline comments for specifics. Each task file has extensive documentation.
|