WinRM Molecule Forkbomb Mechanism¶
The Problem¶
When Ansible runs molecule tests against Windows hosts using WinRM, parallel execution creates a cascade of connection failures that can lock out Active Directory accounts.
Root Cause Chain¶
Ansible forks=N
× tasks per role (~15-50)
× concurrent molecule scenarios
= Total WinRM shell creation attempts
Layer 1: Ansible Forks¶
Ansible's forks setting (default: 5) controls how many hosts are managed simultaneously.
Each fork creates an independent WinRM connection with its own authentication.
Layer 2: pywinrm Connection Model¶
The pywinrm library (Ansible's default WinRM transport) creates a new WinRM shell
for every task. There is no connection pooling or session reuse across tasks.
Task 1: New HTTP connection → NTLM auth → WinRM shell → execute → close shell
Task 2: New HTTP connection → NTLM auth → WinRM shell → execute → close shell
Task 3: New HTTP connection → NTLM auth → WinRM shell → execute → close shell
...
This is documented in pywinrm#277.
Layer 3: WinRM Quota Limits¶
Windows enforces quotas on WinRM resources:
| Quota | Default | Description |
|---|---|---|
| MaxShellsPerUser | 30 | Max concurrent shells per user |
| MaxConcurrentUsers | 10 | Max concurrent user sessions |
| MaxProcessesPerShell | 25 | Max processes per shell |
| MaxMemoryPerShellMB | 1024 | Max memory per shell |
| IdleTimeout | 7200000ms | Shell auto-cleanup after 2 hours |
When forks × concurrent_tasks > MaxShellsPerUser, new shell creation fails.
Layer 4: Authentication Failure Cascade¶
When WinRM rejects a shell creation request due to quota exhaustion:
- Ansible receives "credentials rejected" error (misleading error message)
- Ansible retries authentication (default retry behavior)
- Each retry = a failed NTLM auth attempt against Active Directory
- AD lockout threshold (typically 5 failed attempts in 15 minutes)
- Account locks → ALL hosts using that account are locked out
Layer 5: Molecule Amplification¶
When molecule tests run in parallel (e.g., parallel -j4 molecule test):
4 molecule processes × forks=5 × ~15 tasks = 300 shell creation attempts
MaxShellsPerUser = 30 → 270 failures → 270 failed auth attempts
AD lockout threshold = 5 → LOCKED after first 5 failures
The Compounding Effect¶
flowchart TD
A["parallel -j4<br/>molecule test"] --> B["Scenario 1<br/>forks=5"]
A --> C["Scenario 2<br/>forks=5"]
A --> D["Scenario 3<br/>forks=5"]
B --> E
C --> E
D --> E
E["win-target<br/>MaxShellsPerUser=30"]
E --> F{"45 shells attempted<br/>30 max slots"}
F -->|"30 accepted"| G["Successful tasks"]
F -->|"15 overflow"| H["'credentials rejected'<br/>15 failed NTLM attempts"]
H --> I["AD lockout threshold=5<br/>LOCKED"]
style E fill:#f9f,stroke:#333,stroke-width:2px
style H fill:#f66,stroke:#333,stroke-width:2px
style I fill:#900,stroke:#333,stroke-width:2px,color:#fff
The Fix (Two-Part)¶
Part 1: Raise WinRM Quotas (Admin Toggle)¶
The winrm_quota_config role elevates quotas before parallel testing:
Set-Item WSMan:\localhost\Shell\MaxShellsPerUser -Value 100
Set-Item WSMan:\localhost\Service\MaxConcurrentUsers -Value 25
Set-Item WSMan:\localhost\Shell\MaxProcessesPerShell -Value 50
Set-Item WSMan:\localhost\Shell\IdleTimeout -Value 300000
Part 2: Session Cleanup¶
The winrm_session_cleanup role installs a scheduled task that:
- Enumerates active WinRM shells every 15 minutes
- Terminates shells older than 30 minutes
- Prevents stale shell accumulation that erodes quota headroom
Part 3: Use pypsrp Instead of pywinrm¶
The PSRP connection plugin (ansible_connection=psrp) uses PowerShell Remoting Protocol
which maintains persistent connections via a Runspace Pool. Multiple commands execute over
a single authenticated connection, dramatically reducing auth attempts.
See ansible.windows#597.
Upstream Issues¶
- pywinrm#277 - Multi-threaded requests fail
- molecule#607 - WinRM connection plugin issues
- ansible.windows#597 - Intermittent WinRM failures