Good to mention
- AWS region and account ownership model
- Existing Grafana / Prometheus / Loki or greenfield
- Linux fleet size and config audit requirements
- GPU appetite for on-prem-style inference
- Timeline and internal stakeholders (CTO, platform, security)