leaderboardupdated may 24, 2026 · 4 min read
Cheapest GPU Instances 2026 (AWS, GCP, Azure)
TL;DR
For inference, AWS g5.xlarge (A10G) is the cheapest per inference token. For training, GCP a2-highgpu-1g (A100) leads on $/training-token. Azure's T4-based NC4as_T4_v3 wins for low-cost batch inference.
Equivalent SKUs · monthly cost
us-east-1 / us-central1 · linux · on-demand
| Workload | AWS | $/mo | GCP | $/mo | Winner |
|---|---|---|---|---|---|
| Inference (small batch) | AWSg5.xlarge | $734 | GCPa2-highgpu-1g | $2,145 | AWS −66% |
| Batch inference | AWSg5.xlarge | $734 | GCPNC4as_T4_v3 | $384 | GCP −48% |
Best for X workload
Cheapest A100 hourGCP wins
GCP a2-highgpu-1g with A100 40GB is consistently 15–20% cheaper than equivalents elsewhere.
a2-highgpu-1gOpen in tool
Cheapest T4 hourAzure wins
Azure NC4as_T4_v3 is the lowest sticker on a T4 hourly basis.
NC4as_T4_v3Open in tool
Best $/inferenceAWS wins
g5.xlarge's A10G hits the sweet spot for sub-batch inference workloads.
g5.xlargeOpen in tool
Frequently asked
Should I use Spot/Preemptible for GPU workloads?
For training: yes, with checkpointing. Savings are typically 70%. For real-time inference: no — eviction kills latency SLOs.
What about Lambda Labs, RunPod, and CoreWeave?
Specialized GPU clouds often beat hyperscaler pricing 30–50% for raw GPU hours. The trade-off is fewer integrations (no managed databases, no IAM, less ops tooling).