Cloud Codex / 04
Identity & IAM.
The one cloud topic where a small mistake has a planet-sized blast radius. A leaked access key, a too-permissive role, an S3 bucket policy with the wrong wildcard — these are the kinds of misconfigurations that show up on the front page of Hacker News. The good news: the mental model is short. There are principals, there are resources, and policies decide what one is allowed to do to the other.
1 · The model
- Principal. Who is doing something. A human user, a service, a federated identity from your IdP.
- Resource. What is being acted on. An S3 bucket, an EC2 instance, a DynamoDB table.
- Action. The verb.
s3:GetObject,ec2:StartInstances,dynamodb:Query. - Policy. A document — usually JSON — that says "this principal may (or may not) perform this action on this resource, optionally under this condition."
- Role. A principal that nobody owns directly. Services assume it; humans assume it. The role's policies decide what the assumer can do for the duration of the session.
Everything else — AssumeRole, STS, federation, instance profiles, service-linked roles, permissions boundaries — is plumbing on top of those five concepts.
2 · The AWS canonical version
| Thing | AWS | What it does |
|---|---|---|
| Human identity | IAM Identity Center (formerly SSO) | One front door for humans across all your accounts. Backed by your IdP (Okta, Google Workspace, Entra). |
| Programmatic identity (legacy) | IAM User + access key | Avoid for new use. Static credentials sitting in someone's ~/.aws/credentials are the #1 leaked-credential class. |
| Programmatic identity (modern) | IAM Role + AssumeRole | Temporary credentials minted on demand. Workload assumes a role via the EC2 instance profile / Lambda execution role / IRSA (for EKS pods). |
| Cross-account | AssumeRole across account boundaries | The pattern for "service in account A reads from account B." Trust policy on the target role names account A as the trusted principal. |
| Temporary credentials | STS | The service that mints role credentials. AssumeRole returns a 1-hour-ish session by default. |
| Policy types | Identity vs Resource vs SCP vs Permissions Boundary | Four kinds of policy. They all have to allow the action for it to be permitted. The denial wins on a deny. |
| Org-wide guardrails | SCPs (Service Control Policies) | Set at the AWS Organizations level. "Nobody in any account can disable CloudTrail" — that kind of rule. |
| Secrets in code | Secrets Manager / Parameter Store | So credentials never live in your AMI or environment variables in plaintext. |
3 · GCP and Azure equivalents
| Concept | AWS | GCP | Azure |
|---|---|---|---|
| Identity model | IAM principals + policies | IAM principals + role bindings (additive) | Entra ID + RBAC (additive) |
| Human SSO | IAM Identity Center | Cloud Identity / Workspace | Entra ID |
| Workload identity | IAM Role (assumed by service) | Service Account + Workload Identity Federation | Managed Identity (system or user-assigned) |
| Temporary credentials | STS | IAM Credentials API | Token endpoint on Entra ID |
| Cross-account / cross-project | AssumeRole across accounts | Cross-project IAM bindings | Cross-subscription RBAC |
| Org-wide guardrails | SCPs | Organization Policies | Azure Policy |
| Secrets manager | Secrets Manager / Parameter Store | Secret Manager | Key Vault |
GCP's IAM is additive only. No explicit denies — you grant permissions; the absence of a grant is the deny. Simpler in some ways, less expressive in others. AWS and Azure both allow explicit denies (with deny-wins semantics), which is what most large orgs need for "nobody can touch this even by accident."
4 · How to do it well
- Humans assume roles via SSO, always. No standing IAM users with long-lived access keys for humans. Identity Center / Entra / Google Workspace → role assumption → time-limited session credentials.
- Workloads assume roles via instance profiles / IRSA / Managed Identity. No static credentials in code, in config, or in environment variables. The workload picks up credentials from the platform.
- Least privilege at the policy level. Start narrow, widen when something breaks. The opposite — start with admin, narrow later — never gets narrowed.
- Permissions boundaries for delegated admins. If you want a service team to manage its own IAM, attach a boundary that caps the maximum they can grant.
- SCPs / Org Policies for the things that should be impossible. Disabling CloudTrail. Opening up to the public internet. Creating IAM users (since 2024 most orgs ban this outright).
- Audit the trail. CloudTrail, Cloud Audit Logs, Entra audit logs — turned on, shipped to a separate account, retained for compliance windows. The first thing every incident response needs.
5 · What breaks
- The leaked access key. A long-lived AWS access key checked into a public repo. Within minutes, scrapers find it; within hours, an attacker has spun up crypto-mining EC2s. AWS will email you, but the bill is yours. Mitigation: don't use long-lived keys. If you must, scope tightly with permissions boundaries and rotate weekly.
- The "AdministratorAccess for now" role. Created during a debug session, never narrowed, ends up attached to every service. The single biggest source of over-permissioning in real orgs.
- Wildcard in resource ARN.
"Resource": "*"on a sensitive action. Common with copy-pasted policies from Stack Overflow. Use the Access Analyzer to find these. - Public S3 bucket. Still happens. Mitigation: Block Public Access at the account level, on by default since 2023; resist the urge to turn it off "just for this one bucket."
- Cross-account confused deputy. A vendor's role gets to read your S3 bucket because the trust policy didn't include an
ExternalIdcondition. Mitigation: external IDs are required, not optional. - Stale roles. The intern's IAM user from three years ago, still active, still has S3 access. Audit unused identities quarterly; AWS IAM Access Analyzer surfaces the candidates.
6 · Cost note
IAM itself is free. The cost shows up indirectly:
- Misconfiguration → incident → six-figure remediation. The 2017 Verizon S3 misconfiguration. The 2019 Capital One IAM-via-SSRF breach. The 2021 Code Spaces (the original) deletion-of-everything incident. These are the IAM cost.
- Tooling. AWS Access Analyzer is free. Wiz, Lacery, Orca, Prisma Cloud — third-party CSPM tools — start at a few thousand a month and pay for themselves in detecting the boring misconfigurations before someone with worse intent does.
- Compliance audit prep. SOC 2, ISO 27001, PCI — the auditor will ask for IAM evidence. Investing in well-organised roles and SCPs is cheaper than the audit-prep scramble.
Further reading
- AWS IAM User Guide. Long, dry, authoritative. The Policy Evaluation Logic page is required reading.
- Cloud Security Alliance — "Egregious Eleven". The list of cloud security failures, headed by misconfiguration and identity issues year after year.
- "AWS Security Best Practices" (whitepaper). Free, often updated, surprisingly readable.
- Adjacent: OAuth / OIDC. The protocols underneath workload identity federation.
- Adjacent: Networking. Network-level isolation pairs with IAM-level isolation. Both, not either.
Found this useful?