04 / 08
Cloud Codex / 04

Identity & IAM.

The one cloud topic where a small mistake has a planet-sized blast radius. A leaked access key, a too-permissive role, an S3 bucket policy with the wrong wildcard — these are the kinds of misconfigurations that show up on the front page of Hacker News. The good news: the mental model is short. There are principals, there are resources, and policies decide what one is allowed to do to the other.


1 · The model

  • Principal. Who is doing something. A human user, a service, a federated identity from your IdP.
  • Resource. What is being acted on. An S3 bucket, an EC2 instance, a DynamoDB table.
  • Action. The verb. s3:GetObject, ec2:StartInstances, dynamodb:Query.
  • Policy. A document — usually JSON — that says "this principal may (or may not) perform this action on this resource, optionally under this condition."
  • Role. A principal that nobody owns directly. Services assume it; humans assume it. The role's policies decide what the assumer can do for the duration of the session.

Everything else — AssumeRole, STS, federation, instance profiles, service-linked roles, permissions boundaries — is plumbing on top of those five concepts.

2 · The AWS canonical version

ThingAWSWhat it does
Human identityIAM Identity Center (formerly SSO)One front door for humans across all your accounts. Backed by your IdP (Okta, Google Workspace, Entra).
Programmatic identity (legacy)IAM User + access keyAvoid for new use. Static credentials sitting in someone's ~/.aws/credentials are the #1 leaked-credential class.
Programmatic identity (modern)IAM Role + AssumeRoleTemporary credentials minted on demand. Workload assumes a role via the EC2 instance profile / Lambda execution role / IRSA (for EKS pods).
Cross-accountAssumeRole across account boundariesThe pattern for "service in account A reads from account B." Trust policy on the target role names account A as the trusted principal.
Temporary credentialsSTSThe service that mints role credentials. AssumeRole returns a 1-hour-ish session by default.
Policy typesIdentity vs Resource vs SCP vs Permissions BoundaryFour kinds of policy. They all have to allow the action for it to be permitted. The denial wins on a deny.
Org-wide guardrailsSCPs (Service Control Policies)Set at the AWS Organizations level. "Nobody in any account can disable CloudTrail" — that kind of rule.
Secrets in codeSecrets Manager / Parameter StoreSo credentials never live in your AMI or environment variables in plaintext.

3 · GCP and Azure equivalents

ConceptAWSGCPAzure
Identity modelIAM principals + policiesIAM principals + role bindings (additive)Entra ID + RBAC (additive)
Human SSOIAM Identity CenterCloud Identity / WorkspaceEntra ID
Workload identityIAM Role (assumed by service)Service Account + Workload Identity FederationManaged Identity (system or user-assigned)
Temporary credentialsSTSIAM Credentials APIToken endpoint on Entra ID
Cross-account / cross-projectAssumeRole across accountsCross-project IAM bindingsCross-subscription RBAC
Org-wide guardrailsSCPsOrganization PoliciesAzure Policy
Secrets managerSecrets Manager / Parameter StoreSecret ManagerKey Vault
GCP's IAM is additive only. No explicit denies — you grant permissions; the absence of a grant is the deny. Simpler in some ways, less expressive in others. AWS and Azure both allow explicit denies (with deny-wins semantics), which is what most large orgs need for "nobody can touch this even by accident."

4 · How to do it well

  1. Humans assume roles via SSO, always. No standing IAM users with long-lived access keys for humans. Identity Center / Entra / Google Workspace → role assumption → time-limited session credentials.
  2. Workloads assume roles via instance profiles / IRSA / Managed Identity. No static credentials in code, in config, or in environment variables. The workload picks up credentials from the platform.
  3. Least privilege at the policy level. Start narrow, widen when something breaks. The opposite — start with admin, narrow later — never gets narrowed.
  4. Permissions boundaries for delegated admins. If you want a service team to manage its own IAM, attach a boundary that caps the maximum they can grant.
  5. SCPs / Org Policies for the things that should be impossible. Disabling CloudTrail. Opening up to the public internet. Creating IAM users (since 2024 most orgs ban this outright).
  6. Audit the trail. CloudTrail, Cloud Audit Logs, Entra audit logs — turned on, shipped to a separate account, retained for compliance windows. The first thing every incident response needs.

5 · What breaks

  • The leaked access key. A long-lived AWS access key checked into a public repo. Within minutes, scrapers find it; within hours, an attacker has spun up crypto-mining EC2s. AWS will email you, but the bill is yours. Mitigation: don't use long-lived keys. If you must, scope tightly with permissions boundaries and rotate weekly.
  • The "AdministratorAccess for now" role. Created during a debug session, never narrowed, ends up attached to every service. The single biggest source of over-permissioning in real orgs.
  • Wildcard in resource ARN. "Resource": "*" on a sensitive action. Common with copy-pasted policies from Stack Overflow. Use the Access Analyzer to find these.
  • Public S3 bucket. Still happens. Mitigation: Block Public Access at the account level, on by default since 2023; resist the urge to turn it off "just for this one bucket."
  • Cross-account confused deputy. A vendor's role gets to read your S3 bucket because the trust policy didn't include an ExternalId condition. Mitigation: external IDs are required, not optional.
  • Stale roles. The intern's IAM user from three years ago, still active, still has S3 access. Audit unused identities quarterly; AWS IAM Access Analyzer surfaces the candidates.

6 · Cost note

IAM itself is free. The cost shows up indirectly:

  • Misconfiguration → incident → six-figure remediation. The 2017 Verizon S3 misconfiguration. The 2019 Capital One IAM-via-SSRF breach. The 2021 Code Spaces (the original) deletion-of-everything incident. These are the IAM cost.
  • Tooling. AWS Access Analyzer is free. Wiz, Lacery, Orca, Prisma Cloud — third-party CSPM tools — start at a few thousand a month and pay for themselves in detecting the boring misconfigurations before someone with worse intent does.
  • Compliance audit prep. SOC 2, ISO 27001, PCI — the auditor will ask for IAM evidence. Investing in well-organised roles and SCPs is cheaper than the audit-prep scramble.

Further reading

  • AWS IAM User Guide. Long, dry, authoritative. The Policy Evaluation Logic page is required reading.
  • Cloud Security Alliance — "Egregious Eleven". The list of cloud security failures, headed by misconfiguration and identity issues year after year.
  • "AWS Security Best Practices" (whitepaper). Free, often updated, surprisingly readable.
  • Adjacent: OAuth / OIDC. The protocols underneath workload identity federation.
  • Adjacent: Networking. Network-level isolation pairs with IAM-level isolation. Both, not either.
Found this useful?