IAM, the hard parts.
The basic IAM model — users, policies, roles — gets a paragraph in every AWS tutorial. The model that actually runs production is bigger: STS-issued temporary credentials instead of static keys, AssumeRole as the bridge between workloads, federation for human identity, IRSA for Kubernetes, SCPs and permission boundaries as guardrails, and a policy evaluation order that explicit-denies, then explicit-allows, then implicit-denies. This page is that bigger model.
1 · What IAM actually evaluates
Every AWS API call walks the same evaluation pipeline before it's allowed to do anything. There are six policy layers, evaluated in a specific order, and the first hard rule of IAM is: an explicit DENY at any layer wins, full stop. The second rule: a request needs at least one explicit ALLOW and no DENY at any applicable layer. "No policy mentions me" defaults to deny.
Cross-account access has one extra wrinkle: both the resource policy in the target account and the identity policy in the caller's account must allow the call. Same-account calls only need one of them. This is why you can aws s3 ls s3://my-bucket with no bucket policy when the bucket is in your account, but a partner account can't see it until you add a "Principal": "arn:aws:iam::PARTNER:role/X" grant to the bucket policy and the partner role has s3:ListBucket in its identity policy.
| If you see… | Look here |
|---|---|
| "AccessDenied" with no context | (1) SCP at the org/OU level (2) bucket / resource policy explicit deny (3) permission boundary missing the action |
| Works from my user, fails from the role | Trust policy on the role + identity policy on the user. The role's own identity policy is irrelevant to the AssumeRole call. |
| Works in console, fails from CLI | The console signs requests as your IAM/SSO principal; the CLI may be using a different profile. aws sts get-caller-identity first. |
| Works for actions, fails for resource-level | Resource ARN doesn't match. arn:aws:s3:::bucket vs arn:aws:s3:::bucket/* are different resources. |
| Worked yesterday, denied today | (1) A new SCP was applied (2) permission boundary was attached (3) condition key context changed (e.g., SourceVpc / SourceIp / time-of-day) |
| Allowed in one region, denied in another | Condition key aws:RequestedRegion somewhere in the chain, or the role's STS regional endpoint behaviour (see §11). |
2 · The four kinds of principal
A principal is anything that makes an authenticated API call to AWS. There are four practical kinds, and modern AWS estates use mostly the last two:
| Principal | How it authenticates | When to use |
|---|---|---|
| IAM user | Long-lived access keys | Avoid. CI/CD tokens are the last remaining excuse, and most CI now supports OIDC instead. |
| Federated identity | SAML / OIDC token from an external IdP | Humans. Workforce SSO via Identity Center → Okta / Entra ID / Google. |
| Service-linked role | AWS service is the trust principal | EC2 instance profile, Lambda execution role, ECS task role. |
| OIDC-federated workload | JWT from external OIDC provider | GitHub Actions, GitLab, Buildkite, EKS pods (IRSA), Pod Identity. |
3 · STS and AssumeRole — the bridge
AWS Security Token Service (STS) is the service that issues temporary credentials — an access key, secret key, and session token that expire (typically 1 hour, configurable up to 12). Every modern IAM pattern flows through STS:
sts:AssumeRole— caller has IAM credentials, wants to act as a role in this account or another.sts:AssumeRoleWithSAML— caller has a SAML assertion from a corporate IdP.sts:AssumeRoleWithWebIdentity— caller has a JWT from an OIDC provider (the GitHub Actions case, also IRSA).sts:GetSessionToken— re-issue temporary credentials for an MFA-protected user.
A role has two policies attached. The trust policy answers "who can assume me?" — it lists the AWS principals or federated identities allowed to call AssumeRole on this role. The identity policy answers "what can the assumer do?" — the actual permissions granted once the role is assumed.
4 · How AssumeRole actually works
A successful AssumeRole is a four-actor dance: the caller, STS in the calling account, the target role's trust policy, and (eventually) the AWS service receiving the API call from the assumed session. Walking through it once removes a lot of mystery from "why does my credential expire mid-request":
Three operational consequences fall out of this shape. First, the credentials returned are not the caller's — they're a fresh short-lived (15-minute to 12-hour) triple. Anyone holding them can act as the role until they expire; rotate any logs / dumps that accidentally include them. Second, the session principal is recorded in CloudTrail as arn:aws:sts::ACCT:assumed-role/ROLE/SESSION-NAME — naming sessions matters for auditing. Third, the SDK caches the session credentials and refreshes them roughly when 5 minutes remain, so the typical pattern (one role assumed in a Lambda, used for the function's whole lifetime) does not need explicit refresh logic in your code.
Role chaining. If session credentials AssumeRole into yet another role, the new session is capped at one hour regardless of the target role's MaxSessionDuration — a hard AWS limit to prevent indefinite chaining. Long-running cross-account agents (CI runners, data-platform syncs) that need > 1 hour must AssumeRole directly from a non-session principal (IAM user, IRSA-derived session that isn't itself chained), not from another assumed-role session.
5 · Workload identity — IRSA, Pod Identity, GitHub OIDC
Three modern patterns that all do the same thing: let a workload assume an IAM role without being given a long-lived secret.
| Pattern | Where it runs | How it works |
|---|---|---|
| EC2 instance profile | EC2, ECS-on-EC2 | EC2 metadata service (IMDSv2) at 169.254.169.254 returns role credentials. The SDK auto-discovers. |
| Lambda execution role | Lambda functions | Lambda runtime injects creds as env vars (AWS_ACCESS_KEY_ID, etc.). Auto-discovered. |
| ECS task role | Fargate / ECS tasks | Each task gets a metadata endpoint that returns role creds. |
| IRSA (EKS) | EKS pods (mature pattern) | OIDC provider in front of EKS; pod's service-account token gets exchanged via AssumeRoleWithWebIdentity. |
| EKS Pod Identity | EKS pods (2024+) | Like IRSA, but configured via EKS API instead of trust-policy JSON. Recommended for new clusters. |
| GitHub OIDC | GitHub Actions | GH issues a JWT per workflow; AWS role trusts token.actions.githubusercontent.com; the action calls AssumeRoleWithWebIdentity. |
In each case the result is the same: short-lived STS credentials, no static AWS key checked into the workload's config, and a fine-grained trust policy that lets you say "only the my-service service account in the prod namespace can assume the my-service-role in account 1234." That last sentence is what production-grade IAM actually looks like.
6 · IRSA — the OIDC token flow, end to end
IRSA (IAM Roles for Service Accounts) is the original Kubernetes workload-identity pattern. It's worth tracing once because the same shape recurs in GitHub Actions, GitLab, Buildkite, and now EKS Pod Identity:
Four things make this pattern bulletproof for production. First, the pod never holds a long-lived AWS secret — only a projected ServiceAccount token that's rotated by kubelet on a schedule (default 1 hour). Second, the trust policy on the AWS role specifies both the OIDC issuer URL and a StringEquals condition on system:serviceaccount:<namespace>:<sa-name> — pods can only assume the role if their kubelet namespace and SA name match. Third, the OIDC issuer is per-cluster, so a stolen JWT from cluster A can't assume roles trusted to cluster B's issuer. Fourth, the AWS SDK auto-discovers AWS_WEB_IDENTITY_TOKEN_FILE and AWS_ROLE_ARN environment variables injected by the EKS pod-identity webhook; application code doesn't change.
EKS Pod Identity (2024+) is the same outcome with less plumbing — no OIDC provider to create, no trust-policy JSON to author. EKS holds the mapping (cluster + namespace + SA → role) as an API resource. Pod Identity is the recommended pattern for new clusters; IRSA remains supported and is fine to keep on existing ones.
| Identity mechanism | Where it runs | Best for | Trap |
|---|---|---|---|
| IAM user + access key | Anywhere — but ideally nowhere | One-off CI before OIDC was supported; legacy scripts | Static secret that leaks. Rotate aggressively, or migrate to OIDC. |
| IAM role + AssumeRole | Cross-account from any AWS principal | Centralised platform / observability roles that read from many workload accounts | Role chaining = 1-hour cap on the new session |
| EC2 instance profile | EC2 / ECS-on-EC2 hosts | Anything that runs on a known instance | IMDSv2 must be required (HttpTokens=required) — see Capital One §11 |
| Lambda execution role | Lambda functions | Default for serverless workloads | One role per function; concurrency means many sessions at once |
| IRSA | EKS pods (any version) | Mature clusters; per-namespace fine-grained mapping | Trust policy typos in aud / sub claims fail silently |
| EKS Pod Identity | EKS pods (1.27+) | New clusters; less YAML, no OIDC provider to manage | Requires the Pod Identity agent DaemonSet |
| Identity Center permission set | Humans accessing AWS | Workforce SSO with Okta / Entra / Google | Session times out (default 8h); CLI uses aws sso login |
| GitHub Actions OIDC | CI/CD from GitHub | All deployment automation | Trust policy must pin repo:<org>/<repo>:ref:<branch> or any repo can assume |
7 · Guardrails — SCPs, permission boundaries, session policies
Beyond identity and resource policies, three other policy types exist to cap what a principal can do, regardless of what its identity policies grant. They never grant permissions, only restrict them.
- SCPs (Service Control Policies) live on an AWS Organizations OU or account. Common SCP: "deny
iam:CreateUserfor everyone in this OU" so prod can't have any human-shaped IAM users at all. - Permission boundaries attach to an individual IAM role or user. "This role can have any policy attached but its effective permissions are capped at this set" — a delegation pattern: dev teams self-serve role creation within a guardrail.
- Session policies are passed at
AssumeRoletime as an extra restriction on the assumed session. CI templates use this: a base "build" role attached, then per-build a session policy that scopes it to one S3 prefix.
8 · Condition keys — IAM beyond "service-action-resource"
Every IAM policy statement can include a Condition block that adds context-aware checks. The most useful ones for production:
| Condition key | What it constrains | Example |
|---|---|---|
aws:SourceIp | Caller's source IP | "Only allow from corporate egress IP range." |
aws:SourceVpc / aws:SourceVpce | VPC / VPC endpoint of the caller | "S3 bucket only accessible via this VPC endpoint." Stops data exfil. |
aws:PrincipalOrgID | Caller's AWS org | "Only allow if the caller is in my organization." |
aws:MultiFactorAuthPresent | MFA was used in the session | "Require MFA to delete IAM users." |
aws:RequestedRegion | The region the call is destined for | "Deny all writes outside eu-west-2." Data residency. |
aws:ResourceTag/<key> | A tag on the target resource | "Only let this role write to S3 buckets tagged env=prod." |
aws:PrincipalTag/<key> | A tag on the caller | ABAC: "user with tag team=payments can access resources with the same tag." |
kms:ViaService | Which AWS service is calling KMS on the caller's behalf | "Only S3 can use this KMS key" — prevents direct decrypt by users. |
PrincipalTag + ResourceTag lets you write one policy that scales: "any role tagged with team X can access any resource tagged team X." RBAC requires a new role per team. ABAC is harder to set up but scales better past a couple of dozen teams. AWS Identity Center supports passing attributes from the IdP into the session as principal tags.9 · Workforce identity — Identity Center
AWS IAM Identity Center (formerly AWS SSO) is the modern way humans access AWS:
- Identity Center is enabled at the AWS Organizations management account.
- An identity source is connected — Okta, Entra ID (Azure AD), Google Workspace, or Identity Center's own directory.
- Permission sets are defined — each permission set is "a name + a set of managed/inline IAM policies."
- Users / groups are assigned permission sets in particular AWS accounts.
- Users log in to the Identity Center portal, click the account / role they need, and get an STS session in the browser or via
aws sso login.
Behind the scenes Identity Center creates an IAM role per permission set per account, with the permission set's policies attached and a trust policy pointing back at Identity Center's identity provider. The user-facing experience is "pick an account and a role." The underlying machinery is the same AssumeRole-via-OIDC story.
10 · Build it yourself — cross-account AssumeRole lab
The fastest way to internalise AssumeRole is to do it. This lab uses one account but makes the pattern explicit; replace the account ID with a second sandbox account if you have one.
- Note your account ID.
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) echo "Account: $ACCOUNT_ID" - Create a target role with a trust policy that lets the current caller assume it.
CURRENT_ARN=$(aws sts get-caller-identity --query Arn --output text) cat > /tmp/trust.json <<EOF { "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": { "AWS": "$CURRENT_ARN" }, "Action": "sts:AssumeRole", "Condition": { "Bool": { "aws:MultiFactorAuthPresent": "false" } } }] } EOF aws iam create-role --role-name LabReadOnly --assume-role-policy-document file:///tmp/trust.json - Attach a read-only managed policy.
aws iam attach-role-policy --role-name LabReadOnly \ --policy-arn arn:aws:iam::aws:policy/ReadOnlyAccess - Assume the role and inspect the returned credentials.
aws sts assume-role \ --role-arn arn:aws:iam::${ACCOUNT_ID}:role/LabReadOnly \ --role-session-name lab-1 \ --duration-seconds 900 # Returns an AccessKeyId / SecretAccessKey / SessionToken triple, valid 15 minutes. - Use the temporary creds.
export AWS_ACCESS_KEY_ID=<from above> export AWS_SECRET_ACCESS_KEY=<from above> export AWS_SESSION_TOKEN=<from above> aws sts get-caller-identity # Should show the assumed role's session ARN, not your user. aws s3 ls # Allowed (ReadOnly). aws s3 mb s3://test-bucket-$RANDOM # Denied — ReadOnly doesn't include CreateBucket. - Reset and tear down.
unset AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN aws iam detach-role-policy --role-name LabReadOnly \ --policy-arn arn:aws:iam::aws:policy/ReadOnlyAccess aws iam delete-role --role-name LabReadOnly
Variation: replace the trust policy Principal with "Federated": "arn:aws:iam::ACCT:oidc-provider/token.actions.githubusercontent.com" and a condition matching token.actions.githubusercontent.com:sub to the repo you want — that's the GitHub Actions OIDC pattern, end-to-end.
11 · Real-world case studies
Three publicly-documented stories make the IAM model concrete — one cautionary, two aspirational.
Capital One (2019) — over-permissive IAM + SSRF. A misconfigured ModSecurity WAF on an EC2 instance was exploited via server-side request forgery to query the EC2 metadata endpoint (IMDSv1, before the metadata service required session tokens), which returned the instance's IAM role credentials. The role had broad S3 permissions, and the attacker used those creds to enumerate and exfiltrate ~106 million customer records. The chain of failures is documented in Krebs's writeup and in DOJ court filings. The lessons that landed across the industry: require IMDSv2 (which needs a session token before returning creds, breaking SSRF), scope EC2 roles to least privilege, never give a web-tier role wholesale S3 list/read across all buckets, and use VPC endpoint policies (aws:SourceVpce) so the buckets can only be read from inside the expected VPC. AWS made IMDSv2 the default for new instances after this incident.
Mozilla — least-privilege as code. Mozilla publishes its IAM least-privilege guidelines and the tooling that enforces them. The pattern: every IAM role in their AWS estate is generated from a code repository, with a permission boundary attached automatically by the platform; policy changes are reviewed by humans and re-validated by Access Analyzer on every PR. The argument that travels well: in any sufficiently large org, no IAM role survives manual review unless creating the role itself goes through code review. The Mozilla docs walk through SAR-style (Service / Action / Resource) policies derived from CloudTrail-observed usage, which is the same idea behind AWS's own IAM Access Advisor and the "Generate policy" feature in the console.
Netflix — ConsoleMe and shared-account workflows. Netflix open-sourced ConsoleMe, a self-service portal that lets engineers request a temporary AssumeRole into any of the hundreds of AWS accounts Netflix operates. The post describes a one-click "raise to this role for 1 hour, here's why" workflow that issues a short-lived session, records the justification in an audit log, and revokes the session on time-out. The model that survived: humans don't get long-lived credentials, but they get a frictionless on-demand path to the access they need, with full provenance. Multiple companies have since built variants — Netflix's predecessor tooling "Aardvark and Repokid" automatically reduces unused IAM permissions in production, walking each role's CloudTrail history to delete actions nobody has called in 90 days.
The through-line: in 2026 production AWS, the IAM that works is "no long-lived secrets, federation everywhere, automated guardrails, and reviewable provenance for every elevated session."
12 · What breaks
- "User can't assume role" — almost always one of: (a) trust policy doesn't list them, (b) their identity policy doesn't have
sts:AssumeRoleon the target, (c) an SCP blocks it, (d) MFA condition requires MFA they don't have, (e) the role has an external-ID condition they're not passing. - "Access denied" with no clue. Use the IAM Policy Simulator or CloudTrail — the denied call shows up there with the matched-deny statement. AWS deliberately doesn't tell the caller why (information-leak risk), so the operator has to look on the AWS side.
- Silent permission-boundary cut. When platform teams attach a boundary to a role, any action the role's identity policy grants but the boundary doesn't silently stops working. The role still exists, the identity policy still lists the action, the call still returns AccessDenied with no indication that a boundary is in play. Always check
aws iam get-role --role-name XforPermissionsBoundary. - Trust-policy typos in
aud/sub. IRSA's OIDC trust policy keys are<oidc-issuer>:audand<oidc-issuer>:sub. A common copy-paste error putssts.amazonaws.com:audinstead ofoidc.eks.<region>.amazonaws.com/id/XXXX:aud; the policy looks correct but matches nothing. GitHub Actions has the same trap withtoken.actions.githubusercontent.com:sub— the value must be exactlyrepo:<org>/<repo>:ref:refs/heads/mainor a wildcard you trust. - STS regional endpoints. Calls to global
sts.amazonaws.comalways route tous-east-1— fast for North America, painful from Asia, and unavailable during aus-east-1control-plane event. Use regional STS endpoints (sts.<region>.amazonaws.com) in the SDK config (AWS_STS_REGIONAL_ENDPOINTS=regional) so AssumeRole stays in-region. AWS's own SDKs default to regional in newer versions; older SDKs and many CI tools still default to global. - The 1-hour role-chaining limit. If your code path AssumeRoles from a session that was itself produced by AssumeRole, the new session's TTL is capped at 1 hour regardless of
--duration-secondsor the target role's MaxSessionDuration. Long-running agents either re-authenticate every hour or AssumeRole directly from a non-chained principal. - Static keys leaked. If you see
AKIA...in a commit, rotate immediately, scan the public web for the key (it's almost certainly already scraped), then audit CloudTrail for that key's API activity. AWS scans GitHub and emails you when it spots a key, but the bots get there first. - "My Lambda can't access S3 even though I gave it permission." Confirm (1) the Lambda's execution role has the S3 permission, (2) the S3 bucket policy doesn't deny the role, (3) the bucket isn't behind a KMS key the role can't decrypt with, (4) you're calling the right region.
- IAM eventual consistency. Policy changes propagate over seconds, occasionally tens of seconds. CI scripts that create a role and immediately call it sometimes fail; retry with backoff.
- Identity Center session expiry mid-CLI-command. Default Identity Center session is 8 hours; long-running CLI scripts mid-run will start seeing
ExpiredToken. Re-runaws sso loginand resume — or wrap the script in a retry that detects the error and re-authenticates.
13 · Further reading
- IAM policy evaluation logic. The canonical flowchart. Read this once and refer back; it's where every "why was this denied?" question ends.
- Permissions boundaries. The mechanism for delegated role creation. Worth knowing for platform-team interviews.
- IRSA. The original EKS workload-identity pattern.
- EKS Pod Identity. The 2024 replacement; less plumbing, recommended for new clusters.
- Capital One breach analysis (Krebs). The IMDSv1 + over-permissive role chain that drove the IMDSv2 default.
- Mozilla IAM Least Privilege Guidelines. A public, mature take on least-privilege-as-code.
- Netflix ConsoleMe. The self-service AssumeRole pattern at scale.
- Netflix Aardvark + Repokid. Automated permission-pruning tooling that walks CloudTrail history.
- Identity & IAM concepts. The conceptual companion — the principal/policy/role model, JSON anatomy, and SCPs.
- Authentication primitives. The broader auth landscape — JWTs, OIDC, OAuth — that this page assumes you know.