Which AWS auth method should I use for self-hosted K3s in 2026?

For multi-tenant production workloads with security as priority: IRSA via CloudFront with custom domain (Setup C). For edge clusters with no public OIDC issuer accessible (AWS access via VPC endpoints or NAT): EC2 Instance Profile (Setup A). For mixed Kubernetes plus non-Kubernetes workloads or when reusing existing corporate PKI: IAM Roles Anywhere with self-signed CA (Setup D). The IRSA via S3 public bucket pattern from 2021 tutorials still works but creates compliance noise from S3 Access Analyzer.

Is IRSA via S3 public bucket safe?

Mechanically yes - the bucket only hosts the OIDC discovery JSON and JWKS public keys, neither of which are secrets. The issue is unnecessary public S3 exposure plus AWS S3 Access Analyzer flags this configuration, which creates compliance noise. CloudFront with private S3 bucket and Origin Access Control (OAC) eliminates both problems while preserving the same OIDC mechanic.

Does IAM Roles Anywhere require AWS Private CA?

No. AWS Roles Anywhere accepts any X.509 certificate via the Trust Anchor, including self-signed CAs generated with openssl. AWS Private CA costs 50 USD per month for short-lived certificate mode and 400 USD per month for general mode. A self-signed openssl CA costs 0 USD and works fine for production with proper key management. The CA cert just needs explicit basicConstraints CA:TRUE plus keyUsage keyCertSign,cRLSign extensions.

What is the cold start latency difference between IRSA and Instance Profile on K3s?

In a benchmark of 10 runs per setup (40 runs total) with K3s 1.35.4 and amazon/aws-cli:latest as the test image: Instance Profile median was 3.18 seconds, IRSA via S3 public bucket median was 2.59 seconds (fastest), IRSA via CloudFront median was 3.97 seconds with p95 hitting 13.68 seconds due to CloudFront edge cache cold misses, and IAM Roles Anywhere median was 6.14 seconds (slowest due to initContainer download plus credential_process subprocess fork). Note that because the test image used the :latest tag, kubelet performed registry digest resolution on every pod create, so cold-start times include some image-pull overhead, not pure auth-method timings.

What breaks when I change service-account-issuer in K3s?

All existing IRSA pods whose IAM trust policy references the old issuer URL will get AccessDenied immediately. Cached credentials do not save you - the projected service account token itself is signed under the new issuer URL and rejected by AWS STS. The only safe migration pattern is blue-green: maintain both OIDC providers during cutover, update all trust policies to reference both issuers, then remove the old issuer once no pod uses it. K3s service certs may also regenerate during issuer change, requiring worker re-join with a fresh node-token.

Why is aws_signing_helper serve mode broken with IAM Roles Anywhere?

In testing with helper version 1.8.2 plus amazon/aws-cli 2.34.43 plus K3s 1.35.4, the IMDSv2 listing endpoint at /latest/meta-data/iam/security-credentials/ returns 400 Bad Request, which causes botocore to throw OverflowError on _evaluate_expiration when parsing the response. The credential-process mode works reliably with the same helper version. AWS documentation still describes serve mode as supported, so it may work with different version combinations - check helper version and SDK combination before assuming serve works.

K3s on AWS in 2026: 4 IAM auth methods benchmarked

TL;DR: I have 3 EC2 nodes running K3s. My pod needs to write to S3. Which AWS auth should I use? I built all four (Instance Profile, IRSA via S3, IRSA via CloudFront, IAM Roles Anywhere) on the same cluster, ran benchmarks, killed dependencies on purpose, and recorded everything that broke. Spoiler: auth method picks sub-second median delta, but blast radius differs by 100x.

The problem (and why your tutorial is wrong)

Most “K3s + AWS” tutorials show you one of two patterns:

EC2 Instance Profile - default fallback, every pod inherits node-level AWS perms
IRSA via S3 public bucket - the classic reece.tech 2021 pattern you’ll find replicated across 50+ blog posts

Both have problems in 2026:

Instance Profile blast radius = whole node. Pod compromise = AWS access for every other pod on the host. CloudTrail session name is the EC2 instance ID - you can’t tell which pod made the call.
S3 public bucket smell = AWS S3 Access Analyzer warns. Anyone with the bucket name can see your OIDC discovery doc + JWKS public keys. Not a secret, but unnecessary exposure.

There are at least two more options nobody benchmarks against each other:

IRSA via CloudFront + custom domain - private S3 bucket, OAC, custom branding
IAM Roles Anywhere with self-signed CA - X.509 cert auth, completely different paradigm

I built all four on the same 3-node K3s cluster, ran cold-start benchmarks, then started breaking things on purpose. Here’s what I learned.

Architecture: 4 setups at a glance

K3s pod to AWS authentication architecture: 4 setups compared - Instance Profile, IRSA via S3, IRSA via CloudFront, IAM Roles Anywhere

All four end at AWS STS. The difference is what AWS validates (IMDSv2 token vs OIDC JWT vs X.509 cert) and where the trust anchor sits (EC2 metadata vs OIDC provider vs Roles Anywhere Trust Anchor).

Cluster: K3s v1.35.4+k3s1 on 3x EC2 t3.medium in eu-central-1, single AZ.

Setup	Auth pattern	Per-pod identity?	External deps
A Instance Profile	IMDSv2 fallback	NO (per-node)	none
B IRSA + S3	OIDC federation	YES	S3 bucket public
C IRSA + CloudFront	OIDC federation	YES	S3 + CloudFront + ACM + DNS
D Roles Anywhere	X.509-signed CreateSession	YES	K8s Secret + CA

Key insight before we dive in: all four ultimately call AWS STS. The difference is what AWS validates (IMDSv2 token vs OIDC JWT vs X.509 cert) and where the trust anchor sits (EC2 metadata vs OIDC provider vs Roles Anywhere Trust Anchor).

Setup A: EC2 Instance Profile (baseline)

The default. Attach IAM role to EC2, every pod on that node falls back to IMDSv2 when the SDK doesn’t find creds elsewhere.

resource "aws_iam_role" "k3s_node" {
  name = "lab-irsa-k3s-node"
  assume_role_policy = jsonencode({
    Statement = [{
      Effect = "Allow"
      Principal = { Service = "ec2.amazonaws.com" }
      Action = "sts:AssumeRole"
    }]
  })
}

resource "aws_iam_instance_profile" "k3s_node" {
  name = "lab-irsa-k3s-node"
  role = aws_iam_role.k3s_node.name
}

Pod manifest is identical to “no auth” - SDK auto-discovers via IMDSv2:

apiVersion: v1
kind: Pod
metadata:
  name: aws-test-baseline
spec:
  containers:
  - name: aws-cli
    image: amazon/aws-cli:latest
    command: ["sleep", "3600"]

ARN returned by sts:GetCallerIdentity:

arn:aws:sts::498127364509:assumed-role/lab-irsa-lab-k3s-node/i-026dc1726f1e157e9
                                                              ^^^^^^^^^^^^^^^^^^^
                                                              EC2 instance ID

The audit trail problem: session name is the instance ID, not the pod or service account. CloudTrail shows you which node made the call, not which pod. In a 100-pod cluster, you’ve lost workload-level traceability.

Pros:

Zero config (default fallback)
Cold start ~3.4s (we’ll see this beats IRSA by ~400ms)
No external deps

Cons:

Per-node permissions = blast radius is the whole host
No per-pod IAM granularity
Audit trail at instance level only

When to use: edge clusters with 1-2 pods, dev labs, single-tenant single-purpose nodes.

Setup B: IRSA via S3 public bucket

The classic pattern from 2021 tutorials. K3s API server signs SA tokens with an RSA key. The matching public key (JWKS) lives in a public S3 bucket. AWS STS fetches it to verify pod-issued tokens.

# Generate RSA keypair locally
openssl genrsa -out sa-signer.key 2048
openssl rsa -in sa-signer.key -pubout -out sa-signer.pub

# Build JWKS + OIDC discovery JSON (Python helper script in repo)
python gen-irsa-keys.py --issuer "https://your-bucket.s3.eu-central-1.amazonaws.com"

# Public S3 bucket - smell, intentional in Setup B
resource "aws_s3_bucket_public_access_block" "oidc" {
  block_public_acls       = false
  block_public_policy     = false
  ignore_public_acls      = false
  restrict_public_buckets = false
}

# Dynamic TLS thumbprint - never hardcode (rotates with AWS S3 cert)
data "tls_certificate" "s3" {
  url = "https://your-bucket.s3.eu-central-1.amazonaws.com"
}

resource "aws_iam_openid_connect_provider" "k3s" {
  url             = "https://your-bucket.s3.eu-central-1.amazonaws.com"
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = [data.tls_certificate.s3.certificates[0].sha1_fingerprint]
}

K3s reconfig (CP node):

# /etc/rancher/k3s/config.yaml
kube-apiserver-arg:
  - "service-account-issuer=https://your-bucket.s3.eu-central-1.amazonaws.com"
  - "service-account-signing-key-file=/etc/rancher/k3s/keys/sa-signer.key"
  - "service-account-key-file=/etc/rancher/k3s/keys/sa-signer.pub"
  - "api-audiences=sts.amazonaws.com,https://kubernetes.default.svc"

Pod manifest changes - per-pod IAM via SA annotation + projected token:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: irsa-test-s3-sa
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::ACCT:role/lab-irsa-pod-s3
---
apiVersion: v1
kind: Pod
spec:
  serviceAccountName: irsa-test-s3-sa
  containers:
  - name: aws-cli
    image: amazon/aws-cli:latest
    env:
    - name: AWS_ROLE_ARN
      value: arn:aws:iam::ACCT:role/lab-irsa-pod-s3
    - name: AWS_WEB_IDENTITY_TOKEN_FILE
      value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    volumeMounts:
    - name: aws-iam-token
      mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
      readOnly: true
  volumes:
  - name: aws-iam-token
    projected:
      sources:
      - serviceAccountToken:
          audience: sts.amazonaws.com
          expirationSeconds: 86400
          path: token

Note: I deliberately skipped the eks-pod-identity-webhook. The webhook just automates this manifest. Showing it manually makes the mechanics obvious.

ARN returned:

arn:aws:sts::498127364509:assumed-role/lab-irsa-lab-irsa-pod-s3/botocore-session-1778013011
                                       ^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^
                                       Per-pod IAM role         boto3 default session name

Now we have per-pod identity - different role for different pod, different CloudTrail entry.

The catch: S3 public bucket. Anyone running curl https://your-bucket.s3.eu-central-1.amazonaws.com/.well-known/openid-configuration can read your OIDC config and JWKS. JWKS are public by design (they hold public keys, not secrets) - the issue is unnecessary public S3 exposure plus compliance noise: AWS S3 Access Analyzer flags this and your security team will ask.

Setup C: IRSA via CloudFront + custom domain

Same OIDC mechanic, better security posture. S3 bucket private, CloudFront fronts it with Origin Access Control (OAC), custom domain (oidc.lab.haitmg.pl) with ACM cert.

The setup is 3-stage with manual DNS steps in between:

Stage 1 - ACM cert (issuer creates), S3 bucket, CloudFront stub:

terraform apply -target=aws_acm_certificate.oidc \
                -target=aws_s3_bucket.oidc \
                -target=aws_cloudfront_origin_access_control.oidc \
                -target=aws_s3_object.oidc_config \
                -target=aws_s3_object.oidc_jwks

Manual DNS step: add validation CNAME (5-15 min propagation).

Stage 2 - validation passes, CloudFront deploys:

terraform apply -target=aws_acm_certificate_validation.oidc \
                -target=aws_cloudfront_distribution.oidc \
                -target=aws_s3_bucket_policy.oidc_cf_only

Manual DNS step: add CF alias CNAME (oidc.lab → dXXX.cloudfront.net).

Stage 3 - OIDC provider + IAM role:

terraform apply  # full

Two important quirks:

ACM cert MUST be in us-east-1 for CloudFront - regardless of where your other resources live. The TF code needs an aliased provider:

provider "aws" {
  alias  = "us_east_1"
  region = "us-east-1"
}

resource "aws_acm_certificate" "oidc" {
  provider = aws.us_east_1  # not the default eu-central-1
  domain_name = "oidc.lab.haitmg.pl"
}

Reference aws_acm_certificate_validation.cert_arn, not aws_acm_certificate.arn in the CloudFront viewer_certificate. The validation resource blocks until the cert is ISSUED. Skip this and you get InvalidViewerCertificate errors.

S3 bucket is now private:

resource "aws_s3_bucket_public_access_block" "oidc" {
  block_public_acls       = true   # all 4 flags TRUE
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_s3_bucket_policy" "oidc_cf_only" {
  policy = jsonencode({
    Statement = [{
      Principal = { Service = "cloudfront.amazonaws.com" }
      Condition = {
        StringEquals = {
          "AWS:SourceArn" = aws_cloudfront_distribution.oidc.arn
        }
      }
    }]
  })
}

Cold start was slower than Setup B in my benchmark (3.97 s median vs 2.59 s, see numbers below). Likely CloudFront edge cache cold miss on the first JWKS fetch from each STS request vs direct S3 regional endpoint hit. The win is not performance - it’s security posture: S3 Access Analyzer happy, custom domain branding, optional WAF on CloudFront, ACM cert auto-rotation.

The cost: 3 Terraform applies, 2 manual DNS steps, 20 minutes of DNS propagation waits.

Setup D: IAM Roles Anywhere + self-signed CA

Completely different paradigm. No K8s OIDC, no S3 bucket, no CloudFront. Just X.509 cert auth.

# Self-signed CA (10 years, REQUIRES explicit basicConstraints + keyUsage extensions)
openssl req -x509 -new -nodes -key ca.key -sha256 -days 3650 \
  -addext "basicConstraints=critical,CA:TRUE" \
  -addext "keyUsage=critical,keyCertSign,cRLSign" \
  -out ca.crt -subj "/CN=lab-irsa-benchmark-ca/O=HAIT/C=PL"

# Pod cert (90 days, REQUIRES specific extensions for AWS Roles Anywhere)
cat > pod-cert.ext <<EOF
basicConstraints=critical,CA:FALSE
keyUsage=critical,digitalSignature
extendedKeyUsage=clientAuth
subjectKeyIdentifier=hash
authorityKeyIdentifier=keyid,issuer
EOF

openssl x509 -req -in pod-cert.csr -CA ca.crt -CAkey ca.key -CAcreateserial \
  -out pod-cert.crt -days 90 -sha256 -extfile pod-cert.ext

Three landmines I stepped on, in order:

AWS Private CA costs $50/mo for short-lived mode, $400/mo for general mode. Self-signed CA = $0 and AWS Roles Anywhere accepts external CA via Trust Anchor.
Default openssl req -x509 doesn’t add basicConstraints CA:TRUE to the CA cert. AWS rejects with Incorrect basic constraints for CA certificate. Add -addext "basicConstraints=critical,CA:TRUE".
Default openssl x509 -req doesn’t add keyUsage or extendedKeyUsage to the leaf cert. AWS rejects with Untrusted certificate. Insufficient certificate. The extfile above adds them.

Trust Anchor + Profile + Role:

resource "aws_rolesanywhere_trust_anchor" "lab_ca" {
  name = "lab-irsa-ca"
  source {
    source_data {
      x509_certificate_data = file("ca/ca.crt")
    }
    source_type = "CERTIFICATE_BUNDLE"
  }
}

resource "aws_iam_role" "ra_pod" {
  assume_role_policy = jsonencode({
    Statement = [{
      Principal = { Service = "rolesanywhere.amazonaws.com" }
      Action = ["sts:AssumeRole", "sts:TagSession", "sts:SetSourceIdentity"]
      Condition = {
        StringEquals = {
          "aws:PrincipalTag/x509Subject/CN" = "lab-pod-1"
        }
      }
    }]
  })
}

resource "aws_rolesanywhere_profile" "lab_pod" {
  role_arns = [aws_iam_role.ra_pod.arn]
  duration_seconds = 3600
}

Pod has 3 containers (initContainer downloads helper, sidecar runs it, app talks to it):

spec:
  initContainers:
  - name: download-helper
    image: amazon/aws-cli:latest
    command: ["sh", "-c"]
    args:
      - |
        curl -sSL "https://rolesanywhere.amazonaws.com/releases/1.8.2/X86_64/Linux/Amzn2023/aws_signing_helper" \
          -o /shared/aws_signing_helper
        chmod +x /shared/aws_signing_helper

  - name: setup-aws-config
    image: amazon/aws-cli:latest
    command: ["sh", "-c"]
    args:
      - |
        cat > /aws-config/config <<EOF
        [default]
        credential_process = /shared/aws_signing_helper credential-process \
          --certificate /certs/cert.pem --private-key /certs/key.pem \
          --trust-anchor-arn ARN --profile-arn ARN --role-arn ARN
        EOF

  containers:
  - name: aws-cli
    image: amazon/aws-cli:latest
    env:
    - name: AWS_CONFIG_FILE
      value: /aws-config/config

Why credential-process and not serve (IMDSv2 mock)?

The helper has two modes. serve exposes an IMDSv2-compatible endpoint on localhost:9911. AWS docs describe both as supported.

In my lab (helper 1.8.2 + amazon/aws-cli:latest 2.34.43 + K3s 1.35.4), serve mode returned 400 Bad Request on the listing endpoint (/latest/meta-data/iam/security-credentials/), which made botocore overflow on _evaluate_expiration:

File "awscli/botocore/utils.py", line 640, in _evaluate_expiration
OverflowError: date value out of range

credential-process was reliable. Your mileage may vary - check helper version and SDK combination before assuming serve works. If it does for you, serve saves a config file mount.

ARN returned:

arn:aws:sts::498127364509:assumed-role/lab-irsa-lab-roles-anywhere-pod/228cf3882735f2c00354ea7571dcf6680666dca0
                                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                                                       X.509 cert serial number

Best audit trail of the four setups. Session name = cert serial = unique per cert issuance = correlatable with your PKI logs.

Trade-off: sidecar adds ~50MB RAM per pod. Each AWS API call forks a subprocess (credential-process runs aws_signing_helper on demand). Cold start is ~85% slower than Setup A.

Benchmark: cold start

10 runs per setup, measuring kubectl apply pod to first successful aws sts get-caller-identity.

Methodology:

K3s v1.35.4+k3s1 on 3x EC2 t3.medium in eu-central-1a
Ubuntu 24.04 LTS noble (AMI ami-0596cf3199908321b)
Container image: amazon/aws-cli:latest (resolved to 2.34.43)
Note on image pull: because the tag is :latest, Kubernetes defaults imagePullPolicy to Always, not IfNotPresent. Image layers were cached after the first run on each node, but kubelet still performed registry digest resolution on every pod create. Cold-start numbers therefore include some image-pull / registry overhead - they are not pure auth-method timings. To isolate auth-only overhead in your own runs, set imagePullPolicy: IfNotPresent explicitly in the pod spec.
Each iteration: kubectl delete pod --grace-period=0 --force → wait until gone → kubectl apply → poll aws sts get-caller-identity every 200ms until "Account" appears in output
Total wall time = kubectl apply to first successful STS response
10 runs per setup, 40 runs total. Stats computed via Python statistics.median() and standard p95 = sorted[int(n*0.95)]
Raw CSV (40 runs, all 4 setups): cold-start-results-final.csv · Aggregated stats: cold-start-summary.csv

Setup	Median	p95	Mean
A Instance Profile	3.18 s	4.75 s	3.12 s
B IRSA-S3 (public bucket)	2.59 s	3.69 s	2.88 s
C IRSA-CloudFront	3.97 s	13.68 s	5.69 s
D Roles Anywhere	6.14 s	8.08 s	6.27 s

K3s cold start benchmark chart: Instance Profile vs IRSA-S3 vs IRSA-CloudFront vs IAM Roles Anywhere - median, mean, p95 across 10 runs each, showing Setup C bimodal CloudFront cold cache outliers

Four insights:

Setup C has the worst p95 (13.68 s) and highest variance. Two of 10 runs hit ~13.5 s vs the others sitting around 2.3-5.5 s. Hypothesis: CloudFront edge cache cold misses for the JWKS fetch from AWS STS - first request pays the full origin fetch latency; subsequent requests use cached JWKS at the edge until TTL expires. Caveat: I did not have visibility into AWS STS internal JWKS fetch timing, so this is inferred from distribution shape, not directly observed in CloudFront/STS logs. If your IRSA pod creation is bursty (CronJob spawning every minute), expect this variance in Setup C, not in Setup B.
B vs C median delta = 1.38 s (B faster). Setup B and C share the same OIDC mechanic but differ in where AWS STS fetches JWKS from: regional S3 endpoint (B) vs CloudFront edge (C). On B, every STS validation hits the same regional endpoint with consistent latency. On C, you’re paying CDN cold cache misses on first hit. Don’t read “B is faster” as “always pick B” - the security trade-off (public bucket vs private+OAC) outweighs the median delta for most prod workloads.
A vs C median delta = 790 ms (~25%). OIDC federation does cost noticeable overhead vs Instance Profile fallback, but most cold-start time is still pod scheduling, image pull, and registry resolution - not the auth handshake itself.
Setup D is 93% slower than A because of two factors: initContainer downloads ~25 MB helper binary at pod start, and every AWS API call forks a credential-process subprocess. Optimization: bake helper into a custom Dockerfile (FROM amazon/aws-cli + RUN curl helper). Eliminates init download, saves ~2 s.
With n=10, take medians, not means. Setup C’s mean (5.69 s) is dragged up by 2 cold-cache outliers. Median (3.97 s) is more representative. Real production cold-start distribution is bimodal for CDN-fronted IRSA - plan capacity around p95, not median.

Failure scenarios: what breaks when

I deliberately broke things and recorded what happened. Here’s the matrix:

Test setup detail for issuer change: I changed K3s service-account-issuer from S3 (Setup B’s expected) to CloudFront (Setup C’s expected). After the change, K3s signs SA tokens with iss: <CloudFront URL>. Setup B’s IAM trust policy still expects iss: <S3 URL> → mismatch → AccessDenied. Setup C now matches → OK. The relationship is symmetric - if I had gone the other direction (CF→S3), Setup C would have failed and B would have worked. The lesson is “issuer change breaks any IRSA pod whose trust policy references the old issuer”, regardless of direction.

ID	Failure scenario	A	B	C	D	Cached creds save?
F-01	K3s issuer change (B→C)	✓	✗	✓	✓	NO
F-02	Clock skew +5h on node	✗	✗	✗	✗	NO
F-03	K8s Secret deleted (pod cert)	n/a	n/a	n/a	⚠️→✗	YES (existing only)
F-04	IAM Role deleted mid-flight	n/a	n/a	⚠️→✗	n/a	YES (1h grace)
F-05	OIDC Provider deleted in IAM	n/a	n/a	⚠️→✗	n/a	YES (1h grace)
F-06	Trust policy `:sub` mismatch	n/a	n/a	✗	n/a	NO

Legend: ✓ unaffected · ✗ fails immediately · ⚠️→✗ degraded (cached creds save existing pods 1h, then fail) · n/a scenario doesn’t apply

Per-failure detail

F-01 - K3s issuer change (B→C): Setup B’s IAM trust policy expects iss: <S3 URL>, but K3s now signs SA tokens with iss: <CloudFront URL>. Token rejected. Setup C matches new issuer → OK. Setup A unaffected (Instance Profile, no OIDC). Setup D unaffected (cert auth). Recovery: update all trust policies to new issuer URL OR rollback K3s. Safe migration pattern: blue-green dual-issuer (maintain both OIDC providers during cutover).

F-02 - Clock skew on node: All setups fail. AWS STS rejects SigV4 signatures with timestamp >15min off server clock. Setup A: SignatureDoesNotMatch. Setup B/C: token iat/exp invalid. Setup D: Invalid signature on CreateSession. Recovery: systemctl restart chrony → 5s NTP resync, all setups auto-recover without pod restart.

F-03 - K8s Secret deleted: Setup D only. Existing pod with cert already mounted via projected volume keeps running on cached creds. New pod attempting to start: MountVolume.SetUp failed for volume "cert" : secret not found. Recovery: kubectl create secret again, kubelet auto-retries mount.

F-04 - IAM Role deleted mid-flight: Setup C only (illustrative; same applies to B). Existing pod has 1h-cached AssumeRoleWithWebIdentity creds and keeps working until refresh. New pod: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity. Recovery: terraform apply re-creates role + policy attachment.

F-05 - OIDC Provider deleted in IAM: Setup C only. Existing pod cached. New pod: InvalidIdentityToken: No OpenIDConnect provider found in your account for https://oidc.lab.haitmg.pl. Notably different error code than F-04 (role deleted) - useful for fast triage. Recovery: terraform apply re-creates OIDC provider.

F-06 - Trust policy :sub condition mismatch: Setup C only. Pod with wrong SA name fails immediately: AccessDenied. Same error code as F-04 but different cause - the trust policy condition system:serviceaccount:default:irsa-test-cf-sa doesn’t match token claim. Recovery: fix SA name in pod spec OR fix trust policy condition.

Three meta-insights:

1. Cached creds = grace period (4 of 6 scenarios).

AssumeRoleWithWebIdentity creds cache for 1 hour by default
Existing pod survives role deletion, OIDC deletion, K8s Secret deletion (for Setup D)
New pods (CronJob spawning every minute) hit the failure immediately
First casualty: short-lived workloads.

2. Error message decoder ring:

SignatureDoesNotMatch → clock skew
InvalidIdentityToken: No OpenIDConnect provider found → OIDC provider deleted
AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity → role deleted OR trust policy mismatch (must check both!)
MountVolume.SetUp failed → K8s Secret missing
Credentials were refreshed, but still expired → clock skew (SDK’s confused way of saying it)

3. Recovery time differs wildly:

NTP fix: 5 seconds (systemctl restart chrony)
TF apply re-create: ~30 seconds (role, OIDC provider)
K8s Secret recreate + kubelet retry: ~5 seconds

The most damaging failure is K3s issuer change because it’s instant for all existing IRSA pods and recovery requires either rolling back the issuer or updating every trust policy. Blue-green issuer rollout is the only safe migration pattern.

Cost reality

I expected meaningful cost differences. There aren’t any.

Lab cost (3-day run including troubleshooting): ~$5 total. EC2 dominates ($0.0418/h × 3 nodes × ~24h = ~$3). Everything else fits in free tiers.

Production projection (100 pods, 24/7, ~10 AWS API calls/h per pod):

Setup	Monthly cost	Comment
A Instance Profile	$0	Built into EC2
B IRSA-S3 public	~$0.05	72k S3 GETs/mo, free tier covers
C IRSA-CloudFront	~$0.05	72k CF requests, free tier 10M covers
D Roles Anywhere	$0	No service charge for CreateSession

Cost is never the deciding factor. Engineering time is - Setup C took me 30 minutes the first time (DNS waits), Setup D took 60 minutes (5 different cert/version errors). At $200/h engineer rate, that’s $200+ in invisible cost per first deploy.

The hidden trap: AWS Private CA in Roles Anywhere general mode = $400/mo. Don’t fall for that. Self-signed CA is a feature, not a workaround.

Decision matrix

Edge cluster with no public OIDC issuer (AWS access via VPC endpoints or NAT)
  → Instance Profile (Setup A). Skip the rest.

Multi-tenant K3s in prod, blast radius matters
  → IRSA via CloudFront (Setup C). Sub-second median cold-start cost is irrelevant for long-running workloads.

Mix of K8s and non-K8s workloads (VMs, on-prem servers)
  → IAM Roles Anywhere (Setup D). One auth pattern across everything.

Existing corporate PKI (Microsoft AD CS, Vault PKI, EJBCA)
  → IAM Roles Anywhere (Setup D). Reuse what you have.

Air-gapped cluster with no public OIDC issuer
  → IAM Roles Anywhere (Setup D). Cert distribution can be offline.

GitOps-managed cluster with frequent Secret churn
  → AVOID Setup D. K8s Secret = critical path dependency.

Short-lived CronJob workloads (creds churn every minute)
  → Setup A or D. IRSA's 1h cache is wasted on short pods.

You read about IRSA on someone's 2021 blog and want to copy it
  → Setup B with S3 public bucket. **Don't.** Use C with CloudFront.

Conclusion

My pick for production self-hosted K3s: Setup C (IRSA via CloudFront). Per-pod identity, no S3 public bucket smell, and the 30-minute first deploy is one-time pain.

My pick for hybrid (K8s + non-K8s) prod: Setup D (Roles Anywhere). One mental model for cert auth across containers, VMs, and lambdas.

My pick for “I’m just learning”: Setup A (Instance Profile). Skip the OIDC complexity until you have a multi-pod cluster that actually needs per-pod IAM.

What this lab convinced me of:

Auth method matters less than people think for cold start (sub-second median delta, $0 cost difference for typical workloads)
Failure mode coverage matters way more than auth pattern complexity
Self-signed CA for Roles Anywhere is the unlock - $50-400/mo savings vs AWS Private CA
Tutorials from 2021 are subtly broken in 2026 (awscli v1 EOL, helper IMDSv2 mock broken, openssl version skews)

Reproduce this lab

Full Terraform stacks, K8s manifests, helper scripts, raw benchmark CSVs, and step-by-step runbooks are public on GitHub:

github.com/gebalamariusz/lab-irsa-benchmark

What’s in the repo:

4 Terraform stacks (01-baseline through 04-roles-anywhere)
K8s manifests for each setup’s test pod (with manual projected SA token, no Pod Identity Webhook required)
Cold start benchmark scripts (PowerShell + bash)
Self-signed CA + cert generation script with all required X.509 v3 extensions for AWS Roles Anywhere
Raw benchmark CSVs and chart generator (matplotlib)

Cost to reproduce: ~$3-5 if you run all 4 setups for a few hours and destroy.

What I’m NOT covering (yet)

EKS Pod Identity (newer than IRSA, no projected token, ~simpler) - separate post coming
EKS Auto Mode with Karpenter integration - haven’t benchmarked
VPC Lattice + IAM auth for service-to-service - different problem space

If you found this useful, those are next.

Acknowledgments

This is a follow-up to my AWS Network Firewall vs Palo Alto VM-Series benchmark and AWS Bedrock AgentCore network mode analysis. Same methodology: build it, break it, measure it.

If your team needs help with AWS security audit (IAM, OIDC federation, Roles Anywhere setup review), I run HAIT Consulting - book a 20-min triage call.