Your team just shipped a new personalization API. Marketing ran a campaign. Traffic spiked. Latency tripled. Dashboards lit up. Postmortem pain followed.

The root cause wasn’t a missing feature – it was missing evidence. No baseline, no capacity model, no load profile, no automated performance guardrails.

This week, you decide to change that. Enter Grafana k6 – an open source, developer-centric performance testing tool that feels like writing integration tests, but for traffic at scale.

In this guide, we’ll rebuild performance confidence step‑by‑step – the same way a real team would.


Why k6?

Before diving into code, let’s understand what makes k6 special:

Need k6 Solution
Developer-friendly Write tests in JavaScript
Shift-left testing Run locally, in CI, Docker, or k6 Cloud
Production realism Model real traffic with scenarios, arrival-rate, ramping patterns
Built-in validation Checks, thresholds, tags, custom metrics out of the box
Real-time observability Native outputs to Prometheus, InfluxDB, Grafana
CI/CD ready Deterministic exit codes based on SLO thresholds
Extensible xk6 extensions for gRPC, Redis, Kafka, WebSockets, Browser testing

The k6 Philosophy:

graph LR
    A[Write Test<br/>JavaScript] --> B[Define Scenarios<br/>Traffic Patterns]
    B --> C[Set Thresholds<br/>SLO Gates]
    C --> D[Run Test<br/>Local or CI]
    D --> E[Stream Metrics<br/>Grafana/Prometheus]
    E --> F[Pass/Fail<br/>Based on SLOs]
    
    style A fill:#e1f5ff
    style C fill:#fff4e1
    style F fill:#ffe1e1

First Script: Baseline Request

Create scripts/smoke.js:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import http from 'k6/http';
import { sleep, check } from 'k6';

export const options = {
  vus: 5,              // 5 virtual users
  duration: '30s',     // Run for 30 seconds
};

export default function () {
  const res = http.get('https://api.example.com/health');
  check(res, {
    'status is 200': r => r.status === 200,
    'response < 200ms': r => r.timings.duration < 200,
  });
  sleep(1);
}

Run it:

1
k6 run scripts/smoke.js

What You’ll See:

1
2
3
4
5
6
7
8
     ✓ status is 200
     ✓ response < 200ms

     checks.........................: 100.00% ✓ 150       ✗ 0  
     http_req_duration..............: avg=145ms p(95)=178ms
     http_reqs......................: 150     5/s
     iterations.....................: 150     5/s
     vus............................: 5       min=5 max=5

Understanding the Output:

  • checks: Your validation rules (100% pass rate = healthy)
  • http_req_duration: Response time stats (p95 is your friend)
  • http_reqs: Total requests and requests per second
  • iterations: How many times each VU completed the function
sequenceDiagram
    participant K6
    participant API
    
    loop Every VU (5 times in parallel)
        K6->>API: GET /health
        API-->>K6: 200 OK (145ms)
        K6->>K6: check(status == 200)
        K6->>K6: check(duration < 200ms)
        K6->>K6: sleep(1s)
    end
    
    Note over K6: Repeat for 30 seconds

Congratulations! You just ran your first load test. But we’re not stopping here—let’s add some teeth to it.


Add SLO Guardrails with Thresholds

Thresholds fail the test if SLOs degrade – perfect for CI.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
export const options = {
  vus: 10,
  duration: '1m',
  thresholds: {
    // Error rate must be < 1%
    http_req_failed: ['rate<0.01'],
    
    // P95 latency must be < 400ms AND average < 250ms
    http_req_duration: ['p(95)<400', 'avg<250'],
    
    // At least 99% of checks must pass
    checks: ['rate>0.99'],
  },
};

If violated, k6 exits with non‑zero status -> pipeline fails.


k6 Executors Explained

k6 provides several executor types to model different traffic patterns. Here’s a complete scenario combining multiple executors:

scenarios.js:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import http from 'k6/http';
import { check } from 'k6';

export const options = {
  scenarios: {
    ramp_up: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '1m', target: 50 },
        { duration: '2m', target: 50 },
        { duration: '30s', target: 0 },
      ],
      exec: 'browse',
    },
    sustained_api: {
      executor: 'constant-arrival-rate',
      rate: 100,               // requests per second
      timeUnit: '1s',
      duration: '3m',
      preAllocatedVUs: 60,
      maxVUs: 120,
      exec: 'api',
    },
    spike: {
      executor: 'per-vu-iterations',
      vus: 100,
      iterations: 1,
      gracefulStop: '30s',
      exec: 'login',
    },
  },
  thresholds: {
    'http_req_duration{type:api}': ['p(95)<500'],
    'checks{scenario:login}': ['rate>0.99'],
  },
};

export function browse () {
  http.get('https://example.com/');
}

export function api () {
  const res = http.get('https://api.example.com/v1/products');
  check(res, { '200': r => r.status === 200 });
}

export function login () {
  const payload = JSON.stringify({ user: 'test', pass: 'secret' });
  const headers = { 'Content-Type': 'application/json' };
  const res = http.post('https://api.example.com/login', payload, { headers, tags: { scenario: 'login' } });
  check(res, { 'login ok': r => r.status === 200 });
}

Understanding the Three Scenarios

This test simulates three real-world traffic patterns running simultaneously:

Scenario What It Does Why It Matters
ramp_up 0→50→50→0 users over 3.5 minutes Tests warm-up, cache loading, connection pools
sustained_api 100 requests/second for 3 minutes Validates steady-state capacity and SLA compliance
spike 100 users login simultaneously Tests auth system under sudden traffic burst

k6 ramping-vus Executor

The ramping-vus executor controls the number of virtual users over time. Use it to simulate gradual user growth.

1
2
3
4
5
6
7
8
9
{
  executor: 'ramping-vus',
  startVUs: 0,
  stages: [
    { duration: '1m', target: 50 },   // Gradual increase to 50 users
    { duration: '2m', target: 50 },   // Hold steady at 50 users
    { duration: '30s', target: 0 },   // Ramp down to 0
  ],
}


Setting Description
startVUs Initial number of virtual users
stages Array of ramp stages with duration and target VU count
gracefulRampDown Time to wait for iterations to finish when ramping down


When to use: Morning traffic simulation, cache warm-up testing, gradual load increase.

Behavior: If your API slows down, users wait longer and RPS drops naturally—just like real users.


k6 constant-arrival-rate Executor

The constant-arrival-rate executor maintains a fixed requests-per-second (RPS) regardless of response time.

1
2
3
4
5
6
7
8
{
  executor: 'constant-arrival-rate',
  rate: 100,              // 100 requests per second
  timeUnit: '1s',         // Rate is per second
  duration: '3m',         // Run for 3 minutes
  preAllocatedVUs: 60,    // Start with 60 VUs
  maxVUs: 120,            // Scale up to 120 if needed
}


Setting Description
rate Number of iterations to start per timeUnit
timeUnit Period of time to apply the rate (default: ‘1s’)
preAllocatedVUs Initial VU pool size
maxVUs Maximum VUs k6 can scale to


When to use: SLA validation, capacity planning, “can we handle X RPS?” testing.

Behavior: If your API slows down, k6 spins up more VUs to maintain the target RPS. This reveals latency degradation under load.


k6 ramping-arrival-rate Executor

The ramping-arrival-rate executor gradually increases or decreases RPS through defined stages—combining the best of ramping patterns with fixed throughput guarantees.

1
2
3
4
5
6
7
8
9
10
11
12
13
{
  executor: 'ramping-arrival-rate',
  startRate: 10,          // Start at 10 RPS
  timeUnit: '1s',
  preAllocatedVUs: 50,
  maxVUs: 200,
  stages: [
    { duration: '2m', target: 50 },   // Ramp to 50 RPS
    { duration: '3m', target: 100 },  // Ramp to 100 RPS
    { duration: '1m', target: 100 },  // Hold at 100 RPS
    { duration: '1m', target: 0 },    // Ramp down
  ],
}


Setting Description
startRate Initial iterations per timeUnit
stages Array of stages with duration and target RPS
preAllocatedVUs Initial VU pool (set to expected RPS / iterations per VU per second)
maxVUs Maximum VUs for scaling (set 2-3x preAllocatedVUs)


When to use: Black Friday traffic simulation, realistic ramp-up to peak load, gradual stress testing.

Behavior: k6 dynamically adjusts VUs to maintain the target RPS at each stage. Perfect for finding the breaking point.

Complete Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import http from 'k6/http';
import { check } from 'k6';

export const options = {
  scenarios: {
    ramp_to_peak: {
      executor: 'ramping-arrival-rate',
      startRate: 10,
      timeUnit: '1s',
      preAllocatedVUs: 50,
      maxVUs: 300,
      stages: [
        { duration: '1m', target: 50 },    // Warm up to 50 RPS
        { duration: '2m', target: 100 },   // Scale to 100 RPS
        { duration: '3m', target: 200 },   // Push to 200 RPS
        { duration: '2m', target: 200 },   // Hold peak
        { duration: '1m', target: 0 },     // Ramp down
      ],
    },
  },
  thresholds: {
    http_req_duration: ['p(95)<500'],
    http_req_failed: ['rate<0.01'],
  },
};

export default function () {
  const res = http.get('https://api.example.com/products');
  check(res, { 'status 200': (r) => r.status === 200 });
}

k6 per-vu-iterations Executor (Spike Testing)

The per-vu-iterations executor runs each VU for a fixed number of iterations—perfect for instant traffic bursts.

1
2
3
4
5
6
{
  executor: 'per-vu-iterations',
  vus: 100,         // 100 users
  iterations: 1,    // Each runs once = instant burst
  gracefulStop: '30s',
}

When to use: Login storms, push notification spikes, flash sale simulations.


ramping-vus vs ramping-arrival-rate: Which Should You Use?

Executor Controls When Latency Increases… Best For
ramping-vus Number of concurrent users RPS decreases naturally Simulating realistic user behavior
constant-arrival-rate Fixed RPS k6 adds more VUs SLA validation at specific throughput
ramping-arrival-rate Gradually changing RPS k6 adjusts VUs dynamically Realistic ramp to peak traffic

Quick Decision Guide:

  • “How does my app handle growing users?” → Use ramping-vus
  • “Can we sustain exactly 500 RPS?” → Use constant-arrival-rate
  • “Simulate Black Friday traffic ramp” → Use ramping-arrival-rate
  • “What happens in a login storm?” → Use per-vu-iterations

Pro Tip: Start with one scenario, validate it works, then add more. Don’t try to build the perfect test on day one.


Data-Driven & Parameterized Tests

data.js:

1
2
3
4
5
6
7
8
9
10
11
12
13
import http from 'k6/http';
import { check } from 'k6';
import { SharedArray } from 'k6/data';

const users = new SharedArray('users', () => JSON.parse(open('./users.json')));

export const options = { vus: 20, duration: '45s' };

export default function () {
  const user = users[__ITER % users.length];
  const res = http.get(`https://api.example.com/users/${user.id}`);
  check(res, { 'user fetched': r => r.status === 200 });
}

users.json example:

1
2
3
4
5
[
  { "id": 101 },
  { "id": 102 },
  { "id": 103 }
]

Use environment variables for secrets:

1
2
3
API_BASE=https://api.example.com \
TOKEN=$(op read op://secrets/api_token) \
k6 run data.js

In script:

1
const BASE = __ENV.API_BASE;

Checks vs Thresholds vs Assertions

Concept Scope Purpose
check() Per response Functional validation; contributes to checks metric
Threshold Aggregated metric Enforce SLOs; fails run on breach
Custom logic (throw) Immediate Hard stop for critical scenarios

Custom & Trend Metrics

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import { Trend, Counter } from 'k6/metrics';

const queueDelay = new Trend('queue_delay_ms');
const authFailures = new Counter('auth_failures');

export default function () {
  const start = Date.now();
  // simulate internal queue wait
  sleep(Math.random() * 0.05);
  queueDelay.add(Date.now() - start);

  const res = http.get('https://api.example.com/auth/ping');
  if (res.status !== 200) authFailures.add(1);
}

Visualize custom metrics in Grafana (Prometheus / Influx pipeline described next).


Observability: Streaming to Grafana

Option A: Prometheus Remote Write

Run k6 with output:

1
2
3
k6 run --out experimental-prometheus-rw --tag test=baseline scenarios.js \
  --address 0.0.0.0:6565 \
  -e API_BASE=https://api.example.com

Point to your Prometheus remote-write endpoint via env vars (K6_PROMETHEUS_RW_SERVER_URL).

Option B: InfluxDB + Grafana

1
2
3
4
5
docker run -d --name influx -p 8086:8086 influxdb:2
export K6_INFLUXDB_ORGANIZATION=myorg
export K6_INFLUXDB_BUCKET=perf
export K6_INFLUXDB_TOKEN=secret
k6 run --out influxdb=http://localhost:8086 scenarios.js

Import a k6 Grafana dashboard (ID: 2587 or community variants) and correlate with application metrics.

Correlation Workflow

  1. Run k6 scenario tagged with scenario, type.
  2. Use Grafana to filter: http_req_duration{scenario="sustained_api"}
  3. Overlay with service latency & DB CPU.
  4. Pin p95 regression panels.

GitHub Actions CI Integration

.github/workflows/perf.yml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
name: performance
on:
  pull_request:
    paths: ['api/**']
  workflow_dispatch: {}

jobs:
  k6:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install k6
        uses: grafana/setup-k6-action@v1
      - name: Run smoke performance test
        run: k6 run scripts/smoke.js
      - name: Run gated scenario test
        run: |
          k6 run scenarios.js || echo "Performance regression detected" && exit 1

Failing thresholds stop merges: true shift‑left.


Scaling Beyond One Machine

Need Approach
Higher concurrency k6 Cloud (managed scaling)
Kubernetes-native k6 Operator (CRDs define tests)
Protocol diversity xk6 extensions (Kafka, Redis, Browser)
Browser + API mix k6 Browser module (Chromium drive)

k6 Operator Example (CRD excerpt):

1
2
3
4
5
6
7
8
9
10
apiVersion: k6.io/v1alpha1
kind: K6
metadata:
  name: api-load
spec:
  parallelism: 4
  script:
    configMap:
      name: k6-script
      file: scenarios.js

Best Practices Checklist

  • Start with small smoke tests on every PR.
  • Define SLO-aligned thresholds (p95, error rate) early.
  • Model realistic traffic mix (browsing vs API vs auth spike).
  • Keep scripts versioned alongside code; treat as test artifacts.
  • Tag everything: scenario, endpoint, version, commit SHA.
  • Stream to Grafana; correlate with infra + APM traces.
  • Use arrival-rate executors for RPS targets (not just VUs).
  • Add custom business metrics (e.g., orders_per_second).
  • Run capacity tests before major launches.
  • Automate regression gates in CI.
  • Periodically refresh test data to avoid caching distortion.

Resources


Final Thought: Performance isn’t a phase, it’s a habit. k6 lets you encode that habit as code, observable data, and enforceable standards.

If you found this helpful, share it or adapt the snippets to your stack. Have a twist on these patterns? Drop a comment.