Health Check Endpoints and Ping Monitoring Explained

A health check endpoint is a URL on your application that reports whether the service is working. Load balancers hit it to decide where to route traffic. Monitoring tools hit it to decide whether to page you at 3 AM. Orchestrators like Kubernetes hit it to decide whether to restart your container.

The concept is simple: send a request to a known URL, get back a response that says "healthy" or "unhealthy." But the details matter. What should the endpoint actually check? How deep should it go? And how does this relate to basic ping monitoring?

This guide covers health check endpoints from design to implementation, then explains where ping monitoring fits in and how the two approaches complement each other.

What Is a Health Check Endpoint?

A health check endpoint is a dedicated route in your application -- typically /health, /healthz, or /status -- that returns information about the application's current state. At minimum, it returns an HTTP 200 status code when things are working and a non-200 code (usually 503) when something is wrong.

A basic health check looks like this:

// Express.js example
app.get('/health', (req, res) => {
  res.status(200).json({ status: 'ok' })
})

That is a shallow health check. It confirms the application process is running and can handle HTTP requests. It does not tell you whether the database is reachable, the cache is responding, or downstream APIs are available.

A deeper health check queries those dependencies:

app.get('/health', async (req, res) => {
  const checks = {}

  // Database check
  try {
    await db.query('SELECT 1')
    checks.database = 'ok'
  } catch (err) {
    checks.database = 'error'
  }

  // Redis check
  try {
    await redis.ping()
    checks.cache = 'ok'
  } catch (err) {
    checks.cache = 'error'
  }

  const healthy = Object.values(checks).every((v) => v === 'ok')

  res.status(healthy ? 200 : 503).json({
    status: healthy ? 'ok' : 'degraded',
    checks,
    timestamp: new Date().toISOString(),
  })
})

This tells you not just that the process is alive but that it can actually do useful work.

The /health and /healthz Conventions

There is no official standard for health check URLs, but two conventions dominate.

/health is the most common general-purpose path. It is readable, obvious, and works across any framework or platform.

/healthz comes from the Kubernetes ecosystem. The "z" suffix was adopted by Google's internal systems and carried over into Kubernetes conventions. If you are running on Kubernetes, you will see /healthz, /readyz, and /livez used frequently.

Some applications expose multiple endpoints:

/health or /healthz -- Liveness check. Is the process alive?
/ready or /readyz -- Readiness check. Can the service handle requests right now?
/startup -- Startup check. Has the service finished initializing?

The distinction between liveness and readiness matters in orchestrated environments. A service might be alive (the process is running) but not ready (it is still loading configuration or warming a cache). Kubernetes uses these signals differently: a failed liveness check triggers a restart, while a failed readiness check removes the pod from the load balancer without killing it.

What to Check in a Health Endpoint

The right level of depth depends on who is consuming the health check and what action they will take based on the result.

Shallow Checks (Liveness)

A shallow check confirms the application process is running and can respond to HTTP requests. It does not query any external dependencies.

Use for: Liveness probes in Kubernetes, basic "is the process alive" monitoring.

# Flask example
@app.route('/healthz')
def healthz():
    return {'status': 'ok'}, 200

Dependency Checks (Readiness)

A readiness check verifies that the application can serve real traffic by testing its critical dependencies.

Common dependencies to check:

Database -- Can you execute a simple query? (SELECT 1 or equivalent)
Cache (Redis/Memcached) -- Does a PING command return PONG?
Message queue -- Can you connect and check the queue status?
File storage -- Is the storage service reachable?
Downstream APIs -- Can you reach critical third-party services?

Use for: Load balancer health checks, readiness probes, monitoring systems that need to know if the service is functional.

// Go example
func healthHandler(w http.ResponseWriter, r *http.Request) {
    checks := map[string]string{}

    // Check database
    if err := db.Ping(); err != nil {
        checks["database"] = "error: " + err.Error()
    } else {
        checks["database"] = "ok"
    }

    // Check Redis
    if _, err := rdb.Ping(ctx).Result(); err != nil {
        checks["redis"] = "error: " + err.Error()
    } else {
        checks["redis"] = "ok"
    }

    healthy := true
    for _, v := range checks {
        if v != "ok" {
            healthy = false
            break
        }
    }

    status := http.StatusOK
    if !healthy {
        status = http.StatusServiceUnavailable
    }

    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(status)
    json.NewEncoder(w).Encode(map[string]interface{}{
        "status": map[bool]string{true: "ok", false: "degraded"}[healthy],
        "checks": checks,
    })
}

What Not to Check

Not every dependency should block your health check.

Non-critical services. If your analytics provider is down, your application can still serve users. Do not let an analytics API timeout cause your health check to report unhealthy.

Slow external APIs. If a dependency check takes 5 seconds, your health check takes 5 seconds, and your load balancer might time out and mark the instance as unhealthy even though your application is fine. Set aggressive timeouts (1-2 seconds) on dependency checks.

Expensive operations. Do not run a full database query that scans millions of rows. A SELECT 1 is enough to confirm connectivity. The health check should be lightweight and fast.

Ping Monitoring vs. Health Check Monitoring

Ping monitoring and health check monitoring are related but different.

Ping Monitoring

A ping monitor sends an ICMP ping or an HTTP request to your server at regular intervals. If the server does not respond within a timeout, it is flagged as down.

Ping monitoring answers one question: is this host reachable? It operates at the network and infrastructure level. It does not know anything about your application's internal state.

Strengths:

Simple to set up -- no application changes required
Catches total outages, network failures, and DNS problems
Works for any server, service, or device

Limitations:

Cannot detect application-level failures (a server can respond to pings while the app is crashing)
Cannot detect partial degradation
No insight into why something is failing

Health Check Monitoring

Health check monitoring sends HTTP requests to your application's health endpoint. It gets a structured response that indicates not just reachability but functional status.

Strengths:

Detects application-level problems (database down, cache unreachable)
Can differentiate between total failure and partial degradation
Provides diagnostic information in the response body

Limitations:

Requires you to build and maintain the health endpoint
Only as good as the checks you implement
Adds a small amount of load to your application

Using Both Together

The best monitoring setup uses both approaches.

Ping monitoring catches infrastructure-level problems: the server is unreachable, DNS is broken, the network path is down. Health check monitoring catches application-level problems: the database connection pool is exhausted, a critical dependency is unreachable, the disk is full.

Together, they give you full visibility. For a broader overview of how these fit into a monitoring strategy, see our uptime monitoring guide and our article on endpoint monitoring.

Start with ping, add health checks as you grow

If you are just getting started with monitoring, a simple HTTP check against your homepage is a perfectly good first step. It catches the most common failure mode: the site is completely down. Add health check endpoints when you need deeper visibility into application-level issues.

Implementation Patterns

Response Format

There is no official standard for health check response bodies, but a common pattern looks like this:

{
  "status": "ok",
  "checks": {
    "database": { "status": "ok", "latency_ms": 3 },
    "cache": { "status": "ok", "latency_ms": 1 },
    "storage": { "status": "ok", "latency_ms": 12 }
  },
  "version": "2.4.1",
  "uptime": "72h14m"
}

Including the application version and uptime is optional but helpful for debugging. It lets you correlate health issues with specific deployments.

Timeouts

Set a timeout on the entire health check handler. If any dependency check hangs, you want the health endpoint to respond within a reasonable window (2-5 seconds) rather than blocking indefinitely.

app.get('/health', async (req, res) => {
  const timeout = new Promise((_, reject) =>
    setTimeout(() => reject(new Error('Health check timeout')), 3000)
  )

  try {
    const result = await Promise.race([runChecks(), timeout])
    res.status(200).json(result)
  } catch (err) {
    res.status(503).json({ status: 'timeout', error: err.message })
  }
})

Authentication

Health check endpoints should generally not require authentication. Load balancers and monitoring tools need to hit them without credentials. If you are concerned about exposing internal state, return minimal information (just the status code) on the public endpoint and put detailed diagnostics on a separate authenticated endpoint.

Caching

Do not cache health check responses. The whole point is to reflect the current state of the system. A cached "healthy" response that is 60 seconds old could mask a failure that happened 30 seconds ago.

Health Checks in Production Environments

Kubernetes

Kubernetes uses three probe types:

# Deployment spec
containers:
  - name: myapp
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 15
    readinessProbe:
      httpGet:
        path: /readyz
        port: 8080
      periodSeconds: 5
    startupProbe:
      httpGet:
        path: /healthz
        port: 8080
      failureThreshold: 30
      periodSeconds: 10

The startup probe runs first and prevents liveness checks from killing a slow-starting container. Once the startup probe succeeds, the liveness and readiness probes take over.

AWS Elastic Load Balancer

AWS ELB health checks hit a configurable path and expect a 200 response:

Health check path: /health
Healthy threshold: 3
Unhealthy threshold: 2
Timeout: 5 seconds
Interval: 30 seconds

An instance must pass three consecutive checks to be marked healthy and only needs to fail two to be marked unhealthy. This asymmetry prevents flapping -- a single slow response does not remove an instance from the pool.

Common Mistakes

Checking too many dependencies. If your health check queries 15 external services, any one of them being slow makes your application appear unhealthy. Only check dependencies that are truly critical for serving requests.

No timeouts on checks. A database check that hangs for 30 seconds blocks your health endpoint and may cause cascading failures as the load balancer marks instances unhealthy.

Returning 200 when unhealthy. If your health check returns a 200 status code with {"status": "error"} in the body, load balancers and simple monitoring tools will think everything is fine. Always use the HTTP status code to signal health: 200 for healthy, 503 for unhealthy.

Exposing sensitive information. Do not include database connection strings, internal IP addresses, or credentials in your health check response. Stick to status indicators and latency numbers.

For more on monitoring approaches and tools, see our articles on what uptime monitoring is and server monitoring tools.

References

Monitor your health check endpoints

Set up automated checks against your health endpoints and get alerted the moment something reports unhealthy.

Try Uptime Monitor