Health checks with ASP.NET Core and Kubernetes
Health checks are a fundamental part of our APIs. I guess they fall in that category of “non-functional-but-heavily-required” things. More or less like a good part of the infrastructure code.
They don’t add business value per se but have an enormous impact for IT people. More or less like DDD and Design Patterns. You can normally see them in conjunction with container orchestration or monitoring tools to ensure that the system is alive and kicking.
There are mainly two categories of health checks: readiness and liveness.
Readiness health checks perform an in-depth check of all the application dependencies, such as databases, external services and so on and so forth. The system is booting, alive but not yet ready to serve incoming requests.
Liveness health checks are instead used to signal that the application is ready to serve traffic. They should execute fairly quickly and serve as an immediate probe to ensure everything is fine.
The idea is to first run the readiness checks. If they pass, rely only on the liveness ones for a specific amount of time.
A successful health check should return a 200 HTTP status and a basic report, especially for the readiness ones.
Setting up checks in an ASP.NET Core project is fairly easy. Just add a call to services.AddHealthChecks()
in the ConfigureServices()
method of our Startup.cs
.
On GitHub there are few interesting repositories that add some nice extension methods. AspNetCore.Diagnostics.HealthChecks is one of the most famous, exposing checks for a wide range of systems like SQL Server, MySql, Oracle, Kafka, Redis, and many others.
Once you’ve registered the checks on the DI Container, the next step is to expose the endpoint:
public void Configure(IApplicationBuilder app)
{
app.UseEndpoints(endpoints =>
{
endpoints.MapHealthChecks("/ops/health");
});
}
This is the simplest example possible, however, the MapHealthChecks() methods give us also the possibility to customize the output by specifying a Response Writer:
Based on the checks you’ve added, this should return something like this:
{
"status": "Healthy",
"results": {
"db": {
"status": "Healthy",
"description": null,
"data": {}
}
}
}
Now, I mentioned “container orchestration” at the beginning of this article. Nowadays this tends to rhyme with Kubernetes, which has its own configuration for health checks. In your configuration.yml
file you can specify both liveness and readiness:
readinessProbe:
httpGet:
path: /health/readiness
port: 80
initialDelaySeconds: 10
timeoutSeconds: 30
periodSeconds: 60
successThreshold: 1
failureThreshold: 5
livenessProbe:
httpGet:
path: /health/liveness
port: 80
initialDelaySeconds: 10
timeoutSeconds: 5
periodSeconds: 15
successThreshold: 1
failureThreshold: 3
Few things to note here. First of all, the endpoints are different. As we discussed previously, we can (and should) split our checks in order to let the liveness ones to run as quickly as possible.
This can be accomplished for example by simply skipping all the checks and return a 200 right away:
endpoints.MapHealthChecks("/health/readiness", healthCheckOptions);
endpoints.MapHealthChecks("/health/liveness", new HealthCheckOptions(){
Predicate = (_) => false
});
That Predicate
allows filtering the checks based on various conditions like name or tags. Yes, those are a thing and can be specified. More details here.
Going back to our k8s config, another thing worth mentioning is the different settings used for the checks. For example, timeoutSeconds
is higher when probing for readiness as we are making sure that all our dependencies are alive. Same thing for periodSeconds
: we want liveness checks to be executed more often.
Moreover, don’t forget that if the failureThreshold
is surpassed for liveness, the Pod will be killed. Failing readiness will cause the pod to be marked as Unhealthy instead, and not receive traffic anymore.