Prometheus for slow stats

An article, posted about 6 years ago filed in development, engineering, cluster, management, devops, rails, ruby on rails, ruby, logging & monitoring.

Prometheus is a statistics collecting tool that originated from SoundCloud. Designed to be used in high performance environments, it is build to be blazingly fast. Hence, the client typically is expected to be blazingly fast as well, gathering and presenting data within nanoseconds. For Ruby on Rails applications however this has lead to an unresolved issue with the Prometheus ruby-client when the same application is forked (typical for Puma, Passenger and other popular ruby-servers). The Prometheus client collects data within its own fork before serving it to the exporter endpoint. This can or cannot be a problem. When you measuring response times, running averages from a random fork may be good enough. However, when you’re also counting data over time you’re having separate counters in every fork. The solution?

Either use something like prometheus_exporter by Sam Affron (from Discourse) or the mmap-fork of the Prometheus ruby-client made by Gitlab, or wait for the GoCardless people to finish their PR that adds, among other things, a file based backend as a data store.

Or … if you have ‘slow’ metrics, like counts of the number of registrations where nano-seconds don’t matter, you could also just make them available through a custom controller that outputs Prometheus compatible data. Prometheus is simply expecting a text response. This is the route I went with. Prometheus expects little more than something like the example below, which can be outputted as plain text on a random endpoint (the endpoint has to be registered with the Prometheus scraper):

# TYPE var_name counter
# HELP var_name Human description of the var_name 
var_name{country_code=""} 142
var_name{country_code="nl"} 121
# TYPE other_var_name counter
# HELP other_var_name Human description of the other_var_name 
other_var_name 2122

I build a few helper classes, which I sadly cannot release due to a copyright clause in the contract, but most of the time went figuring out whether and how this strategy could actually work. When the plan was ready, the actual coding was just a couple of hours.

It may not be what Prometheus was intended for, it may not scale infinitely, but when you’re already using Prometheus, and you’d like to gather a few more stats and timing is not an issue, this might be a worthwhile and most importantly simple route to consider.

Op de hoogte blijven?

Collectie

Gerelateerd