Of all of the new tools spawned by the DevOps movement, I find Etsy’s open-source tool, statsd, the most interesting. The enterprise software market is being shaken to its foundation, and statsd is one of the tools providing the vibrations. Instead of relying on the more generic metrics provided by application performance management (APM) vendors, Etsy, and others like them, is delivering highly specific, and highly relevant metrics directly from their code with statsd. With just a few lines of code, developers can measure any part of their application they choose, in the way they choose. This is very similar to the freedom that developers gain with a proper log analysis tool – they can dump any data they want to a log and analyze it later. Freed from the issue of storage, and of the mechanics of log analysis, they can focus on using the data to enhance performance management, troubleshooting, business intelligence etc.
For current users of statsd, the question might be – why would I want to put this in Sumo Logic, as opposed to using a tool like graphite for dashboard purposes? First of all, Sumo Logic provides analytics that supplement basic statsd metrics very well. For example, if you are watching your error count skyrocket and your user performance plummet, your next step will usually be to look for specific applications errors and do root cause analysis, which is a perfect use case for Sumo Logic. Secondly, there is a lot of value of both having the statsd and Sumo Logic metrics in “single pane of glass”, where performance metrics can be viewed alongside more complex analytics. Finally, for current users of Sumo Logic, statsd is a simple way to push application performance data straight into Sumo Logic, without filling up log files or worrying about data volumes.
Background for Statsd
First a little background on StatsD. The basis for the project started at Flickr, and was expanded at Etsy. This is appropriate since John Allspaw and his team helped kick-start the DevOps movement at Flickr, before coming over to Etsy. From the technical perspective statsd is, in their own words:
A network daemon that runs on the Node.js platform and listens for statistics, like counters and timers, sent over UDP and sends aggregates to one or more pluggable backend services.
So, statsd modules forward clear-text metrics over UDP. StatsD supports a few different types of metrics, as well as analytics, but for the sake of simplicity, I will only cover two areas here: Counting and Timing. The counting metric sends the metric name, the amount to increment/decrement, and possibly the sampling interval:
counter.sample:1|c
The timing metric looks very similar, with a metric name and value:
timing.sample:320|ms
Generating the Metrics
To generate the data, I created a simple perl script using the statsd perl module Net::Statsd. I then created a Syslog Source on a Linux Collector over the standard port of 514. The Sumo Logic Syslog Source, essentially a listener for text over UDP, can receive the statsd message just fine. One caveat, though – since the statsd messages do not include a timestamp, Sumo Logic will assign the ingest time as the timestamp. This means that is essential that you set the timezone setting correctly. I tested this with thousands of events, and there were no issues. To make some interesting, and relevant, metrics I added extra logic to my perl script to create some patterns with the rand() function and some math:
use Net::Statsd;
# Configure where to send events
# That’s where your statsd daemon is listening.
$Net::Statsd::HOST = ‘localhost’; # Default
$Net::Statsd::PORT = 514; # Default
# Initial Values
$basepercent = 0.50;
$webTime = 50;
$appTime = 100;
$dbTime = 150;
$basecount = 5;
# Infinite loop
while(1) {
$basepercent = ($basepercent + (rand(100) + 50)/100)/2;
$webTime = $basepercent*($webTime + 50 + rand(750))/2;
$appTime = $basepercent*($appTime + 100 + rand(1000))/2;
$dbTime = $basepercent*($dbTime + 150 + rand(1200))/2;
Net::Statsd::timing(‘web.time’,$webTime);
Net::Statsd::timing(‘app.time’,$appTime);
Net::Statsd::timing(‘db.time’,$dbTime);
$k = 0;
$basecount = $basepercent*($basecount + rand(5))/2;
while($k < $basecount)
{
Net::Statsd::increment(‘site.logins’);
$k++;
}
sleep(5 + rand(10))
}
Making sense of the Metrics
Once the metrics were successfully being ingested into Sumo Logic, I needed to create some useful searches and Dashboard Monitors. With the statsd counter function, I simply wanted to extract the data, drop it into 1m buckets, and sum up the number of increments to the counter over each minute. The key-value structure of a statsd message can be easily parsed with our keyvalue operator. Basically, I just told Sumo Logic to look for a lower case key name with “.” in it [a-z\.]+ and a numerical value \d+. I only searched for “site.logins”, but you could use the statement to look for any number of different counters in the same dashboard.
_sourceCategory=*statsd*
| keyvalue regex “([a-z\.]+?):(\d+?)\|c” “site.logins” as logins
| timeslice by 1m
| sum(logins) by _timeslice
With the timing metrics, an average over each minute seems most relevant (though other functions like max, min, or standard deviations could be useful here). I pulled out all three timings together, by looking for key that looks like *.time – ?<tier>[a-z]+).time . Since I named my metrics web.time, app.time, and db.time, I was able to put each of the “tier” metrics on the same graph.
_sourceCategory=*statsd* AND time
| parse regex “(?<tier>[a-z]+).time:(?<test_time>\d+)\|ms”
| timeslice by 1m
| avg(test_time) by _timeslice, tier
| transpose row _timeslice column tier
As I ran each of these searches, I clicked the “Add to Dashboard” button on the far right to add them a newly created StatsD dashboard. I included a screenshot below (the tier metrics are on the left, and the counter is on the right):
Wrapping Up
You can see from this example how easy it is to analyze data in the statsd format. Once the data is in Sumo Logic, the sky is the limit to what you can do with it. There are other metrics and backend functions that Sumo Logic can support over the long term, but this simple integration provides the majority of functionality needed. Let us know you think, and sign up for a free account to try it out yourself.