In the good old days of monolithic services, basic load testing was relatively straightforward. You’d start your service on a production-like server, maybe a database. Then, you’d point a load generation tool at it and measure how much load the service could push through. Unfortunately, things are a bit more complicated in microservices architectures, especially in Amazon Web Services (AWS), due to issues like random-assigned IP addresses, security groups, etc.
For starters, if the service you’re trying to load test isn’t a leaf node in your architecture, it’ll depend on other services. Those, in turn, will depend on more services. Besides, even once you manage to boot up all the services needed to run the test, you still have the challenge of deploying, starting and managing your load testing tool. Especially once you need a load test tool that runs on more than one host, you’re having to work pretty hard.
Distributed load testing setups require the following parts:
- Remote Control: Manage the fleet of load generators without touching every host.
- Load Generator: Create the load on the microservice.
- Measurements: Measure the performance of the tested microservice and summarize it.
I recently faced this problem when I wanted to load test one of our microservices on short notice. Through sheer luck, I came up with a simple solution that turned out to solve all these problems.
Meet sleeper cells
It struck me that we already have all the tooling to deploy, manage and monitor load generators. We just didn’t call them that — we called them clients. In other words: In a deployment of Sumo Logic, we had clusters of clients to the microservice. We just didn’t have a way to get them to generate synthetic load.
Instead of hacking it and dealing with one-off throwaway builds, I decided to turn those clients into load generators. Every client would now include a sleeper agent, ready to spring into action.
Here is the base class for the sleeper cells:
package com.sumologic.util.scala.bench import java.util.concurrent.atomic.{AtomicInteger, AtomicLong} import com.netflix.config.scala.DynamicProperties import com.sumologic.util.scala.env.Environment import com.sumologic.util.scala.log.Logging import com.sumologic.util.scala.rateLimiter.FixedRateLimiter import com.sumologic.util.scala.time.{TimeConstants, TimeFormats, TimeSource} import scala.util.control.NonFatal abstract class SleeperCell(name: String, assemblyName: String) extends DynamicProperties with Logging with TimeSource with TimeConstants { // API to implement by subclasses. protected def makeRequest(): Unit protected def logStats(): Unit protected def resetStats(): Unit // Remote control. private val configUpdateCallback = new Runnable() { override def run(): Unit = checkForConfigurationUpdate() } private val activatedAssemblies = dynamicStringListProperty(s"sleeper.cell.$name.assemblies", List[String]()) activatedAssemblies.addCallback(configUpdateCallback) protected val requestsPerSecond = dynamicIntProperty(s"sleeper.cell.$name.rate", Int.MaxValue) requestsPerSecond.addCallback(configUpdateCallback) protected val agentThreads = dynamicIntProperty(s"sleeper.cell.$name.agents", 64) agentThreads.addCallback(configUpdateCallback) // Stats. protected val lastLog = new AtomicLong(now) protected val requestCount = new AtomicInteger(0) protected val failedRequestCount = new AtomicInteger(0) // State. private var activeAgents: Seq[SleeperAgent] = Seq.empty[SleeperAgent] checkForConfigurationUpdate() prefix(s"$name sleeper cell") info("Initialized and awaiting instructions.") private def checkForConfigurationUpdate() { this synchronized { val cellActivated = !Environment().isProd && activatedAssemblies.get().contains(assemblyName) if (cellActivated && activeAgents.isEmpty) { activateCell() } else if (!cellActivated && activeAgents.nonEmpty) { goToSleep() } else if (cellActivated && activeAgents.size != agentThreads.get()) { info(s"Agent count changed from ${activeAgents.size} to ${agentThreads.get()} - restarting.") goToSleep() activateCell() } } } private def activateCell() { info(s"We have been activated. Activating ${agentThreads.get()} agents.") activeAgents = (1 to agentThreads.get()).map(new SleeperAgent(_)) activeAgents.foreach(_.start()) } private def goToSleep() { info(s"We have been told to go back to sleep. Shutting down ${activeAgents.size} agents.") activeAgents.foreach(_.keepRunning = false) activeAgents.foreach(_.join()) requestCount.set(0) failedRequestCount.set(0) resetStats() } private class SleeperAgent(id: Int) extends Thread(s"Sleeper-Agent-$name-$id") with TimeConstants with TimeFormats { var keepRunning = true val rateLimiter = new FixedRateLimiter(requestsPerSecond.get(), 1.second) override def run() { while (keepRunning) { while (!rateLimiter.isActionAllowed) { Thread.sleep(50) } try { rateLimiter.recordAction() requestCount.incrementAndGet() makeRequest() } catch { case NonFatal(e) => failedRequestCount.incrementAndGet() } def timeToLogStats: Boolean = (now - lastLog.get()) > 15.seconds if (timeToLogStats) { lastLog synchronized { if (timeToLogStats) { logStats() lastLog.set(now) } } } } } } }
Remote Control
Under normal circumstances, the sleeper agents simply watch out for a particular property in Archaius. If that property is set, the sleeper agents wake up and start attacking the target microservice with requests. For safety, the code includes a check to prevent it from being activated in production. A different configuration property controls the amount of load generated.
Load Generator
Sleeper agents are threads that call a custom makeRequest() function at a pre-set rate limit. Each cell contains a configurable number of agent threads. The number of threads can be changed at runtime (again, via the remote control).
Measurements
Each of the sleeper agents logs a set of measurements every 15 seconds into the logs of their host, which already we already collect into Sumo Logic. Based on the logs, we can aggregate and determined how our target behaved. Bonus tip: Log the settings of the load generator alongside the results, so you don’t need to track those externally.
2015-06-11 18:29:17,026 -0700 INFO [logger=scala.config.util.ConfigClientSleeperCell] [settings: 64 threads, 10000 requests/s] 5979 requests sent in 15s at 398 requests/sec. 5696 requests failed. (loadById: 4989, loadByUri: 5687, findByUriPattern: 119, failing loadById: 163)
Conclusion
This Sleeper Agent pattern was a quick and easy way to get a load test going. We’ve since replicated this a number of times, and all of our environments contains several sleeper cells.
The post Using Sleeper Cells to Load Test Microservices appeared first on Sumo Logic.