What is “anomaly detection”?
Here is how the peeps on the interweb and wikipedia define it: Anomaly detection (also known as outlier detection) is the search for events which do not conform to an expected pattern. The detected patterns are called anomalies and often translate to critical and actionable insights that, depending on the application domain, are referred to as outliers, changes, deviations, surprises, intrusions, etc.
The domain: Machine Data
Machine data (most frequently referred to as log data) is generated by applications, servers , infrastructure, mobile devices, web servers, and more. It is the data generated by machines in order to communicate to humans or other machines exactly what they are doing (e.g. activity), what the status of that activity is (e.g. errors, security issues, performance), and results of their activity (e.g. business metrics).
The problem of unknown unknowns
Most problems with analyzing machine data orbit around the fact that existing operational analytics technologies enable users to find only those things they know to look for. I repeat, only things they KNOW they need to look for. Nothing in these technologies helps users proactively discover events they don’t anticipate getting, events that have not occurred before, events that may have occurred before but are not understood, or complex events that are not easy or even possible to encode into queries and searches.
Our infrastructure and applications are desperately, and constantly, trying to tell us what’s going on through the massive real-time stream of data they relentlessly throw our way. And instead of listening, we ask a limited set of questions from some playbook. This is as effective as a patient seeking advice about massive chest pain from a doctor who, instead of listening, runs through a checklist containing skin rash, fever, and runny nose, and then sends the patient home with a clean bill of health.
This is not a good place to be; these previously unknown events hurt us by repeatedly causing downtime, performance degradations, poor user experience, security breaches, compliance violations, and more. Existing monitoring tools would be sufficient if we lived in static, three system environments where we can enumerate all possible failure conditions and attack vectors. But we don’t.
We operate in environments where we have thousands of sources across servers, networks, and applications and the amount of data they generate is growing exponentially. They come from a variety of vendors, run a variety of versions, are geographically distributed, and on top of that, they are constantly updated, upgraded, and replaced. How can we then rely on hard-coded rules and queries and known condition tools to ensure our applications and infrastructure is healthy and secure? We can’t – it is a fairy tale.
We believe that three major things are required in order to solve the problem of unknown unknowns at a multi-terabyte scale:
-
Cloud: enables an elastic compute at the massive scale needed to analyze this scale of data in real-time across all vectors
-
Big Data technologies: enable a holistic approach to analyzing all data without being bound to schemas, volumes, or batch analytics
-
Machine learning engine: advanced algorithms that analyze and learn from data as well as humans in order to get smarter over time
Sumo Logic Real-Time Anomaly Detection
Today we have announced Beta access to our Anomaly Detection engine, an engine that uses thousands of machines in the cloud and continuously and in real-time analyzes ALL of your data to proactively detect important changes and events in your infrastructure. It does this without requiring users to configure or tune the engine, to write queries or rules, to set thresholds, or to write and apply data parsers. As it detects changes and events, it bubbles them up to the users for investigation, to add knowledge, classify events, and to apply relevance and severity. It is in fact this combination of a powerful machine learning algorithm and human expert knowledge that is the real power of our Anomaly Detection engine.
So, in essence, Sumo Logic Anomaly Detection continuously turns unknown events into known events. And that’s what we want: to make events known, because we know how to handle and what to do with known events. We can alert on them, we can create playbooks and remediation steps, we can prevent them, we can anticipate their impact, and, at least in some cases, we can make them someone else’s problem.
In conclusion
Sumo Logic Anomaly Detection has been more than three years in the making. During that time, it has had the energy of the whole company and our backers behind it. Sumo Logic was founded with the belief that this capability is transformational in the face of exponential data growth and infrastructure sprawl. We developed architecture and adopted a business model that enable us to implement an analytics engine that can solve the most complex problems of the Big Data decade.
We look forward to learning from the experience of our Beta customers and soon from all of our customers about how to continue to grow this game changing capability. Read more here and join us.