I’m glad you ask, I just might. In fact, we started collecting data about machine data some 9 months ago when we participated at the AWS Big Data conference in Boston. Since then we continued collecting the same data at a variety of industry show and conferences such as VMworld, AWS re: Invent, Velocity, Gluecon, Cloud Slam, Defrag, DataWeek, and others.
The original survey was printed on my home printer, 4 surveys per page, then inexpertly cut with the kitchen scissors the night before the conference – startup style, oh yeah! The new versions made it onto a shiny new iPad as an IOS App. The improved method, Apple caché, and a wider reach gave us more than 300 data points and, incidentally, cost us more than 300 Sumo Logic T-Shirts which we were more than happy to give up in exchange for data. (btw, if you want one come to one of our events, next one coming up will be the Strata Conference).
As a data junkie, I’ve been slicing and dicing the responses and thought that end of our fiscal year could be the right moment to revisit it and reflect on my first blog post on this data set.
Here is what we asked:
- Which business problems do you solve by using machine data?
- Which tools do you use to analyze machine data in order to solve those business problems?
- What issues do you experience solving those problems with the chosen tools?
The survey was partially designed to help us to better understand the Sumo Logic’s segment of IT Operations Management or IT Management markets as defined by Gartner, Forrester, and other analysts. I think that the sample set is relatively representative. Responders come from shows with varied audiences such as developers at Velocity and GlueCon, data center operators at VMworld, and folks investigating a move to the cloud at AWS re: Invent and Cloud Slam. Answers were actually pretty consistent across the different “cohorts”. We have a statistically significant number of responses, and finally, they were not our customers or direct prospects. So let’s dive in and see what we’ve got and let’s start at the top:
Which business problems do you solve by using logs and other machine data?
- Applications management, monitoring, and troubleshooting (46%)
- IT operations management, monitoring, and troubleshooting (33%)
- Security management, monitoring, and alerting (21%)
Does anything in there surprise? I guess it depends on what your point of reference is. Let me compare it to the overall “IT Management” or “IT Operations Management” market. The consensus(if such a thing exists) is that size by segment is:
- IT Infrastructure (servers, networks, etc) is up to 50-60% of the total market
- Application (internal, external, etc.) is just north of 30-40%
- Security is around 10%
Source: Sumo Logic analysis of aggregated data from various industry analysts who cover IT Management space.
There are a few things that could explain the big difference between how much our subsegment leans more toward Applications vs. IT infrastructure.
- (hypothesis #1) analysts measure total product sold to derive the market size which might not be the same as effort people apply to these use cases.
- (hypothesis #2) there is more shelfware in IT Infrastructure which overrepresented effort.
- (hypothesis #3) there are more home-grown solutions in Application management which underrepresents effort.
- (hypothesis #4) our data is an indicator or a result of a shift in the market (e.g., when enterprises shift toward the IaaS, they spend less time managing IT Infrastructure and shift more toward the core competency, their applications).
- (obnoxious hypothesis #5) intuitively, it’s the software stupid – nobody buys hardware because they love it, it exists to run software (applications), and we care more about applications, and that’s why it is so.
OK, ok, let’s check the data to see which hypothesis can our narrow response set help test/validate. I don’t think our data can help us validate hypothesis #1 or hypothesis #2. I’ll try to come up with additional survey questions that will, in the future, help test these two hypotheses.
Hypothesis #3 on the other hand might be partially testable. If we compare responses from users who use commercial vs. who use home-grown, we are left with the following:
Not a significant difference between responders who use commercial vs. responders who use home grown tools. Hypothesis #3 explains only a couple of percentage points of difference.
Hypothesis #4 – I think we can use a proxy to test it. Let’s assume that responders from VMworld are focused on internal data center and the private cloud. In this case they would not be relying as much on IaaS providers for IT Infrastructure Operations. On the other hand, let’s also assume that AWS, and other cloud conference attendees are more likely to rely on IaaS for IT Infrastructure Operations. Data please:
Interesting, seems to explain some shift between security and infrastructure, but not applications. So, we’re left with:
- hypothesis #1 – spend vs. reported effort is skewed – perhaps
- hypothesis #2 – there is more shelfware in IT infrastructure – unlikely
- obnoxious hypothesis #5 – it’s the software stupid – getting warmer
That should do it for one blog post. I’ve barely scratched the surface by stopping with the responses to the first question. I will work to see if I can test the outstanding hypotheses and, if successful, will write about the findings. I will also follow-up with another post looking at the rest of the data. I welcome your comments and thoughts.
While you’re at it, try Sumo Logic for free.