In the past twelve months the phrase “real time analytics” has gone from being largely un-recognised to representing a host of big data analytics use cases that are not addressed by long-running, complex algorithms processing large data sets at rest in a distributed data store. But in the world of big data we seem determined to use ambiguous terminology and “real time analytics” is no different.
Many people have used the term to mean tooling to run analytics queries very quickly – in real time. So the query would run in real time, regardless of how old the data was. And if that data had travelled via a number of batch processes from operational systems into ETL systems then to a data warehouse and finally into the in-memory cache of a BI tool, then the data was certainly not real time. In this meaning “real time” means I don’t have time to get coffee before the results come back, but I probably have time to get coffee before I act on the results. After all, if it took a day or two to get the data to me, ten minutes to get my coffee isn’t going to make much difference.
But while I was drafting this I thought I’d check on how people are using the term now, and I was gratified to find that pretty much all the top ten search hits referred to content that use the term” real time analytics” to mean analytics on real time data. Increasingly, it seems, there is a wide recognition that low latency, from the origin of the data (the event that generates it) to that data being reflected in queries, is critical. If you think about it most, if not all, industries are going to have some KPIs that need monitoring in real time – minute-by-minute, if not second by second. It is a by-product of the internet of things and also of the increased connectedness of people that huge volumes of event data are now being generated. And there is value in that data. For example web clicks can tell marketers how their A/B testing is going right now, so they can optimise campaigns more surely and more swiftly. Infrastructure managers can not only see when critical service levels are breached, they can watch the trends emerging, compare them with historical trends and anticipate issues to pre-empt problems.
Batch analytics may have been the first wave of big data, but low latency, real time analytics is emerging as the second.
VP Product Marketing
Dai Clegg will be speaking at UNICOM’s Big Data Event on 5 December in London. For more information and to book your place please click here.