In the information age, it’s the data-driven bird that gets the worm. Giant companies like Google, Facebook, and Apple hoard data, because it’s the information equivalent of gold.
But merely hoarding data isn’t enough. You need to be adept at sifting through, tying together, and making sense of all the data spilling out of your data lakes. Only then can you act on data to make better decisions and build smarter products.
Yet in the crowded and overfunded analytics market, seeing through the stupefying vendor smog can be all but impossible. To help you make sense of the vast and confusing analytics space, I’ve put together a list of my top predictions for the next five years.
With any luck, these predictions will help you steer your organization toward data-driven bliss.
, which can deploy computations to GPUs or CPUs, , and .
Compilers are vastly more flexible than engines because they can take number-crunching recipes and translate them to run in different infrastructures (in-database, on Spark, in a GPU, whatever!). Compilers can also, in theory, generate workflows that run way faster than any interpreted engine.
Even Spark has been acquiring basic compilation facilities, which is a sure sign that compilers are here to stay, and may eventually eclipse legacy pure computational engines.
3. ETL diversifies
Few data acronyms can strike more fear into the hearts of executives than the dreaded “ETL.” Extract-transform-load is the necessary evil by which piles of incomplete, duplicated, unrelated, messy slop is pulled out, cleaned up, and shoved into somewhere the data Vulcans can mind-meld with it.
ETL is the antithesis of modern, agile, and data-driven. ETL means endlessly replicated data, countless delays, and towering expenses. It means not being able to answer the questions that matter when they matter.
In an attempt to make ETL more agile, the industry has developed a variety of alternatives, most heavily funded at the moment by venture capital. These solutions range from high-level ETL tools that make it easier to do ETL into Hadoop or a data warehouse, to streaming ETL solutions, to ETL solutions that leverage machine learning to cross-reference and deduplicate.
Another very interesting class of technology includes tools like Dremio and Xcalar, which reimagine ETL as extract-load-transform (or ELT). In essence, they push transformation to the end and make it lazy, so you don’t have to do any upfront extraction, loading, or transformation.
Historically, ELT has been slow, but these next-generation solutions make ELT fast by dynamically reshaping, indexing, and caching common transformations. This gives you the performance of traditional ETL, with the flexibility of late-stage transformations.
No matter how you slice it, ETL is undergoing dramatic evolution that will make it easier than ever for organizations to rapidly leverage data without time-consuming and costly upfront investments in IT.
4. Data silos open up
The big problems at big organizations don’t really involve fancy analytics. Most companies can’t even add up and count their data. Not because sums and counts are hard, but because data in a modern organization is fragmented and scattered in ten thousand silos.
Thanks to the cloud (including the API revolution and managed data solutions) and recent advances in ETL, it’s becoming easier than ever for organizations to access more of their data in a structured way.
Next-generation data management solutions will play an important role in leveraging these technological advances to make all of an organization’s data analytically accessible to all the right people in a timely fashion.
5. Machine learning gets practical
Machine learning is just past the peak of the hype cycle. Or at least we can hope so. Unnamed tech celebrities who don’t understand how machine learning works continue to rant about doomsday Terminator scenarios, even while consumers can’t stop joking about how terrible Siri is.
Machine learning suffers from a fatal combination of imperfection and inculpability. When machine learning goes wrong (as it often and inevitably does), there’s no one to blame, and no one to learn from the mistake.
That’s an absolute no-no for any kind of mission-critical analytics.
So until we are able to train artificial minds on the entirety of knowledge absorbed by society’s brightest, the magical oracle that can answer any question over the data of a business is very far off. Much farther than five years.
Until then, we are likely to see very focused applications of machine learning. For example, ThoughtSpot’s natural language interface to BI; black-box predictive analytics for structured data sets; and human-assistive technology that lets people see connections between different data sources, correct common errors, and spot anomalies.
These aren’t the superbrains promised in science fiction, but they will make it easier for users to figure out what questions to ask and help guide them toward finding correct answers.
While analytics is a giant market and filled with confusing marketing speak, there are big trends shaping the industry that will dictate where organizations invest.
These trends include the ongoing migration of data intelligence into business applications, the advent of analytic compilers that can deploy workflows to ad hoc infrastructure, the rapidly evolving state of ETL, the increased accessibility of data silos to organizations, and the very pragmatic if unsensational ways that machine learning is improving analytics tools.
These overarching trends for the next five years will ripple into the tools that organizations adopt, the analytic startups that get funded, the acquisitions that existing players make, and the innovation that we see throughout the entire analytic stack, from data warehouse to visual analytics front-ends.
When figuring out what your data architecture and technology stack should look like, choose wisely, because the industry is in the process of reinvention, and few stones will be left unturned.
This article is published as part of the IDG Contributor Network.