Sage Advice US

For some businesses large and small, less data is more​

If the tech press is to be believed, the rules of data are simple: First, when in doubt, collect all of it, all the time, the more the better. Second, use the most sophisticated bleeding-edge deep learning algorithms to analyze it all. Third, leverage AI for, uh, something or other at some point.

The industry-wide attitude toward data has instilled an urgent, keeping-up-with-the-Joneses mentality in business owners and executives, but the reality is that not everyone benefits from keeping pace with the data-obsessed Joneses.

Matt Dancho, founder and CEO of the data consultancy Business Science, calls the pressure that businesses often feel to amass vast warehouses of data and run them through deep learning algorithms “two of the biggest misconceptions out there,” and slavishly bowing to that pressure can pile up cost and confusion in equal measure.

Companies in the digital marketing space (like Facebook) need big data to craft marketing campaigns with pinpoint accuracy. Financial firms can collect thousands of data points per second, and they should probably stockpile and analyze that data in order to accurately predict market behaviors. And for biomedical companies investigating linkages between genes and disease, Olympian datasets are inescapable.

But for the most part, the rest of us are probably okay.

By way of example, take a recent Business Science client that makes high-end wallets and cases for tech products. For the past five years, it saw nonstop growth, with its website notching upwards of 40,000 hits a day. It warehoused tons of Google Analytics and AdWords data along the way, but what to do with it? “To have that level of data just makes it really difficult to handle,” says Dancho.

The problem is simple: The larger datasets become, the more machines are required to actually process them. Definitions of “Big Data” vary, but a common one is data that needs more than one computer to actually hold it all. At that size, companies need complex storage and retrieval mechanisms to handle data across multiple machines, which can be as challenging as it is expensive.

The solution for the wallet-manufacturing client holds true for most companies, even those with the potential for massive datasets: Don’t hold it all. Rather, the client took a sample of the data collected. A few million rows of data is often more than enough, and usually it can be fit into a few gigabytes rather than spread out across numerous computers.

Once a useful sampling of data is collected, businesses don’t need to go all in on deep-learning algorithms, which, as it turns out, are not all that useful for most companies anyway.

“Firms will probably need to invest in deep learning if they’re interested in accuracy,” says Dancho, adding that the models are important for companies developing highly predictive projections. “But if you sit down and think about the job, you’re not always trying to predict something insanely accurately. You’re just trying to understand the data you have.”

All of which, Dancho says, leads to a few important pieces of advice: When tempted to load up on data, take a breath and take a sample; when considering if deep learning is right for you, the answer is more often than not “no”; and beware of consultants bearing shiny tools. “We hear from clients time and time again that they don’t like working with consultants because they’re trying to sell them a $50,000 software package,” says Dancho.

Not trying to precisely predict something with the complexity of climate patterns? Save your money.