The Democratization of Data Science

Want to catch tax cheats? The government of Rwanda does — and it’s finding them by studying anomalies in revenue-collection data.

Want to understand how American culture is changing? So does a budding sociologist in Indiana. He’s using data science to find patterns in the massive amounts of text people use each day to express their worldviews — patterns that no individual reader would be able to recognize.

Intelligent people find new uses for data science every day. Still, despite the explosion of interest in the data collected by just about every sector of American business — from financial companies and health care firms to management consultancies and the government — many organizations continue to relegate data-science knowledge to a small number of employees.

That’s a mistake — and in the long run, it’s unsustainable. Think of it this way: Very few companies expect only professional writers to know how to write. So why ask only professional data scientists to understand and analyze data, at least at a basic level?

Relegating all data knowledge to a handful of people within a company is problematic on many levels. Data scientists find it frustrating because it’s hard for them to communicate their findings to colleagues who lack basic data literacy. Business stakeholders are unhappy because data requests take too long to fulfill and often fail to answer the original questions. In some cases, that’s because the questioner failed to explain the question properly to the data scientist.

Why would non–data scientists need to learn data science? That’s like asking why non-accountants should be expected to stay within budget.

These days every industry is drenched in data, and the organizations that succeed are those that most quickly make sense of their data in order to adapt to what’s coming. The best way to enable fast discovery and deeper insights is to disperse data science expertise across an organization.

Companies that want to compete in the age of data need to do three things: share data tools, spread data skills, and spread data responsibility.

Sharing Tools

Most data tools sit with the data science team. While this may seem logical, creating a silo of data tools and restricting access to a narrow group of employees places too heavy a burden on those employees. Most inquiries from other departments — engineering, finance, product, marketing — are relatively simple requests that anyone with basic training could fulfill. By saddling data scientists with basic gatekeeping tasks, organizations divert their attention from the larger projects that require their deep expertise.

Airbnb, a huge believer in the democratization of data science, strives to empower every team member to make data-driven decisions. To ensure that, the company created its own Data University.

Collaborative tools help too. At Airbnb, anyone can post an article to a Knowledge Repository. The rest of the company sees new analyses in a news feed, letting them know (1) which new problem has just been solved and (2) who solved it, so anyone with further questions can know whom to call. In addition to helping the whole company become more effective, such articles give recognition to the people who post them — which incentivizes others to do the same.

Of course, when you share data tools, you also need to enable people to use those tools. Not every company can create its own Data University. Depending on the data tools your organization uses, though, a variety of educational programs, both online and in person, can get your team up to speed. (Of course, I’m biased: I cofounded one of them.)

As your team gains the opportunity to learn those skills, they’ll feel more comfortable bringing data to bear on every important decision. It will become clear that some team members are more comfortable using data skills than others are. Encourage the proficient ones to mentor the others. Even at our company, where data science is our business, some people don’t work with data all the time. When they need help on a knotty problem, they pair up with those who do.

A data-literate team makes better requests. Even a basic understanding of tools and resources greatly improves the quality of interaction among colleagues. When the “effort level” — the amount of back-and-forth needed to clarify what is wanted — of each request goes down, speed and quality go up.

Shared skills improve workplace culture and results in another way, too: They improve mutual understanding. If you know how hard it will be to get a particular data output, you’ll adjust the way you interact with the people in charge of giving you that output. Such adjustments improve the workplace for everyone.

Sharing Responsibility

Once an organization is delivering the access and education needed to democratize data among its employees, it may be time to adjust roles and responsibilities. At a minimum, teams should be able to access and understand the data sets most relevant to their own functions. But by equipping more team members with basic coding skills, organizations can also expect non–data science teams to apply this knowledge to departmental problem solving — leading to greatly improved outcomes.

If your workforce is data-literate, for example, your centralized data team can shift its focus from “doing everyone else’s data work” to “building the tools that enable everyone to do their data work faster.” Our own data team doesn’t run analyses every day. Instead, it builds new tools that everyone can use so that 50 projects can move forward as quickly as one project moved before.

Data science isn’t just for data scientists anymore, if it ever was. Smart companies today ensure that many of their employees can speak the language of data and use it to improve work outcomes. By empowering employees with these fundamental skills, companies are realizing tremendous levels of innovation and efficiency.

Source: hbr