Two thought leaders in the field of data analysis analyzed and explored the role of coronavirus, dark data, and data scientists, and explained why data analysis remains so difficult.
In an extensive conversation with two of the top data analytics thought leaders, the industry media has raised some key questions about today’s data analytics.
The following topics include:
Given the ongoing spread of the coronavirus, how do you view the work of data analysis departments or data analysis departments under the current epidemic impact?
What are the differences in roles and key strengths between business intelligence experts and data scientists?
Why is “dark data” important?
What should be an effective strategy for dark data?
Many executives say their companies face difficulties in analyzing data.
Why is data analysis still so difficult?
To provide insight into data analysis, industry media talked to Bill Schmarzo, chief innovation officer of Hitachi Vantara, and Andi Mann, chief technology advocate of Splunk.
How do you see the ongoing epidemic affecting the data analysis industry and data analysis practice?
“The data analysis is interesting,” Mann said. “It is now necessary to gain more insight from the data analysis.
One way to analyse the data is to try to understand where resources can be allocated more efficiently during an economic downturn.
It’s really important that employees in many companies work remotely from home without actually disrupting business.
Sectors such as retail, online services, digital services and marketing services have been affected by the epidemic in different ways.
A better approach they can take is to use data analytics for targeted marketing and targeted customer engagement.
Of course, for nonprofits and government agencies, data can be used to provide services to those most in need in a downturn, such as the unemployed or homeless.
Therefore, you can use data analysis to identify targets.
Splunk, for example, is providing data sets and analytics to public services.
We’re working with universities to try to track transmission, and we’re working with businesses and governments to try to track coronavirus and other things.
Therefore, data analysis can not only help study the virulence and transmission mechanism of coronavirus, but also help people fight against coronavirus.
Because Splunk is a data analytics platform, instead of creating the data ourselves, we take it from other sources and provide it to state and federal agencies so they can analyze the data set using Splunk.
It’s really powerful.”
“The fact that data analysis can be used not only to combat the spread of coronavirus, but also to analyze what happens after the outbreak is over, is actually very important,” Schmarzo said.
Given the incalculable costs that countries around the world have incurred in responding to the epidemic, we must repay at some point.
So I think we have to use data analytics to do more with less.
We’re going to have to look at marketing activities and treatment activities very micro.
Everything will become highly personal.
Take health care, for example.
Many government departments now make comprehensive policy decisions on health care and overall welfare.
Many organizations there is too much waste in this aspect, so you need to fundamentally get more income, or the idea of “do more with less” become more granular, it will be a good thing for the analysis of the industry, because we are very skilled at using very detailed analysis of the data and the digital trend, to really understand each customer, the distinctive differences between teachers, students, and equipment.
So I think most organizations have to have a mindset that says, ‘do more with less,’ because that’s the only way an organization can change its economic value curve in the face of severe profit pressure, you can raise taxes significantly, and there’s no such thing as a free lunch.”
Have you ever heard of how to do an analysis in these difficult times?
“Pharmaceutical companies are definitely operating 24/7,” Schmarzo said.
I sat on a panel last week with machine learning engineers from drugmaker GlaxosmithKline, who said they were working on drugs and vaccines against coronavirus.
It is a tragedy that we, as data professionals, do not know much about the epidemic.
We didn’t do enough testing, and sometimes we didn’t even have confidence in the results.
What is happening now is a classic example of not doing data science.
When some people make predictions and inferences from only a small amount of data collected, this is overly optimistic or overly negative in a way, people just don’t apply good data science rigor to these issues.
People can be thoughtful about even small data sets, but the constraints and assumptions of these data sets must be clearly stated.
Small data sets are not random samples, and no analysis is performed.
Some people just take a few Numbers and extrapolate to some extreme.
In many cases, it’s just because of their own personal agenda.”
“I talk to a lot of clients, and their data scientists are working, but in health care, there are a lot of people who have been working on Numbers for a long time, just trying to figure out how to deal with and contain the virus, and there are a lot of people trying to figure out how the virus spreads,” Mann said.
So I see people in finance using data analytics to understand the business.
So using data science to measure their business metrics, as I said earlier, trying to try and understand where to put resources.
In addition, I see another area of digital processing is the insurance industry, which needs to make insurance claims.
The insurance industry will face many challenges, so they have done a lot of actuarial math and are applying data science to their actuarial practice.
There are a lot of gaps in the effectiveness of using data analytics, and I don’t think some people realize that.”
What are the differences in roles and key strengths between business intelligence experts and data scientists?
Both business intelligence specialists and data scientists are important, Schmarzo says.
If there is no report to tell what is happening, there is no idea where to focus resource and data science efforts, so they are very complementary.
This infographic is likely to cost vendors in the business intelligence space more than anything else, because the misconception is that data science is BI 3.0.
The two are very different, and business intelligence experts do strive to clearly communicate the metrics and key performance indicators (KPIs) that organizations use to measure progress and success.
However, data scientists are trying to identify which variables and indicators might be better predictors of performance.
This is an exploratory strong route, will take the failure as the center, need to constantly trying, constantly failure, continuous learning, people can’t in the progress of the science data to measure how much time, if they understand the costs of false positive and false negative, so in fact can only measure yourself in model in terms of efficiency, so it is, in effect, two different worlds.
And neither is better than the other.
In data science, all of this has focused on really understanding the assumptions that you’re trying to prove, for example, what the metrics of success and progress are, the business entities, the stakeholders, and all those metrics are very different.”
“It’s interesting to talk about the difference,” Maguire said. “Obviously, I think when companies are choosing resumes for business intelligence experts and data scientists, a lot of people are likely to choose data scientists because it sounds good.
And I think it’s also interesting to be a failure-centric data science expert, which may actually be real learning.
Perhaps some corporate executives are saying: “Why are we paying this much for this failure-centric specialist?”
“If you don’t fail enough, that means you’re not trying enough, that means you’re not trying hard enough,” Schmarzo says.
Failure is an effective way to learn.
In business intelligence, such failures will not be accepted if the architecture is not built to work properly.
Keep experimenting with combinations, transformations, and extensions of different data and data elements, trying to figure out which of these variables and combinations actually provides better predictions.”
“Business intelligence and data science are two very different kinds of science,” Mann says.
They are very much a science.
Business intelligence grows with the accumulation of knowledge, which is actually very important for how an enterprise conducts its business.
There are some very big differences between the two sciences.
Data science is about the innovation process. For example, data science talks about innovation as a result of learning from failure.
In my opinion, if there is no failure, then there is no learning. More data and understanding can be obtained by trying. More questions should be asked instead of more answers.
So data scientists seem to be asking a lot of questions, and users are asking even more questions about the data.
Every answer the user gets is just an opportunity to ask more questions.
So that’s another way of thinking about it.
I think it’s a different way of thinking about bringing data from any source to any question rather than trying to find an answer.
So there really is a fundamental difference in the way data scientists think about innovation opportunities.
Think of the data as never having a final answer and always asking more questions.
And business intelligence experts look for answers because their business needs to be done, and that’s what they need to do.
Therefore, this innovative idea has nothing to do with running a business.
This is one of the biggest differences I’ve seen, and it excites things like pre-deployment, well-planned, and adding data sources on demand.
Because in business intelligence, you know what questions to ask, you know that you intend to plan the data set through data science.
Therefore, you need to be able to introduce new data sets and continuously enrich them in operation.
Some of the problems encountered do lock the concept of data science into innovation and problems.
I think it’s a very interesting way of looking at it.”
“Let me add two more points,” Schmarzo said.
First, business intelligence experts are really concerned with understanding what’s happening and where it’s happening.
Data analysis scientists are trying to understand why it happens, and when put together, it becomes powerful.
Another thing I think in BUSINESS intelligence experts will gradually become mature.
Really understand where and how data and analytics can drive business growth.
They have greater business acumen and are good at value engineering, identifying, validating and identifying sources of value creation.
Then combine them with data science, which makes for a powerful team.
Someone once asked me, what’s the difference between business intelligence and data science?
I’ve spent a lot of time really thinking about how these two work and how to change the way you think about things.
And came to the conclusion that the team needed both.”
“The other thing that comes to mind,” Mann said, “is getting ai to do a lot of human work.
Business intelligence specialists have deep business knowledge that data scientists may not have, and therefore need to understand their business and use their intelligence to understand the problems they are trying to solve.
And data scientists often use machine learning and artificial intelligence for things like dealing with huge data sets.
Because humans are really bad at observing, but machines are really good at it.
Thus, when exposed to large data sets, using machine learning becomes almost an inevitable choice for insight, whereas business intelligence professionals don’t necessarily need to adopt machine learning, just get the right data sets and use them in the right way to get the insights they need.”
“But it’s interesting that when we think about the impact of the coronavirus outbreak, we have to be able to use these machines to help us have a very detailed insight into every aspect of our customers, our employees, our products, our services, our operations,” Schmarzo said.
It is this level of granularity that allows us to get more out of it, and we simply seek to do more with less.
Traditionally, business intelligence has focused on categorizing aggregated data, looking at things at the aggregation level and making decisions.
As we try to do more with less money, we need those machines to tell us which patients are at risk for which diseases, and who are most at risk.”
Why is “dark data” important?
What should be an effective strategy for dark data?
“This is something we are really interested in,” Mann said.
Splunk is an analytics and data processing company where customers use our analytics platform to process their data.
So data is really important, and we have a theory that whatever data we use, the more data we use, the better we can do.
So we partnered with an independent analyst, the Enterprise Strategy Group, and asked them to validate some of our ideas about this dark data.
Collecting more data to make the business better was our basic assumption, and it became a fact.
ESG analysts look at how companies can do better.
So they looked at indicators such as revenue, profitability and efficiency, and looked at what it meant to use and look up data.
They also raised questions about companies’ IT budgets and spending on data analytics, about their commitment to finding dark data and how efficiently they operate.
So when you look at the differences between teams that can use more data in the organization and those that end up using less data and are less loyal to the data, there are indeed significant differences in results.
When we talk about these people using their dark data, all this data hidden in databases, log streams or edge devices, or various turbines, or production lines, we find that when we collect more data, we get more and cost less.
It makes sense to do more with less.
They are also able to stay ahead of their competitors and are twice as likely to develop and launch products.
Moreover, competitors are twice as likely to exceed customer focus targets and 10 times as likely to generate more than 20 per cent of revenue from new products and services over the next few years.
So data drives innovation directly.
It’s all about mining unused data, but the question is, if the data is already used, how do you find the resources to mine that extra data?
“We actually work with our customers to do data source assessments,” Mann said.
Such as where the data is, what data is available, and what it is used for.
And you don’t have to seek outside help to deal with it.
Such questions can be addressed by data scientists within the organization, because, as discussed earlier, the role of data scientists is to discover insights that have not yet been gained.
So being able to enable its data scientists to find the dark data and start developing strategies around how to use these unknowns to make an organization’s business better is another way of looking at the world.”
“There are some very interesting things about dark data,” Schmarzo said.
How do you determine if the data is valuable?
How do you know that you should try to go back and find these data sources and bring them in?
We found that if we let use cases drive it, those use cases will help people distinguish what data is valuable.
It will eventually help distinguish noise from signals in the data.
As a result, many methods are very use-case-centric.
Select a use case, understand the actions to be performed, and then brainstorm which data sources you might want to look at.
This involves mining some old data.
Of course, the most likely examples of the use of dark data today are what happened with the coronavirus epidemic, and how South Korea immediately used SARS and swine flu data.
They collected a lot of data, they made some correct predictions, that was 10 years ago, that was useless data.
Who needs that data anymore?
But it’s very valuable, it helps them make really sophisticated decisions.
As a result, organizations have large amounts of data buried in different parts of the organization.
The best way we found to solve the problem was to think about the use cases to be used, and then bring all the different stakeholders together to start thinking about what data we had, what data we could work with, and start the process.
Many times, we find that business stakeholders and business analysts know what data might be useful.
Data scientists actually tell companies what data is useful.”
Even in this day and age, why is data analysis so difficult?
“So I think there are a lot of reasons.
I think it all stems from the notion that humans are generally not that good at Numbers.
That’s not to say that some people aren’t very good at math, but Numbers are a construct, and most people see things visually.
Humans can also use hearing and smell to learn more.
Moreover, people are not very good at dealing with contradictory thoughts.
So it’s one thing when data tells people something they don’t know, but it’s hard when data tells people something they don’t believe.
As a result, many people discard data that does not support previous claims.
When people talk about the coronavirus epidemic, it’s interesting to see that more data needs to be collected, more testing needs to be done, and the idea of using more data will change the results of these models.
So I don’t think people naturally gravitate toward data and analysis.
They naturally gravitate toward stories and ideas.
So, as I said before, it takes a unique mindset to be a data scientist.
But it also has the unique ability to compromise and embrace new ideas from data scientists so that executives can drive these initiatives.
Unfortunately, these are unusual human traits.”
“Humans are really bad at Numbers and patterns,” Schmarzo said. “If you need any proof, go to a Las Vegas casino and gamble.
It is said that gambling is a tax on people who are not good at mathematics.
In addition, many people are looking for magic in data analysis.
The problem, of course, is the term “magic”.
There’s nothing magical about data analysis. It’s hard work.
There’s nothing magical about what we’re doing in data science, just a lot of hard work.
It’s really a process and a mindset.
We’re going to explore a lot of different ideas, we’re going to try a lot of different things, we’re going to keep failing, we’re going to keep iterating, and we’re going to keep learning in the process.
And that’s why a lot of what we’re trying to do is get executives to think like data scientists.
We have a whole set of approaches to attracting senior executives.
How do you get business people to think like a data scientist who is already adopting data and analytics?
In many cases, it requires them not to know what they’ve done, to give up their old ways of working, and to be ready for a new learning process.”
“I think it’s because people are bad at Numbers,” Mann says. “People who are good at Excel these days are data scientists, so to speak.
But I think the toolset adopted is also partially flawed.
Because data scientists are very smart people, they don’t mind working with complex and difficult toolsets.
I think as an IT leader, you need to create a simpler toolset.
One of the things we’re doing is getting people to plug open source algorithms into machine learning toolkits.
So people don’t have to be data scientists to adopt data science.
I think as leaders in IT and data, we can do a lot to make data science more accessible.”