
Statistical Thinking and Engineering in a Big Data World
Explore the challenges and opportunities in the Big Data world, highlighting the importance of statistical thinking and engineering for delivering tangible results. Discover insights on the current state of Big Data and the integration of analytics in various sectors.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Statistical Thinking and Engineering in a Big Data World NCB Conference June 24, 2021 Roger W. Hoerl Union College 1
Core Message The Big Data world is here, but has not produced the results that many of us were expecting. In short, it has over-promised and under-delivered. While many are looking for even more powerful methods to address this shortfall, I don t feel this is the root cause of the problem. I argue that fundamentals of sound statistical thinking have been overlooked, perhaps because they are no longer viewed as important in a Big Data world. Further, the engineering aspect of statistics/data science, that is, how to link and integrate tools in a sequential manner to solve complex problems, is grossly under-developed. Integrating the principles of statistical thinking, and augmenting statistical science with statistical engineering, can go a long way towards producing the tangible results expected by Big Data. 2
Outline The Big Data world is here So what is the problem? Is there still a role for statistical thinking? What about statistical engineering? Michael I Jordan viewpoint A potential way forward Summary 3
The Big Data World is Here The Big Data world of AI, machine learning, and data science is not coming, but is clearly already here Some things you wouldn t be able to do today without analytics: Be approved for a mortgage or auto loan Obtain life or auto insurance Participate in online dating services Invest in a mutual fund Utilize GPS directions (Garmin, Google Maps, etc.) Perform a Google search Participate in online sports gambling Analytics are Already Embedded into Medicine, Science and Global Business. 4
So What is the Problem? I have noticed a growing divergence between the technical literature on Big Data versus the business literature (HBR, SML, etc.) and private bloggers The technical literature tends to focus on the success stories and promise of the tools; the half full part Conversely, in the business literature and blogs I am now seeing more articles questioning the actual payoff of Big Data versus the hype; i.e., the half empty part Is the Big Data World Delivering What We Expected? 5
Two Examples Earnest: I m talking with the business (CEO, sales, marketing, planning, finance, logistics, manufacturing ) managers of the companies, who are complaining that machine learning (ML) and artificial intelligence (AI) are not for them, and pointing to the efforts they made to hire their brilliant data scientists. Yet, not much value came out of that team. From my experience, the main reason is the miscommunication between two very different disciplines. The business is focused on the strengths and weaknesses of psychology and humans, while data science is focused on the strengths and weaknesses of math and computers. Data Scientists are from Mars, Business People from Venus 6
Two Examples Ross: The mad dash accelerated as quickly as the pandemic. Researchers sprinted to see whether artificial intelligence could unravel Covid-19 s many secrets and for good reason. There was a shortage of tests and treatments for a skyrocketing number of patients. Maybe AI could detect the illness earlier on lung images, and predict which patients were most likely to become severely ill. Hundreds of studies flooded onto preprint servers and into medical journals claiming to demonstrate AI s ability to perform those tasks with high accuracy. It wasn t until many months later that a research team from the University of Cambridge in England began examining the models more than 400 in total and reached a much different conclusion: Every single one was fatally flawed. Machine Learning is Booming in Medicine. It s Also Facing a Credibility Crisis 7
Whats Missing? Our quandary: All other things being equal, Big Data is better than little data Newer data mining tools are powerful and can work quite well, e.g., Support vector machines (SVM), Neural nets ( Deep Learning ), Methods based on bootstrapping, such as Random Forests. Yet, the tangible payoff lags expectations; why? Clearly, we are missing something in the equation Could It Be That the Fundamentals Are Still Important? 8
Is There Still a Role for Statistical Thinking? Statistical Thinking: A philosophy of learning and action based on the following fundamental principles*: All work occurs in a system of interconnected processes Variation exists in all processes Understanding and reducing variation are keys to success A key issue noted by Ross is that most COVID modelers did not carefully consider the process that produced the data that they analyzed Documentation of the data pedigrees* involved would have highlighted serious limitations that were overlooked Further, potential sources of variation, such as between adolescents and adults, were neglected Earnest highlights the culturalvariation between business people and data scientists, which neither group seems to fully understand *Hoerl and Snee, Statistical Thinking, 3rd ed. Massive Data Sets Cannot Replace Critical Thinking 9
What About Statistical Engineering? Another issue noted by Ross, and many other authors, is the distinction between fitting models and solving real problems Unfortunately, structured approaches to problem solving are more associated with engineering than statistical science Consider the following slide, which was presented by Michael I Jordan (Berkeley) in 2019 Is this a crazy idea, or is Jordan perhaps onto something important? An Engineering Viewpoint Seems to be Missing 10
Statistical Engineering Presented at the Symposium on Statistics in the Data Science Era University of Michigan, 9/20/2019 11
Statistical Engineering Michael I. Jordan (Berkeley): society needs us to solve problems to carry out the statistical analogue of building a bridge or electrifying a city. We re often kidding ourselves regarding discovering truth. Xiao Li Meng (Harvard): Developed new course that emphasizes deep, broad, and creative statistical thinking instead of technical problems that correspond to a recognizable textbook chapter. Susan Hockfield (Former President of MIT): Science develops the fundamental parts list (periodic table, human genome, etc.) Engineering figures out how to build something of value to society from the parts list Leaders Have Realized that the Engineering Component is Critical 12
Statistical Engineering International Statistical Engineering Association (ISEA) Definition: The discipline of statistical engineering is the study of the systematic integration of statistical concepts, methods, and tools, often with other relevant disciplines, to solve important problems sustainably. Key phrases and words: Discipline ( the study of ) not a collection of tools Integration involves multiple methods/disciplines Other relevant disciplines not limited to statistics or engineering Solve important problems problem or opportunity oriented versus tool oriented Statistical engineering is tool agnostic Sustainably long term success is key The Engineering Component to Complement the Science Component 13
Statistical Engineering An observation: Scientists, engineers, and statisticians have been building something useful from the statistical science parts list of tools for a long time, to address large, complex, unstructured problems. However, This was typically done in an ad-hoc manner, with little or no underlying theory documented to provide guidance to others. Applications were typically one-offs, requiring the wheel to be reinvented with each new problem. This approach of one-offs and ad-hoc approaches may work for a given problem, but is not how an engineering discipline develops. Statistical Engineering is a New Discipline, But Built on the Work of Statistical Pioneers 14
Statistical Engineering Is: Is Not: Engineering solutions to large, complex, unstructured (LCU) problems Applied statistics A holistic approach A purely technical approach Tool agnostic A recommended set of tools Based on the scientific method Based on algorithms & number crunching Viewing data as a means to an end; i.e., data are a how Viewing data as an end in themselves; i.e., data are the what Neutral and broad in application area Engineering statistics Statistical Engineering is to Statistics What Chemical Engineering is to Chemistry 15
Statistical Engineering The Typical Phases of Statistical Engineering Projects Develop & Execute Tactics Identify & Deploy Final Solution Identify Problem Provide Structure Develop Strategy Understand Context Right Problem Across Silos Clarify Mess Define Problem Agree on Metrics History Politics Personalities Identify Alternatives Select Methods & Apply How to Attack Sequential Approach Use Core Processes Verify Success Sustainability Ongoing cycle of improvement through the scientific method Large, Complex, Unstructured Problems Require a Different Approach 16
Statistical Engineering The Typical Phases of Statistical Engineering Projects Develop & Execute Tactics Identify & Deploy Final Solution Identify Problem Provide Structure Develop Strategy Understand Context Right Problem Across Silos Clarify Mess Define Problem Agree on Metrics History Politics Personalities Identify Alternatives Select Methods & Apply How to Attack Sequential Approach Use Core Processes Verify Success Sustainability Not historically addressed well in published case studies! Structuring, Digging into Context, and Developing an Overall Strategy are Key 17 17
International Statistical Engineering Association ISEA legally incorporated in 2018 Over 360 members as of today Individual membership is free Holds annual Statistical Engineering Summit (November this year) Now sponsoring Stu Hunter Research Conference Holding regular webinars Writing Statistical Engineering Handbook Handbooks are common for engineering disciplines Available to members on ISEA website (isea-change.org) Several chapter available now; completion of 1st ed. planned for September From Inception to Over 360 Members in Three Years 18 18
Summary Our core message is that the statistics/analytics profession has the potential to significantly enhance its impact on society The proven principles of statistical thinking are still relevant in a Big Data World Michael I Jordan pointed out the need for an engineering mindset to balance a statistical/data science mindset Integration of multiple methods to solve large, complex, unstructured problems is a unique, and to a large degree, unmet niche ISEA has taken some specific steps to address this oversight The Statistical Engineering Handbook is nearly finished Most chapters are complete and are available to members on ISEA website Lots of work remains, including further development of the underlying theory of statistical engineering 19 19