THE PROCESS OF BIG DATA SOLUTION ADOPTION An exploratory study within the Dutch telecom and energy utility sector Master Management of Technology Faculty Technology, Policy and Management Delft University of Technology August 2013 Name: Bas Verheij Student number: 1178938 1st supervisor: Dr. Laurens Rook 2nd supervisor: Prof.dr.ir. Jan van den Berg Chair: Prof.dr. Cees van Beers External company: Accenture External supervisor: Drs. Paul van der Linden Msc 1 2 ACKNOWLEDGEMENT This thesis covers the development of a conceptual big data solution adoption process model, which has been designed using empirical findings from interviews with firms adopting big data solutions within the Dutch telecommunications and energy utilities sector. This document presents the results of my research graduation research to finalize the master program Management Of Technology (MOT) at Delft University of Technology. Firstly I would like to thank Paul van der Linden from Accenture for giving me the opportunity to conduct my research at Accenture and for helping me shape the research through the many meetings we had. Accenture provided a very fruitful environment for developing knowledge on the topic of big data, and I am thankful to Paul van der Linden and the DD&A department for giving me the opportunity to conduct my research among large and interesting companies within the Netherlands. Also I would like to thank my first supervisor Laurens Rook for supervising my thesis, and mainly for the support and discussions during the process of my research. Also I would like to thank Jan van den Berg for his critical and accurate feedback, this has greatly helped my learning experience on performing scientific research. Thirdly I would to thank my friends who helped me during my time in Delft and my various study activities. Especially I would like to thank the friends that helped me finish my Bachelor in Mechanical Engineering, which ultimately made this graduation possible. Finally I am very grateful to my family for supporting me through my life and my study period, and especially for supporting me to persevere in my studies. 3 LIST OF ABBREVIATIONS Abbreviation Term ACID Atomicity, Consistency, Isolation, Durability AWS Amazon Web Services EC2 Amazon Web Services EC2 AWS EMR Amazon Web Services Elastic MapReduce API Application Programming Interface BASE Basically Available, Soft state, Eventual consistency BI Business Intelligence BI&A Business Intelligence & Analytics CAP theorem Consistency, Availability, Partition tolerance DM Data mining DW Data Warehouse ETL Extraction-Transformation-Loading HDFS Hadoop Distributed File System MVCC Multi-version Concurrency Control OLAP Online Analytical Processing RDBMS Relational Database Management System DSS Decision Support System DDD Data-driven Decisionmaking IS Information System 4 5 TABLE OF CONTENTS 1 INTRODUCTION ...................................................................................................................................... 11 1.1 RESEARCH OBJECTIVE .................................................................................................................................. 13 1.2 RESEARCH QUESTIONS ................................................................................................................................ 13 1.3 RESEARCH FRAMEWORK .............................................................................................................................. 15 1.4 STRUCTURE OF THIS REPORT ......................................................................................................................... 15 2 BACKGROUND ........................................................................................................................................ 18 2.1 INTRODUCTION .......................................................................................................................................... 19 2.2 BUSINESS INTELLIGENCE & ANALYTICS AND BIG DATA ......................................................................................... 20 2.3 DATABASE TECHNOLOGY .............................................................................................................................. 24 2.4 BIG DATA SOLUTIONS .................................................................................................................................. 26 2.5 BIG DATA AND ORGANIZATIONS ..................................................................................................................... 31 2.6 THEORY: IT INNOVATION ADOPTION .............................................................................................................. 33 2.7 CONCLUSION: BI GENERATIONS AND ADOPTION ............................................................................................... 35 3 METHODOLOGY ...................................................................................................................................... 38 3.1 CASE-STUDY RESEARCH APPROACH JUSTIFICATION ............................................................................................ 39 3.2 CASE-STUDY RESEARCH STRATEGY .................................................................................................................. 41 3.3 PHASE OF PREPARATION .............................................................................................................................. 43 3.4 PHASE OF DATA COLLECTION ......................................................................................................................... 44 3.5 PHASE OF DATA ANALYSIS ............................................................................................................................ 46 3.6 DESIGN OF CONCEPTUAL MODEL ................................................................................................................... 49 3.7 CASE-STUDY VALIDITY CONCERNS .................................................................................................................. 51 4 BIG PRACTICES IN TWO INDUSTRIES ....................................................................................................... 54 4.1 ENERGY SECTOR OVERVIEW .......................................................................................................................... 55 4.2 TELECOM SECTOR OVERVIEW ........................................................................................................................ 65 4.3 OTHER SECTORS OVERVIEW .......................................................................................................................... 76 5 RESULTS OF THEMATIC ANALYSIS ........................................................................................................... 80 5.1 CASE STUDY RESULTS................................................................................................................................... 81 5.2 BUSINESS CASE DEVELOPMENT ...................................................................................................................... 84 5.3 SOLUTION CHOICE ...................................................................................................................................... 87 5.4 ORGANIZATIONAL CHANGE ........................................................................................................................... 91 5.5 INFORMATION PRIVACY ............................................................................................................................... 94 5.6 IMPLEMENTATION AND FINE-TUNING PROBLEMS IN BIG DATA TECHNOLOGY ........................................................... 95 6 CONCEPTUAL MODEL ............................................................................................................................. 98 6.1 CONSIDERATIONS: REQUIREMENTS FOR BIG DATA ADOPTION............................................................................... 99 6.2 BIG DATA ADOPTION PROCESS PHASES ............................................................................................................ 99 6.3 ISSUES PERCEIVED IN THE ADOPTION PROCESS ................................................................................................ 102 6.4 ADOPTION PROCESS MODEL ....................................................................................................................... 104 7 ANALYSIS .............................................................................................................................................. 106 7.1 IT INNOVATION ADOPTION (BUSINESS CASE DEVELOPMENT) ............................................................................. 107 7.2 RADICAL INNOVATION (ORGANIZATIONAL CHANGE) ........................................................................................ 109 7.3 CHANGES IN IS ARCHITECTURES (SOLUTION CHOICE)........................................................................................ 113 6 7.4 INFORMATION PRIVACY AS A MAIN DRIVER FOR IS CHANGE (PRIVACY) ................................................................. 115 8 REFLECTION .......................................................................................................................................... 120 8.1 REFLECTION ON RESEARCH RESULTS ............................................................................................................. 121 8.2 QUALITY OF THE RESULTS ........................................................................................................................... 121 8.3 REFLECTION ON RESEARCH PROCESS ............................................................................................................. 123 9 CONCLUSION ........................................................................................................................................ 124 9.1 MAIN FINDINGS ....................................................................................................................................... 125 9.2 RESEARCH IMPLICATIONS ........................................................................................................................... 128 9.3 MANAGERIAL IMPLICATIONS ....................................................................................................................... 131 9.4 RESEARCH LIMITATIONS ............................................................................................................................. 134 9.5 FUTURE RESEARCH .................................................................................................................................... 135 9.6 CLOSURE ................................................................................................................................................ 137 10 REFERENCES ...................................................................................................................................... 138 APPENDIX A: PLANNING ............................................................................................................................... 144 APPENDIX B.1: INTERVIEW PROTOCOL ......................................................................................................... 146 APPENDIX B.2: INTERVIEW CASE STUDY DESCRIPTION QUESTIONS ............................................................. 147 APPENDIX B.3: INTERVIEW PERCEPTIONS QUESTIONS ................................................................................. 148 APPENDIX D: SAMPLE DETAILS ..................................................................................................................... 149 D.1 GENERAL FIRM CHARACTERISTICS ...................................................................................................................... 149 D.2 CHARACTERISTICS FIRM CASES .......................................................................................................................... 151 D.3 GROUPING OF ADOPTION PROCESSES ................................................................................................................. 152 APPENDIX F: MOST FREQUENTLY USED PREDICTORS OF ORG. IT ADOPTION (JEYARAJ ET AL., 2006) ........... 153 APPENDIX G: LITERATURE RESEARCH ........................................................................................................... 157 APPENDIX H: DATABASE LANDSCAPE (AS OF JUNE 2013) ............................................................................. 158 APPENDIX I: NOSQL FUNDAMENTAL TECHNICAL CONCEPTS ........................................................................ 159 I.1 FUNDAMENTAL CONCEPTS ................................................................................................................................. 159 I.2 TYPES OF DATA MODELS .................................................................................................................................... 161 I.3 PERFORMANCE AND ELASTICITY .......................................................................................................................... 161 7 EXECUTIVE SUMMARY Research Objective Starting with the research objective: the main purpose of this research is to design a conceptual big data solution adoption model by exploring the process of big data solution adoption within organizations. This multiple-case study gives a thorough description of the adoption process of big data solutions and the main issues organizations experience within this process. Phenomenon The term “big data” is primarily seen as an umbrella term used within the industry, clear definitions in scientific literature have not been found at the moment of research. Gartner was the first firm within the industry to name the big data phenomenon defining it as “challenges and opportunities in data growth” in 2001. Gartner defined these challenges and opportunities in data growth as having three facets: increasing volume, velocity and variety ("The three Vs"; Pettey & Goasduff, 2011). The McKinsey Global Institute published an industry report in 2011, which uses an intentionally subjective definition to capture the essence of the difference between data and big data. This definition will be used as a principle in this paper (Manyika, Chui, Brown, & Bughin, 2011; p1): “Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.” The big data phenomenon is accompanied by a new generation of databases – so called NoSQL databases – which have the capability of distributed analyzing large datasets of unstructured data. These databases are expected to be adopted within firms as the need of analysis of an increasing volume, velocity and variety of data demands for these new database technologies. Although big data is widely and wildly cited in industry papers, analyses of actual activities within firms is scarce. This is strange, concerning the large potential impact big data has on society. This is also reflected in the coming MIS Quarterly Executive Dec. 2013 Special edition, which titles “How to succeed in a world of big data1” and calls specifically for papers on innovative business uses of data, and organizational approaches to developing and sustaining big data skills and capabilities; topics that are as well described in within this research. Methodology This research was designed as a multiple-case study research (Yin, 2009) in which people involved in big data practices in Dutch telecommunications and energy utility firms were interviewed. The case studies were summarized and through a process of open-coding interviews (Strauss & Corbin, 1998) main concepts were extracted from the codes using Eisenhardts (1989) process of inducting theory using case studies. Findings Using eight cases within two Dutch sectors, five phases within the big data solution adoption process were found: a strategy development phase, a knowledge development phase, a pilot/test-case phase, a platform implementation phase and a fine-tuning phase. Issues described within the process of big data solution adoption were categorized in four main concepts: business case development, technical, organizational, and information privacy related. These findings are depicted in the conceptual model below. 1 See http://misqe.org/ojs2/MISQE%20Dec%202013%20Special%20Issue.pdf. Retrieved 4/7/2013. 8 Big data adoption process issues e sa Strategy Knowledge development Pilot/test-cases Platform implementation Fine-tuning h P - Business case barriers (5) - Business case co-development - Pilot isolated from normal - Top management support esac ssenisuBtnempoleved - Business case drivers (4) b----t eHIINuTncsao htimren nsdreaton astnlaoosna l gdp ggsyareet oraymtdc kedeseeta snhitnstoae sbl sfduir gepo rpmd soau otrpatt phoerrt p arty -- IDnesftianbei lsitcya &le bouf gims pblieg mdaetnat asytisotnem s - Solution choice related (combined - Choices on data granularity - Ecosystem complexity - Ecosystem management seussi lacinhceT in pilot/test-cases) -- SUonlsuttaiobnle c phloaitcfeo rrmelsa ted (15) ---p rDDInoaatctteeaags rsaaaevctscae eil spaslba (itlsifitloyor ’ms) in traditional ---p- PDADrollaaagbtttolaafero ssmitrothmrsuem r arc imlgianiridgngi gpety rps o(crEboaTllbeeLlm esdcmash tsea mseat ’s) seussi lanoitazinagrO --- WWTehhaeemnn stteoot udstpeav r&et lssoatprvu iancntgau dlryeatt icaa l skills --t-p oARUr ioblvnigalukencsn yin oni newetesendsr ecnpdoar nolt ssocute atmsrksaeeenhrss oalaltdtteiet urasdn easly otinc s ---- DEGAxleeitgvte nesr luneopaxplpt eacoornrnmta alfymlr topiumcanar tiloc iceraagstpa ioannbiz ialittiioesn --(e ISn.pgfe.r arceisfltiircau bccaitlsiutery e ce acnpagapinbaeibleiitlriitesi)se sn neeedededed ycavirp noitam -p Urivnakcnyo wn consumer attitudes on - Restraints internal law department -(-e CCxohissattnisng ogin fs giyn slatteewgmsr as&)t iroeng purlaivtaiocny st ooling rofnI Conceptual model of issues in the big data solution adoption process. These concepts after analysis led to twelve main issues in the process of big data solution adoption: Business case development (strategy, knowledge development, and pilot/test-case phases) 1. A paradigm change is needed in order for a firm to recognize the value of big data business cases. 2. Business case development is could be driven by both IT management support or top management support (in case of new business development), it is hampered due to three main reasons a. The business case value being unclear and business case return being unclear b. High financial and time investment before value of business case becomes clear c. High baseline pressure (especially in small firms) 3. Information privacy concerns significantly impact business case development as will be discussed in (6). Solution choice (pilot/test-case phase) 4. Choice for big data solutions (NoSQL databases) are mainly driven by need for scalability, this is also the main limiting factor on currently used IS. Unstructured data was not reported as a main determinant for big data solution choice. 5. Parallel databases are currently seen as a viable alternative for big data (NoSQL) solutions by firms which had already adopted parallel databases for other firm activities. 6. Ecosystem complexity and management problems are reported in both experimentation and implementation phases, these lead to significant delays in the solution adoption process in implementation and fine-tuning phase. 9 Organizational change (knowledge development, pilot/test-case, and implementation phases) 7. An analytics department was perceived needed during the process of big data solution adoption and in all cases were such an department was not yet present an Analytics department was setup during the process, which caused significant delays in the process. 8. Business process experts (or business consultants) that can translate business case needs in information systems requirements – taking into account legal and technical constraints – are perceived a crucial and needed role in the process of big data solution adoption. 9. Specific big data solution infrastructure setup, management and maintenance capabilities were perceived needed in the process of big data solution adoption. Information privacy (knowledge development, pilot/test-case, and implementation phases) 10. Information privacy significantly influences the big data solution adoption process in three main ways: a. Uncertainty regarding consumer attitudes in combination with fear of negative brand image concerning new big data related products and services hampers business case development and experimentation, and is seen as a main barrier to new business case development in a part of cases where personal data is used. b. Risk of changes in laws and regulations hampering business case development. c. Business case being unviable due to high costs of privacy tooling (opt-in/out APIs) in the implementation phase (to connect opt-in/out APIs to current IS) hampers both implementation and new business case development. Implementation and fine-tuning of big data technology (implementation and fine-tuning phases) 11. Issues concerning data quality and accuracy problems were related to two main problems: a. Added complexity layers in big data platforms prevents traditional drill-down in case of data problems. b. Scaling effects due to large datasets lead to algorithms having to be retested and recalibrated. 12. Performance (fine-)tuning of big data solutions is perceived complicated due to high complexity of big data ecosystems and lack of documentation. 10
Description: