KARELIA UNIVERSITY OF APPLIED SCIENCES Degree Programme in Applied Computer Sciences Jonas Lesy Ruben Vervaeke APPLYING INTERNET OF THINGS – SMART CITY Thesis June 2015 THESIS June 2015 Degree Programme in Applied Computer Sciences Tikkarinne 9 80200 JOENSUU FINLAND Tel. 358-13-260 600 Author(s) Jonas Lesy & Ruben Vervaeke Title Applying Internet of Things – Smart City Commissioned by Karelia University of Applied Sciences Abstract This thesis displays the progression and result of the Final Project that was realized by Ruben Vervaeke and Jonas Lesy, during the second semester of 2014-2015. The project is comprised of finding a solution for Process Genius, a company that needed an easy way to collect data from various public service providers. From bus schedules to the opening/closing times of the city’s bridges, they wanted all kinds of data to be gath- ered in a central place. The purpose of the project was, therefore, to build a data ware- house and to create a web service to access the required data. The first part of this thesis will describe the context of the project and the operation procedures. It will have a more in-depth description of the purpose and outline of the project and explain how the project was handled. It also includes what had to be done first, the planning stage of the project and some preliminary steps that were necessary to start the development. This part is followed by the required theory to be able to understand what had to be done. This part introduces all of the important concepts upon which the system is built. After that, the implementation is described, which consists of the steps taken to build the data warehouse and web services and how they interact with each other. The complete system is explained and the core components are discussed in detail. This thesis ends with the results and discussion of the project. That part will tell to what extent the goals of the project were reached, which difficulties where encountered and what possible future actions might be. Language Pages 125 English Appendices 3 Pages of Appendices 7 Keywords Internet of Things, Hadoop, Big Data, Data warehouse CONTENTS 1 INTRODUCTION ............................................................................................ 7 2 ACTION PLAN ................................................................................................ 9 2.1 Project background ................................................................................ 9 2.1.1 Organization ......................................................................................... 9 2.1.2 Mission and vision ................................................................................ 9 2.2 Problem description.............................................................................. 10 2.3 Goals and project outline ...................................................................... 10 2.4 Project approach .................................................................................. 11 3 INTERNET OF THINGS ............................................................................... 14 4 BIG DATA ..................................................................................................... 16 5 DATA WAREHOUSES ................................................................................. 18 6 HADOOP ...................................................................................................... 23 6.1 Introduction .......................................................................................... 23 6.2 Components ......................................................................................... 24 6.3 MapReduce .......................................................................................... 25 6.3.1 The basics .......................................................................................... 34 6.3.2 The process of a MapReduce Job run ............................................... 41 6.3.3 Failures in YARN ................................................................................ 46 6.4 The Hadoop Distributed File System .................................................... 48 6.4.1 Design ................................................................................................ 48 6.4.2 Concepts ............................................................................................ 49 6.4.3 Data flow ............................................................................................ 51 6.5 Hadoop input and output ...................................................................... 54 6.5.1 Serialization........................................................................................ 54 7 HBASE ......................................................................................................... 57 7.1 Data Model ........................................................................................... 58 7.2 Data operations .................................................................................... 62 7.3 HBase Schemas .................................................................................. 64 8 CLOUDERA .................................................................................................. 65 9 DEVELOPING TOOLS ................................................................................. 68 9.1 Hardware .............................................................................................. 68 9.2 Software ............................................................................................... 68 10 PRACTICAL APPLICATION ......................................................................... 75 10.1 Network setup ...................................................................................... 75 10.2 Cloudera installation ............................................................................. 76 10.3 Application ............................................................................................ 81 10.3.1 Data retrieval ................................................................................ 82 10.3.2 Data storage ................................................................................. 94 10.3.3 Data providing .............................................................................. 97 10.3.4 Error handling ............................................................................. 110 10.3.5 Workflow ..................................................................................... 111 11 REFLECTION ............................................................................................. 114 11.1 Difficulties ........................................................................................... 114 11.2 Future thoughts .................................................................................. 115 11.3 What did we learn............................................................................... 115 11.4 Workload balancing ............................................................................ 116 REFERENCES ............................................................................................... 118 LIST OF IMAGES Figure 1 - Relational database model example ................................................. 20 Figure 2 - Data warehouse overview ................................................................ 21 Figure 3 - MapReduce workflow ....................................................................... 28 Figure 4 - Reducejob result ............................................................................... 34 Figure 5 - Possible job scheduling assignments ............................................... 36 Figure 6 - Map outputs to one reduce input workflow ....................................... 37 Figure 7 - Map outputs to multiple reduce inputs workflow ............................... 38 Figure 8 - MapReduce job execution workflow ................................................. 42 Figure 9 - MapReduce status check workflow .................................................. 45 Figure 10 - MapReduce name- and datanodes................................................. 49 Figure 11 - HDFS file read workflow ................................................................. 51 Figure 12 - HDFS file write workflow ................................................................. 53 Figure 13 - HBase table example (myTable) .................................................... 61 Figure 14 - Cloudera Manager overview ........................................................... 66 Figure 15 - Cloudera Health History .................................................................. 67 Figure 16 - Cloudera statistics example ............................................................ 67 Figure 17 - X2Go gateway connection .............................................................. 69 Figure 18 - X2Go connecting to virtual servers ................................................. 69 Figure 19 - X2Go machine overview ................................................................. 70 Figure 20 - Trello overview ............................................................................... 73 Figure 21 - Network setup ................................................................................. 75 Figure 22 - Cloudera role instances .................................................................. 81 Figure 23 - Application architecture overview ................................................... 82 Figure 24 - Data domain class model ............................................................... 85 Figure 25 - HBase VehicleDetectionReadings schema .................................... 96 LIST OF TABLES Table 1 - Common Writable implementations ................................................... 56 Table 2 - RDBMS versus HBase key sorting mechanism ................................. 58 Table 3 - Cloudera installation paths ................................................................. 76 Table 4 - Scheduling types ............................................................................... 85 Table 5 - Consequences of resource modification ............................................ 87 Table 6 - Consequences of service modification............................................... 88 Table 7 - Consequences of city modification .................................................... 89 Table 8 - REST HTTP requests example ........................................................ 104 APPENDICES APPENDIX 1 TERMINAL OUTPUT OF MAPREDUCE JOB APPENDIX 2 DATASCHEDULER APPENDIX 3 DATASCHEDULEDTASK ABBREVIATIONS If any abbreviations are mentioned throughout this thesis, the explanation can be found in this list. BLL Business Logic Layer, a layer, often implemented in programs that make connection to a database, which prevents invalid data opera- tions. CDH Cloudera Hadoop, Cloudera’s open source distribution which includes Apache Hadoop. CRLF Carriage Return Line Feed, a term defined to refer to the end of a line. It is often used when transmitting messages so that the system knows the end of the message is reached. DAO Data Access Object, a layer implemented in applications to make connection with a database, often used for retrieving data out of a da- tabase. DWH Data Warehouse, a complete system built to immediately answer requests for data without having to overload the original sources of the data. It is most commonly used for analytical purposes. ETL Extract, Transformation and Load, the part of a data warehouse which collects and unites data from source files to make it usable for analy- sis. HDFS Hadoop Distributed File System, the file system used by Hadoop to store its databases and files on. HTML HyperText Markup Language, the standard language developed to create web pages. HTTP HyperText Transfer Protocol, the protocol defined to provide commu- nication between a web client and a webserver. IDE Integrated Development Environment, a software application used to develop different applications. JAR Java Archive, is a standard data compression and archiving format used for files written in the Java programming language. JDK Java Development Kit, a software package needed by developers to program in the Java language. JSON JavaScript Object Notation, a standardised format of defining data objects with their attributes. JSON files are easy to read by humans and often used to transmit data. JVM Java Virtual Machine, an environment for executing Java bytecode. For example, compiled Java code runs on a JVM. RDBMS Relational Database Management System, a database management system used to manage relational databases. RPC Remote Procedure Call, the technology that allows an application to execute code on another machine without having to know the code written for that application. UI User Interface, the interface that makes interaction between the user and the system possible. URL Uniform Resource Locator, a structured name that refers to a piece of data. It can for example be used to locate a website or local storage. XML Extensible Markup Language, a standard created to create formal and structured files which store data like for example configuration set- tings. This presentation is human-readable and machine-readable. YARN Yet Another Resource Negotiator, the new version of MapReduce in Hadoop’s framework. It is a programming model for executing jobs. 7 1 INTRODUCTION This thesis is written by Ruben Vervaeke and Jonas Lesy, two Belgian students, and is the result of our Final Project carried out during our Erasmus exchange. This Final Project accounts for 22 credits and is the final step to getting a de- gree in Applied Computer Sciences. This document is formatted according to the instructions of the thesis committee at Karelia University of Applied Sciences. The guidelines by them were followed during the entire process of this thesis. To make it easier to comprehend the technical aspects of this thesis, a different font type was used to indicate class names, attributes, code sample and commands. This report is the result of about twelve weeks of working on the final project. The aim of this thesis is to inform the reader on the development and details of the project. It is written in such a way that everyone with some minor experience on the subject will be able to understand it completely. If there are any technical terms, abbreviations or jargon, they will be explained in a way that anyone with basic IT knowledge will be able to comprehend the text. To start off, we would like to thank Mr. P. Laitinen for giving us the opportunity to work on this project and for providing us with the interesting subject we had to work with. If it was not for him, there would not have been any project and this document would not have been written. We would also want to thank Mr. J. Ranta for monitoring our project and taking the time to organize meetings together with Mr. Laitinen. Next to that, we would like to thank everyone at Process Genius for letting us help them to find a solution for their problem. They have always been friendly and provided us with all of the necessary information to continue with the devel- opment of this project. 8 At last, we want to thank our family and friends, who have supported us during the progress of this project. Motivation is one of the keywords necessary to achieve goals. 9 2 ACTION PLAN 2.1 Project background 2.1.1 Organization This project was conducted in co-operation with Process Genius, a company settled in Joensuu. The company started in 2011 and specializes in cutting edge 3D online services. This means that they provide 3D models especially made for industrial process plants and the sales organizations that supply them. These models are very useful because Process Genius can display all of the necessary and important data on them. For example, when a power plant in a company is down or malfunctioning, the cause can be seen immediately on the 3D model. This means fixing the problem or investigating malfunctions is more efficient and quicker. The company provides the complete solution by conduct- ing research in the customer’s power plants, and then provide the 3D model for it and all of the important data. User-friendly experience is important for them and so is productivity. 2.1.2 Mission and vision The founder’s passion is to combine their know-how on scientific topics with methods to develop next generation tools. These tools are created to optimize user experience and to boost sales. They have a wide global partner network that gives them a good stance in the competitive market. Next to that, they possess a highly skilled team to develop their tools. They excel in graphical design, industrial knowledge and web application development. In short, they have everything they need to deliver high quality products to their customers. 10 2.2 Problem description The employees at Process Genius develop a complete solution for industrial and technological companies. They have developed the idea to deploy their project and technology to help the citizens of Joensuu. At the moment, data from many different public services is not accessible. To give an example, there is no easy way to check the bus schedule. The city of Joensuu does already have a website for this but it is very unclear and ineffi- cient. We have also tested this website and we can agree with this statement, the website is not user-friendly and the schedules are hard to read. Next to the bus data, Process Genius also wants to display other data such as when the bridges go up and where the snow ploughing machines are. This will all be presented on their 3D model from the Joensuu city. The idea is that a user can, for example, just click on a bus stop on the map and see when a specific bus will pass there. It is supposed to be an all-in-one solu- tion again, similar to what Process Genius usually delivers. 2.3 Goals and project outline As Process Genius stated, they want to have access to the data so they can use it to display information on their 3D model. This is where we, as a project team, came in. Our task was to provide them with the data, so they can access it whenever they want. In other words, our task was to set up some sort of a local storage which can be accessed by them. We chose to set up a data ware- house for this solution.
Description: