ebook img

Making Things See: 3D vision with Kinect, Processing, Arduino, and MakerBot PDF

440 Pages·2012·2.91 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Making Things See: 3D vision with Kinect, Processing, Arduino, and MakerBot

Making Things See Greg Borenstein Editor Brian Jepson Copyright © 2012 Greg Borenstein O’Reilly Media books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or [email protected]. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. Make DEDICATION For Jacob and Ellie and Sophie and Amalia. The future is yours. Preface When Microsoft first released the Kinect, Matt Webb, CEO of design and invention firm Berg London, captured the sense of possibility that had so many programmers, hardware hackers, and tinkerers so excited: “WW2 and ballistics gave us digital computers. Cold War decentralization gave us the Internet. Terrorism and mass surveillance: Kinect.” Why the Kinect Matters The Kinect announces a revolution in technology akin to those that shaped the most fundamental breakthroughs of the 20th century. Just like the premiere of the personal computer or the Internet, the release of the Kinect was another moment when the fruit of billions of dollars and decades of research that had previously only been available to the military and the intelligence community fell into the hands of regular people. Face recognition, gait analysis, skeletonization, depth imaging—this cohort of technologies that had been developed to detect terrorists in public spaces could now suddenly be used for creative civilian purposes: building gestural interfaces for software, building cheap 3D scanners for personalized fabrication, using motion capture for easy 3D character animation, using biometrics to create customized assistive technologies for people with disabilities, etc. While this development may seem wide-ranging and diverse, it can be summarized simply: for the first time, computers can see. While we’ve been able to use computers to process still images and video for decades, simply iterating over red, green, and blue pixels misses most of the amazing capabilities that we take for granted in the human vision system: seeing in stereo, differentiating objects in space, tracking people over time and space, recognizing body language, etc. For the first time, with this revolution in camera and image-processing technology, we’re starting to build computing applications that take these same capabilities as a starting point. And, with the arrival of the Kinect, the ability to create these applications is now within the reach of even weekend tinkerers and casual hackers. Just like the personal computer and Internet revolutions before it, this Vision Revolution will surely also lead to an astounding flowering of creative and productive projects. Comparing the arrival of the Kinect to the personal computer and the Internet may sound absurd. But keep in mind that when the personal computer was first invented, it was a geeky toy for tinkerers and enthusiasts. The Internet began life as a way for government researchers to access one anothers’ mainframe computers. All of these technologies only came to assume their critical roles in contemporary life slowly as individuals used them to make creative and innovative applications that eventually became fixtures in our daily lives. Right now it may seem absurd to compare the Kinect with the PC and the Internet, but a few decades from now, we may look back on it and compare it with the Altair or the ARPAnet as the first baby step toward a new technological world. The purpose of this book is to provide the context and skills needed to build exactly these projects that reveal this newly possible world. Those skills include: Working with depth information from 3D cameras Analyzing and manipulating point clouds Tracking the movement of people’s joints Background removal and scene analysis Pose and gesture detection The first three chapters of this book will introduce you to all of these skills. You’ll learn how to implement each of these techniques in the Processing programming environment. We’ll start with the absolute basics of accessing the data from the Kinect and build up your ability to write ever more sophisticated programs throughout the book. Learning these skills means not just mastering a particular software library or API, but understanding the principles behind them so that you can apply them even as the practical details of the technology rapidly evolve. And yet even mastering these basic skills will not be enough to build the projects that really make the most of this Vision Revolution. To do that, you also need to understand some of the wider context of the fields that will be revolutionized by the cheap, easy availability of depth data and skeleton information. To that end, this book will provide introductions and conceptual overviews of the fields of 3D scanning, digital fabrication, robotic vision, and assistive technology. You can think of these sections as teaching you what you can do with the depth and skeleton information once you’ve gotten it. They will include topics such as: Building meshes Preparing 3D models for fabrication Defining and detecting gestures Displaying and manipulating 3D models Designing custom input devices for people with limited ranges of motion Forward and inverse kinematics In covering these topics, our focus will expand outward from simply working with the Kinect to using a whole toolbox of software and techniques. The last three chapters of this book will explore these topics through a series of in-depth projects. We’ll write a program that uses the Kinect as a scanner to produce physical objects on a 3D printer, we’ll create a game that will help a stroke patient with physical therapy, and we’ll construct a robot arm that copies the motions of your actual arm. In these projects, we’ll start by introducing the basic principles behind each general field and then seeing how our newfound knowledge of programming with the Kinect can put those principles into action. But we won’t stop with Processing and the Kinect. We’ll work with whatever tools are necessary to build each application, from 3D modeling programs to microcontrollers. This book will not be a definitive reference to any of these topics; each is vast, comprehensive, and filled with its own fascinating intricacies. This book aims to serve as a provocative introduction to each area—giving you enough context and techniques to start using the Kinect to make interesting projects and hoping that your progress will inspire you to follow the leads provided to investigate further. Who This Book Is For At its core, this book is for anyone who wants to learn more about building creative interactive applications with the Kinect, from interaction and game designers who want to build gestural interfaces to makers who want to work with a 3D scanner to artists who want to get started with computer vision. That said, you will get the most out of it if you are one of the following: a beginning programmer looking to learn more sophisticated graphics and interactions techniques, specifically how to work in three dimensions, or an advanced programmer who wants a shortcut to learning the ins and outs of working with the Kinect and a guide to some of the specialized areas that it enables. You don’t have to be an expert graphics programmer or experienced user of Processing to get started with this book, but if you’ve never programmed before, there are probably other much better places to start. As a starting point, I’ll assume that you have some exposure to the Processing creative coding language (or can teach yourself that as you go). You should know the basics from Getting Started with Processing by Casey Reas and Ben Fry (http://shop.oreilly.com/product/0636920000570.do), Learning Processing by Dan Shiffman (http://learningprocessing.com), or the equivalent. This book is designed to proceed slowly from introductory topics into more sophisticated code and concepts, giving you a smooth introduction to the fundamentals of making interactive graphical applications while teaching you about the Kinect. At the beginning, I’ll explain nearly everything about each example, and as we go I’ll leave more and more of the details to you to figure out. The goal is for you to level up from a beginner to a confident intermediate interactive graphics programmer. The Structure of This Book The goal of this book is to unlock your ability to build interactive applications with the Kinect. It’s meant to make you into a card-carrying member of the Vision Revolution I described at the beginning of this introduction. Membership in this Revolution has a number of benefits. Once you’ve achieved it, you’ll be able to play an invisible drum set that makes real sounds, make 3D scans of objects and print copies of them, and teach robots to copy the motions of your arm. However, membership in this Revolution does not come for free. To gain entry into its ranks, you’ll need to learn a series of fundamental programming concepts and techniques. These skills are the basis of all the more advanced benefits of membership, and all of those cool abilities will be impossible without them. This book is designed to build up those skills one at a time, starting from the simplest and most fundamental and building toward the more complex and sophisticated. We’ll start out with humble pixels and work our way up to intricate three-dimensional gestures. Toward this end, the first half of this book will act as a kind of primer in these programming skills. Before we dive into controlling robots or 3D printing our faces, we need to start with the basics. The first four chapters of this book cover the fundamentals of writing Processing programs that use the data from the Kinect. Processing is a creative coding environment that uses the Java programming language to make it easy for beginners to write simple interactive applications that include graphics and other rich forms of media. As mentioned previously, this book assumes basic knowledge of Processing (or equivalent programming chops), but as we go through these first four chapters, I’ll build up your knowledge of some of the more advanced Processing concepts that are most relevant to working with the Kinect. These concepts include looping through arrays of pixels, basic 3D drawing and orientation, and some simple geometric calculations. I will attempt to explain each of these concepts clearly and in depth. The idea is for you not to just to have a few project recipes that you can make by rote, but to actually understand enough of the flavor of the basic ingredients to be able to invent your own “dishes” and modify the ones I present here. At times, you may feel that I’m beating some particular subject to death, but stick with it—you’ll frequently find that these details become critically important later on when trying to get your own application ideas to work. One nice side benefit to this approach is that these fundamental skills are relevant to a lot more than just working with the Kinect. If you master them here in the course of your work with the Kinect, they will serve you well throughout all your other work with Processing, unlocking many new possibilities in your work, and really pushing you decisively beyond beginner status. There are three fundamental techniques that we need to build all of the fancy applications that make the Kinect so exciting: processing the depth image, working in 3D, and accessing the skeleton data. From 3D scanning to robotic vision, all of these applications measure the distance of objects using the depth image, reconstruct the image as a three- dimensional scene, and track the movement of individual parts of a user’s body. The first half of this book will serve as an introduction to each of these techniques. I’ll explain how the data provided by the Kinect makes these techniques possible, demonstrate how to implement them in code, and walk you through a few simple examples to show what they might be good for. Working with the Depth Camera First off, you’ll learn how to work with the depth data provided by the Kinect. The Kinect uses an IR projector and camera to produce a “depth image” of the scene in front of it. Unlike conventional images in which each pixel records the color of light that reached the camera from that part of the scene, each pixel of this depth image records the distance of the object in that part of the scene from the Kinect. When we look at depth images, they will look like strangely distorted black and white pictures. They look strange because the color of each part of the image indicates not how bright that object is, but how far away it is. The brightest parts of the image are the closest, and the darkest parts are the farthest away. If we write a Processing program that examines the brightness of each pixel in this depth image, we can figure out the distance of every object in front of the Kinect. Using this same technique and a little bit of clever coding, we can also follow the closest point as it moves, which can be a convenient way of tracking a user for simple interactivity. Working with Point Clouds This first approach treats the depth data as if it were only two-dimensional. It looks at the depth information captured by the Kinect as a flat image when really it describes a three- dimensional scene. In the third chapter, we’ll start looking at ways to translate from these two-dimensional pixels into points in three-dimensional space. For each pixel in the depth image, we can think of its position within the image as its x-y coordinates. That is, if we’re looking at a pixel that’s 50 pixels in from the top-left corner and 100 pixels down, it has an x-coordinate of 50 and a y-coordinate of 100. But the pixel also has a grayscale value. And we know from our initial discussion of the depth image that each pixel’s grayscale value corresponds to the depth of the image in front of it. Hence, that value will represent the pixel’s z-coordinate. Once we’ve converted all our two-dimensional grayscale pixels into three-dimensional points in space, we have what is called a point cloud—that is, a bunch of disconnected points floating near each other in three-dimensional space in a way that corresponds to the arrangement of the objects and people in front of the Kinect. You can think of this point cloud as the 3D equivalent of a pixelated image. While it might look solid from far away, if we look closely, the image will break down into a bunch of distinct points with space visible between them. If we wanted to convert these points into a smooth continuous surface, we’d need to figure out a way to connect them with a large number of polygons to fill in the gaps. This is a process called constructing a mesh, and it’s something we’ll cover extensively later in the book in the chapters on physical fabrication and animation. For now, though, there’s a lot we can do with the point cloud itself. First of all, the point cloud is just cool. Having a live 3D representation of yourself and your surroundings on your screen that you can manipulate and view from different angles feels a little bit like being in the future. It’s the first time in using the Kinect that you’ll get a view of the world that feels fundamentally different that those that you’re used to seeing through conventional cameras. To make the most of this new view, you’re going to learn some of the fundamentals of writing code that navigates and draws in 3D. When you start working in 3D, there are a number of common pitfalls that I’ll try to help you avoid. For example, it’s easy to get so disoriented as you navigate in 3D space that the shapes you draw end up not being visible. I’ll explain how the 3D axes work in Processing and show you some tools for navigating and drawing within them without getting confused. Another frequent area of confusion in 3D drawing is the concept of the camera. To translate our 3D points from the Kinect into a 2D image that we can actually draw on our flat computer screens, Processing uses the metaphor of a camera. After we’ve arranged our points in 3D space, we place a virtual camera at a particular spot in that space, aim it at the points we’ve drawn, and, basically, take a picture. Just as a real camera flattens the objects in front of it into a 2D image, this virtual camera does the same with our 3D geometry. Everything that the camera sees gets rendered onto the screen from the angle and in the way that it sees it. Anything that’s out of the camera’s view doesn’t get rendered. I’ll show you how to control the position of the camera so that all of the 3D points from the Kinect that you want to see end up rendered on the screen. I’ll also demonstrate how to move the camera around so we can look at our point cloud from different angles without having to ever physically move the Kinect. Working with the Skeleton Data The third technique is in some ways both the simplest to work with and the most powerful. In addition to the raw depth information we’ve been working with so far, the Kinect can, with the help of some additional software, recognize people and tell us where they are in space. Specifically, our Processing code can access the location of each part of a user’s body in 3D: we can get the exact position of hands, head, elbows, feet, etc. One of the big advantages of depth images is that computer vision algorithms work better on them than on conventional color images. The reason Microsoft developed and shipped a depth camera as a controller for the Xbox was not to show players cool looking point clouds, but because they could run software on the Xbox that processes the depth image in order to locate people and find the positions of their body parts. This process is known as skeletonization because the software infers the position of a user’s skeleton (specifically, his joints and the bones that connect them) from the data in the depth image. By using the right Processing library, we can get access to this user position data without having to implement this incredibly sophisticated skeletonization algorithm ourself. We can simply ask for the 3D position of any joint we’re interested in and then use that data to make our applications interactive. In Chapter 4, I’ll demonstrate how to access the skeleton data from the Kinect Processing library and how to use it to make our applications interactive. To create truly rich interactions, we’ll need to learn some more sophisticated 3D programming. In Chapter 3, when working with point clouds, we’ll cover the basics of 3D drawing and navigation. Then, we’ll add to those skills by learning more advanced tools for comparing 3D points with each other, tracking their movement, and even recording it for later playback. These new techniques will serve as the basic vocabulary for some exciting new interfaces we can use in our sketches, letting users communicate with us by striking poses, doing dance moves, and performing exercises (among many other natural human movements). Once we’ve covered all three of these fundamental techniques for working with the Kinect, we’ll be ready to move on to the cool applications that probably drew you to this book in the first place. This book’s premise is that what’s truly exciting about the Kinect is that it unlocks areas of computer interaction that were previously only accessible to researchers with labs full of expensive experimental equipment. With the Kinect, things like 3D scanning and advanced robotic vision are suddenly available to anyone with a Kinect and an understanding of the fundamentals described here. But to make the most of these new possibilities, you need a bit of background in the actual application areas. To build robots that mimic human movements, it’s not enough just to know how to access the Kinect’s skeleton data, you also need some familiarity with inverse kinematics, the study of how to position a robot’s joints in order to achieve a particular pose. To create 3D scans that can be used for fabrication or computer graphics, it’s not enough to understand how to work with the point cloud from the Kinect, you need to know how to build up a mesh from those points and how to prepare and process it for fabrication on a MakerBot, a CNC machine, or 3D printer. The final two chapters will provide you with introductions to exactly these topics: 3D scanning for fabrication and 3D vision for robotics. 3D Scanning for Digital Fabrication

Description:
This detailed, hands-on guide provides the technical and conceptual information you need to build cool applications with Microsoft’s Kinect, the amazing motion-sensing device that enables computers to see. Through half a dozen meaty projects, you’ll learn how to create gestural interfaces for so
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.