Ph.D. Thesis Appearance-based Mapping and Localization using Feature Stability Histograms for Mobile Robot Navigation Presented by Eval Bladimir Bacca Cortes 2012 Institute of Informatics and Applications COMPUTER VISION AND ROBOTICS GROUP Thesis submitted for the degree of Ph.D. in Technology Thesis Advisors: Dr. Xavier Cufí Dr. Prof. Joaquim Salvi VICOROB VICOROB University of Girona University of Girona We, Xavier Cufí and Joaquim Salvi, professors of the Computer Vision and Robotics group at the University of Girona, ATTEST: That this thesis "Appearance-based Mapping and Localization using Feature Stability Histograms for Mobile Robot Navigation", submitted by Eval Bladimir Bacca Cortes for the degree of European Ph.D. in Technology, was carried under our supervision. Signatures Authorship Declaration I hereby declare that this thesis contains no material which has been accepted for the award of any other degree or diploma in any university. To the best of my knowledge and belief, this thesis contains no material previously published or written by another person, except where due reference has been made. Bladimir Bacca Cortes. Funding This work has been partially funded by the Commission of Science and Technology of Spain (CICYT) through the coordinated projects DPI-2007- 66796-C03-02, and the project RAIMON (Autonomous Underwater Robot for Marine Fish Farms Inspection and Monitoring) CTM2011-29691-C02-02 funded by the Spanish Ministry of Science and Innovation. The LASPAU- COLCIENCIAS grant 136-2008, the University of Valle contract 644-19-04- 95, and the consolidated research group grant 2009SGR380. Abstract These days mobile robots are needed to interact within non-structured environments. They must deal with people, moving obstacles, perceptual aliasing, weather changes, occlusions, long term navigation and robot-human interaction in order to have high levels of autonomy from a decision-making point of view. These requirements are useful for service robots designed to conduct surveillance, inspect, deliver, clean and explore. Applications where robots need to collect sensor measurements from complex environments and extract meaningful information to achieve their tasks. Simultaneous Localization and Mapping (SLAM) is considered essential for mobile robots immersed in real world applications requiring any prior information about the environment. Robotics community has been trying to solve the SLAM problem in many ways, and using appearance or metric information to represent the environment. This thesis is concerned with the problem of appearance-based mapping and localization for mobile robots in changing environments. This introduces our research question: How can a mobile robot update its internal representation of the environment and its location on it when the appearance of the environment is changing? This work proposes an appearance-based mapping and localization method whose main contribution is the Feature Stability Histogram (FSH). The FSH is built using a voting schema, if the feature is re-observed, it will be promoted; otherwise it progressively decreases its corresponding FSH value. The FSH is based on the human memory model to deal with changing environments and long-term mapping and localization. The human memory model introduce concepts of Short-Term memory (STM), which retains information long enough to use it, and Long-Term memory (LTM), which retains information for longer periods of time or lifetime. If the entries in the STM are continuously rehearsed, they become part of the LTM (i.e. they become more stable). However, this work proposes a change in the pipeline of this model, allowing to any input be part of the STM or LTM considering the input strength (e.g. uncertainty, the Hessian value in the SURF descriptor, or the matching distance). The FSH stores the stability values of local features, stable features are only used for localization and mapping. This innovative feature management approach is able to cope with changing environments, long-term mapping and localization, and also contributes to the semantic environment representation. The mobile robot perception system plays an important role in SLAM, this must provide reliable information of the robot environment, taking advantage of its surroundings in order to reconstruct a consistent representation of the environment. Taking into account requirements as precision, real-time operation, wide field of view, long-term landmark tracking and robustness to occlusions, this work considered a perception system composed by the combination of 2D Laser Range Finder (LRF) and an omnidirectional camera. Monocular vision sensors have limited field of view, for this reason they are prone to occlusions and limited feature-tracking. Omnidirectional vision solves this problems but it introduces additional non-linearity due the mirror projection model. Despite the fact 2D LRF are limited to planar motion, 2D LRF can be combined with omnidirectional vision sensors providing a sensor model with enhanced information of the environment. This work describes a sensor model based in the extrinsic calibration between a 2D LRF and an omnidirectional camera in order to extract 3D locations of vertical edges. The vertical edges are the features used to describe the appearance of the environment. Data association is considered of crucial importance for any SLAM algorithm, this work proposes a matching method based on the unified spherical model for catadioptric sensors. This matching process improves the Joint Compatibility Branch and Bound (JCBB) test for data association considering the local appearance of the environment vertical edges. Experimental validation of this approach was conducted using two different SLAM algorithms, and with a long-term dataset collected during a period of 1 year. From the analysis of the experiments carried out the FSH model is able of: filtering out dynamic objects from laser scans and features present in the environment, increasing the map accuracy over the map updates, holding a model of the environment embedding the more stable appearance of the environment, increasing the localization accuracy over the map updates, dealing well with large environments, and reducing the data association effort in long-term runs. Resumen Actualmente, los robots móviles necesitan interactuar con ambientes no estructurados. Los robots móviles deben tratar con gente, obstáculos en movimiento, ambientes altamente similares, cambios climáticos, oclusiones, navegación a largo término y interacción humano-robot con el fin de tener altos niveles de autonomía en su toma de decisiones. Estos requerimientos son útiles en robots de servicio diseñados para llevar a cabo tareas de vigilancia, inspección, entrega de paquetes, limpieza y exploración. En estas aplicaciones los robots necesitan recolectar medidas de sus sensores en ambientes complejos y extraer información significativa para llevar a cabo sus tareas. Localización y construcción de mapas de manera simultánea (SLAM) es considerada una tarea esencial para los robots móviles inmersos en aplicaciones reales sin requerir información previa acerca del ambiente. La comunidad experta en Robótica ha tratado de solucionar el problema de SLAM de diferentes maneras, y usando información métrica o basada en apariencia para representar el ambiente. Esta tesis se ocupa del problema de la localización y construcción de mapas basados en apariencia para robots móviles en ambientes complejos. Esto introduce la pregunta de investigación: ¿Cómo un robot móvil puede actualizar su representación interna del entorno y su localización en éste cuando la apariencia del ambiente es cambiante? Este trabajo propone un método de localización y construcción de mapas basados en apariencia cuya principal contribución es el Histograma de Estabilidad de las Características (FSH). El FSH es construido usando un sistema de votos, si la característica en el ambiente es re-observada, esta será promovida; de lo contrario ésta progresivamente disminuye su correspondiente valor en el FSH. El FSH en el modelo de memoria humano para hacer frente a ambientes cambiantes y tareas de localización y construcción de mapas a largo término. El modelo de memoria humano introduce conceptos como memoria a corto plazo (STM), la cual retiene información el tiempo suficiente para ser usada, y la memoria a largo plazo (LTM), la cual retiene información por largos periodos de tiempo o de por vida. Si las características almacenadas en la STM son continuamente re-observadas, éstas harán parte de la LTM (i.e. estas características son más estables). Sin embargo, este trabajo propone un cambio en el proceso de memoria humana, permitiendo que cualquier entrada sea parte de la STM o de la LTM al considerar la intensidad de la característica de entrada (e.g. incertidumbre, el valor de la Hessiana en el descriptor SURF, o la distancia de correspondencia). El FSH almacena los valores de estabilidad de características locales, las cuales son solamente usadas para localización y construcción del mapa. Este innovador método de administrar las características del ambiente es capaz de hacer frente a entornos cambiantes, localización y construcción de mapas a largo término y también contribuye a una representación semántica del ambiente. El sistema de percepción del robot juega un importante papel en SLAM, éste debe proveer de información certera del ambiente del robot, aprovechándose de los alrededores con el fin de construir una representación consistente del ambiente. Teniendo en cuenta requerimientos como precisión, operación en tiempo-real, amplio campo de visión, seguimiento de características por largo tiempo, robustez a oclusiones, este trabajo ha considerado un sistema de percepción compuesto por la combinación de un sensor de rango láser (LRF) en 2D y una cámara omnidireccional. Sensores de visión monocular tienen un limitado campo de visión, por esta razón son propensos a oclusiones y seguimiento de características limitado. La visión omnidireccional resuelve estos problemas per introduce otros como la no-linearidad debido al modelo de proyección en el espejo. A pesar del hecho que los LRF son limitados a movimientos planos, los LRF pueden ser combinados con la visión omnidireccional proveyendo un modelo de sensor con información mejorada del entorno. Este trabajo describe un modelo de sensor basado en la calibración extrínseca entre un LRF y una cámara omnidireccional con el fin de extraer la posición 3D de bordes verticales. Los bordes verticales son características usadas para describir la apariencia del ambiente. La asociación de datos es considerada de crucial importancia para cualquier algoritmo de SLAM, este trabajo propone un método para establecer la correspondencia entre características basado en el modelo unificado de proyección para un sensor catadióptrico. Este proceso de asociación mejora el método JCBB (Joint Compatibility Branch and Bound) considerando la apariencia local del ambiente como bordes verticales. La validación experimental de este método fue realizada usando dos algoritmos diferentes de SLAM, y con un dataset adquirido durante un largo periodo de 1 año. Del análisis de los experimentos realizados el modelo FSH es capaz de: filtrar objetos dinámicos de los datos del sensor láser y de las características del ambiente, aumentar la precisión del mapa a lo largo de las actualizaciones del mapa, manteniendo un modelo del entorno embebido en la apariencia más estable del ambiente, incrementar la precisión en la localización a lo largo de la actualización de los mapas, tratando así con grandes ambientes y reduciendo el esfuerzo de asociación de datos en ejecuciones de largo término. Acknowledgements I would like to thank to my supervisors Xevi and Quim, whose helpful guidance and support encourage me during my PhD studies. They were always willing to help in any problem I could have. I was very pleased to be part of the Computer Vision and Robotics Group (VICOROB), thanks to all of them who were available to offer their collaboration. Specially, I would show my gratitude to Javi, Joseph, Ricard, Tudor, Angelos (please, do not use your compass within a city downtown), Simone, Pere, Marc, Joseta, Jordi Freixenet and Lluís Magi for his support in the mechanical stuff. Many thanks to secretaries of the institute (Anna, Montse and Rosa) who managed all the paper work needed for the trips. It is a real pleasure to thank the colleagues from the MIS (Modélisation, Information et Systèmes) laboratory: Ashu for his friendship and teaching me of Indian culture, Sang for his technical support, Guillaume and Damien for their presence in my speaking and Pauline. I am specially mentioning to Professor El Mustapha Mouaddib for his guidance, collaboration and concern during my stay. Many thanks to my colleague Eduardo Caicedo, who help a lot in the beginning of my studies taking care of the paper work of my grant, and as the main head of the Perception and Intelligent Systems group support me facing the Electrical and Electronic Engineering School directives. Finally, but most importantly, this thesis would not have possible without the tremendous support of my family: to my wife always present in good and bad moments; to my father who was in charge of all my issues in Colombia; to my mother, my sister, my brother, my laws and sister in law who were aware of us; and specially to my little girl Catalina, seeing you every morning encourage me to go on. Table of Contents 1. INTRODUCTION .................................................................................................. 1 1.1. MOTIVATION ....................................................................................................... 1 1.2. OBJECTIVE ......................................................................................................... 6 1.3. STRUCTURE OF THE THESIS ...................................................................................... 7 2. BACKGROUND ..................................................................................................... 8 2.1. INTRODUCTION ................................................................................................... 8 2.2. SLAM TECHNIQUES .............................................................................................. 10 2.2.1. Problem Formulation ................................................................................. 11 2.2.2. Kalman Filter-based SLAM ......................................................................... 12 2.2.3. Particle Filter SLAM ................................................................................... 13 2.2.4. Appearance-based SLAM ............................................................................ 14 2.2.5. Map Representation .................................................................................. 16 2.3. LIFELONG MAPPING AND LOCALIZATION ...................................................................... 16 2.4. ENVIRONMENT MODELING ..................................................................................... 19 2.4.1. Range Data Features ................................................................................. 20 2.4.2. Image Features ........................................................................................ 21 2.5. PLATFORM DESCRIPTION AND DATASETS ..................................................................... 23 2.5.1. The Mobile Robot and the Perception System................................................ 23 2.5.2. Datasets .................................................................................................. 25 2.6. DISCUSSION ..................................................................................................... 28 3. FEATURE EXTRACTION AND ENVIRONMENT MODELING ................................... 31 3.1. INTRODUCTION ................................................................................................. 32 3.2. LASER RANGE FINDER FEATURES .............................................................................. 32 3.2.1. Laser Range Finder Calibration ................................................................... 32 3.2.1.1. Laser alignment ................................................................................................ 33 3.2.1.2. Drift effect ........................................................................................................ 34 3.2.1.3. LRF linear model ............................................................................................... 35 3.2.2. Breakpoint Detection ................................................................................. 36 3.2.3. Laser Lines Detection ................................................................................ 38 3.3. OMNIDIRECTIONAL VISION FEATURES ........................................................................ 41 3.3.1. Central Catadioptric Edge Detection Algorithms ............................................ 41 3.3.2. Vertical Edge Detection ............................................................................. 42 3.3.3. Results .................................................................................................... 45 3.4. RANGE-AUGMENTED OMNIDIRECTIONAL VISION SENSOR .................................................. 47 3.4.1. Problem Formulation ................................................................................. 48 3.4.2. Simultaneous Parameter Estimation ............................................................ 49 3.4.3. Non-simultaneous Parameter Estimation ...................................................... 50 3.4.4. Results .................................................................................................... 51 3.5. TEXTURED VERTICAL EDGE FEATURES ........................................................................ 54 3.5.1. Sensor Model ........................................................................................... 55 3.5.2. Data association ....................................................................................... 61 3.5.3. Results .................................................................................................... 62 3.6. DISCUSSION ..................................................................................................... 66 4. FEATURE STABILITY HISTOGRAM MODEL ......................................................... 68 4.1. INTRODUCTION ................................................................................................. 69 4.2. HUMAN MEMORY MODEL ........................................................................................ 69 4.3. METHOD OVERVIEW ............................................................................................. 70 4.4. LOCALIZATION AND MAPPING USING THE FEATURE STABILITY HISTOGRAM .............................. 71 i
Description: