@phdthesis{oai:sucra.repo.nii.ac.jp:00010287, author = {Cao, Lu}, month = {}, note = {105 p., Human can effortlessly express their spatial experience and talk about where objects are located in relation some underlying objects. Since it is impossible for us to learn all the objects, such information has been critical to explore the visual world. Intuitively, if we know where the objects are, recognizing them will become easier. In computer vision, in order to mimic human's ability, an important and open problem is to endow robotic systems the ability to comprehend spatial relations as human does. This is somewhat like a school child does when learning to write a descriptive sentence, such as the CD is to the left of the book. The primary goal of this work is to design and demonstrate spatial recognition methods to bridge the gap between visual information and human cognition. Towards this goal, we treat spatial relations as a kind feature as well as other visual features, such as color, size, etc and have developed computational templates to represent spatial relations. We propose a novel model to encode linguistic spatial expressions. We first investigated how humans manipulate space by the action of natural language and classified basic class of relation. We then extracted the observations from cognitive systems to computer vision applications. We propose templates for recognizing spatial relation, translating linguistic expression into visual information, representing spatial terms in an angular fashion. The templates have been tested over 720 scenarios where 1-3 unknown objects within. Comprehending spatial relations are beyond simply distinguishing them. It is noticeable that spatial relation needs a pair of objects. In determination of different class of relation, the underlying objects which are named as reference objects play a decisive role. Concretely, objects like humans, animals and computer displays are somewhat different with objects like balls, boxes, cups in that they have intrinsic front side. The former's front is independent from interlocutors' viewpoint whereas the latter's is not. It turns out the front orientation adjacent to the frontal-side of those objects are transformed accordingly if they are rotated from the frontal view. We then focus on introducing an estimation model for those objects, from estimating poses transformations, to adjusting intrinsic-front orientation. The first step studies one prominent type of pose variation given viewpoint transformation in supervised fashion. Naive Bayesian classifier is followed for prediction. The estimator performs highly competitively with the state of the arts on the ETH-80 database, and an everyday-object database that we collected on our own. The models profit from an interactive interface, which is developed to understand some simple English words and grammatical structures. The ability makes our models are closer to the way of human-human interaction. Finally, we conduct experiments integrally within the system, which consists of an object detector, a spatial recognition model, a pose estimator and a user interface. The goal is recognizing unknown objects via comprehending spatial relations by interactive means. The simple yet effective models outperform in recognition tasks in the author's database., Abstract ............................................................................................................................................... 2 Acknowledgment ................................................................................................................................ 5 List of Figures ................................................................................................................................... 10 List of Tables ..................................................................................................................................... 14 Chapter 1 ........................................................................................................................................... 15 Introduction ....................................................................................................................................... 15 1.1 Spatial Relations in Visual Recognition .................................................................................. 15 1.2 Related Work .......................................................................................................................... 19 1.2.1 Spatial Comprehension in psychology, linguistics, and philosophy ................................ 20 1.2.2 Learning Spatial Relations for Robotic Systems .............................................................. 23 Chapter 2 ........................................................................................................................................... 29 Towards Spatial Comprehension ...................................................................................................... 29 2.1 Understanding Spatial Knowledge .......................................................................................... 29 2.1.1 Terminology .................................................................................................................... 29 2.1.2 Classification of Frames of Reference ............................................................................ 30 2.1.3 Spatial Templates and Their Acceptance Regions ........................................................... 35 2.2 Computational Model for Human Spatial Linguistic Expressions .......................................... 36 2.2.1 The 2d Projective Model of Intrinsic and Relative Frames of Reference ........................ 37 2.2.2 Modifications: The 3-D Computational Model ................................................................ 43 2.2.3 The Model of Group-based Frame of Reference .............................................................. 49 2.3 Conclusion ............................................................................................................................... 57 Chapter 3 ........................................................................................................................................... 58 Pose Estimation ................................................................................................................................. 58 3.1 Instruction ............................................................................................................................... 58 3.2 Related Work .......................................................................................................................... 59 3.3 The Model ............................................................................................................................... 61 3.3.1 Building Key-Pose Structure ............................................................................................ 61 3.3.2 Image Feature ................................................................................................................... 62 3.3.4 Adjusting Front Orientation ............................................................................................. 65 3.4 Experiment .............................................................................................................................. 66 3.4.1 Pose Estimation Result ..................................................................................................... 66 3.4.2 Adjusting front orientation ............................................................................................... 69 3.4.3 Spatial Recognition Experiment ....................................................................................... 70 3.5 Conclusion ............................................................................................................................... 72 Chapter 4 ........................................................................................................................................... 73 Constructing the Database ................................................................................................................. 73 4.1 Instruction ............................................................................................................................... 73 4.2 Relative work .......................................................................................................................... 74 4.3 Collecting Candidate Objects .................................................................................................. 74 4.3.1 Collecting Candidate Objects for Visual Recognition ..................................................... 75 4.3.2 Designing Scenarios for Spatial Recognition .................................................................. 77 4.4 Conclusion ............................................................................................................................... 80 Chapter 5 ........................................................................................................................................... 81 Interactive Object Recognition .......................................................................................................... 81 5.1 Integral System Overview ....................................................................................................... 81 5.2 The role of Natural Language ................................................................................................. 81 5.3 Experiment 1: close linguistic form ........................................................................................ 83 5.4 Experiment 2: Comparison with the original model ............................................................... 89 5.5 Failure Case Study .................................................................................................................. 91 Chapter 6 ........................................................................................................................................... 93 Conclusion ......................................................................................................................................... 93 Related Publications .......................................................................................................................... 95 Bibliography ...................................................................................................................................... 97, 主指導教員 : 久野義徳, text, application/pdf}, school = {埼玉大学}, title = {Comprehending Spatial Relations for Interactive Object Recognition}, year = {2013}, yomi = {ソウ, ロ} }