@phdthesis{oai:sucra.repo.nii.ac.jp:00019696,
 author = {MD, KAMAL UDDIN},
 month = {},
 note = {xii, 78p, Person re-identification (Re-ID) is one of the most important tools of intelligent video-surveillance systems, which aims to recognize an individual across different non-overlapping sensors of a camera network. It is a very challenging task in computer vision because the visual appearance of an individual changes due to the variations in viewing angle, illumination intensity, pose, occlusion and diverse cluttered background. The general objective of this thesis is to tackle some of these constraints by proposing different approaches, which exploit modern RGB-D sensor-based additional information.
At first, we present a novel cross-modal person re-identification technique by exploiting local shape information of an individual, which bridges the domain gap between two modalities (RGB and Depth). The core idea is, most of the existing Re-ID systems widely use RGB-based appearance cues, which is not suitable when lighting conditions are very poor. However, for many security reasons, sometimes continued surveillance via camera in low lighting conditions is inevitable. To overcome this problem, we take advantage of the depth sensor based cameras (e.g. Microsoft Kinect and Intel RealSense Depth camera), which can be installed in dark places to capture video, while RGB based cameras can be installed in good lighting conditions. Such types of heterogeneous camera networks can be advantages due to the different sensing modalities available but face challenges to recognize people across depth and RGB cameras. In this approach, we propose a body partitioning method and novel HOG based feature extraction technique on both modalities, which extract local shape information from regions within an image. We find that combining the estimated features on both modalities can sometimes help to better reduce visual ambiguities of appearance features caused by lighting conditions and clothes. We also exploit an effective metric learning approach which obtains a better re-identification accuracy across RGB and depth domain.
In this dissertation, we also present two novel multi-modal person reidentification methods. In the first method, we introduce a depth guided attention-based person re-identification method in multi-modal scenario, which takes into account the depth-based additional information in the form of an attention mechanism. Most of the existing methods rely on complex dedicated attention-based architecture for feature fusion and thus become unsuitable for real-time deployment. In our approach, we propose a depth-guided foreground extraction mechanism that helps the model to dynamically select the more relevant convolutional filters of the backbone CNN architecture, for enhanced feature representation and inference.
In our second method, we propose a novel person re-identification technique that exploits the advantage of using multi-modal data for fusing in dissimilarity space, where we design a 4-channel RGB-D image input in the Re-ID framework. Additionally, lack of a proper RGB-D Re-ID dataset prompts us to collect a new RGB-D Re-ID dataset named SUCVL RGBD-ID, including RGB and depth images of 58 identities from three cameras where one camera was installed in poor illumination conditions and the remaining two cameras were installed in two different indoor locations with different indoor lighting environments.
Finally, extensive experimental evaluations on our dataset and publicly available datasets demonstrate that our proposed methods are efficient and outperform all the related state-of-the-art methods., 1 Introduction 1
　1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
　1.2 Person Re-identification . . . . . . . . . . . . . . . . . . . . . . . . . 2
　1.3 Challenges of Person Re-ID . . . . . . . . . . . . . . . . . . . . . . . 3
　1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
　1.5 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 7
　1.6 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Literature Review 10
　2.1 Single-modality Person Re-identification . . . . . . . . . . . . . . . . 11
　　　2.1.1 Feature Learning approach . . . . . . . . . . . . . . . . . . . . 12
　　　2.1.2 Metric Learning approach . . . . . . . . . . . . . . . . . . . . 12
　　　2.1.3 Deep Learning approach . . . . . . . . . . . . . . . . . . . . . 13
　2.2 Cross-modality Person Re-identification . . . . . . . . . . . . . . . . . 14
　2.3 Multi-modality Person Re-idetification . . . . . . . . . . . . . . . . . 15

3 Cross-modal Person Re-identification using Local Shape Information 19
　3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
　3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
　　　3.2.1 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . 22
　　　3.2.2 Metric learning . . . . . . . . . . . . . . . . . . . . . . . . . . 23
　　　3.2.3 Feature matching/classification . . . . . . . . . . . . . . . . . 24
　3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
　　　3.3.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
　　　3.3.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . 27
　　　3.3.3 Compared Methods . . . . . . . . . . . . . . . . . . . . . . . . 27
　　　3.3.4 Evaluation on BIWI RGBD-ID . . . . . . . . . . . . . . . . . 27
　　　3.3.5 Evaluation on IAS-Lab RGBD-ID . . . . . . . . . . . . . . . . 29
　3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4 Depth Guided Attention for Person Re-identification in Multi-modal Scenario 32
　4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
　4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
　　　4.2.1 The Overall Framework . . . . . . . . . . . . . . . . . . . . . 35
　　　4.2.2 Triplet Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
　4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
　　　4.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
　　　4.3.2 Evaluation Protocol . . . . . . . . . . . . . . . . . . . . . . . . 38
　　　4.3.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . 39
　　　4.3.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . 39
　4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5 Fusion in Dissimilarity Space for RGB-D Person Re-identification 43
　5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
　5.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
　　　5.2.1 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . 47
　　　5.2.2 Fusion Technique . . . . . . . . . . . . . . . . . . . . . . . . . 50
　5.3 SUCVL RGBD-ID Dataset Description . . . . . . . . . . . . . . . . . 52
　5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
　　　5.4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
　　　5.4.2 Evaluation Protocol . . . . . . . . . . . . . . . . . . . . . . . . 55
　　　5.4.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . 55
　　　5.4.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . 56
　　　5.4.5 Runtime Performance Evaluation . . . . . . . . . . . . . . . . 63
　5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
　　　5.5.1 General Observations . . . . . . . . . . . . . . . . . . . . . . . 63
　　　5.5.2 Failure Cases Analysis . . . . . . . . . . . . . . . . . . . . . . 64
　5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6 Conclusions and Future Work 66
　6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
　6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Publication List 68
Bibliography 68, 指導教員 : KOBAYASHI Yoshinori, text, application/pdf},
 school = {埼玉大学},
 title = {Cross-Modal and Multi-Modal Person Re-identification with RGB-D Sensors},
 year = {2021},
 yomi = {エムディ, カマル ウディン}
}