Combining Data-Driven 2D and 3D Human Appearance Models

Abstract

Detailed 2D and 3D body estimation of humans has many applications in our everyday life: interaction with machines, virtual try-on of fashion or product adjustments based on a body size estimate are just some examples. Two key components of such systems are: (1) detailed pose and shape estimation and (2) generation of images. Ideally, they should use 2D images as input signal so that they can be applied easily and on arbitrary digital images. Due to the high complexity of human appearance and the depth ambiguities in 2D space, data driven models are the tool at hand to design such methods. In this work, we consider two aspects of such systems: in the first part, we propose general optimization and implementation techniques for machine learning models and make them available in the form of software packages. In the second part, we present in multiple steps, how the detailed analysis and generation of human appearance based on digital 2D images can be realized. We work with two machine learning methods: Decision Forests and Artificial Neural Networks. The contribution of this thesis to the theory of Decision Forests consists of the introduction of a generalized entropy function that is efficient to evaluate and tunable to specific tasks and allows us to establish relations to frequently used heuristics. For both, Decision Forests and Neural Networks, we present methods for implementation and a software package. Existing methods for 3D body estimation from images usually estimate the 14 most important, pose defining points in 2D and convert them to a 3D `skeleton'. In this work we show that a carefully crafted energy function is sufficient to recover a full 3D body shape automatically from the keypoints. In this way, we devise the first fully automatic method estimating 3D body pose and shape from a 2D image. While this method successfully recovers a coarse 3D pose and shape, it is still a challenge to recover details such as body part rotations. However, for more detailed models, it would be necessary to annotate data with a very rich set of cues. This approach does not scale to large datasets, since the effort per image as well as the required quality could not be reached due to how hard it is to estimate the position of keypoints on the body surface. To solve this problem, we develop a method that can alternate between optimizing the 2D and 3D models, improving them iteratively. The labeling effort for humans remains low. At the same time, we create 2D models reasoning about factors more items than existing methods and we extend the 3D pose and body shape estimation to rotation and body extent. To generate images of people, existing methods usually work with 3D models that are hard to adjust and to use. In contrast, we develop a method that builds on the possibilities for automatic 3D body estimation: we use it to create a dataset of 3D bodies together with 2D clothes and cloth segments. With this information, we develop a data driven model directly producing 2D images of people. Only the broad interplay of 2D and 3D body and appearance models in different forms makes it possible to achieve a high level of detail for analysis and generation of human appearance. The developed techniques can in principle also be used for the analysis and generation of images of other creatures and objects.