The importance of gestures in human interactions has been the catalyst of research breakthroughs in areas such as human-computer interaction, multi-modal technologies, and gesture recognition and understanding. Nonetheless, as mainstream techniques such as deep learning keep pushing the boundaries of gesture recognition algorithms, problems that more fundamental to gestures such as gesture perception and production are yet to be addressed. In other words, current research should not only focus on gestures as means to interact with systems, but also on what makes and constitutes a gesture. One of the classical ways of addressing this problem lies in exploring the paradigms such as zero-shot and representation learning to recognize the semantic or high-level characteristics of gesture in conjunction with the gesture classification. In this regard, understanding the pragmatical, semantical and morphological attributes of gestures could lead to novel approaches to tackle standard gesture recognition problems.

This special session will focus on such fundamental challenges related to how to model and represent gestures. Specifically, we are interested in the challenges associated with the gestures’ (1) morphology (shape, movement), (2) phonology (tempo, synchrony, sonification), (3) semantics (meaning); (4) affective properties (can gestures convey emotions in a similar way to action units?), (5) motor characteristics (beat, how are gestures produced?); (6) cognitive aspects (gestures as enablers of understanding and learning), (7) pragmatics (the context in which they were generated), and (8) singularity (zero-shot or few-shot learning).

Last year, the special session on “Fundamental Challenges in Modeling, Representation and Synthesis of Gestures” brought together a diverse community of linguistics, psychologists, computer scientists, engineers and roboticists, to address these questions and propose solutions to these issues. Similarly, this year’s special session will act as the forum that unifies these challenges and questions into one coherent framework. With this session, we expect to gather knowledge related to gestures as enablers of understanding, learning and cognitive offload during human interactions. Additionally, we expect to gain new knowledge on how can gestures be represented in ways that address the eight aforementioned challenges. The insights generated from this session can shed light to how gestures are associated with emotion, learning and understanding; novel ways to integrate gestures’ semantic attributes into machine learning frameworks, and how can gestures be abstracted into a more detailed yet comprehensive data structures.


  • Juan P Wachs,
  • Edgar J. Rojas-Muñoz,