• Haibo Li, KTH Royal Institute of Technology, Sweden
  • Lena Klasén, Research Director, National Forensic Center, Sweden


Deep learning has changed the research of computer vision and revolutionized the panorama of techniques and solutions for human behaviors understanding and human action recognition. With deep learning techniques like “open-pose”, machines can detect human actions well from a video camera by a real-time analysis of body gestures. In many practical applications like in forensics it is good to know what a person is doing now but it is more important to know what the person will do next. This addresses a fundamental question in computer vision, human intention prediction.

It is extremely important but also very challenging to be able to predict human intentions, particularly, long-term predictions. This is one of the hot research topics in the field. Most of the related research used the human body to infer his intention. The human body is usually treated as a physical object viewed from a third person view under the assumption that the human intention is quantitatively and objectively measurable. Building motion models for the human body in action or observing and computing head orientation and body gestures are shown effective ways in making a short-term prediction of human intention, saying several hundreds millisecond ahead, but they fail to make a long term prediction.

There are alternative ways to treat human intention and be able to make a long- term prediction. One of such solutions is a phenomenological approach, which treats human intention not objectively measurable but a fundamentally different way to characterize human intention and offer alternatives to make a long-term intention prediction, which has not been possible with existing approaches. In this tutorial we will give a detailed explanation of phenomenology of human intention and Gibson’s ecological approach to visual perception and use real forensic cases, for example, a recent phenomenological investigation on the 2017 Stockholm truck attack, to demonstrate how these theories change the way of predicting human intention.

More specifically the topics will be covered are:

  • Human intention prediction through human body and body gestures: the state of the art, new solutions in 2D and in 3D coping with body part occlusions;

  • The requirements and challenges of measuring and predicting human behaviors in the forensic field. State of the art forensic tools of human behavior understanding and intention prediction.

  • Phenomenology of human intention and Gibson’s ecological approach to visual perception of human actions.

  • Egocentric perception and computing of human intention.

  • Visual affordance definition and computing. Examples including sittability and walkability and demonstrate how to use visual affordance to make a
    long-term prediction of human intention.