ABSTRACT

The term “telepresence” is used in a general manner to describe multimedia conferencing systems at least consisting of high-definition, high-quality audio and video, creating an environment among users or user groups who are not co-located, with a feeling of co-located presence through multimedia communication like an experience of “being there,” creating a sense of co-location of all participants. Some aspects of key aspects of telepresence are gaze awareness, eye contact, and actual size rendering. One example is an immersive telepresence system using specially designed and special-purpose conference rooms with multiple displays permitting life-size image reproduction using multiple cameras, encoders, decoders, microphones, and loudspeakers.

Telepresence can be formally defined as a real-time interactive audio, video, and data applications communications experience between all multipoint conference participants of being in the same space with a strong sense of realism and presence by optimizing a variety of attributes such as life-size images, high-fidelity and high-quality audio and video, spatial audio, preserving spatial relationships between streams, eye contact, gaze awareness, and body language.

A number of techniques for handling multiple audio and multiple video streams are used in specially designed conference rooms to create this experience. However, not all these techniques of telepresence systems are similar or the same, and interoperability between these disparate systems is not possible unless the same common standards are used across all of these telepresence systems. Fortunately, the Internet Engineering Task Force (IETF) Controlling Multiple Streams for Telepresence (CLUE) working group has created common standards for conveying information about the relationships between multiple streams of media (audio, video, and/or data applications) that would enable senders and receivers to make choices to allow telepresence systems to interwork. In fact, the CLUE framework (see Section 16.3) goes far beyond telepresence.

In this chapter, we have described CLUE requirements and use cases based on which the CLUE framework has been developed. Based on the framework, CLUE data model, signaling, and real-time transport protocol (RTP) usage are defined. Although session initiation protocol (SIP) has been used as the call control protocol, it has been seen that session description protocol (SDP) alone or its extension is not good enough to meet the CLUE requirements. That is why a new CLUE signaling protocol for session description that defines CLUE-specific session attributes has been defined. The overall signaling scheme for telepresence will include SIP, SDP, and CLUE. In the subsequent sections, we have described CLUE data model, CLUE signaling protocol for session description, RTP usage, and call flows.