ABSTRACT

This chapter describes our work on real-time, two-party, and multi-party voice-over-IP (VoIP) systems that can achieve high perceptual conversational quality. It focuses on the fundamental understanding of conversational quality and its trade-offs among the design of speech codecs and strategies for network control, playout scheduling (POS), and loss concealments. We have studied three key aspects that address the limitations of existing work and improve the perceptual quality of VoIP systems. Firstly, we have developed a statistical approach based on a just-noticeable difference (JND) to significantly reduce the large number of subjective tests, as well as a classification method to automatically learn and generalize the results to unseen conditions. Using network and conversational conditions measured at run time, the classifier learned helps adjust the control algorithms in achieving high perceptual conversational quality. Secondly, we have designed a cross-layer speech codec to interface with the loss-concealment (LC) and POS algorithms in the packetstream layer in order to be more robust and effective against packet losses. Thirdly, we have developed a distributed algorithm for equalizing mutual

CONTENTS

2.1 Introduction ..................................................................................................42 2.2 Evaluating Conversational Quality ........................................................... 47

2.2.1 Previous Work .................................................................................. 47 2.2.2 Evaluations-Generalization of Conversational Quality ............. 49

2.3 Cross-Layer Speech Codecs for VoIP ........................................................54 2.3.1 Previous Work on Speech Codecs .................................................54 2.3.2 Cross-Layer Designs of Speech Codecs ........................................58