ABSTRACT

With video conferencing in business, medical treatment and education in a wide range of applications, modern audio video conferencing proposes new demands of voice audio, first of all, participants can hear each speaker’ voice in a moment at the same time. Secondly, participants can learn the tone of the speaker’s reaction and the attitude (Wang 2013). Currently, voice, audio processing mechanisms are that each speaker has a separate voice input device in the meeting, audio frequency transmitted to the device terminals through the network for mixing process and then play. The receiving end of equipment starts using multiple threads to receive audio data and processing, network bandwidth occupied very large. If the network jitters, audio eects are being played poorly after mixing, and it needs a higher quality final output device to solve this problem. If you play the mixing after output by PC, you cannot meet the requirements of the modern video conference. For this purpose, ITU-T (International Telecommunication Union-Telecommunications) (Toga 1999) proposed a centralized conference mode, in MCU (Multipoint Control Unit) it mixes the voice audio of the speaker, and then transfer the audio which after processing to the participants, and let participants receive and listen. It reduces the occupation of the terminal device and network bandwidth requirement).