
Novel Frequency Domain BWE Techniques for Enhanced Voice Services
Explore the innovative Frequency Domain Bandwidth Extension (BWE) techniques for Improved Voice Quality, synchronization methods, and switching mechanisms in the context of 3GPP Enhanced Voice Services (EVS) codec. Learn about the evolution of BWE from Prior-art Review to Multi-Mode FD BWE and its constraints within the EVS codec design.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies Co. Ltd Beijing, China Kihyun Choo, Eunmi Oh Samsung Electronics Co., Ltd., Seoul, Korea V clav Eksler VoiceAge Corp., Montreal, QC, Canada
Agenda 3GPP EVS codec overview Prior-art review of BWE techniques Multi-mode FD BWE Multi-Mode FD BWE with Relaxed Synchronization BWE Switching Mechanism Quality Evaluation Conclusions
3GPP EVS codec overview 3GPP Enhanced Voice Services (EVS) codec standardized in Sep. 2014. Significantly improves user experience EVS codec encodes the signal based on the signal content: Time-Domain LP coding techniques are used: ACELP and GSC GSC (Generic audio Signal Coding) is used to improve the quality of music/mixed segments in LP domain. Frequency-Domain coding techniques
Prior-art Review of BWE techniques BWE exploits the intrinsic correlation between the low and high frequency parts of a signal s spectrum in order to reconstruct the high frequency part. Spectral Band Replication (SBR) in MPEG-4 HE-AAC. A multi-mode bandwidth extension scheme in Recommendations ITU-T G.711.1 Annex D and G.722 Annex B. SWB is a key feature of EVS as a new 3GPP codec, important to extend the bandwidth from WB to SWB/FB. Time domain (TD) or Frequency domain (FD) BWE for different types of input signals. The paper focuses on FD-BWE on top of either ACELP or GSC. This switched bandwidth extension (BWE) approach improves the EVS codec LP based coding efficiency.
Multi-mode FD BWE Concept standardized for the first time in Recommendations ITU-T G.711.1 Annex D and G.722 Annex B. A transient detector identifies rapid variations of the high band signal over time. 4 classes: TRANSIENT (TS), HARMONIC (HM), NORMAL (NM) or NOISE (NS). A combination of adaptive spectral envelope and time envelope coding, derived from the high band signal. TRANSIENT frames: four spectral envelopes and four time envelopes. Non-TRANSIENT frames: fourteen spectral envelopes and no time envelope. The high frequency band excitation is generated by either normalizing the selected region of the low frequency band with an adaptive normalization length or generated by random noise.
FD BWE constraints in EVS The design constraints imposed upon the EVS codec specify that the total algorithmic delay of the codec must not exceed 32 ms. Transform used by FD-BWE: an Asymmetric Low Delay Optimized (ALDO) window with a time support of 40 ms while the non-zero window length is 28.75 ms. Still insufficient delay allowance remaining for the FD BWE to achieve the overall 32 ms delay requirement. Multi-mode BWE in ITU-T G.711.1 Annex D and G.722 Annex B as a baseline. Then the new low delay FD-BWE approach relies on a relaxed synchronization scheme, based on the multi-mode BWE in ITU-T codecs.
Relaxed synchronization The relaxed synchronization of multi-mode FD BWE is achieved by utilizing the time difference between the high frequency band excitation and the high frequency band envelope. Assume D1is the delay of the low frequency band coding, D2is the delay of high frequency band coding where D2is introduced by the windowing prior to the MDCT. The total algorithmic delay of the prior bandwidth extension algorithms is (D1+ D2). By the proposed scheme, the delay may be adaptively reduced to the range [max(D1, D2), (D1+D2)] with minimal impact on the perceptual quality. When the spectrum is relatively stable, the phase of the high frequency band signal is of relatively minor perceptual significance when compared to its energy. By reducing the delay, the time alignment between the energy envelope of the low and high frequency band signals is maintained while a short time misalignment (or a relaxation) of the synchronization between the high frequency band excitation and envelopes is permitted.
Time alignment An asymmetric low delay optimized (ALDO) window is used for FD BWE. Assumes that the target overall delay is Dt, which is in the range [max(D1, D2), (D1+D2)]. The input time domain signal may be delayed by (Dt D2). Encoder Input signal frame m High frequency band spectral envelope Dt-D2 frame m Dt D2 Decoder D1 Decoded low frequency band signal frame m+1 frame m frame m-1 frame m Delayed low frequency band signal Dt frame m The time misalignment between the high frequency band excitation and the high frequency band envelope is {(D1+D2) Dt}. Consequently a lower delay than (D1+D2) is achieved by the proposed FD BWE. D1+D2 High frequency band excitation frame m Dt Decoded high frequency band signal frame m
Transient frames The time envelope is calculated on top of the delayed high frequency band time domain signal: = n 1 79 2 = + = ( ) ( 80 ( ) ) n , , 0 . 3 , t j s j j rms hb 80 0 It is then adjusted by an attenuation factor R, which represents the energy attenuation of the low frequency band due to the LP based low band coding: 2 1 1 N N 2 LF LF = ( ( )) ( ( )) , R s n s n syn ori = = 0 0 n n Finally the time envelope is adjusted: 5 . 1 R ( ) if , 5 . 0 R R t j R rms rms = = ( ) ( ) else if 1 , , 0 , 3 , t j t j j rms ( ) otherwise , t j rms Spectral envelopes: multi-stage split VQ. The envelopes at even positions are quantized by Split VQs. The prediction errors at odd positions are calculated with interpolation and quantized by another stage of the Split VQ.
Other techniques Non-Transient frames Energy control in each sub-band to prevent unpleasant distortion. The distortion may occur due to the un-matched characteristics of the original and the generated spectra. Energy control adjusts the energies depending on the comparison of the tonalities of the two spectra to avoid the distortion. Adaptive normalization length to generate the high frequency band excitation The more harmonic the high frequency band is, the longer normalization length. Depends on the number of the sub-bands of the low frequency band whose peak to average ratio is larger than a threshold, nh, + if 24 , 2 32 max mode n h + = 4 . 0 25 if , n mode TS h = + = 8 5 . 0 if , L n mode NM or NS h ( ) = , HM A pre-echo reduction is performed to improve the performance of fricatives for Non-Transient frames.
EVS implementation details The proposed relaxed synchronization FD BWE scheme is applied in the EVS codec for WB at 13.2 kbps and for SWB at 13.2 kbps and 32 kbps. The BWE bit budget for SWB is 31 bits, while it is 6 bits for WB. For WB coding, there are fewer spectral envelopes and no time envelopes encoded since the WB FD BWE covers only frequencies from 6~8 kHz. The delay parameters correspond in the EVS codec to Dt= 12 ms (overall delay constraint minus the frame length), D1= 9.6875 ms (encoder look-ahead plus encoder resampling) D2= 8.75 ms (overlap length). This results in the time misalignment between the high frequency band excitation and the high frequency band envelope being 6.4375 ms.
BWE Switching Mechanism In general, TD BWE on top of ACELP performs well when encoding active speech segments and FD BWE on top of GSC performs well when encoding inactive and mixed/music segments. However, some mixed/music segments are better coded with ACELP coding and FD BWE. If the input signal is classified as a music signal, or the low frequency band signal is classified as inactive, multi-mode FD BWE is used irrespective of whether the low frequency band is coded with ACELP or GSC. Otherwise, if the input signal is classified as a speech signal, TD BWE is used no matter how the low frequency band has been coded. When the high frequency band signal is judged to contain inactive or mixed/music content signals then FD BWE is used as the high band coding technology. bandwidth TD-BWE TD-BWE ACELP coding GSC coding bandwidth 0 Technology TD-BWE ACELP coding FD-BWE TD-BWE GSC coding FD-BWE ACELP coding TD-BWE GSC coding bandwidth 0 Technology FD-BWE ACELP coding FD-BWE GSC coding 0 Technology
Quality Evaluation MUSHRA, 95% confidence intervals, 16 expert listeners, 16 mixed content items and 16 music items. Two variants were evaluated: Low delay (LD) FD BWE configured to have overall delay Dt= 12 ms. High delay (HD) FD BWE configured to have overall delay of Dt= (D1+D2) = 18.4375 ms. EVS SWB at 13.2 kbps. Low delay FD BWE is statistically equivalent to the high delay FD BWE.
Conclusions A novel multi-mode FD BWE scheme with relaxed synchronization optimized for inactive and mixed/music content signals is presented. It forms a part of the LP based coding of the 3GPP EVS codec. High subjective quality and low algorithmic delay are achieved by relaxing the time alignment constraints between the high frequency band excitation and its envelope. A switching mechanism between two different BWE technologies shows a performance advantage.