Skip to the content.

CodecFlow: Efficient Bandwidth Extension via Conditional Flow Matching in Neural Codec Latent Space

Paper Link: Arxiv

Abstract of the paper

Speech Bandwidth Extension improves clarity and intelligibility by restoring/inferring appropriate high-frequency content for low-bandwidth speech. Existing methods often rely on spectrogram or waveform modeling, which can incur higher computational cost and have limited high-frequency fidelity. Neural audio codecs offer compact latent representations that better preserve acoustic detail, yet accurately recovering high-resolution latent information remains challenging due to representation mismatch. We present CodecFlow, a neural codec-based BWE framework that performs efficient speech reconstruction in a compact latent space. CodecFlow employs a voicing-aware conditional flow converter on continuous codec embeddings and a structure-constrained residual vector quantizer to improve latent alignment stability. Optimized end-to-end, CodecFlow achieves strong spectral fidelity and enhanced perceptual quality on 8 kHz to 16 kHz and 44.1 kHz speech BWE tasks.

Method

V/UV segmentation and LR–HR embedding cosine similarity over time. Upper: spectrogram with word boundaries and V/UV labels. Lower: cosine similarity; orange dashed regions mark UV-aligned drops, blue dashed line shows the global mean.
An overview of the proposed CodecFlow framework. (a) the overall model pipeline, (b) the architecture of the voicing extractor, and (c) the architecture of the flow prediction network from the flow embedding converter (FEC).

Results

Spectrogram comparisons for the 8 kHz to 16 kHz bandwidth expansion task. Spectrogram comparison between the 8 kHz input, the 16 kHz ground truth (GT), and model outputs, including NUWave2, AP-BWE, FlowHigh, and the proposed CodecFlow.
Spectrogram comparisons for the 8 kHz to 44.1 kHz bandwidth expansion task. Spectrogram comparison between the 8 kHz input, the 44.1 kHz ground truth (GT), and model outputs, including NUWave2, AP-BWE, FlowHigh, and the proposed CodecFlow.

Baseline Comparison

Bandwidth Extension from 8 kHz to 16 kHz
Target
Input
Nu-Wave2
AP-BWE
Fre-Painter
FlowHigh
CodecFlow
Bandwidth Extension from 8 kHz to 44.1 kHz
Target
Input
Nu-Wave2
AP-BWE
Fre-Painter
FlowHigh
CodecFlow
Target
Input
Nu-Wave2
AP-BWE
Fre-Painter
FlowHigh
CodecFlow

Ablation Study

Bandwidth Extension from 8 kHz to 16 kHz
Target
Input
CodecReg
CFM-Conf
CFM-UConf
CodecFlow
Bandwidth Extension from 8 kHz to 44.1 kHz
Target
Input
CodecReg
CFM-Conf
CFM-UConf
CodecFlow