On this page you can find the results presented in the article as well as additional ones that explore the performance of the system according to music genre, type of headphones, environment etc. The models evaluated are:
- Baseline model by Estreder et al.
- Proposed model DPNMM without any power constraint
- DPNMM with $\Delta \mathcal{P}_{max} = $ 2, 1, 0.5 dBA
Metrics
Two objective metrics are considered to perform the evaluation of the models. We compute a mean Noise-to-Mask Ratio (NMR) per audio sample of the test set, only selecting the Bark bands where the initial music masking threshold is below the noise level :
\[\text{NMR} = \frac{1}{M} \sum_{n, \nu} (1-m_\nu(n)) | P_{dB}^{noise}(n,\nu) - \hat{T}_{dB}(n,\nu) |,\]where $M = \sum_{n, \nu} (1-m_\nu(n))$ with $m_\nu$ a mask such that $m_\nu(n) = 0$ if the initial threshold is below the noise, and $m_\nu(n) = 1$ otherwise. The obtained NMR is compared to the initial NMR with the unprocessed music to evaluate how much the system can improve the masking effect on the bands where it is required. However, the system may as well induce power variations in the other bands. To evaluate this effect we also compute a mean Global Level Difference (GLD) :
\[\text{GLD} = \frac{1}{N} \sum | \hat{\mathcal{P}}_{dBA}^{music}(n) - \mathcal{P}_{dBA}^{music}(n) | .\]Both metrics are computed by frequency ranges: broadband, first third of Bark bands (low), second third (medium), and last third (high).
Results
General results (presented in the article)


In terms of NMR, all three versions of PDNMM outperform Estreder’s model on the broadband metric statistically significantly, except DPNMM with $\Delta \mathcal{P}_{max} =$ 0.5 dBA.
The version of the neural model without any power constraint performs the best compared to the baseline (p-value = $7 \cdot 10^{-8}$). Applying a power constraint results in a decrease in performance all the more important the stricter the constraint (low constraint threshold value) particularly in the low-frequency range to the point of becoming less performant than Estreder’s PEQ. This outcome is expected, given the relatively low weight of high frequencies in the power measurement. When the power constraint is strict, the low and mid frequencies are more significantly affected. This trend is confirmed when examining the GLD. Without a power constraint, the neural model achieves excellent NMR performance by significantly amplifying the musical signal compared to Estreder’s model. Adding the power constraint has then a clear beneficial effect on the GLD measure, thus achieving significantly better results compared to the baseline model, except at high frequencies where the model is less affected by the constraint. In particular, both neural models with constraints $\Delta \mathcal{P}_{max} =$ 2, 1 dBA achieve a better NMR than Estreder’s model (p-value of $10^{-6}$ and $0.01$) and a better broadband GLD (p-value of $1.5 \cdot 10^{-4}$ and $3.3 \cdot 10^{-9}$).
Earbuds impact
The noises in the test set are filtered with the frequency responses of 3 models of earbuds to reproduce their respective passive attenuations :
- Bose headphones QuietComfort
- Sony earbuds WF-1000XM4 with sound isolating sleeves
- Apple Airpods with smooth tips
The Bose and Sony headphones act as low-pass filters while the Airpods have a much flatter effect.
NMR
GLD
The airpods clearly reduce less the noise than the two other headphones models. Therefore the initial NMR is greater in this case and the performances for all models are reduced. This can be explained by a closer look at examples of results: the system concentrates primarily on the Bark bands where the NMR is highest (generally mid and high frequencies in the case of airpods), even if this means leaving out other bands.
Environments
The noise set is composed of samples from defined environments :
- Urban
- Transportation (train / plane / boat)
- Cocktail party (restaurant / café)
- Construction site
- Beach
- Indoor office
NMR
GLD
The performance obtained is broadly as expected, with poorer results in terms of both NMR and GLD in the noisiest environments (all the more so as the music level is contained in the [45, 100] dBA range).
SNR
We can also view the metrics by SNR band (in dB) between unprocessed music and noise.