2013 ComParE Feature Set

This blog will introduce a well-evolved feature set for automatic recognition of audio emotion, ie. 2013 ComParE Feature Set[1]. The ComParE feature set contains 6373 features. It consists of some generic acoustic emotion descriptors and their statistical functionals.

65 Acoustic low-level descriptors(LLDs)

The low-level descriptors cover a broad set of descriptors from the fields of speech processing, Music Information Retrieval, and general sound analysis. The set includes energy, spectral, and voicing related low-level descriptors (LLDs) including logarithmic harmonic-to-noise ratio (HNR), spectral harmonicity, and psychoacoustic spectral sharpness. Details are shown below.[2]

  • Sum of auditory spectrum (loudness)
  • Sum of RASTA-style filtered auditory spectrum
  • RMS energy
  • zero-crossing rate

55 SPECTRAL LLD

  • RASTA-style auditory spectrum, bands 1–26 (0–8 kHz)
  • MFCC 1–14
  • Spectral energy 250–650 Hz, 1 k–4 kHz
  • Spectral roll off point 0.25, 0.50, 0.75, 0.90
  • Spectral flux, centroid, entropy, slope
  • Psychoacoustic sharpness, harmonicity
  • Spectral variance, skewness, kurtosis
  • F 0 (SHS and viterbi smoothing)
  • Prob. of voice
  • Log. HNR, Jitter (local, delta), Shimmer (local)

Statistical functionals

Statistical functionals include mean, moments, quartiles, 1- and 99-percentiles, as well as contour related measurements such as (relative) rise and fall times, amplitudes and standard deviations of local maxima (‘peaks’), and linear and quadratic regression coefficients.

FUNCTIONALS APPLIED TO LLD/deta_LLD

  • Quartiles 1–3, 3 inter-quartile ranges
  • 1% Percentile (=min), 99% percentile (=max)
  • Percentile range 1–99%
  • Position of min/max, range (max min)
  • Arithmetic mean, root quadratic mean
  • Contour centroid, flatness
  • Standard deviation, skewness, kurtosis
  • Rel. duration LLD is above 25/50/75/90% range
  • Rel. duration LLD is rising
  • Rel. duration LLD has positive curvature
  • Gain of linear prediction (LP), LP coefficients 1–5
  • Mean, max, min, SD of segment length

FUNCTIONALS APPLIED TO LLD ONLY

  • Mean value of peaks
  • Mean value of peaks – arithmetic mean
  • Mean/SD of inter peak distances
  • Amplitude mean of peaks, of minima
  • Amplitude range of peaks
  • Mean/SD of rising/falling slopes
  • Linear regression slope, offset, quadratic error
  • Quadratic regression a, b, offset, quadratic error
  • Percentage of non-zero frames

Reference

[1] Schuller B, Steidl S, Batliner A, et al. The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism[J]. 2013.
[2] Weninger F, Eyben F, Schuller B W, et al. On the acoustics of emotion in audio: what speech, music, and sound have in common[J]. 2013.