Features in MER

Analysis Content

  • Text-Content (Web-Documents, Social-Tags, Lyrics)
  • Audio-Content (Acoustic Features)

Audio-Content

P1. Measurement and time series analysis of emotion in music(Schubert1999 Cited: 118’)
A very old book introducing measurement and time series analysis of emotion in music.

Types Features
Loudnedss related Dynamics
Pitch related Mean pitch, Pitch range, Variation in pitch, Melodic contour, Register, Mode, Timbre, Harmony
Duration related Tempo, Articulation, Note onset, Vibrato, Rhythm, Metre

P2. Automatic mood detection from acoustic music data(ISMIR2003 Cited: 233’)
“It was indicated that mode, intensity, timbre and rhythm are of great significance in arousing different music moods. However, mode is very difficult to obtain from acoustic data (Hinn, 1996). Therefore, only the rest three features are extracted and used in our mood detection system.”

Types Features
Intensity Root mean-square (RMS) level in decibels
Rhythm Average strength, Average correlation peak, Average tempo
Timbre Spectral Shape Features: Centroid, Bandwidth, Roll off, Spectral Flux; Spectral Contrast Features: Sub-band Peak, Sub-band Valley, Sub-band Average

P3. Disambiguating Music Emotion Using Software Agents(ISMIR2004 Cited: 118’)
This paper confirmed the results of P2 which found that emotional intensity was highly correlated with rhythm and timbre features.

Types Features
Tempo Beats per Minute (BPM)
LLD Low-level standard descriptors from the MPEG-7 audio standard (12 attributes)
Timbre Spectral centroid, Spectral rolloff, Spectral flux, Spectral kurtosis
Intensity Labels of intensity from 0 to 9 were applied to instances by a human listener
Another 12 attributes Generated by a genetic algorithm using the Sony Extractor Discovery System (EDS)

Tools recommended:Wavelet tools, MPEG-7 Low Level Descriptors, Sony Extractor Discovery System (EDS)

P4. Modeling emotional content of music using system identification(TSMC2005 Cited: 104’)

Types Features
Dynamiscs Loudness level, Short term max.loudness
Mean Pitch Power spectrum centroid, Mean STFT centroid
Pitch Variation Mean STFT Flux, Std dev. STFT flux, Std dev. STFT centroid
Timbre Timbral Width, Mean STFT rolloff, Std. dev. STFT rolloff, Sharpness(Zwicker and Fastl)
Harmony Spectral dissonance(Hutchinson and Knopoff), Spectral dissonance(Sethares), Tonal dissonance(Hutchinson and Knopoff), Tonal dissonance(Sethares), Complex tonalness
Tempo Beats Per Minute
Texture Multiplicity

Tools recommended:PsySound, Marsyas

P5. Music Emotion Classification: A Fuzzy Approach(ACM MM2006 Cited: 142’)
This paper used PsySound2 to extract music features and choose 15 features as recommended in P1. “begins with all 15 features and then greedily removes the worst feature sequentially until no more accuracy improvement can be obtained.” Same as Detecting and Classifying Emotion in Popular Music(JCIS2006 Cited: 22’)

Types Features
Loudnedss related Dynamics
Pitch related Mean pitch, Pitch range, Variation in pitch, Melodic contour, Register, Mode, Timbre, Harmony
Duration related Tempo, Articulation, Note onset, Vibrato, Rhythm, Metre

Tools recommended:PsySound2

P6. Multi-Label Classification of Music into Emotions(ISMIR2008 Cited: 529’)

Types Features
Rhythm The two highest peaks and computing their amplitudes, their BMPs (beats per minute) and the high-to-low ratio of their BPMs; Summing the histogram bins between 40-90, 90-140 and 140- 250 BPMs respectively. -> 8
Timbre the first 13 MFCCs, spectral centroid, spectral rolloff and spectral flux for per frame -> 16 -> The mean, std, mean std and std std over all frames -> 64

Tools recommended:Marsyas tool

P7. A regression approach to music emotion recognition(TASLP2008 Cited: 319’)
“extract musical features and construct a 114-dimension feature space”

Types Features
PsySound Loudness, Level, Dissonance, Pitch -> 44
Marsyas Spectral centroid, Spectral rolloff, Spectral flux, Time domain zero-crossing and Mel-frequency cep- stral coefficient (MFCC) -> 19, 6 rhythmic content features (by beat and tempo detection), 5 pitch content features (by multi- pitch detection) -> 30
Spectral contrast Capture the relative spectral information in each subband and utilize the spectral peak, spectral valley, and their dynamics as features -> 12
DWCH histograms of Daubechies wavelet co- efficients at different frequency subbands with different resolutions -> 28

Tools recommended:PsySound, Marsyas, Matlab

P8. Music emotion recognition: A state of the art review(ISMIR2010 Cited: 268’)
“An overview of the most common acoustic features used for mood recognition”

Types Features
Dynamics RMS-Energy
Timbre MFCCs, Spectral-Shape, Spectral-Contrast
Harmony Roughness, Harmonic-Change, Key-Clarity, Majorness
Register Chromagram, Chroma-Centroid and Deviation
Rhythm Rhythm-Strength, Regularity, Tempo, Beat-Histograms
Articulation Event-Density, Attack-Slope, Attack-Time

Tools recommended:MIRtoolbox

P9. Machine recognition of music emotion: A review(TIST2012 Cited: 139’)
“briefly review some features that have been utilized in MER”

Types Features
Energy Audio power, Total loudness, Specific loudness sensation coefficients(SONE)
Rhythm Rhythm Strength, Rhythm Regularity, Rhythm Clarity, Average onset frequency, Average tempo
Melody Salient Pitch, Chromagram center, Key clarity, Mode, Harmonic change
Timbre MFCC

Tools recommended:MA Toolbox, MIRtoolbox, Marsyas tool

P10. Developing a benchmark for emotional analysis of music(PloSone2017)
This is a interesting competive workshop.
“Performance of the different feature-sets on valence, development and evaluation-sets of 2015, 20 fold cross-validation”

“Performance of the different feature-sets on arousal, development and evaluation-sets of 2015, 20 fold cross-validation”

Tools recommended:OpenSMILE

Findings

  1. Every lab has its own emo-features-set in music. Most common used features:
    MFCCs, Loudness, Spectral features (centroid, flux, rolloff, flatness), Timbre, Rhythm, Pitch, Harmony, Zero crossing rate

  2. Acoustic feature extraction has better use a number of tools to give a broad mix from which to select the best features:
    Marsyas, MIRtoolbox, PsySound, OpenSMILE