Features in MER

Posted on 2017-04-27 Edited on 2021-10-21

Analysis Content

Text-Content (Web-Documents, Social-Tags, Lyrics)
Audio-Content （Acoustic Features）

Audio-Content

P1. Measurement and time series analysis of emotion in music(Schubert1999 Cited: 118’)
A very old book introducing measurement and time series analysis of emotion in music.

Types	Features
Loudnedss related	Dynamics
Pitch related	Mean pitch, Pitch range, Variation in pitch, Melodic contour, Register, Mode, Timbre, Harmony
Duration related	Tempo, Articulation, Note onset, Vibrato, Rhythm, Metre

P2. Automatic mood detection from acoustic music data(ISMIR2003 Cited: 233’)
“It was indicated that mode, intensity, timbre and rhythm are of great significance in arousing different music moods. However, mode is very difficult to obtain from acoustic data (Hinn, 1996). Therefore, only the rest three features are extracted and used in our mood detection system.”

Types	Features
Intensity	Root mean-square (RMS) level in decibels
Rhythm	Average strength, Average correlation peak, Average tempo
Timbre	Spectral Shape Features: Centroid, Bandwidth, Roll off, Spectral Flux; Spectral Contrast Features: Sub-band Peak, Sub-band Valley, Sub-band Average

P3. Disambiguating Music Emotion Using Software Agents(ISMIR2004 Cited: 118’)
This paper confirmed the results of P2 which found that emotional intensity was highly correlated with rhythm and timbre features.

Types	Features
Tempo	Beats per Minute (BPM)
LLD	Low-level standard descriptors from the MPEG-7 audio standard (12 attributes)
Timbre	Spectral centroid, Spectral rolloff, Spectral flux, Spectral kurtosis
Intensity	Labels of intensity from 0 to 9 were applied to instances by a human listener
Another 12 attributes	Generated by a genetic algorithm using the Sony Extractor Discovery System (EDS)

Tools recommended：Wavelet tools, MPEG-7 Low Level Descriptors, Sony Extractor Discovery System (EDS)

P4. Modeling emotional content of music using system identification(TSMC2005 Cited: 104’)

Types	Features
Dynamiscs	Loudness level, Short term max.loudness
Mean Pitch	Power spectrum centroid, Mean STFT centroid
Pitch Variation	Mean STFT Flux, Std dev. STFT flux, Std dev. STFT centroid
Timbre	Timbral Width, Mean STFT rolloff, Std. dev. STFT rolloff, Sharpness(Zwicker and Fastl)
Harmony	Spectral dissonance(Hutchinson and Knopoff), Spectral dissonance(Sethares), Tonal dissonance(Hutchinson and Knopoff), Tonal dissonance(Sethares), Complex tonalness
Tempo	Beats Per Minute
Texture	Multiplicity

Tools recommended：PsySound, Marsyas

P5. Music Emotion Classification: A Fuzzy Approach(ACM MM2006 Cited: 142’)
This paper used PsySound2 to extract music features and choose 15 features as recommended in P1. “begins with all 15 features and then greedily removes the worst feature sequentially until no more accuracy improvement can be obtained.” Same as Detecting and Classifying Emotion in Popular Music(JCIS2006 Cited: 22’)

Types	Features
Loudnedss related	Dynamics
Pitch related	Mean pitch, Pitch range, Variation in pitch, Melodic contour, Register, Mode, Timbre, Harmony
Duration related	Tempo, Articulation, Note onset, Vibrato, Rhythm, Metre

Tools recommended：PsySound2

P6. Multi-Label Classification of Music into Emotions(ISMIR2008 Cited: 529’)

Types	Features
Rhythm	The two highest peaks and computing their amplitudes, their BMPs (beats per minute) and the high-to-low ratio of their BPMs; Summing the histogram bins between 40-90, 90-140 and 140- 250 BPMs respectively. -> 8
Timbre	the first 13 MFCCs, spectral centroid, spectral rolloff and spectral flux for per frame -> 16 -> The mean, std, mean std and std std over all frames -> 64

Tools recommended：Marsyas tool

P7. A regression approach to music emotion recognition(TASLP2008 Cited: 319’)
“extract musical features and construct a 114-dimension feature space”

Types	Features
PsySound	Loudness, Level, Dissonance, Pitch -> 44
Marsyas	Spectral centroid, Spectral rolloff, Spectral flux, Time domain zero-crossing and Mel-frequency cep- stral coefficient (MFCC) -> 19, 6 rhythmic content features (by beat and tempo detection), 5 pitch content features (by multi- pitch detection) -> 30
Spectral contrast	Capture the relative spectral information in each subband and utilize the spectral peak, spectral valley, and their dynamics as features -> 12
DWCH	histograms of Daubechies wavelet co- efficients at different frequency subbands with different resolutions -> 28

Tools recommended：PsySound, Marsyas, Matlab

P8. Music emotion recognition: A state of the art review(ISMIR2010 Cited: 268’)
“An overview of the most common acoustic features used for mood recognition”

Types	Features
Dynamics	RMS-Energy
Timbre	MFCCs, Spectral-Shape, Spectral-Contrast
Harmony	Roughness, Harmonic-Change, Key-Clarity, Majorness
Register	Chromagram, Chroma-Centroid and Deviation
Rhythm	Rhythm-Strength, Regularity, Tempo, Beat-Histograms
Articulation	Event-Density, Attack-Slope, Attack-Time

Tools recommended：MIRtoolbox

P9. Machine recognition of music emotion: A review(TIST2012 Cited: 139’)
“briefly review some features that have been utilized in MER”

Types	Features
Energy	Audio power, Total loudness, Specific loudness sensation coefficients(SONE)
Rhythm	Rhythm Strength, Rhythm Regularity, Rhythm Clarity, Average onset frequency, Average tempo
Melody	Salient Pitch, Chromagram center, Key clarity, Mode, Harmonic change
Timbre	MFCC

Tools recommended：MA Toolbox, MIRtoolbox, Marsyas tool

P10. Developing a benchmark for emotional analysis of music(PloSone2017)
This is a interesting competive workshop.
“Performance of the different feature-sets on valence, development and evaluation-sets of 2015, 20 fold cross-validation”

“Performance of the different feature-sets on arousal, development and evaluation-sets of 2015, 20 fold cross-validation”

Tools recommended：OpenSMILE

Findings

Every lab has its own emo-features-set in music. Most common used features:
MFCCs, Loudness, Spectral features (centroid, flux, rolloff, flatness), Timbre, Rhythm, Pitch, Harmony, Zero crossing rate
Acoustic feature extraction has better use a number of tools to give a broad mix from which to select the best features:
Marsyas, MIRtoolbox, PsySound, OpenSMILE