Features in MER
Analysis Content
- Text-Content (Web-Documents, Social-Tags, Lyrics)
- Audio-Content (Acoustic Features)
Audio-Content
P1. Measurement and time series analysis of emotion in music(Schubert1999 Cited: 118’)
A very old book introducing measurement and time series analysis of emotion in music.
Types | Features |
---|---|
Loudnedss related | Dynamics |
Pitch related | Mean pitch, Pitch range, Variation in pitch, Melodic contour, Register, Mode, Timbre, Harmony |
Duration related | Tempo, Articulation, Note onset, Vibrato, Rhythm, Metre |
P2. Automatic mood detection from acoustic music data(ISMIR2003 Cited: 233’)
“It was indicated that mode, intensity, timbre and rhythm are of great significance in arousing different music moods. However, mode is very difficult to obtain from acoustic data (Hinn, 1996). Therefore, only the rest three features are extracted and used in our mood detection system.”
Types | Features |
---|---|
Intensity | Root mean-square (RMS) level in decibels |
Rhythm | Average strength, Average correlation peak, Average tempo |
Timbre | Spectral Shape Features: Centroid, Bandwidth, Roll off, Spectral Flux; Spectral Contrast Features: Sub-band Peak, Sub-band Valley, Sub-band Average |
P3. Disambiguating Music Emotion Using Software Agents(ISMIR2004 Cited: 118’)
This paper confirmed the results of P2 which found that emotional intensity was highly correlated with rhythm and timbre features.
Types | Features |
---|---|
Tempo | Beats per Minute (BPM) |
LLD | Low-level standard descriptors from the MPEG-7 audio standard (12 attributes) |
Timbre | Spectral centroid, Spectral rolloff, Spectral flux, Spectral kurtosis |
Intensity | Labels of intensity from 0 to 9 were applied to instances by a human listener |
Another 12 attributes | Generated by a genetic algorithm using the Sony Extractor Discovery System (EDS) |
Tools recommended:Wavelet tools, MPEG-7 Low Level Descriptors, Sony Extractor Discovery System (EDS)
P4. Modeling emotional content of music using system identification(TSMC2005 Cited: 104’)
Types | Features |
---|---|
Dynamiscs | Loudness level, Short term max.loudness |
Mean Pitch | Power spectrum centroid, Mean STFT centroid |
Pitch Variation | Mean STFT Flux, Std dev. STFT flux, Std dev. STFT centroid |
Timbre | Timbral Width, Mean STFT rolloff, Std. dev. STFT rolloff, Sharpness(Zwicker and Fastl) |
Harmony | Spectral dissonance(Hutchinson and Knopoff), Spectral dissonance(Sethares), Tonal dissonance(Hutchinson and Knopoff), Tonal dissonance(Sethares), Complex tonalness |
Tempo | Beats Per Minute |
Texture | Multiplicity |
Tools recommended:PsySound, Marsyas
P5. Music Emotion Classification: A Fuzzy Approach(ACM MM2006 Cited: 142’)
This paper used PsySound2 to extract music features and choose 15 features as recommended in P1. “begins with all 15 features and then greedily removes the worst feature sequentially until no more accuracy improvement can be obtained.” Same as Detecting and Classifying Emotion in Popular Music(JCIS2006 Cited: 22’)
Types | Features |
---|---|
Loudnedss related | Dynamics |
Pitch related | Mean pitch, Pitch range, Variation in pitch, Melodic contour, Register, Mode, Timbre, Harmony |
Duration related | Tempo, Articulation, Note onset, Vibrato, Rhythm, Metre |
Tools recommended:PsySound2
P6. Multi-Label Classification of Music into Emotions(ISMIR2008 Cited: 529’)
Types | Features |
---|---|
Rhythm | The two highest peaks and computing their amplitudes, their BMPs (beats per minute) and the high-to-low ratio of their BPMs; Summing the histogram bins between 40-90, 90-140 and 140- 250 BPMs respectively. -> 8 |
Timbre | the first 13 MFCCs, spectral centroid, spectral rolloff and spectral flux for per frame -> 16 -> The mean, std, mean std and std std over all frames -> 64 |
Tools recommended:Marsyas tool
P7. A regression approach to music emotion recognition(TASLP2008 Cited: 319’)
“extract musical features and construct a 114-dimension feature space”
Types | Features |
---|---|
PsySound | Loudness, Level, Dissonance, Pitch -> 44 |
Marsyas | Spectral centroid, Spectral rolloff, Spectral flux, Time domain zero-crossing and Mel-frequency cep- stral coefficient (MFCC) -> 19, 6 rhythmic content features (by beat and tempo detection), 5 pitch content features (by multi- pitch detection) -> 30 |
Spectral contrast | Capture the relative spectral information in each subband and utilize the spectral peak, spectral valley, and their dynamics as features -> 12 |
DWCH | histograms of Daubechies wavelet co- efficients at different frequency subbands with different resolutions -> 28 |
Tools recommended:PsySound, Marsyas, Matlab
P8. Music emotion recognition: A state of the art review(ISMIR2010 Cited: 268’)
“An overview of the most common acoustic features used for mood recognition”
Types | Features |
---|---|
Dynamics | RMS-Energy |
Timbre | MFCCs, Spectral-Shape, Spectral-Contrast |
Harmony | Roughness, Harmonic-Change, Key-Clarity, Majorness |
Register | Chromagram, Chroma-Centroid and Deviation |
Rhythm | Rhythm-Strength, Regularity, Tempo, Beat-Histograms |
Articulation | Event-Density, Attack-Slope, Attack-Time |
Tools recommended:MIRtoolbox
P9. Machine recognition of music emotion: A review(TIST2012 Cited: 139’)
“briefly review some features that have been utilized in MER”
Types | Features |
---|---|
Energy | Audio power, Total loudness, Specific loudness sensation coefficients(SONE) |
Rhythm | Rhythm Strength, Rhythm Regularity, Rhythm Clarity, Average onset frequency, Average tempo |
Melody | Salient Pitch, Chromagram center, Key clarity, Mode, Harmonic change |
Timbre | MFCC |
Tools recommended:MA Toolbox, MIRtoolbox, Marsyas tool
P10. Developing a benchmark for emotional analysis of music(PloSone2017)
This is a interesting competive workshop.
“Performance of the different feature-sets on valence, development and evaluation-sets of 2015, 20 fold cross-validation”
“Performance of the different feature-sets on arousal, development and evaluation-sets of 2015, 20 fold cross-validation”
Tools recommended:OpenSMILE
Findings
Every lab has its own emo-features-set in music. Most common used features:
MFCCs, Loudness, Spectral features (centroid, flux, rolloff, flatness), Timbre, Rhythm, Pitch, Harmony, Zero crossing rateAcoustic feature extraction has better use a number of tools to give a broad mix from which to select the best features:
Marsyas, MIRtoolbox, PsySound, OpenSMILE