cat README.md

PMEMO/2019

794 pop-music chorus clips, annotated by 457 listeners along the valence-arousal plane, with simultaneous electrodermal-activity (EDA / skin conductance) recordings. An open dataset for music emotion recognition, published at ICMR 2018 (Yokohama).

authors: Zhang K. · Zhang H. · Li S. · Yang C. · Sun L.
venue: ICMR '18 / Proc. of the 2018 ACM Int. Conf. on Multimedia Retrieval, pp. 135–142
repo: github.com/HuiZhangDB/PMEmo
lab: next.zju.edu.cn · NEXT Lab, Zhejiang University
tags: music · emotion · physiology · valence-arousal · pop-songs · multi-modal

[01]

OVERVIEW · BY THE NUMBERS

last updated 2019-06-26

SOURCE From Billboard Hot 100, iTunes Top 100 and UK Top 40 Singles charts (2016–2017), an initial pool of 1,000 popular songs was deduplicated to 794. For each song, the chorus segment was manually selected by music students, then annotated by ≥10 listeners (266 Chinese non-music students + 44 music majors + 47 native English speakers). Skin conductance was recorded simultaneously at 50 Hz.

FIG.01

SONGS_PER_SOURCE

n = 1329 (with overlap)

3 charts contributed the original 1,000-song pool. Overlap removed → 794 unique chorus clips.

FIG.02

QUADRANT_DISTRIBUTION

n = 767

Pop-music skews toward Q1 (high V, high A) — bright + energetic. Q4 (calm) is rare.

FIG.03

TOP_ARTISTS · n=15

by clip count

Drake leads the pack — typical of late-2010s pop-chart dominance.

[02]

VA_ATLAS · VALENCE-AROUSAL SPACE

click point → inspect

PROTOCOL Each clip is a point on the (V, A) plane, where V ∈ [0, 1] is perceived pleasantness and A ∈ [0, 1] is perceived energy. Background heatmap shows density (20×20 bins). Replicates Fig. 3 of the paper. Click a point to inspect.

FIG.04

VA_SCATTER + DENSITY

n = 767

[03]

SAMPLES · ONE PER QUADRANT

audio + dynamic VA + EDA

MODALITIES For each quadrant, one chorus clip with the most extreme V/A values is shown. Press play; the playhead slides along the dynamic VA curve (2 Hz) and the multi-subject EDA traces (50 Hz, downsampled, z-scored).

idx title artist val aro

[04]

TEXT · LYRICS & LISTENER COMMENTS

parsed + tokenised

SOURCES Lyrics scraped as .lrc files (timestamps stripped); comments crawled from Netease Cloud Music (中文) and SoundCloud (English). Word frequencies below — note the difference in tone between the two listener communities.

FIG.05

LYRICS_FREQUENCY

FIG.06

SOUNDCLOUD_COMMENTS

FIG.07

NETEASE_COMMENTS

FIG.08

NE_VS_SC · TOP-14 OVERLAY

cross-platform

中文评论偏抒情，英文评论偏直白：网易云常见 "感觉 / 喜欢 / 听到"，SoundCloud 常见 "follow / love / fire / song".

[05]

ML · MODEL EXPERIMENTS

SVR · 60-d PCA

SETUP Original paper: 6373-d ComParE features → MLR / SVR → static V/A regression. Reproduced here in compressed form: PCA-60 features → RBF-SVR (C=1.0). Test split below; r values closely match the paper's baseline.

FIG.09

PRED_VS_TRUE · VALENCE

FIG.10

PRED_VS_TRUE · AROUSAL

FIG.11

LISTENER_QUIZ · INTERACTIVE

human vs annotators

TAB.01

PAPER_BASELINES

10-fold CV · 6373-d

task	model	RMSE	r
static V	MLR	0.136	0.546
static V	SVR	0.124	0.638
static A	MLR	0.111	0.719
static A	SVR	0.102	0.764

NOTES

METHODOLOGY · 4 KEY POINTS

refer §3

457 subjects: 266 Chinese non-music students + 44 music majors + 47 native English speakers. Each clip annotated by ≥10 listeners.
Dynamic V/A sampled at 2 Hz. First 15 s discarded (Initial Orientation Time, cf. Schubert 2007).
EDA via Biopac MP150 + BSL-SS3LA finger electrodes, 50 Hz sampling, 0.6 Hz low-pass + z-score.
Cronbach's α ≈ 0.998 on both V and A — exceptional inter-annotator reliability.