MuCUE Bench - Music Comprehensive Understanding Evaluation
Overview of the MuCUE Dataset.
Cross-genre & Multilingual Music Corpus
Covering key identification, pitch detection ....
Expert-validated and dedicated to addressing sophisticated challenges in music audio comprehension, with intentionally designed scalability for future enhancements.
Instrument Recognition in Music
Please listen and name the instruments in this clip.
A. guitar
B. horn
C. saxophone
D. hi-hat
Music Genre Classification (e.g., Rock, Classical, Jazz, Electronic)
What type of music is this track classified as?
A. metal
B. disco
C. classical
D. country
Mood/Theme Identification In Music
What’s the mood and thematic focus of this track?
A. action
B. dark
C. upbeat
D. deep
Lyric Analysis (The process of interpreting a song's underlying story, emotional drive, or deeper meaning by analyzing its word choice, flow, and hidden metaphors)
What is the main message of the lyrics?
A. To give up when things get tough
B. To rely on others for happiness
C. To live freely and embrace one's journey
D. To seek revenge on haters
Comprehensive Music Audio Understanding
What is the primary characteristic of the bass line in the song?
A. Melodic and flowing
B. Percussive and rhythmic
C. Distorted and chaotic
D. Absent in the excerpt
A chord – the fundamental harmonic unit formed by simultaneous pitches – plays critical roles in music analysis, AI composition, and MIR systems
I need to know the chord being used in this guitar section - can you identify it?
A. D#:maj7/1
B. A#:maj(#11)/1
C. F#:min9(*5)/2
D. D:min9(*5)/1
10,000+music samples,Total duration exceeds 100 hours, with average clip length of 10 seconds-10 minutes
44.1kHz sampling rate, 16-bit depth
Metric | Tasks | Computation |
---|---|---|
Accuracy | All | True samples / All samples |
Annotation:
tonal center: B major
Annotation:
primary characteristic of the bass line: Percussive and rhythmic
Tasks | Datasets |
---|---|
Key Identification | gs_key, gs_key_30s |
Pitch Identification | nsyn_pitch |
Chord Identification | guitarset |
Instrument Classification | ins_cls, nsyn_ins, mtg_ins |
Genre Classification | gtzan, fma-small, fma-medium, mtg_genre |
Mood/Theme Identification | mtg_mood |
Lyrical Reasoning | ly_m163 |
Comprehensive | mmau-music, tat, mucho, mcaps, mqa_m163 |
Below are the average performance metrics of each model on the MMMUBench test set, ranked in descending order by overall score.
Datasets | Gemini-2.0-flash | Qwen2.5-Omni | Kimi-Audio | Qwen2-Audio | Ours |
---|---|---|---|---|---|
gs key 30s | 33.6 | 23.8 | 26.0 | 18.2 | 50.4 |
gtzan key | 33.7 | 28.7 | 28.3 | 22.0 | 34.1 |
nsyn pitch | 30.8 | 36.8 | 31.8 | 31.2 | 77.2 |
guitarset | 25.2 | 13.2 | 27.2 | 19.2 | 58.8 |
ballroom tempo | 31.7 | 28.9 | 24.4 | 31.1 | 29.4 |
gtzan tempo | 41.3 | 32.4 | 22.9 | 27.1 | 40.7 |
ins cls | 26.0 | 66.8 | 79.4 | 39.8 | 91.2 |
nsyn ins | 32.4 | 40.6 | 44.4 | 22.4 | 74.0 |
mtg ins | 19.8 | 55.8 | 51.2 | 24.0 | 68.6 |
gtzan | 72.2 | 88.6 | 77.8 | 83.9 | 81.3 |
fma-small | 63.4 | 66.2 | 55.8 | 65.6 | 72.4 |
fma-medium | 62.8 | 78.0 | 59.8 | 77.0 | 85.2 |
mtg genre | 57.2 | 61.6 | 55.8 | 46.4 | 81.4 |
ballroom genres | 57.0 | 45.8 | 44.0 | 35.2 | 52.4 |
mtg mood | 38.2 | 43.4 | 39.4 | 29.2 | 52.8 |
md4q | 71.9 | 47.6 | 61.3 | 57.8 | 65.9 |
salami segd | 40.6 | 18.6 | 27.2 | 19.4 | 64.8 |
salami pred | 37.6 | 32.2 | 34.6 | 31.2 | 64.8 |
salami cnt | 49.8 | 36.8 | 37.8 | 30.2 | 43.2 |
salami overall | 62.1 | 55.8 | 45.3 | 42.6 | 48.7 |
lyr | 88.2 | 87.4 | 87.0 | 60.0 | 90.8 |
mmau-music | 67.1 | 63.8 | 66.2 | 57.8 | 66.5 |
tat | 61.2 | 59.4 | 54.0 | 61.4 | 80.6 |
mucho | 69.6 | 66.5 | 69.7 | 66.7 | 63.9 |
mcaps | 62.2 | 65.6 | 68.0 | 74.0 | 80.0 |
mqa | 58.0 | 76.0 | 79.0 | 60.8 | 88.4 |
Avg | 49.8 | 50.8 | 49.9 | 43.6 | 65.7 |
We welcome researchers from academia and industry to contribute to the improvement of the MuCUE Bench dataset and model optimization. Contribution methods include: