Towards Generalizability to Tone and Content Variations in the Transcription of Amplifier Rendered Electric Guitar Audio

Yu-Hua Chen, Yuan-Chiao Cheng, Yen-Tung Yeh, Jui-Te Wu, Jyh-Shing Roger Jang, Yi-Hsuan Yang
National Taiwan University, Positive Grid

Abstract

Electric guitar recordings pose unique challenges for automatic music transcription due to the rich tone-related variations arisen from the employed guitar amplifiers, cabinets, and effect pedals, which collectively alter the sound character the guitar produces. Transcription models trained on small datasets with a limited palette of guitar tones may not generalize well to unseen tones. Due to the scarcity of labeled data, however, little work has thoroughly studied the effect of tones in electric guitar transcription. In this paper, we present a prototype benchmark and evaluation protocol for electric guitar transcription, training our transcription models using up to 256 presets of commercial-grade amplifier-cabinet combinations with different gain ranges, and testing the models on an unseen set of 6 out-of-domain presets. Moreover, we propose a new Transformer-based transcription model named tone-informed hierarchy Transformer (Ti-hFT) that incorporates representations of guitar tones as conditions to improve the model's adaptability to tone-related nuances. Experiments demonstrate the effectiveness of this tone-informed model over baselines and prior models, as well as the importance of increasing the tone and content variation of the training data for better generalizability.

Transcription on Neural DSP Quad Cortex rendered audio

This section presents the transcription results of our Tone-informed hierarchy Transformer (Ti-hFT) model and compares its performance with the EGDB-PG-finetuned hFT-Transformer and the guitar + piano tracks of the transcribed result from MT3 model. Below, you can explore audio recording examples featuring different gain levels (Low-Gain, Crunch, and High-Gain). Each example includes the ground truth piano roll alongside the predicted piano rolls generated by each model. Notably, the MT3 results may exhibit a blank score, possibly due to the cropped pitch range of 40--90 in our plots or pitches being assigned to tracks other than guitar or piano. Since our training data consists of the EGDB-PG dataset, these audio clips rendered using Neural DSP feature unseen amplifier tones.

Low-Gain

Label Ti-hFT EGDB-PG finetuned hFT-Transformer MT3-guitar&piano
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Label Ti-hFT EGDB-PG finetuned hFT-Transformer MT3-guitar&piano
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll

Crunch

Label Ti-hFT EGDB-PG finetuned hFT-Transformer MT3-guitar&piano
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Label Ti-hFT EGDB-PG finetuned hFT-Transformer MT3-guitar&piano
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll

High Gain

Label Ti-hFT EGDB-PG finetuned hFT-Transformer MT3-guitar&piano
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Label Ti-hFT EGDB-PG finetuned hFT-Transformer MT3-guitar&piano
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll

Transcription in the wild

This section demonstrates the transcription capabilities of our Tone-informed hierarchy Transformer (Ti-hFT) model using audio extracted from YouTube recordings. The audio clips exhibit a diverse range of tones, with some featuring ambient effects such as reverb and delay (not considered in our work). For each example, the original audio is played on the left channel, while the transcribed audio, synthesized as piano, is played on the right channel. We recommend wearing headphone to listen to these samples.

BibTeX

TBD