Towards Generalizability to Tone and Content Variations in the Transcription of Amplifier Rendered Electric Guitar Audio

Yu-Hua Chen, Yuan-Chiao Cheng, Yen-Tung Yeh, Jui-Te Wu, Jyh-Shing Roger Jang, Yi-Hsuan Yang
National Taiwan University, Positive Grid

Abstract

Transcribing electric guitar recordings is challenging due to the scarcity of diverse datasets and the complex tone-related variations introduced by amplifiers, cabinets, and effect pedals. To address these issues, we introduce EGDB-PG-PG, a novel dataset designed to capture a wide range of tone-related characteristics across various amplifier-cabinet configurations. In addition, we propose the Tone-informed Transformer (TIT), a Transformer-based transcription model enhanced with a tone embedding mechanism that leverages learned representations to improve the model’s adaptability to tone-related nuances. Experiments demonstrate that TIT, trained on EGDB-PG-PG, outperforms existing baselines across diverse amplifier types, with transcription accuracy improvements driven by the dataset’s diversity and the tone embedding technique. Through detailed benchmarking and ablation studies, we evaluate the impact of tone augmentation, content augmentation, audio normalization, and tone embedding on transcription performance. This work advances electric guitar transcription by overcoming limitations in dataset diversity and tone modeling, providing a robust foundation for future research.

Transcription on Neural DSP Quad Cortex rendered audio

This section presents the transcription results of our Tone-informed Transformer (TIT) model and compares its performance with the EGDB-PG-finetuned hFT-Transformer and the guitar + piano tracks of the transcribed result from MT3 model. Below, you can explore audio recording examples featuring different gain levels (Low-Gain, Crunch, and High-Gain). Each example includes the ground truth piano roll alongside the predicted piano rolls generated by each model. Notably, the MT3 results may exhibit a blank score, possibly due to the cropped pitch range of 40--90 in our plots or pitches being assigned to tracks other than guitar or piano. Since our training data consists of the EGDB-PG dataset, these audio clips rendered using Neural DSP feature unseen amplifier tones.

Low-Gain

Label TIT EGDB-PG finetuned hFT-Transformer MT3-guitar&piano
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Label TIT EGDB-PG finetuned hFT-Transformer MT3-guitar&piano
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll

Crunch

Label TIT EGDB-PG finetuned hFT-Transformer MT3-guitar&piano
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Label TIT EGDB-PG finetuned hFT-Transformer MT3-guitar&piano
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll

High Gain

Label TIT EGDB-PG finetuned hFT-Transformer MT3-guitar&piano
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Label TIT EGDB-PG finetuned hFT-Transformer MT3-guitar&piano
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll
Predicted Piano Roll
Ground Truth Piano Roll
Ground Truth Piano Roll
MT3 Piano Roll

Transcription in the wild

This section demonstrates the transcription capabilities of our Tone-informed Transformer (TIT) model using audio extracted from YouTube recordings. The audio clips exhibit a diverse range of tones, with some featuring ambient effects such as reverb and delay (not considered in our work). For each example, the original audio is played on the left channel, while the transcribed audio, synthesized as piano, is played on the right channel. We recommend wearing headphone to listen to these samples.

BibTeX

TBD