Practical and Reproducible Symbolic Music Generation by Large Language Models with Structural Embeddings

Untitled

Listening Samples

The following 12 samples are from the listening test in this paper, and the first two samples are the ones analyzed into the fitness scape plots (Figure 7: the first sample for top row; the second sample for the bottom row). The prompts rest for around the first 5 seconds.

GPT2

output.gen_for_listening.gpt2-MN-lm-no-embeds.s66.t7619.prompt-64.1689246739.028448.wav

output.gen_for_listening.gpt2-MN-lm-no-embeds.s141.t3899.prompt-64.1689246746.2421086.wav

output.gen_for_listening.gpt2-MN-lm-no-embeds.s446.t3796.prompt-64.1689246753.1930459.wav

output.gen_for_listening.gpt2-MN-lm-no-embeds.s591.t4406.prompt-64.1689246756.2442229.wav

output.gen_for_listening.gpt2-MN-lm-no-embeds.s801.t4389.prompt-64.1689246759.9431634.wav

output.gen_for_listening.gpt2-MN-lm-no-embeds.s880.t3558.prompt-64.1689246763.4680882.wav

output.gen_for_listening.gpt2-MN-lm-no-embeds.s888.t4677.prompt-64.1689246767.1705449.wav

output.gen_for_listening.gpt2-MN-lm-no-embeds.s992.t4495.prompt-64.1689246770.742311.wav

output.gen_for_listening.gpt2-MN-lm-no-embeds.s1059.t5696.prompt-64.1689246773.986176.wav

output.gen_for_listening.gpt2-MN-lm-no-embeds.s1104.t2417.prompt-64.1689246777.9444382.wav

output.gen_for_listening.gpt2-MN-lm-no-embeds.s1124.t8701.prompt-64.1689246780.3382137.wav

output.gen_for_listening.gpt2-MN-lm-no-embeds.s1183.t4920.prompt-64.1689246786.181727.wav

GPT2-RE

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-random.s66.t9619.prompt-64.1689246954.481215.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-random.s141.t2185.prompt-64.1689246960.9225938.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-random.s446.t3585.prompt-64.1689246963.2468724.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-random.s591.t5051.prompt-64.1689246966.140737.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-random.s801.t8459.prompt-64.1689246970.0560102.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-random.s880.t7153.prompt-64.1689246975.5306854.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-random.s888.t3210.prompt-64.1689246987.4033859.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-random.s992.t2600.prompt-64.1689246990.1443334.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-random.s1059.t9262.prompt-64.1689246992.48795.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-random.s1104.t3187.prompt-64.1689246997.9798877.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-random.s1124.t4339.prompt-64.1689247000.548001.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-random.s1183.t4428.prompt-64.1689247003.7612572.wav

GPT2-SE

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-time-pc.s66.t4491.prompt-64.1689246838.4858363.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-time-pc.s141.t2434.prompt-64.1689246842.58001.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-time-pc.s446.t5793.prompt-64.1689246845.5250168.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-time-pc.s591.t9339.prompt-64.1689246849.5457628.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-time-pc.s801.t5260.prompt-64.1689246855.3214364.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-time-pc.s880.t6626.prompt-64.1689246859.3580675.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-time-pc.s888.t2914.prompt-64.1689246863.827006.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-time-pc.s992.t3165.prompt-64.1689246866.7479434.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-time-pc.s1059.t4181.prompt-64.1689246869.4383483.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-time-pc.s1104.t2159.prompt-64.1689246872.8721101.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-time-pc.s1124.t3637.prompt-64.1689246875.2480106.wav

output.gen_for_listening.gpt2-MN-lm-all-embeds-init-time-pc.s1183.t6032.prompt-64.1689246878.1251895.wav

GT

output.gen_for_listening.GT.s66.t4675.prompt-64.1689247121.2217507.wav

output.gen_for_listening.GT.s141.t2555.prompt-64.1689247130.4810164.wav

output.gen_for_listening.GT.s446.t2099.prompt-64.1689247132.7888298.wav

output.gen_for_listening.GT.s591.t2056.prompt-64.1689247134.8781655.wav

output.gen_for_listening.GT.s801.t4983.prompt-64.1689247139.1909335.wav

output.gen_for_listening.GT.s880.t3659.prompt-64.1689247144.1085837.wav

output.gen_for_listening.GT.s888.t2192.prompt-64.1689247147.0149174.wav

output.gen_for_listening.GT.s992.t6129.prompt-64.1689247148.9846585.wav

output.gen_for_listening.GT.s1059.t2379.prompt-64.1689247153.4739404.wav

output.gen_for_listening.GT.s1104.t2639.prompt-64.1689247158.2874134.wav

output.gen_for_listening.GT.s1124.t5772.prompt-64.1689247160.6634374.wav

output.gen_for_listening.GT.s1183.t1657.prompt-64.1689247171.1830833.wav

Generation of Bach

bach.png

Prompt: First 8 measures from ASAP dataset (Foscarin et al., 2020)

input.bach_fugue_bwv893.100.l0.gpt2-MN-lm-no-embeds.1677051380.389268.wav

Full Song: Piano performance from ASAP dataset (Foscarin et al., 2020)

bach_fugue_bwv893.1677051537.745666.wav

GPT2

output.bach_fugue_bwv893.100.l0.gpt2-MN-lm-no-embeds.1677051379.848664.wav

GPT2-RE

output.bach_fugue_bwv893.100.l0.gpt2-MN-lm-all-embeds-init-random.1677051414.538455.wav

GPT2-SE

output.bach_fugue_bwv893.100.l0.gpt2-MN-lm-all-embeds-init-time-pc.1677051443.5603802.wav

Interactive Generation

The following samples demonstrate three examples of step-by-step generation loops:

  1. Single-loop generation from the given prompt using only GPT2-RE

  2. Three-loop generation from multiple models where the first and third loops are from GPT2-RE and the second loop is from GPT2-SE to enhance repetition in the middle of the song;

  3. Three-loop generation where each loop of generation from GPT2-RE is followed by human revision except for the last loop.

(“s.” in each parenthesis denotes seconds.)

Prompt (~ 5s.)

Generation Type

1st Output (< 15s.)

1st Revision (< 15s.)

2nd Output (< 30s.)

2nd Revision (< 30s.)

Final Output (< 60s.)

input.multi-model.s0.100.l0.1676961835.0617409.wav

Single-Loop