A primer on generative models for music production

Tribe

Could a software engineer with a background in artificial intelligence and no theoretical or practical knowledge about music make an album? This was the challenge that led me into a multi-year exploration into computational creativity. During that process, I released an album and learned a lot about the field of machine learning for music production.

In this article, I’ll cover some recent advances in AI for music production with a focus on tools that you can incorporate into your music-making workflow today. For readers interested in the theoretical side of things, I’ll cover a few pointers for additional resources at the end.

Music Composition

When people think about AI and music, they invariably think about a future where machines are performers who can auto-magically create whole music compositions. Those machines will make original songs – using new concepts and ideas – personalized to sound good to you, the listener. While this is the holy grail of the field, it’s still very far from reality. But we are getting closer.

Deepmind’s MuseNet is probably the closest thing that exists to the vision of fully AI-driven music creation. It’s a deep learning model that can generate MIDI songs with 10 different instruments in many different styles. The most distinguishing characteristic of this project is that instruments play well together: drum kicks will hit in synchronicity with the bass line and pianos complement your strings. Their main models are biased toward classical-sounding music, so they tend to create powerful orchestral music. However, with some tweaking and searching, you can find music in any style.

This song was made in a single MuseNet session in less than one hour, using Serum presets. Notice the complete switch mid-song while keeping concordance with the previous musical structure.

Although the project has not yet been released as an open-source or commercial product, DeepMind gives a demo in their blog post that you can use to experiment with their model. This demo is minimal, but luckily some people were able to create UIs that provide access to the full spectrum of options in the model. My favorite is MuseTree, an interactive system that allows you to generate songs iteratively.

Google’s Magenta Studio, while older and more limited, provides tools for the generation, continuation, and interpolation of simple melodies and drum beats. Those tools are available as standalone apps or Ableton Live plugins that you can easily incorporate into your music-making workflow.

On the commercial side, one of the most compelling products is AIVA. It can create full-length songs in several music styles with many instrument options. Songs can be edited and tweaked directly on their tool and later exported as MIDI or audio using their preset-sounding synths. There’s a free plan, so it's a good option if you want a low-commitment way to get started.

Now, I couldn’t finish a section on music composition without touching on the most ambitious project in this area: DeepMind’s Jukebox. This project skips MIDI generation to produce actual music that you can listen to. The results are still very far from being enjoyable without a lot of experimentation and tweaking. However, as it is, you can still use this tool to generate interesting musical ideas.

Sampling & Remixing

Have you ever imagined being able to sample those sick vocals from your favorite Miley Cyrus song? Me neither 🙂 But the technology is out there today, it’s pretty cool, and it’s not constrained to Miley Cyrus or even vocals.

Recent advances in Sound Source Separation have generated some impressive open-source projects. You can take your favorite samples and extract drums, bassline, vocals, and more, almost as if you had access to the original master stems. The current problem lies with that “almost.” Separations generally have noticeable artifacts, like bits of vocals on the snares of your drums. Those artifacts are typically hard to repair, but if you can wrap your mind around them, they actually give a lot of character to your samples.

UMXL separations of Emilíana Torrini’s Unemployed In Summertime

Currently, the three best projects in this field are Spleeter from Deezer Research, Demucs from FB Research, and OpenUnmix from INRIA/Sony. All provide trained models that you can use directly in your code or through command-line utilities. Fortunately, there are also some UIs available, such as Spleeter Web.

Some companies are developing proprietary source separation models, such as LALALA.AI. Their models are high quality and can also extract specific instruments, such as synths and guitars. Since it features a free plan, it’s definitely worth a try.

FX

In a world where it feels like every new Top40 melody is just a rehash of some song from last year’s Top 40, audio effects have gained traction as a way of making your music stand out from the crowd. In this new world of A E S T H E T I Cs, why not go full futuristic and use some AI-based FX plugins?

GuitarML is one of the most interesting projects in this area. It’s a community of developers trying to replicate the acoustics of some well-known physical guitar pedals using AI. The idea is simple: you present a model with examples of how a guitar sounds with and without the FX, and let the algorithm learn how to transform one into another. This model can later be used as an FX plugin that approximately replicates the original guitar pedal. Models are available as VST or even as Raspberry PIs to which you can connect your instruments directly.

Bassline from the previous separation passed through GuitarML’s Chameleon

Another interesting area is humanization: you can create a sentimental piano melody in your DAW using the most texturized synth preset ever, but it will still sound mechanical in some way. The intensity and micro-randomness of a piano player pressing the keys is hard to replicate in modern music-making software. 

As you can imagine, there are AI tools to give back that human vibe. For example, from the previously mentioned Google Magenta, we have Groove. Trained with drummer’s performances, it will adjust the timing and velocity of drum patterns to produce the “feel” of the original performances. VirtuosoNet takes a similar approach, but directioned towards music scores.

Science

For anyone interested in digging into the theoretical foundations of these tools or those with the technical skills that allow you to work with bleeding-edge open-source software, this section will give some pointers for future exploration.

Music Composition with Deep Learning: A Review” and “A Comprehensive Survey on Deep Music Generation” (arxiv) give a good overview of the current state of music composition.

The Music Demixing Challenge (MDX) is the best place to stay on top of developments in sound source separation. Every year, all the main research labs and startups compete here. If you want to get deep into the details of implementing an algorithm, this tutorial is an excellent starting point.

This list provides an extensive collection of open-source repositories for many tasks, from interactive piano composition to audio style transfer. It’s a fantastic resource, and by digging around, you’ll definitely find some exciting ideas. Dear reader, as we all know, you already are a fantastic musician, one of the most creative minds of our generation. I sincerely hope you can incorporate some of these tools into your music-making process on your path to stardom. See you at a future AI Song Contest?


Pedro Oliveira is a machine learning engineer and data scientist at Tribe AI. An expert in knowledge graphs, NLP, recommender engines, and computational creativity, he’s also doing research on the intersection of AI and electronic music production.

Related Stories

Applied AI

8 Ways AI for Healthcare Is Revolutionizing the Industry

Applied AI

7 Prerequisites for AI Tranformation in Healthcare Industry

Applied AI

How AI for Fraud Detection in Finance Bolsters Trust in Fintech Products

Applied AI

AI in Construction: How to Optimize Project Management and Reducing Costs

Applied AI

AI Implementation: Ultimate Guide for Any Industry

Applied AI

How to Evaluate Generative AI Opportunities – A Framework for VCs

Applied AI

3 things we learned building Tribe and why project-based work will change AI

Applied AI

How to Measure ROI on AI Investments

Applied AI

How to Build a Data-Driven Culture With AI in 6 Steps

Applied AI

Leveraging Data Science – From Fintech to TradFi with Christine Hurtubise

Applied AI

AI for Cybersecurity: How Online Safety is Enhanced by Artificial Intelligence

Applied AI

Write Smarter, Not Harder: AI-Powered Prompts for Every Product Manager

Applied AI

Welcome to Tribe House New York 👋

Applied AI

Tribe welcomes data science legend Drew Conway as first advisor 🎉

Applied AI

How to build a highly effective data science program

Applied AI

What the OpenAI Drama Taught us About Enterprise AI

Applied AI

AI in Customer Relationship Management

Applied AI

How 3 Companies Automated Manual Processes Using NLP

Applied AI

Why do businesses fail at machine learning?

Applied AI

Everything you need to know about generative AI

Applied AI

AI Security: How to Use AI to Ensure Data Privacy in Finance Sector

Applied AI

An Actionable Guide to Conversational AI for Customer Service

Applied AI

AI Consulting in Healthcare: The Complete Guide

Applied AI

Key Takeaways from Tribe AI’s LLM Hackathon

Applied AI

AI Consulting in Finance: Benefits, Types, and What to Consider

Applied AI

AI in Construction in 2023: Use Cases and Benefits

Applied AI

How data science drives value for private equity from deal sourcing to post-investment data assets

Applied AI

AI and Predictive Analytics in Investment

Applied AI

The Secret to Successful Enterprise RAG Solutions

Applied AI

Using data to drive private equity with Drew Conway

Applied AI

Announcing Tribe AI’s new CRO!

Applied AI

How to Enhance Data Privacy with AI

Applied AI

Tribe's First Fundraise

Applied AI

Machine Learning in Healthcare: 7 real-world use cases

Applied AI

Thoughts from AWS re:Invent

Applied AI

Understanding MLOps: Key Components, Benefits, and Risks

Applied AI

AI Diagnostics in Healthcare: How Artificial Intelligence Streamlines Patient Care

Applied AI

A Deep Dive Into Machine Learning Consulting: Case Studies and FAQs

Applied AI

Key Generative AI Use Cases From 10 Industries

Applied AI

A Guide to AI in Insurance: Use Cases, Examples, and Statistics

Applied AI

Making the moonshot real – what we can learn from a CTO using ML to transform drug discovery

Applied AI

Scalability in AI Projects: Strategies, Types & Challenges

Applied AI

AI in Private Equity: A Guide to Smarter Investing

Applied AI

No labels are all you need – how to build NLP models using little to no annotated data

Applied AI

AI and Predictive Analytics in the Cryptocurrency Market

Applied AI

How the U.S. can accelerate AI adoption: Tribe AI + U.S. Department of State

Applied AI

AI-Driven Digital Transformation

Applied AI

Advanced AI Analytics: Strategies, Types and Best Practices

Applied AI

Navigating the Generative AI Landscape: Opportunities and Challenges for Investors

Applied AI

AI in Banking and Finance: Is It Worth The Risk? (TL;DR: Yes.)

Applied AI

What our community of 200+ ML engineers and data scientist is reading now

Applied AI

5 machine learning engineers predict the future of self-driving

Applied AI

10 ways to succeed at ML according to the data superstars

Applied AI

AI Consulting in Insurance Industry: Key Considerations for 2024 and Beyond

Applied AI

8 Prerequisites for AI Transformation in Insurance Industry

Applied AI

Self-Hosting Llama 3.1 405B (FP8): Bringing Superintelligence In-House

Get started with Tribe

Companies

Find the right AI experts for you

Talent

Join the top AI talent network

Close
Tribe