Demo reel play button

AI Interactive Music Systems for Video Games

On the topic of artificial intelligence, the direction of most conversations inevitably land on generative systems like ChatGPT, Stable Diffusion, or Midjourney. I’d like to highlight another use for AI, namely creating interactive music systems.

Design

The core idea is a music system that reacts to how the player handles a gamepad controller. Gamepad data is recorded as the player performs a number of tasks. In our case, we recorded 3 levels of fighting NPC characters in addition to an idle state. A neural network is then trained to recognize these states.

On the music systems side, the states of the neural network drive the volume of 3 layers in a piece of music to dynamically change intensity levels.

Architecture

Originally, this experiment was supposed to be done with a simple SVM or Naïve Bayes classifier. However, there are no readily available packages in popular game engines for inference using these architectures. A neural network architecture was chosen to be able to use the Unity Barracuda package.

Hyper-ParameterValue
Layers2
Dropout0.2
Hidden Size8
Learning Rate5e-7

Data & Training

Data was recorded for all gameplay functions of a Dualsense gamepad controller. Those not applicable to the demo game (Orcs Must Die 2) were discarded. To retain some sense of time and not simply classify static controller states, I calculated the RMS energy of each function within a certain time window.

The downside of this method is that the RMS must be calculated upon inference as well, which will negatively impact the game’s frame rate. I recommend calculating RMS only once every few hundred milliseconds, particularly if most buttons on the controller should be used. Another solution is to calculate the RMS of each function in sequence to avoid potential lag spikes, and infer only when all features have been calculated.

The model tended to get stuck in a local minima after epoch 10, as determined by an early stopping mechanism, and the final model was trained for 14 epochs.

Results

While somewhat unstable, it works more often than not. Exaggerating the recorded training data yielded better results, indicating that the model is simply having trouble separating the classes properly. The stability of the system could be improved by more data, further training, and tweaking hyperparameters. Predictions can also be combined with other, traditional, in-game variables to create more complex interactive music systems. You don’t even need the system to act on music, you could potentially change NPC behavior or alter the game world depending on how the controller is used.

The accessibility of these tools open a lot of new doors for game development, and I hope this project inspires sound designers and composers to innovate by implementing audio in new ways. Should you wish to read the entire research article, you may download it here.

WAITLIST

By joining the waitlist you agree to let us contact you with offers, resources, and news associated with our site.