On the topic of artificial intelligence, the direction of most conversations inevitably land on generative systems like ChatGPT, Stable Diffusion, or Midjourney. I’d like to highlight another use for AI, namely creating interactive music systems.
The core idea is a music system that reacts to how the player handles a gamepad controller. Gamepad data is recorded as the player performs a number of tasks. In our case, we recorded 3 levels of fighting NPC characters in addition to an idle state. A neural network is then trained to recognize these states.
On the music systems side, the states of the neural network drive the volume of 3 layers in a piece of music to dynamically change intensity levels.
Originally, this experiment was supposed to be done with a simple SVM or Naïve Bayes classifier. However, there are no readily available packages in popular game engines for inference using these architectures. A neural network architecture was chosen to be able to use the Unity Barracuda package.
Hyper-Parameter | Value |
Layers | 2 |
Dropout | 0.2 |
Hidden Size | 8 |
Learning Rate | 5e-7 |
Data was recorded for all gameplay functions of a Dualsense gamepad controller. Those not applicable to the demo game (Orcs Must Die 2) were discarded. To retain some sense of time and not simply classify static controller states, I calculated the RMS energy of each function within a certain time window.
The downside of this method is that the RMS must be calculated upon inference as well, which will negatively impact the game’s frame rate. I recommend calculating RMS only once every few hundred milliseconds, particularly if most buttons on the controller should be used. Another solution is to calculate the RMS of each function in sequence to avoid potential lag spikes, and infer only when all features have been calculated.
The model tended to get stuck in a local minima after epoch 10, as determined by an early stopping mechanism, and the final model was trained for 14 epochs.
While somewhat unstable, it works more often than not. Exaggerating the recorded training data yielded better results, indicating that the model is simply having trouble separating the classes properly. The stability of the system could be improved by more data, further training, and tweaking hyperparameters. Predictions can also be combined with other, traditional, in-game variables to create more complex interactive music systems. You don’t even need the system to act on music, you could potentially change NPC behavior or alter the game world depending on how the controller is used.
The accessibility of these tools open a lot of new doors for game development, and I hope this project inspires sound designers and composers to innovate by implementing audio in new ways. Should you wish to read the entire research article, you may download it here.
By joining the waitlist you agree to let us contact you with offers, resources, and news associated with our site.