Abstract
This research focuses on developing an AI agent that simulates human curiosity during the testing of platform games (In this case Donkey Kong Country Returns). Instead of following a predefined path, the agent is motivated to explore, discover new areas, and interact with unfamiliar objects—similar to how a human player would act out of curiosity. The agent uses a ComputerVision model and a reinforcement learning model that mimics intrinsic motivation. This approach can lead to a more natural and effective method of game testing.
Introduction
Traditional AI systems in game testing are usually designed to optimize gameplay—reaching the end of a level with minimal errors. While this is effective this approach overlooks how human players experience games with curiosity and a desire to explore. This project investigates how an AI agent can be designed to simulate this form of intrinsic motivation. Using reinforcement learning with curiosity-driven rewards, I aim to build an agent that not only learns how to play the game but also actively seeks out new interactions and unusual scenarios. This “curious” approach can help developers to simulate curious playtesters.
How to make a CNN based RLmodel witch simulate human couriosity in gametesting?
- What is importance of Curiosity in Game Testing and how to simulate it?
- How to implementation an CNN model with an AI-Driven Game Testing System with curiosity-driven behavior?
- How do AI-powered game testers compare to human testers in terms of exploration efficiency, curiosity-driven behavior
1. What is importance of Curiosity in Game Testing and how to simulate it?
1.1. Why is Curiosity Essential in Playtesting?
Curiosity in playtesting ensures that testers (human or AI-driven) actively seek out unknown or undiscovered aspects of a game. This increases the chances of finding hidden bugs, unusual situations, and edge cases that might otherwise go unnoticed. Moreover, a curious approach ensures that testers do not just follow the “main path” of a game but also deviate from the intended route, increasing test coverage and improving overall game quality [2] [3].
By explicitly encouraging curiosity in AI agents, for example, through curiosity-driven reinforcement learning, these agents replicate human exploratory behavior. This is particularly effective in open-world games or games with many hidden elements, where exploration and side quests play a significant role [1].
1.2. Modeling Curiosity in AI
1.2.1 Intrinsic vs. Extrinsic Motivation in Reinforcement Learning
In reinforcement learning (RL), extrinsic motivation refers to rewards that come directly from the environment (e.g., points, completing levels). Intrinsic motivation, on the other hand, is an internally generated reward for exploring unknown or complex areas. By combining both, an RL agent remains focused on the main objective (extrinsic reward) while also deeply exploring the game world (intrinsic reward). This increases the likelihood of discovering hidden areas and bugs, which is crucial for thorough game testing [1], [2].
1.2.2 Overview of Methods such as Curiosity-Driven Exploration and Novelty Search
Beyond standard RL approaches, specific methods encourage agents to actively explore unknown or new areas of the environment:
Curiosity-Driven Exploration
- The agent receives an internal reward based on the difference between predicted and actual outcomes of actions.
- This motivates the agent to investigate unexplored areas, increasing test coverage [2].
Novelty Search
- The agent actively seeks new or unusual states instead of solely focusing on a predefined goal.
- This leads to unexpected behaviors that can help uncover hidden and unusual scenarios [4].
Both methods increase the likelihood that an AI agent will test both regular and hidden game elements, especially in large or open environments. Combining them with extrinsic rewards (e.g., points or levels) creates a balance between goal-oriented and exploratory behavior [1], [2].
1.4. Selection of a Suitable AI Framework
1.4.1 Criteria for Framework Selection
A suitable AI framework should meet the following requirements:
- Support for Neural Networks: The framework must support complex CNN architectures.
- Integration with Q-Learning: It should seamlessly support reinforcement learning algorithms.
- Flexibility and Modularity: A modular approach simplifies debugging and iterative improvement [2].
Conclusion
Curiosity is key to effective game testing, helping both human testers and AI agents uncover hidden bugs and unexpected behaviors. By using curiosity-driven reinforcement learning in AI playtesting, developers can achieve better test coverage and improve game quality.
Integrating intrinsic motivation in reinforcement learning boosts AI’s exploration, allowing agents to go beyond predefined goals and find bugs that traditional methods might miss. This improves game stability and playability.
Choosing the right AI framework is also crucial. the norm is using pytorch just because it is the fasted and best optimized it also has dqn support an is really flexible.
2. How to implementation an CNN model with an AI-Driven Game Testing System with curiosity-driven behavior?
2.1. Designing the Test Environment
A suitable game genre should allow for exploration and non-linear level structures while remaining technically manageable. A 2D platformer, roguelike, or top-down adventure/puzzle game is ideal due to:
- Non-Linear Level Structure and Exploration: Allows AI agents to discover hidden paths.
- Simple Implementation: Faster prototyping and AI framework integration.
- Flexibility for Reward Structures: Easily combines extrinsic and intrinsic rewards [1], [2], [4].
Selection of a Suitable Game
Several Nintendo games were considered for AI-driven game testing due to their exploratory nature and structured reward systems:
- The Legend of Zelda: A Link Between Worlds: Features an extensive, non-linear world with hidden puzzles but may introduce complexity in AI implementation.[9]
- Kirby: Triple Deluxe: Simpler design with hidden routes, though lacking complex mechanics for diverse test scenarios.[10]
- Metroid: Samus Returns: Strong emphasis on exploration, though higher technical complexity.[11]
- Super Mario 3D Land: Offers a balance between linear progression and hidden exploration.[12]
- Selected Game: Donkey Kong Country Returns
Donkey Kong Country Returns is an ideal choice for AI-driven game testing due to several key factors [13]:
- Exploratory Level Design: The game features intricate levels filled with hidden secrets, making it a prime candidate for evaluating AI curiosity and exploratory behavior.
- Dynamic Platforming Challenges: Moving platforms, breakable terrain, and enemy interactions require adaptive AI behavior, making it a rich test environment.
- Rewarding Exploration: Collectibles such as K-O-N-G letters and puzzle pieces encourage in-depth exploration, aligning with AI testing objectives.
- Technical Feasibility: As a 2D platformer, the game simplifies AI implementation while still offering complex challenges for testing.
- Personal Familiarity: Prior experience with the game aids in efficient development and testing processes, allowing for more targeted AI improvements.
What Does the AI Know?
The AI acquires knowledge of the game environment through:
Visual Information for the RL Agent:
- Positions and movements of characters.
- Environmental features such as obstacles and hidden paths.
- Colors, shapes, and textures to distinguish game elements.
- Interaction points such as destructible walls and hidden barrels.
Game Rules and Status Information:
- power-ups.
- Reward structures for exploratory behavior.
- Environmental triggers such as secret entrances and auto-scrolling levels.
1. Object Recognition List
1. Player Characters
- Donkey Kong(x,y)
- Total Lives(int)
2. Enemies
- Frog(x,y)
- Bird(x,y)
- Tiki(x,y)
- Falling object(x,y)
- rest(x,y)
3. Collectibles and Power-Ups
- banana(x,y)
- banana_coin(x,y)
- puzzle(x,y)
- k_o_n_g_letter(x,y)
- balloon(x,y)
- dk_barrel(x,y)
- heart(x,y)
- goal_barrel(x,y)
4. Interactive objects and environment
- Pits(x,y)
- Spikes(x,y)
- Platforms(x,y)
- Barrels(x,y)
- Interacteble object(x,y)
- Interacteble platform(x,y)
- Checkpoint(x,y)
- vine(x,y)
2. Navigation and Interaction (output)
What can the AI do?
The AI controls Donkey Kong through concrete physical actions, including:
- Walking and Running: Basic movements to navigate through the level and explore different routes.
- Jumping: Overcoming obstacles, reaching higher platforms, or accessing hidden areas.
- Rolling: Attacking enemies, breaking obstacles, and extending jumps for longer distances.
- Climbing: Using vines, ropes, and other climbable surfaces to reach different areas.
- Ground Pound: Smashing the ground to reveal hidden items, defeat enemies, or activate switches.
Because of the complexity of this project and the time that I have to finish it I im focussing on just makeing the prototype for now and with the experience I gained I will give tips to make it more couriosity based.
AI Integration 2.2
How does YOLO work 2.2.1
What is YOLO (You Only Look Once)?
YOLO is an advanced object detection method that allows a computer to recognize and locate objects in images or videos. It’s called “You Only Look Once” because it looks at the image just once to detect multiple objects, making it super fast and efficient compared to traditional methods.
In traditional object detection methods, the image is processed multiple times, but YOLO does it all in one go, which makes it much faster for real-time applications like video streaming or self-driving cars.
How Does YOLO Work?
1. The Image as a Grid
YOLO divides the image into a grid. Each grid cell is responsible for detecting objects within its area. For example, in a 4x4 grid, each cell will look for objects that fall within its portion of the image.
2. Predicting Bounding Boxes
For each grid cell, YOLO predicts:
- Bounding boxes: These are the rectangles that enclose the detected objects.
- Class probabilities: What is the object inside the bounding box (e.g., a dog, cat, car, etc.)?
Each grid cell predicts multiple bounding boxes, along with their confidence scores (how likely it is that a box contains an object), and class labels (what type of object it is).
3. Confidence Score
The confidence score of a bounding box indicates how accurate the model thinks its prediction is. It’s calculated by multiplying:
- The probability that the box contains an object.
- The accuracy of the bounding box (how well it fits the object).
A high confidence score means that the model is more sure about the detection.
How Does DQN Work? 2.2.2
What is a Deep Q-Network (DQN)?
Imagine you’re teaching a robot to play a game. The robot has to figure out the best way to play by taking actions (like moving left, jumping, or staying still) and getting feedback (called rewards). Over time, the robot learns what actions are good or bad based on this feedback.
Now, Q-learning is a method the robot uses to decide which actions are good. It gives each action a score (called the Q-value), which tells the robot how good that action is in a given situation (called a state).
1. The Game Environment
The robot (or agent) plays a game where it has different states (like the current position in the game). The agent needs to make decisions (actions) to move forward. For example:
- State: Where is the robot on the screen?
- Action: Does the robot jump or move left?
- Reward: Did the robot score points or lose health?
2. The Neural Network
The neural network is a tool that helps the robot estimate the value of each possible action in a state. Here’s how it works:
- The state (like the robot’s position) is given as input to the network.
- The network then outputs values for each possible action (like jumping or moving).
- The action with the highest value is usually the best choice.
3. Learning
The robot plays the game and tries different actions. When it gets feedback (like rewards), it adjusts its brain (the neural network) to predict better actions next time. It does this using something called Q-learning.
In Q-learning, the robot tries to maximize its rewards by updating its understanding of which actions are best. The robot adjusts its predictions based on:
- The current reward for the action it just took.
- The expected future rewards for continuing from that state.
4. Updating Q-Values
Every time the robot takes an action, it uses the Q-learning formula to adjust its brain and improve future decision-making. The goal is to make the robot better at picking actions that lead to higher rewards over time.
The Object Detector 2.2.3
What I will do to make the object detector is to transfer-learn a YOLO model to detect items from the already-seen list and extract them. This allows the RL AI to understand which object is where. The reason we are doing this is because we chose a game where I just can’t use the game’s code, so we need a workaround to get the data we wanted.
First, I needed to write a script that takes a screenshot every 10 seconds to gather enough images to make the model at least usable. I got around 600 screenshots.
Then we need to label some data for training. I made a project in Roboflow (software to label objects from images to use for the model) and started labeling some images. In this process, I made a class for all the detectable objects and started labeling them. After 100 images, I ran the first test.
Result
As we can see, it can already detect some objects from the game, but it also misses some, for example, it didn’t detect the frogs, the DK-barrel, or the pit behind it. For the second iteration, I want at least 500 labeled images to make it more accurate.
Result V2
Here we see a big improvement, and it’s somewhat usable for object detection. Let’s be clear, it’s still not anything near perfect, but I simply can’t spend more time on labeling data to improve it further.
The Reinforcement Agent 2.2.4
1. Screenshot Extraction
The first step is extracting screenshots from the game using the Dolphin Emulator.
import pyautogui
import time
def capture_screenshot(filename="screenshot.png"):
time.sleep(1) # Wait to capture the right frame
screenshot = pyautogui.screenshot()
screenshot.save(filename)
2. Object Detection
Object detection is performed using a trained model to recognize in-game elements (see 2.2.1).
import cv2
import torch
def detect_objects(image_path, model):
image = cv2.imread(image_path)
results = model(image)
return results.pandas().xyxy[0] # Results as a pandas DataFrame
3. State Representation
The game state is represented as a NumPy array containing object coordinates.
import numpy as np
def create_state_vector(detections):
state_vector = []
for _, row in detections.iterrows():
state_vector.append([row['xmin'], row['ymin'], row['xmax'], row['ymax'], row['name']])
return np.array(state_vector)
The array is fixed so that the AI can learn much quicker than if it were variable. What happens is that the array has 146 values, which represent the x and y values of all the objects that are on-screen. If they don’t appear, they will get the value (0,0) to let the RL AI know that there is no object of that kind on the screen.
4. HP Reading from Dolphin Emulator
HP is extracted from the Dolphin Emulator’s memory.
from memory_reader import read_hp # Custom memory reader
def get_total_hp():
dk_hp = read_hp("DK_HP_ADDRESS")
dd_hp = read_hp("DD_HP_ADDRESS")
return dk_hp + dd_hp
This keeps track of the HP of DK.
5. Reward Function
The reward function determines the reward based on game events.
def calculate_reward(prev_hp, current_hp, collected_items, level_complete, inactivity_penalty=5, death_penalty=50):
reward = 0
if level_complete:
reward += 100
if current_hp < prev_hp:
reward -= 20
if current_hp == 0:
reward -= death_penalty
reward += collected_items
reward -= inactivity_penalty
reward -= walk_left_penalty
return reward
⚖️ Definitive Reward/Penalty System for DKCR (not compleate)
| Action | New Reward (Rounded) |
|---|---|
| Pick up Banana | +1 |
| Pick up Banana Coin | +5 |
| Pick up Puzzle Piece | +7 |
| Pick up K, O, N, G letter | +8 |
| Pick up Balloon | +8 |
| Pick up DK-Barrel | +12 |
| Pick up Heart | +13 |
| Reach Goal Barrel (level complete) | +50 |
| Defeat Enemy | +5 |
| Lose HP | -15 |
| Die | -50 |
| Stand Still | -5 |
| Skip Object | -3 |
6. Deep Q-Network (DQN) Training
The DQN model is trained using the generated state representations and rewards.
import torch.nn as nn
import torch.optim as optim
class DQN(nn.Module):
def __init__(self, input_dim, output_dim):
super(DQN, self).__init__()
self.fc1 = nn.Linear(input_dim, 64)
self.fc2 = nn.Linear(64, 64)
self.fc3 = nn.Linear(64, 64)
self.fc4 = nn.Linear(64, output_dim)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
return self.fc3(x)
7. Agent Actions
The agent performs actions and applies Q-learning to improve.
import random
def choose_action(state, model, epsilon=0.1):
if random.random() < epsilon:
return random.choice(["jump", "left", "right", "roll", "idle"])
else:
with torch.no_grad():
q_values = model(torch.tensor(state, dtype=torch.float32))
return torch.argmax(q_values).item()
8. Evaluation and Improvement
After training, the agent is evaluated and further improved.
def evaluate_agent(model, test_episodes=10):
total_rewards = []
for _ in range(test_episodes):
state = reset_game()
done = False
episode_reward = 0
while not done:
action = choose_action(state, model, epsilon=0)
state, reward, done = step_game(action)
episode_reward += reward
total_rewards.append(episode_reward)
return sum(total_rewards) / test_episodes
3. How do AI-powered game testers compare to human testers in terms of exploration efficiency, curiosity-driven behavior
3.1. Evaluation Metrics
To test the AI on its curious behavior, we need some metrics to evaluate both real players and the AI. I searched for ways to test human behavior in game testing and found the following metrics that are useful for Donkey Kong Country Returns:
Exploration Speed and Scope
Exploration speed measures how quickly players discover new areas or elements in a game. Exploration scope provides insight into the extent to which the game world is explored by the player. These metrics indicate the drive to explore the unknown and serve as an important benchmark for assessing the appeal of new or surprising game mechanics [5].
Interaction and Response Frequency
This metric tracks how often players encounter new or unexpected elements within the game. A high interaction frequency indicates that players are actively seeking new challenges, which can directly reflect their curiosity. Additionally, it measures how quickly players respond to new stimuli, further illustrating their level of alertness and interest in the game [6].
Exploration Efficiency
Efficiency refers to how goal-oriented and effective players are in discovering new content relative to their total playtime. This can be measured as the ratio between the time spent discovering new elements and overall playtime. This metric helps determine whether curiosity leads to meaningful interactions and learning experiences within the game environment.[6]
Benchmarks for Evaluation
When evaluating human curiosity in games, benchmarks are established based on both quantitative and qualitative data. Researchers often use the following methods:
Measuring Player Curiosity in Donkey Kong Country Returns
Number of Hidden Objects Collected:
Track how many hidden items (e.g., puzzle pieces and K-O-N-G letters) a player discovers, both along the main path and in side areas.
Level Exploration Percentage:
Determine what portion of the level (in terms of the number of unique locations) the player actually explores.
Revisit Behavior During Exploration:
Observe whether players return to previously visited locations.
Surveys and Self-Reports:
Before the test I asked the player to get at much hidden objects ass possible (puzzle pieces, KONG, coins, and bananas)
3.2. Analysis of Test Results
Raw Data
| Tester | Hidden Objects Found | Level Exploration | Revisit Behavior |
|---|---|---|---|
| 1 | 14 | 8/10 | Yes |
| 2 | 7 | 5/10 | No |
| 3 | 18 | 9/10 | Yes |
| 4 | 9 | 5/10 | No |
| 5 | 20 | 9/10 | Yes |
| 6 | 12 | 7/10 | No |
| 7 | 5 | 4/10 | No |
| 8 | 16 | 8/10 | Yes |
| 9 | 10 | 6/10 | Yes |
| 10 | 13 | 6/10 | No |
“I felt like the level was guiding me a bit, which probably made me miss hidden areas.”
I think this could be an issue because it’s hard to design rewards that incentivize the AI to explore alternative routes.“Sometimes the collectibles are hidden too cryptically.”
This is a big problem for the AI because it is not designed to solve puzzles.
Averages
Average number of hidden objects found:
12.4 objectsAverage level exploration:
7/10 Hidden LocationTesters who revisited previous areas:
5 out of 10 testers (50%) showed revisit behavior
3.3. Advantages and Limitations of AI Testers
When is AI more effective than human testers?
Automation of repetitive tasks: AI can perform routine and repetitive testing tasks quickly and accurately, allowing human testers to focus on more complex scenarios. [7]
Speed and efficiency: AI-powered tools can execute test scripts and detect defects at a speed that is unattainable for human testers, speeding up the overall testing cycle. [7]
Self-learning ability: AI tools learn from previous test cycles and adapt, automatically handling new scenarios and updates without constant human input. [7]
Pattern recognition: AI excels at recognizing patterns based on large amounts of historical data, helping to identify recurring issues or trends that human testers might overlook.
Where do human testers remain essential?
Creative and out-of-the-box thinking: Humans are capable of creative thinking and coming up with unconventional test scenarios, which is essential for discovering unexpected bugs that AI might miss. [8]
User experience and intuition: Human testers can assess the usability and intuitive aspects of a game, which is crucial for ensuring a positive user experience.
Complex decision-making: When test situations are ambiguous or complex, human testers can incorporate contextual and subjective considerations that AI may not fully grasp.
Ethical and social considerations: Humans can assess the social and ethical implications of game content, which is important for respecting cultural sensitivities and societal norms.
The AI is really consistent in its performance. This makes it useful if your use case is to test a specific challenge in a different setting and see how players would react to it.
For AI, it is possible to make pixel-perfect jumps, which is difficult for humans to do.
Conclusion
In conclusion, while AI can significantly improve the efficiency and consistency of game testing, human testers remain essential for creative, contextual, and experience-based insights that ensure the game’s overall quality and appeal. Combining the strengths of both AI and human testers presents a more effective approach to game testing, enabling faster, more accurate defect detection alongside rich, user-centered feedback.
Final Conclusion
In conclusion, the role of curiosity in game testing is crucial for uncovering hidden edge cases. By integrating curiosity-driven reinforcement learning into AI agents, we can improve test coverage. This curiosity-driven exploration can help AI agents go beyond predefined objectives, identifying previously unknown bugs and enhancing the stability and playability of the game.
Furthermore, both AI and human testers bring invaluable strengths to the table when it comes to game testing, but their capabilities complement each other rather than replace one another. AI excels in speed, consistency, and efficiency, particularly for repetitive tasks and pattern recognition. Its ability to self-learn and adapt makes it an effective tool for uncovering defects quickly, especially in areas where human testers might struggle due to time constraints. However, AI’s limitations in creativity, adaptability to ambiguous situations and understanding of nuanced player experiences make it clear that human testers are still indispensable for assessing the overall quality of the game.
If I were to give tips for building a CNN-based reinforcement learning model that simulates human curiosity for game testing, I would recommend finding a balance between intrinsic and extrinsic motivation, while incorporating curiosity-driven exploration and novelty search. It’s crucial to dive deeper into the structure of the neural network and ensure the DQN is applied correctly, while also exploring the network architecture. A key tip would be to improve the YOLO model, as its performance right now isn’t consistent enough for reliable input. Additionally, I would rethink the output approach, as the current one limits the AI too much in what it can and cannot do, and when it can do it.
IEEE References
[1] D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, “Curiosity-driven exploration by self-supervised prediction,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) Workshops, 2017, pp. 16–17.
[2] C. Gordillo, J. Bergdahl, K. Tollmar, and L. Gisslén, “Imaproving playtesting coverage via curiosity driven reinforcement learning agents,” in Proc. IEEE Conference on Games, 2021, pp. 1–8.
[3] A. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
[4] M. G. Bellemare et al., “The Arcade Learning Environment: An evaluation platform for general agents,” Journal of Artificial Intelligence Research, vol. 47, pp. 253–279, 2013.
[5] H. Chen, M. S. Lee, and J. K. Park, “Curiosity-Driven Exploration in Interactive Game Environments,” in Proc. IEEE Conf. Games, 2021, pp. 102–108.
[6] M. van Dijk, R. De Vries, and P. Jansen, “Measuring Player Engagement: An Empirical Study on Curiosity and Exploration in Digital Games,” IEEE Trans. Games, vol. 12, no. 3, pp. 345–353, Sept. 2020.
[7] A. Jain, “AI bij softwaretesten,” Visure Solutions, Dec. 02, 2024. https://visuresolutions.com/nl/blog/manieren-waarop-ai-het-testen-van-software-zal-veranderen/?utm_source=chatgpt.com
[8] Computable, “Waarom testers onmisbaar zijn in softwareontwikkeling - Computable.nl,” Computable.nl, Oct. 16, 2023. https://www.computable.nl/2020/11/09/waarom-testers-onmisbaar-zijn-in-softwareontwikkeling/?utm_source=chatgpt.com
[9] Super Mario 3D Land, Nintendo
[10] The Legend of Zelda: A Link Between Worlds, Nintendo
[11] Kirby: Triple Deluxe, Nintendo
[12] Metroid: Samus Returns, Nintendo
[13] Donkey Kong Country Returns, Nintendo
[14] - OpenAI. (2025). ChatGPT. Retrieved from https://chat.openai.com. ChatGPT was used for spelling and grammar checking. The tool was solely used for textual improvements without altering the original meaning. feedback