I am interested in learning mechanisms. Most of my research focuses on learning algorithms (Artificial Intelligence and Machine Learning). But I also did some research about the cognitive processes of learning (Cognitive Psychology and Cognitive Modeling).

Feel free to contact me if you have questions or comments about my research.

Artificial Intelligence / Machine Learning

The Independent Gamma Ensemble - A Modular Discounting Framework for Reinforcement Learning 


Reinforcement learning (RL) studies algorithms which learn to take optimal actions during multi-step decision making. The goal is to learn actions that lead to the maximum reward over time. For example, starting at a new job we need to learn the way to a good restaurant for lunch. At each street junction (our state) we need to decide which street to follow (action). We get a reward for reaching a restaurant that signals its food quality. Our goal is to learn the shortest way to a restaurant with a high reward.

Many RL methods learn for each state and action pair a value which encodes the future discounted reward sum that can be gained by following this action and the optimal actions afterward. The discounting of the future reward is exponentially over time. The further away the reward is, the stronger it gets discounted. A discount factor parameter (gamma) controls the strength of the discounting. If there are several restaurants to choose from, a weak discounting will prefer a restaurant with a high reward but that may be far away. A strong discounting prefers a more nearby restaurant, but which gives a smaller reward. Depending on our choice of the discount factor, the RL algorithm will learn the way to a different restaurant.

Classical algorithms have only one discount factor. In contrast, I propose the Independent Gamma Ensemble (IGE) which consists of several modules, each learning values with a different discount factor. This idea is based on neuroscientific research by Tanaka et al. (2007) which suggests that our brain also learns values for actions with different discount factors in different brain areas.

The IGE has two interesting properties. First, in tasks with several possible goals such as several restaurants the IGE learns and represents different behaviors (policies) in parallel. The IGE can learn the way to different restaurants, whereas classical RL methods only learn the way to one restaurant. Having learned several behaviors allows to switch between them depending on the context. For example, under stress, a nearby restaurant can be chosen by using a module with strong discounting. Whereas, under no stress, a module with a weak discounting can be used to go to a nice restaurant that is further away.

The second interesting property is that it has more information about its learned behaviors compared to classical algorithms. It usually learns for modules with similar discount factors the same behavior. Having modules that learned the same behavior, more information about its outcome can be decoded. For tasks where rewards are only given at the end, such as when we arrive at a restaurant, it is possible to decode the reward of the restaurant and how much time is needed to reach it, for each learned behavior. Classical RL algorithm cannot do this. It allows the IGE to choose the best behavior depending on its current goal. For example, one day we want to go to the restaurant with the best food, but the next day we have a time constraint of 30 min. The IGE can select for one day the behavior that results in the overall highest reward. On the next day, it can switch to a behavior that gives the best reward and needs only up to 30 min. The IGE can do this because it can decode the reward and the time information for each learned behavior.

The IGE is a new framework in RL with interesting features. The goal of my PhD project is to analyze its properties and to find potential application scenarios for it in machine learning.

More Information:

Adaptation of Optimization Algorithms to Problem Domains by Transfer Learning


Optimization is one of the most important problems in science and engineering. Its goal is to find the best solution for a function. For example, we want to find the parameters of a walking controller for a robot to achieve a fast locomotion. Our function takes as input the parameters of the controller and outputs the speed of the robot after it is measured. Common optimization algorithms are designed to work for a large set of functions, but not necessarily to be efficient for specific domains. A domain is a set of functions which share similar features. For example, the walking speed function for different humanoid robots is different for each robot, but they often have similar good parameters in certain regions of the parameter space. Common optimization algorithms do not make use of such information and often need many function evaluations to find a good solution. This can be very time intensive. I proposed a transfer learning approach to adapt optimization algorithms to specific problem domains to improve their performance.

My approach analyzes solved problems of a domain and it learns a solution space model for the domain. The model describes areas in the parameter space of the domain where good solutions are expected. Knowledge of these areas is used to improve the performance of the optimization algorithm for unseen problems of the same domain. Because of its general design, my method can be applied to a wide range of problems and algorithms.

I used it to adapt the Powell's Conjugate Direction Set algorithm and the CMA-ES algorithm to artificially generated problem domains with two and six dimensions. My approach learned successfully the solution space model of each domain and it improved the optimization performance on unseen problems of the same domain.

More Information:

Artificial Evolution of a Walking Controller for a Robot Dog


I developed a walking controller for a new quadrupedal robotic platform, the Spring-Dog (spDog). The spDog is a "dog" robot created by the Neural Computations Unit at OIST for their Reinforcement Learning and Artificial Evolution experiments. The walking controller that I developed is a central pattern generator composed of modified Hopf oscillators and has six parameters. The controller allowed the robot to walk in a desired direction.

Afterward, I used the Evolution Strategy algorithm to optimize the parameters of the controller to improve its walking speed. Moreover, I developed a simulation model of the robot for the NERD simulator. The simulator model allows to perform optimizations and evolutionary experiments much faster and with less supervision compared to optimizations on the physical robot.

The evolutionary experiments resulted in a stable and fast walking controller that was further used by the unit to implement behaviors for the spDog.

More Information:
  • Video of the spDog robot with a manually configured and an evolved walking controller
  • Video of the simulated spDog robot with a manually configured and an evolved walking controller

Learning the Motor Dynamics of a Humanoid Robot


The evolution of robot controllers such as a walking controller for a humanoid is time intensive and needs usually human supervision. To avoid these drawbacks the Neurocybernetics group at the University of Osnabrück has developed a physics simulator (NERD) for their artificial evolution experiments. Nonetheless, the motor dynamics of the simulator need to be carefully tuned so that the simulation can successfully reproduce the behavior of the physical robot.

I implemented a system identification procedure to learn the dynamics of robot simulation models. The behavior of the real robot is recorded for several complex movements. It can then be compared to the simulation model which performs the same movements. Optimization methods such as Powell’s Conjugate Direction Set method and Evolutionary Strategy are then applied to optimize the parameters of the model to match the behavior of the real robot.

The procedure and the optimization methods were evaluated on a humanoid robot. Parameters of a model for the servomotors of the robot have been learned. Powell’s method and Evolutionary Strategy were able to identify parameters which yield an acceptable model quality that is better than a manually tuned parameter set. The identified motor model was further used by the Neurocybernetics group for their experiments.

More Information:

Cognitive Psychology / Cognitive Modeling

A Critical View on the Model-free / Model-based Decision Learning Theory


Decision making is a cognitive process to select an option (action) from a set of alternatives. Learning becomes important when the outcomes of decisions are unknown and need to be learned. Decision learning describes this process of learning to make decisions.

Reinforcement learning, a field in machine learning, provides a framework to describe the cognitive processes of decision learning. Two theories about decision learning have been developed in the past using the reinforcement learning framework. The model-free/model-based (MF/MB) theory proposes that the general procedure of decision learning is composed of two distinct processes. One using model-free mechanisms and the other using model-based mechanisms. The second theory, the actor-critic theory, proposes the existence of two components to explain model-free mechanisms. The critic learns the values for certain states and the actor learns which actions to select to reach states with high values. Current MF/MB models are critic-only models, i.e. actions are only selected based on values and not by a separate actor component. This is contrast to the actor-critic theory about model-free processes.

I created an experimental design for a decision task with the goal to identify decision learning behavior that can not be modeled with critic-only methods. I collected preliminary data from human participants and showed that current MF/MB models can not explain some of their behavior. The results suggest that current MF/MB models need to be adapted, possibly by introducing an actor-critic procedure for their model-free component. Improved models could help to model human decision learning better and to facilitate our understanding of its cognitive processes.

I started this project as part of the preparation for my PhD, but I stopped it after the collection of preliminary data to concentrate on the Gamma - Ensemble project.

More Information:
  • Poster that summarizes the project: Reinke, C., Uchibe, E.; Doya, K.; A Critical View on the Model based/Model free Learning Theory, The 7th Research Area Meeting Grant-in Aid for Scientific Research on Innovative Areas: Elucidation of the Neural Computation for Prediction and Decision Making, Kitakyushu, Japan, June 2014