Research
Most of my research focuses on learning algorithms in Artificial Intelligence and Machine Learning. But I also did research on the cognitive processes of learning related to Cognitive Psychology and Cognitive Modeling.
Feel free to contact me if you have questions or comments.
Artificial Intelligence / Machine Learning
Transfer and Meta Reinforcement Learning for Social Robotics
(ongoing)
We are investigating Deep Reinforcement Learning (DRL) methods for Social Robotics as part of the European H2020 SPRING project (Socially Pertinent Robots in Gerontological Healthcare) such as for human-aware navigation. DRL uses deep neural networks as function approximators for its value functions and policies to handle such complex tasks. But DRL has the problem of requiring large amounts of training data that has to be collected in simulation or on the robot. A further problem for many social robotics tasks is, that the definition of the reward function, which is optimized by DRL, is not evident. It needs to be iteratively designed by defining an initial guess, learning the optimal behavior for it, and then potentially adapting it because its resulting behavior is not as expected. For example, a reward function for navigation among humans could give a negative reward if colliding with people and a positive reward if reaching its goal position. If the negative reward is too small compared to the positive reward then the robot might still learn a behavior that bumps into people. Seeing this unwanted behavior we would have to define a stronger negative reward for colliding with people and train the behavior again. This process might repeat until a good reward function and its associated behavior are found. But training the behavior at every iteration from scratch, especially with DRL, is data, time, and cost expensive.
As a solution, we explore Transfer and Meta Learning methods in the context of DRL with the goal to reduce the amount of needed data. These methods reuse knowledge from previously trained reward functions to improve the learning of a new reward function, reducing the amount of required data.
More Information:
Paper about the Software Architecture of the SPRING project and initial results from human-interaction experiments: Alameda-Pineda X., Addlesee A., Hernandez Garcia D., Reinke C., et al.: Socially Pertinent Robots in Gerontological Healthcare. (arXiv: 2404.07560), 2024
Paper about an extension of PEARL (a Meta Learning procedure) and its application to Social Robotics: Ballou A., Alameda-Pineda X., Reinke C.: Variational Meta Reinforcement Learning for Social Robotics. Applied Intelligence, 2023
Paper about a novel Transfer Learning procedure that extends Successor Features allowing transfer between tasks with changing general reward functions (not just linear reward functions as for Successor Features): Reinke C., Alameda-Pineda X.: Successor Feature Representations. Transactions on Machine Learning Research, 2023
Workshop paper about a combination of Successor Features and Neural Episodic Control for Transfer Learning: Emukpere, D., Alameda-Pineda, X., Reinke, C.: Successor Feature Neural Episodic Control. Fifth Workshop on Meta-Learning at NeurIPS, 2021
Automated Discovery of Patterns in Complex Systems
(ongoing)
The understanding of novel systems, for example in physics or chemistry, requires knowledge about their possible behaviors. The goal of this project is to develop AI and ML approaches for an automated discovery of interesting phenomena (patterns or dynamics) that can be produced in a system. An example is the work by Grizhou et al. about the identification of chemical compounds of oil droplets to discover their possible movement behaviors on a water film. We develop methods that can automatically create experiments to test different compounds in order to find interesting behaviors of their oil droplets.
We use intrinsically motivated goal exploration processes (IMGEPs) as exploration methods. They are based on the concept of intrinsic curiosity, i.e. a mechanisms behind the process of development and learning of skills in humans or animals . IMGEPs generate experiments by imagining goals, then try to achieve them by leveraging their past discoveries. Progressively they learn which goals are achievable. The goals are defined in a special goal space that describes the important features of the system that should be explored. An important part of our research is to learn those features autonomously based on raw sensor data from the system, such as video recordings. Deep learning methods such as variational autoencoders are applied to learn the features from the data.
We applied our system as an initial test on Lenia, a continuous game-of-life cellular automaton. Our system could discover a variety of complex self-organized visual patterns.
More Information:
Paper about pattern discovery in Lenia, a cellular automaton: Reinke, C., Etcheverry, M., Oudeyer P.: Intrinsically Motivated Discovery of Diverse Patterns in Self-Organizing Systems, Proceedings of the International Conference on Learning Representations (ICLR), 2020
Recorded presentation at ICLR 2020
Blog post by Mayalen Etcheverry about our project
Adaptive Reinforcement Learning via Modular Discounting
(2015 - 2018)
Reinforcement learning (RL) studies algorithms which learn to take optimal actions during multi-step decision making. The goal is to learn actions that lead to the maximum reward over time. For example, starting at a new job we need to learn the way to a good restaurant for lunch. At each street junction (our state) we need to decide which street to follow (action). We get a reward for reaching a restaurant that signals its food quality. Our goal is to learn the shortest way to a restaurant with a high reward.
Many RL methods learn for each state and action pair a value which encodes the future discounted reward sum that can be gained by following this action and the optimal actions afterward. The discounting of the future reward is exponentially over time. The further away the reward is, the stronger it gets discounted. A discount factor parameter (gamma) controls the strength of the discounting. If there are several restaurants to choose from, a weak discounting will prefer a restaurant with a high reward but that may be far away. A strong discounting prefers a more nearby restaurant, but which gives a smaller reward. Depending on our choice of the discount factor, the RL algorithm will learn the way to a different restaurant.
Classical algorithms have only one discount factor. In contrast, I propose the Independent Gamma Ensemble (IGE) which consists of several modules, each learning values with a different discount factor. This idea is based on neuroscientific research by Tanaka et al. (2007) which suggests that our brain also learns values for actions with different discount factors in different brain areas.
The IGE has two interesting properties. First, in tasks with several possible goals such as several restaurants the IGE learns and represents different behaviors (policies) in parallel. The IGE can learn the way to different restaurants, whereas classical RL methods only learn the way to one restaurant. Having learned several behaviors allows to switch between them depending on the context. For example, under stress, a nearby restaurant can be chosen by using a module with strong discounting. Whereas, under no stress, a module with a weak discounting can be used to go to a nice restaurant that is further away.
The second interesting property is that it has more information about its learned behaviors compared to classical algorithms. It usually learns for modules with similar discount factors the same behavior. Having modules that learned the same behavior, more information about its outcome can be decoded. For tasks where rewards are only given at the end, such as when we arrive at a restaurant, it is possible to decode the reward of the restaurant and how much time is needed to reach it, for each learned behavior. Classical RL algorithm cannot do this. It allows the IGE to choose the best behavior depending on its current goal. For example, one day we want to go to the restaurant with the best food, but the next day we have a time constraint of 30 min. The IGE can select for one day the behavior that results in the overall highest reward. On the next day, it can switch to a behavior that gives the best reward and needs only up to 30 min. The IGE can do this because it can decode the reward and the time information for each learned behavior.
The IGE is a new framework in RL with interesting features. The goal of my PhD project is to analyze its properties and to find potential application scenarios for it in machine learning.
More Information:
My Ph.D. thesis: Reinke, C.; The Gamma-Ensemble - Adaptive Reinforcement Learning via Modular Discounting, 2018
Short paper about time adaptive reinforcement learning with the IGE: Reinke, C.; Time Adaptive Reinforcement Learning, ICLR 2020 Workshop: Beyond “Tabula Rasa” in Reinforcement Learning (BeTR-RL), 2020
Paper about average reward optimization with the IGE: Reinke, C., Uchibe, E., Doya K.; Average Reward Optimization with Multiple Discounting Reinforcement Learners, Proceedings of the 2017 International Conference on Neural Information Processing (ICONIP), 2017
Short paper about average reward optimization with a similar framework to the IGE: Reinke, C., Uchibe, E., Doya K.; Maximizing the Average Reward in Episodic Reinforcement Learning Tasks, Proceedings of the 2015 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS), pp. 420-421, 2015
Transfer Learning for Optimization Algorithms
(2017)
Optimization is one of the most important problems in science and engineering. Its goal is to find the best solution for a function. For example, we want to find the parameters of a walking controller for a robot to achieve a fast locomotion. Our function takes as input the parameters of the controller and outputs the speed of the robot after it is measured. Common optimization algorithms are designed to work for a large set of functions, but not necessarily to be efficient for specific domains. A domain is a set of functions which share similar features. For example, the walking speed function for different humanoid robots is different for each robot, but they often have similar good parameters in certain regions of the parameter space. Common optimization algorithms do not make use of such information and often need many function evaluations to find a good solution. This can be very time intensive. I proposed a transfer learning approach to adapt optimization algorithms to specific problem domains to improve their performance.
My approach analyzes solved problems of a domain and it learns a solution space model for the domain. The model describes areas in the parameter space of the domain where good solutions are expected. Knowledge of these areas is used to improve the performance of the optimization algorithm for unseen problems of the same domain. Because of its general design, my method can be applied to a wide range of problems and algorithms.
I used it to adapt the Powell's Conjugate Direction Set algorithm and the CMA-ES algorithm to artificially generated problem domains with two and six dimensions. My approach learned successfully the solution space model of each domain and it improved the optimization performance on unseen problems of the same domain.
More Information:
My Master thesis explains the project in detail
Short Paper: Reinke, C., Doya K.; Adaptation of Optimization Algorithms to Problem Domains by Transfer Learning, Proceedings of the 2017 International Conference on Intelligent Informatics and Biomedical Sciences, pp. 214-215, 2017
Artificial Evolution of a Walking Controller for a Robot Dog
(2011)
I developed a walking controller for a new quadrupedal robotic platform, the Spring-Dog (spDog). The spDog is a "dog" robot created by the Neural Computations Unit at OIST for their Reinforcement Learning and Artificial Evolution experiments. The walking controller that I developed is a central pattern generator composed of modified Hopf oscillators and has six parameters. The controller allowed the robot to walk in a desired direction.
Afterward, I used the Evolution Strategy algorithm to optimize the parameters of the controller to improve its walking speed. Moreover, I developed a simulation model of the robot for the NERD simulator. The simulator model allows to perform optimizations and evolutionary experiments much faster and with less supervision compared to optimizations on the physical robot.
The evolutionary experiments resulted in a stable and fast walking controller that was further used by the unit to implement behaviors for the spDog.
More Information:
Internal report about the project
Video of the spDog robot with a manually configured and an evolved walking controller
Video of the simulated spDog robot with a manually configured and an evolved walking controller
Learning the Motor Dynamics of a Humanoid Robot
(2010)
The evolution of robot controllers such as a walking controller for a humanoid is time intensive and needs usually human supervision. To avoid these drawbacks the Neurocybernetics group at the University of Osnabrück has developed a physics simulator (NERD) for their artificial evolution experiments. Nonetheless, the motor dynamics of the simulator need to be carefully tuned so that the simulation can successfully reproduce the behavior of the physical robot.
I implemented a system identification procedure to learn the dynamics of robot simulation models. The behavior of the real robot is recorded for several complex movements. It can then be compared to the simulation model which performs the same movements. Optimization methods such as Powell’s Conjugate Direction Set method and Evolutionary Strategy are then applied to optimize the parameters of the model to match the behavior of the real robot.
The procedure and the optimization methods were evaluated on a humanoid robot. Parameters of a model for the servomotors of the robot have been learned. Powell’s method and Evolutionary Strategy were able to identify parameters which yield an acceptable model quality that is better than a manually tuned parameter set. The identified motor model was further used by the Neurocybernetics group for their experiments.
More Information:
My Bachelor thesis explains the project in detail
Poster that summarizes the project: Reinke, C.; Automated Identification of Motor Models for a Humanoid Robot, Interdisciplinary College 2010, Günne, Germany, March 2010
Cognitive Psychology / Cognitive Modeling
A Critical View on the Model-free / Model-based Decision Learning Theory
(2014)
Decision making is a cognitive process to select an option (action) from a set of alternatives. Learning becomes important when the outcomes of decisions are unknown and need to be learned. Decision learning describes this process of learning to make decisions.Reinforcement learning, a field in machine learning, provides a framework to describe the cognitive processes of decision learning. Two theories about decision learning have been developed in the past using the reinforcement learning framework. The model-free/model-based (MF/MB) theory proposes that the general procedure of decision learning is composed of two distinct processes. One using model-free mechanisms and the other using model-based mechanisms. The second theory, the actor-critic theory, proposes the existence of two components to explain model-free mechanisms. The critic learns the values for certain states and the actor learns which actions to select to reach states with high values. Current MF/MB models are critic-only models, i.e. actions are only selected based on values and not by a separate actor component. This is contrast to the actor-critic theory about model-free processes.
I created an experimental design for a decision task with the goal to identify decision learning behavior that can not be modeled with critic-only methods. I collected preliminary data from human participants and showed that current MF/MB models can not explain some of their behavior. The results suggest that current MF/MB models need to be adapted, possibly by introducing an actor-critic procedure for their model-free component. Improved models could help to model human decision learning better and to facilitate our understanding of its cognitive processes.
I started this project as part of the preparation for my PhD, but I stopped it after the collection of preliminary data to concentrate on the Gamma - Ensemble project.
More Information:
Internal report about the project
Poster that summarizes the project: Reinke, C., Uchibe, E.; Doya, K.; A Critical View on the Model based/Model free Learning Theory, The 7th Research Area Meeting Grant-in Aid for Scientific Research on Innovative Areas: Elucidation of the Neural Computation for Prediction and Decision Making, Kitakyushu, Japan, June 2014