Research Ideas

This page lists publications, blogs posts and other resources that could be interesting in the context of the Flatland environment. These ideas are currently unexplored, and anyone is welcome to research them. Conversely, feel free to submit PRs if you have interesting ideas that you are willing to share with the community!

Publications

Strong Generalization and Efficiency in Neural Programs (Li et al.)

In this paper, DeepMind trains a neural program (a program “learned” using machine learning) to solve various tasks: sorting lists, solving knapsack problems… In some settings, the resulting algorithms perform better than the original hand-coded ones.

Could we use this for Flatland: instead of writing a planning solution ourselves, and instead of training an RL agent that learns to take decisions, what about we train a full program that learns to plan?

We study the problem of learning efficient algorithms that strongly generalize in the framework of neural program induction. By carefully designing the input / output interfaces of the neural model and through imitation, we are able to learn models that produce correct results for arbitrary input sizes, achieving strong generalization. Moreover, by using reinforcement learning, we optimize for program efficiency metrics, and discover new algorithms that surpass the teacher used in imitation. With this, our approach can learn to outperform custom-written solutions for a variety of problems, as we tested it on sorting, searching in ordered lists and the NP-complete 0/1 knapsack problem, which sets a notable milestone in the field of Neural Program Induction. As highlights, our learned model can perform sorting perfectly on any input data size we tested on, with O(nlogn) complexity, whilst outperforming hand-coded algorithms, including quick sort, in number of operations even for list sizes far beyond those seen during training.

Comparison of path planning methods for a multi-robot team (Hvězda)

Comparison of path planning methods for a multi-robot team. Very similar to flatland environment, apart from the junction constraints, same speed, no failures…? (AFAICT). The algorithm assumes that the resource graph is constructed such that resources are of two types: intersection resources with capacity 1 and lane resources with capacity 1 or greater. Another assumption is also that if multiple agents are present on the same resource then they are all traveling in the same direction and their order does not change, meaning they cannot overtake each other. The idea is that the lanes are not wide enough for two agents to drive in parallel but long enough so that agents can drive behind each other.

This master thesis discusses the topic of multi-agent pathplanning. For this reason several algorithms were picked and described in the first part of this thesis. All algorithms were implemented in C++ and from experience from working with these algorithms several modifications and improvements were proposed and implemented. The second part of the thesis elaborates on the results of experiments performed on the basic versions of the algorithms as well as the improvements and discusses their effect. This part discusses the potential applications the algorithms as well. All algorithms were tested on the map of robotic warehouse as well as grid maps from pc games.

Chip Placement with Deep Reinforcement Learning (Mirhoseini et al.)

This work from Google AI uses reinforcement learning to place chips on a PCB. This placement problem could be seen as a planning problem, if you consider the electric lines as trains which need to reach a destination without blocking each other.

In this work, we present a learning-based approach to chip placement, one of the most complex and time-consuming stages of the chip design process. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of chip blocks, our method becomes better at rapidly generating optimized placements for previously unseen chip blocks. To achieve these results, we pose placement as a Reinforcement Learning (RL) problem and train an agent to place the nodes of a chip netlist onto a chip canvas. To enable our RL policy to generalize to unseen blocks, we ground representation learning in the supervised task of predicting placement quality. By designing a neural architecture that can accurately predict reward across a wide variety of netlists and their placements, we are able to generate rich feature embeddings of the input netlists. We then use this architecture as the encoder of our policy and value networks to enable transfer learning. (…)

Shunting Trains with Deep Reinforcement Learning (Peer, E., Menkovski, V., Zhang, Y., & Lee, W-J. (2018))

The Train Unit Shunting Problem (TUSP) is a difficult sequential decision making problem faced by Dutch Railways (NS). Current heuristic solutions under study at NS fall short in accounting for uncertainty during plan execution and do not efficiently support replanning. Furthermore, the resulting plans lack consistency. We approach the TUSP by formulating it as a Markov Decision Process and develop an image-like state space representation that allows us to develop a Deep Reinforcement Learning (DRL) solution. The Deep Q-Network efficiently reduces the state space and develops an on-line strategy for the TUSP capable of dealing with uncertainty and delivering significantly more consistent solutions compared to approaches currently being developed by NS.

  • PDF Link - TUE

  • Peer, E., Menkovski, V., Zhang, Y., & Lee, W-J. (2018). Shunting trains with deep reinforcement learning. In 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC) (pp. 3063-3068). [8616516] IEEE-SMC. https://doi.org/10.1109/SMC.2018.00520

  • Found by: Adrian