In Flatland, you have full control over the observations that your agents will work with. Three observations are provided as starting point. However, you are encouraged to implement your own.
The three provided observations are:
Global, local and tree: A visual summary of the three provided observations.
The global observation is the simplest one. In this case, every agent is provided a global view of the full Flatland environment. This can be compared to the full, raw-pixel data used in Atari games. The size of the observation space is
h × w × c, where
h is the height of the environment,
w is the width of the environment and
c is the number of channels of the environment. These channels can be modified by the participants but in the initial configuration, we include the following
h × w channels:
Transition maps: provides a unique value for each type of transition map and its orientation. Its dimensions is
h × w × 16assuming 16 bits encoding of transitions. Transition maps represent the movements allowed on a cell, read more about them here.
Agent states: A 3D array
h × w × 5containing:
Channel 0: one-hot representation of the self agent position and direction
Channel 1: other agents’ positions and direction
Channel 2: self and other agents’ malfunctions
Channel 3: self and other agents’ fractional speeds
Channel 4: number of other agents ready to depart from that position
Agent targets: A 3D arrays
h × w × 2containing respectively the position of the current agent target, and the positions of the other agents’ targets. The positions of the targets of the other agents is a simple
1flag, therefore this observation doesn’t indicate where each other agent is heading to.
This observation space is well suited for single-agent navigation but does not provide enough information to solve the multi-agent navigation task, thus participants must improve on this observation space to solve the challenge.
The tree observation is defined in flatland.envs.observations.GlobalObsForRailEnv
Local grid observation¶
The local grid observation has shown limited experimental results and is considered deprecated. We keep it for historical purpose, and because it may be useful when combined with other observations. Be aware that its implementation is not currently supported.
The local grid observation is very similar to the global observation, where we only replace
w by agent specific dimensions. The agent is always situated at the position
(0, (w + 1)/2) within the observation grid, and the observation grid is rotated according to the agent’s direction such that the full height
h of the observation grid is in front of the agent.
The initial local grid view provides the same channels as the initial global view introduced above. This observation space offers benefits over the global view, mostly by reducing the amount of irrelevant information in the observation space. Global navigation with this local observation would be impossible if no general information about the target location were given (especially when the target is outside of view). We therefore compute a distance map for every agent-target and provide this distance map as an additional channel:
Distance map: This additional channel in the local observation grid allows for a sense of direction without the need of a global view.
An abstract visualization of the local field of view of an agent. The green boxes represent visible cells in the agents field of view. This field of view is turned as the agent’s direction changes.
The tree observation is defined in flatland.envs.observations.LocalObsForRailEnv
The tree observation exploits the fact that a railway network is a graph and thus the observation is only built along allowed transitions in the graph. The observation is generated by spanning a 4 branched tree from the current position of the agent. Each branch follows the allowed transitions (backward branch only allowed at dead-ends) until a cell with multiple allowed transitions is reached. Here the information gathered along the branch is stored as a node in the tree.
Here is a small example of a railway network with an agent in the top left corner. The tree observation is build by following the allowed transitions for that agent:
As we move along the allowed transitions we build up a tree where a new node is created at every cell where the agent has different possibilities (Switch), dead-end or the target is reached.
It is important to note that the tree observation is always build according to the orientation of the agent at a given node. This means that each node always has 4 branches coming from it in the directions Left, Forward, Right and Backward. These are illustrated with different colors in the figure below. The tree is build form the example rail above. Nodes where there are no possibilities are filled with
-inf and are not all shown here for simplicity. The tree however, always has the same number of nodes for a given tree depth.
Each node is filled with information gathered along the path to the node. Currently each node contains 12 features:
Channel 0: if own target lies on the explored branch the current distance from the agent in number of cells is stored.
Channel 1: if another agents target is detected the distance in number of cells from current agent position is stored.
Channel 2: if another agent is detected the distance in number of cells from current agent position is stored.
Channel 3: possible conflict detected - this relies on a predictor that we will introduce afterward
tot_dist = Other agent predicts to pass along this cell at the same time as the agent, we store the distance in number of cells from current agent position
0 = No other agent reserve the same cell at similar time
Channel 4: if an not usable switch (for agent) is detected we store the distance. An unusable switch is a switch where the agent does not have any choice of path (ie the agent is blocked), but other agents coming from different directions might.
Channel 5: This feature stores the distance (in number of cells) to the next node (e.g. switch or target or dead-end)
Channel 6: minimum remaining travel distance from node to the agent’s target given the direction of the agent if this path is chosen
Channel 7: number of agents going in the same direction found on path to node
Channel 8: number of agents going in the opposite direction found on path to node
Channel 9: malfunctioning/blocking agents, returns the number of time steps the observed agent will remain blocked
Channel 10: slowest observed speed of an agent in same direction
1 if no agent is observed
min_fractional speed otherwise
Channel 11: number of agents ready to depart but no yet active
Missing values are handled as follow:
Missing/padding nodes are filled in with
Missing values in present node are filled in with
In case of the root node, the values are `[0, 0, 0, 0, distance from agent to target, own malfunction, own speed]
In case the target node is reached, the values are
[0, 0, 0, 0, 0].
The tree observation is defined in flatland.envs.observations.TreeObsForRailEnv
Notice how channel 4 indicates if a possible conflict is detected. In order to predict conflicts, the tree observation relies on a predictor, which anticipates where agents will be in the future. We provide a stock predictor that assumes each agent just travels along its shortest path. We will talk in more details about predictors when introducing custom observations.