# Getting started¶

## Setup¶

The setup uses conda, install it if necessary.

# with GPU support:
conda env create -f environment-gpu.yml
conda activate flatland-baseline-gpu-env

# or, without GPU support:
conda env create -f environment-cpu.yml
conda activate flatland-baseline-cpu-env


## Training¶

Let’s train a policy on a 25x25 environment with 5 agents using Ape-X:

\$ python ./train.py -f baselines/apex_tree_obs/apex.yaml


This training will start, and provide status updates from time to time:

Resources requested: 4/8 CPUs, 0/0 GPUs, 0.0/4.49 GiB heap, 0.0/1.56 GiB objects
Result logdir: /Users/flaurent/ray_results/flatland-sparse-small-tree-fc-apex
Number of trials: 1 (1 RUNNING)
+----------------------------+----------+-------+
| Trial name                 | status   | loc   |
|----------------------------+----------+-------|
| APEX_flatland_sparse_00000 | RUNNING  |       |
+----------------------------+----------+-------+
...
+----------------------------+----------+--------------------+--------+------------------+-------+----------+
| Trial name                 | status   | loc                |   iter |   total time (s) |    ts |   reward |
|----------------------------+----------+--------------------+--------+------------------+-------+----------|
| APEX_flatland_sparse_00000 | RUNNING  | 192.168.1.22:76819 |      2 |          400.018 | 62878 |  -1937.4 |
+----------------------------+----------+--------------------+--------+------------------+-------+----------+
...


Let’s have a look at baselines/apex_tree_obs/apex.yaml, the experiment configuration file we have used:

flatland-sparse-small-tree-fc-apex:
run: APEX
env: flatland_sparse
stop:
timesteps_total: 5000000 # 5e6
checkpoint_freq: 10
checkpoint_at_end: True
keep_checkpoints_num: 5
checkpoint_score_attr: episode_reward_mean
config:
num_workers: 3
num_envs_per_worker: 5
num_gpus: 0

env_config:
observation: tree
observation_config:
max_depth: 2
shortest_path_max_depth: 30

generator: sparse_rail_generator
generator_config: small_v0

wandb:
project: <w&b project name>
tags: ["small_v0", "tree_obs", "apex"]

model:
fcnet_activation: relu
fcnet_hiddens: [256, 256]
vf_share_layers: True

• We train for a timesteps_total of 5 millions steps using the APEX method.

• We use 3 workers (num_workers), which means 3 cores will be used. We don’t use a GPU (num_gpus: 0).

• We use the flatland_sparse environment, which is the standard one that uses the sparse_rail_generator and sparse_scedule_generator.

• We use a tree observation with a max_depth of 2 and a shortest_path_max_depth of 30.

• The model is a simple fully connected 2-layer neural with a relu non-linearity.

• Optionally, you can export the training metrics to Weights & Biases, in which case you need to specify your username and a project name.

Let’s look more closely at the environment that we use: we use the generator_config called small_v0. The various generator configs are located in envs/flatland/generator_configs. This specific generator config looks as follow:

width: 25
height: 25
number_of_agents: 5
max_num_cities: 4
grid_mode: False
max_rails_between_cities: 2
max_rails_in_city: 3
seed: 0
regenerate_rail_on_reset: True
regenerate_schedule_on_reset: True


This is the configuration used for all the baseline benchmarks. As stated before, it consists of a 25x25 environments with 5 agents. By storing the environment generator configurations in such files, we make it easier to compare various methods on the same task.

## Troubleshooting¶

### “ray.tune.error.TuneError: Insufficient cluster resources to launch trial”¶

This error means that you don’t have the hardware resources required to run the training. Adjust the value of num_workers and num_gpus to match your hardware. Not that you will need one core per worker, and an extra core for the learning process.