Getting started#
Setup#
The setup uses conda, install it if necessary.
# with GPU support:
conda env create -f environment-gpu.yml
conda activate flatland-baseline-gpu-env
# or, without GPU support:
conda env create -f environment-cpu.yml
conda activate flatland-baseline-cpu-env
Training#
Let’s train a policy on a 25x25 environment with 5 agents using Ape-X:
$ python ./train.py -f baselines/apex_tree_obs/apex.yaml
This training will start, and provide status updates from time to time:
Resources requested: 4/8 CPUs, 0/0 GPUs, 0.0/4.49 GiB heap, 0.0/1.56 GiB objects
Result logdir: /Users/flaurent/ray_results/flatland-sparse-small-tree-fc-apex
Number of trials: 1 (1 RUNNING)
+----------------------------+----------+-------+
| Trial name | status | loc |
|----------------------------+----------+-------|
| APEX_flatland_sparse_00000 | RUNNING | |
+----------------------------+----------+-------+
...
+----------------------------+----------+--------------------+--------+------------------+-------+----------+
| Trial name | status | loc | iter | total time (s) | ts | reward |
|----------------------------+----------+--------------------+--------+------------------+-------+----------|
| APEX_flatland_sparse_00000 | RUNNING | 192.168.1.22:76819 | 2 | 400.018 | 62878 | -1937.4 |
+----------------------------+----------+--------------------+--------+------------------+-------+----------+
...
Let’s have a look at baselines/apex_tree_obs/apex.yaml
, the experiment configuration file we have used:
flatland-sparse-small-tree-fc-apex:
run: APEX
env: flatland_sparse
stop:
timesteps_total: 5000000 # 5e6
checkpoint_freq: 10
checkpoint_at_end: True
keep_checkpoints_num: 5
checkpoint_score_attr: episode_reward_mean
config:
num_workers: 3
num_envs_per_worker: 5
num_gpus: 0
env_config:
observation: tree
observation_config:
max_depth: 2
shortest_path_max_depth: 30
generator: sparse_rail_generator
generator_config: small_v0
wandb:
project: <w&b project name>
entity: <w&b username>
tags: ["small_v0", "tree_obs", "apex"]
model:
fcnet_activation: relu
fcnet_hiddens: [256, 256]
vf_share_layers: True
We train for a
timesteps_total
of 5 millions steps using theAPEX
method.We use 3 workers (
num_workers
), which means 3 cores will be used. We don’t use a GPU (num_gpus: 0
).We use the
flatland_sparse
environment, which is the standard one that uses thesparse_rail_generator
andsparse_scedule_generator
.We use a tree observation with a
max_depth
of 2 and ashortest_path_max_depth
of 30.The model is a simple fully connected 2-layer neural with a relu non-linearity.
Optionally, you can export the training metrics to Weights & Biases, in which case you need to specify your username and a project name.
Let’s look more closely at the environment that we use: we use the generator_config
called small_v0
. The various generator configs are located in envs/flatland/generator_configs
. This specific generator config looks as follow:
width: 25
height: 25
number_of_agents: 5
max_num_cities: 4
grid_mode: False
max_rails_between_cities: 2
max_rails_in_city: 3
seed: 0
regenerate_rail_on_reset: True
regenerate_schedule_on_reset: True
This is the configuration used for all the baseline benchmarks. As stated before, it consists of a 25x25 environments with 5 agents. By storing the environment generator configurations in such files, we make it easier to compare various methods on the same task.
Troubleshooting#
“ray.tune.error.TuneError: Insufficient cluster resources to launch trial”#
This error means that you don’t have the hardware resources required to run the training. Adjust the value of num_workers
and num_gpus
to match your hardware. Not that you will need one core per worker, and an extra core for the learning process.