Environment Configurations

In this challenge, the configuration of all of the evaluation environments is disclosed! The only parameter kept secret is the seed to ensure that the submissions solve the problems in a generale way.

Round 2

In Round 2, your submission has to solve as many environments as possible in 8 hours. The number of environments is such that it is not possible to solve them all in 8 hours (if anyone manages to reach the end, we’ll just generate more 😉).

The environments start very small and have increasingly larger sizes. The evaluation stops when less than 25% of the agents reach their target (averaged over each test of 10 episodes), or after 8h, whichever comes first. Each solved environment awards you points, and the goal is to get as many points as possible.

This means that the challenge is not only to find the best solutions possible, but also to find solutions quickly! This is consistent with the business requirements of railway companies: it’s very important for them to be able to re-route trains as fast as possible when a malfunction occurs.

Each test consists of 10 environments. Each environment in a test has a different malfunction interval (the malfunction interval is the inverse of the malfunction rate):

  • Level_0: no malfunction at all

  • Level_1: malfunction_interval = min_malfunction_interval = 250

  • Level_2: malfunction_interval = 2*min_malfunction_interval = 500

  • Level_9: malfunction_interval = 9*min_malfunction_interval = 2250

For each environment, you get a normalized reward between 0.0 and 1.0 (equal to the normalized reward as defined in Round 1 + 1.0). The final score is the sum of all the normalized rewards. See the Evaluation Metrics page for more details.

All the environment use the following parameters in Round 2:

  • n_envs_run = 10

  • min_malfunction_interval = 250

  • max_rails_in_city = 4

  • malfunction_duration = [20,50]

  • max_rails_between_cities = 2

  • speed_ratios = {1.0: 1.0}

  • grid_mode = False

Remember that the goal is no longer to solve all the tests - this list is infinite! The goal is to solve as many as possible, with the best score possible, within the 8h overall time limit.

The environment parameters are calculated as follow:

  • \(n\_agents_{n+1} = n\_agents_{n}+ceiling(10^{len(n\_agents_{n})}-1)*0.75\)

  • \(n\_cities_{n} = (n\_agents_{n} // 10) + 2\)

  • \(x\_dim_{n} = ceiling(sqrt((2*(ceiling(max\_rails\_in\_city/2) + 3))^2*(1.5*n\_cities_{n})))+7\)

  • \(y\_dim_{n} = x\_dim_{n}\)

You can check out this Google Spreadsheet to calculate the parameters for any environments.

test

n_agents

x_dim

y_dim

n_cities

Test_0

1

25

25

2

Test_1

2

25

25

2

Test_2

3

25

25

2

Test_3

4

25

25

2

Test_4

5

25

25

2

Test_5

6

25

25

2

Test_6

7

25

25

2

Test_7

8

25

25

2

Test_8

9

25

25

2

Test_9

10

29

29

3

Test_10

18

29

29

3

Test_11

26

32

32

4

Test_12

34

35

35

5

Test_13

42

37

37

6

Test_14

50

40

40

7

Test_15

58

40

40

7

Test_16

66

42

42

8

Test_17

74

44

44

9

Test_18

82

46

46

10

Test_19

90

48

48

11

Test_20

98

48

48

11

Test_21

106

50

50

12

Test_22

181

62

62

20

Test_23

256

71

71

27

Test_24

331

80

80

35

Test_25

406

87

87

42

Test_26

481

94

94

50

Test_27

556

100

100

57

Test_28

631

106

106

65

Test_29

706

111

111

72

Test_30

781

117

117

80

Test_31

856

122

122

87

Test_32

931

127

127

95

Test_33

1006

131

131

102

Test_34

1756

170

170

177

Test_35

2506

202

202

252

Test_36

3256

229

229

327

Test_37

4006

253

253

402

Test_38

4756

275

275

477

Test_39

5506

295

295

552

Test_40

6256

314

314

627

Round 1

n_envs_run indicates the number of environments ran for each test. A mean score is calculated for each of the 14 tests. The final score is the mean of these means.

The malfunction interval differs from environment to environment, but it is never smaller than min_malfunction_interval. In each test, some environments have no malfunctions at all.

All the environment use the following parameters in Round 1:

  • malfunction_duration = [20,50]

  • max_rails_between_cities = 2

  • speed_ratios = {1.0: 1.0}

  • grid_mode = False

test

n_agents

x_dim

y_dim

n_cities

max_rails_in_city

min_malfunction_interval

n_envs_run

Test_0

5

25

25

2

3

50

50

Test_1

10

30

30

2

3

100

50

Test_2

20

30

30

3

3

200

50

Test_3

50

20

35

3

3

500

40

Test_4

80

35

20

5

3

800

30

Test_5

80

35

35

5

4

800

30

Test_6

80

40

60

9

4

800

30

Test_7

80

60

40

13

4

800

30

Test_8

80

60

60

17

4

800

20

Test_9

100

80

120

21

4

1000

20

Test_10

100

100

80

25

4

1000

20

Test_11

200

100

100

29

4

2000

10

Test_12

200

150

150

33

4

2000

10

Test_13

400

150

150

37

4

4000

10