Operations Research
===


The following sections describe how solve a Flatland problem with mixed-integer programming (MIP), CP-SAT solvers 
problems. 

It assumes that the Flatland problem can be solved by using shortes path walks only. That means for each agents the 
system has only to decide whether it can make one step along agent's shortes path or not. If the problem can be 
solved under this assumption the problem is a simple ordering-problem. No alternative paths are required. 

### Dependencies
Install OR-Tools for Python

The fastest way to get Google OR-Tools is to install the Python binary version. 
If you already have Python 2.7 or 3.5+ (and the Python package manager PIP) installed, you can do so as follows:
````python
python -m pip install --upgrade --user ortools
````

### Getting started
After the installation is complete, you are ready to use the ```ProblemDescription```. Then you need to import a 
problem solver, such as the ```MIPSolver``` or the ```CPSATSolver```. 

```python
from solver.Google_OR_Tools.cp_sat_solver import CPSATSolver
from solver.Google_OR_Tools.mip_solver import MIPSolver
from solver.Google_OR_Tools.problem_description import ProblemDescription
```


Next step is to create or to load from file a RailEnv environment. 
```python
env = RailEnv(...)
```


Once the Flatland RailEnv is available the problem to solve is defined. To solve the Flatland problem you have to 
create a ```ProblemDescription``` object. The ```ProblemDescription``` first requires a solver and the reference to 
the RailEnv environment (env). Then you have to define the path the agents travel along. Next you can pass the 
maximum allowed steps for solving the problem (max_agents_steps_allowed). 
This parameter declares how many steps an agent can use to reach the target. 
The 3rd parameter (calculation_time_limit_milliseconds) is an optional parameter. 
It limits the computation time for solving the MIP problem. The passed time must be in in milliseconds.

```python
``` 

```python
problem = ProblemDescription(solver,
                       env,
                       get_shortest_paths(env.distance_map),
                       max_agents_steps_allowed=(env.height + env.width) * 1.5,
                       calculation_time_limit_milliseconds=120000)
```
                   
FLATlan main simulation loop gets executed as long as the ```ProblemDescription``` has at least one replay step left to execute.      
````python
while problem.replay_not_ended():
                ...
                # execute one step based on the MIPSolver calculated replay_step  
                action_dict = problem.replay_step()
                obs, all_rewards, done, _ = env.step(action_dict)
                ...
````

To check if the problem was solved just call the ``ìs_solved()`` method:
````python
sovler.is_solved()
````

### Example code 
Just execute ```python solve_tests.py``` to see the ``MIPSolver`` in action. If you like to apply the ``MIPSolver`` to 
many pickel files then you could use the ```solve_envs.py``` script. 


### Brief overview of the implement MIP formulation 

The ```ProblemDescription``` assumes that each agents can reach the target by just walking along a fixed path without 
infinitiy blocking any other agents. 

The ```ProblemDescription``` uses the shortest path as the fixed path. See also ``get_shortest_paths`` method. 
````python
from flatland.envs.rail_env_utils import get_shortest_paths
````

The ``get_shortest_paths`` method returns a ordered list of ``WalkingElement``. Each ``WalkingElement`` has the agent's 
current 
position and direction stored. And it further contains the information of the transition form agent's current position 
to the next neighbor cell which is the next step on the shortest path. This information is stored as 
a ``RailEnvNextAction`` element. 

#### The agents must play the Flatland game according to the following rules:

(1) The agents have to be at least one time step in a resource. 

(2) An agent can only leave a cell when he immediately enters into another cell. 

(3) An agent can only leave a cell when the next cell is reachable from current cell: Thus the agent can only change 
a cell by using valid transition. The pre-computed path ensures this conditions, this rule will not be tested with 
MIP constraints.  

(4) Each Flatland cell is a resource that can only be assigned exclusively to one agent at same time.

(5) Agent can not start before first simulation step.  

(6) Agents must be at their destination before the maximal simulation steps are passed.


##### The MIP formulation of the rules can be found in the ``ProblemDescription`` class
Rule 1: ``_implement_agent_resource``
![min_travel_time_rule](./images/min_travel_time_rule.png)
Flatland supports multi-speed such that each agent can have it's own speed (``agent.speed_data['speed']``). The 
```ProblemDescription``` sets the ```minimal travel time=np.ceil(1/speed)```.

Rule 2  and rule 3 ``_implement_agent_path_edge`` by connecting the right resource chain only one equation is required 
to satisfy the rule 2 and 3.
![resource_chain_rule](./images/resource_chain_rule.png)


Rule 4: ``_implement_agents_ordering_selection``
![agent_ordering_rule](./images/agent_ordering_rule.png)
One time step is required because Flatland updates all agents at same time step. 

The variables ``V a < b`` controls the ordering of  ``Agent a`` and   ``Agent b`` at resource ``Rx``.  
If the ``V a < b`` is set to ``1`` then ``Agent a`` allocates resource ``Rx``  before ``Agent b``. 

The variables ``V b < a`` controls as well the ordering of  ``Agent a`` and   ``Agent b`` at resource ``Rx``.  
If the ``V b < a`` is set to ``1`` then ``Agent b`` allocates resource ``Rx``  before ``Agent a``. 

As the sum of ``V a < b`` and ``V b < a`` must be equal to ``1`` and as both variable only can hold ``0`` or ``1`` so 
only ``V  a < b`` or ``V b < a`` can be set to ``1``. 
This result in that only one of the first two equation will be used. So variables ``V...`` is a switching variable. 
``V...`` can enable/disable one equation. The constant valued variable ``S`` is used to make one equation 
fully satisfied without taking in account the other terms.  

Special case: If agents don't have speed = 1.0 which means the minimal travel time for one resource is longer than one 
simulation step. Otherwise one agent tries to enter the conflicting resource to early and this leads into a synchronisation 
issue where the solution is no longer a Flatland solution.    

Rule 5: ``_implement_agent_earliest_start_time`` 
![first_resource.png](./images/first_resource.png)

Rule 6: ``_implement_agent_latest_arrival_time``
![target_resource](./images/target_resource.png)

##### Special case 
If multiple agents start at the same position. The ```ProblemDescription``` assume that the agent have to be ordered along 
the agents handle (index). Have a look into ``_implement_agents_force_index_sort``.

### More information 
Example of a mixed-integer programming (MIP) with Google OR-Tools can be found following the 
[link to the how-to](https://developers.google.com/optimization/mip/integer_opt#declare-the-solver).


### Application example code 

You have to import Flatland and the ```ProblemDescription``` before you can start writting your own application. 
````python
...
from flatland.envs.rail_env import RailEnv
from flatland.envs.rail_generators import complex_rail_generator
from flatland.envs.schedule_generators import complex_schedule_generator
from flatland.utils.rendertools import RenderTool, AgentRenderVariant
from flatland.envs.rail_env_shortest_paths import get_shortest_paths
...
# ----------------------------- Flatland -----------------------------------------------------------
from solver.Google_OR_Tools.cp_sat_solver import CPSATSolver
from solver.Google_OR_Tools.mip_solver import MIPSolver
from solver.Google_OR_Tools.problem_description import ProblemDescription

````

To generate a new RailEnv with multi-speed you can use the following code snipped. The demostration application uses
the complex_rail_generator and the complex_schedule_generator as topology and Flatland problem generator. 
````python
def generate_multi_speed(env):
    max_speed_time_factor = 1
    speeds = [1.0, 0.5, 0.25, 0.125]
    for i_agent in range(env.get_num_agents()):
        speed = speeds[i_agent % 4]
        env.agents[i_agent].speed_data['speed'] = speed

        max_speed_time_factor = max(max_speed_time_factor, np.ceil(1.0 / speed))
    return max_speed_time_factor


def create_flatland_environment(number_of_agents, width, height, seed_value):
    rail_generator = complex_rail_generator(nr_start_goal=number_of_agents, nr_extra=1, min_dist=10,
                                            max_dist=120, seed=seed_value)
    schedule_generator = complex_schedule_generator()

    environment = RailEnv(width=width,
                          height=height,
                          rail_generator=rail_generator,
                          number_of_agents=number_of_agents,
                          schedule_generator=schedule_generator
                          )

    max_speed_time_factor = generate_multi_speed(environment)

    return environment, max_speed_time_factor
````

##### How to bring the ``ProblemDescription``` together with the Flatland environment:  

````python
... 

# load files
env, max_speed_time_factor = create_flatland_environment(number_of_agents, width, height, seed_value=0)

...

solver = MIPSolver() 

problem = ProblemDescription(solver,
                       env,
                       get_shortest_paths(env.distance_map),
                       max_agents_steps_allowed=(env.height + env.width) * 1.5 * max_speed_time_factor,
                       calculation_time_limit_milliseconds=360000)
````

Once you have to MIPSolver and the Flatland environment, which describes in general a Flatland problem you are ready
to replay the calculated solution. 

You might test to solve the problem defined in ```ProblemDescription``` with MIP use the ```MIPSolver```. You can alos
 use the ``CPSATSolver`` to check the satisfiability.  


Now you can just replay the calculated solution: 

````python
while problem.replay_not_ended():
    obs, all_rewards, done, _ = env.step(problem.replay_step())
    ... 
    # you might write out the replay steps or you just render one step 
    ...
````

#### Implement own solver
To implement you own solver you just need to make the ```AbstractSolver``` concrete. You have to implement the 
interface methods.

#### Results: Flatland Challenge - Round 1 results
[solve_envs_challenge_01_baseline.csv](./images/solve_envs_challenge_01_baseline.csv)

| Measure                 | Value      |
| :---------------------- | ---------: |
| Fraction of done-agents |     0.999  |
| Mean Reward             | -1023.928  | 
|  |  | 
| Feasible solution found |     999 |
| No feasible solution    | 1 |
|  |  | 
| Min calculation time   | 2ms  |
| Mean calculation time   | 116s  |
| Median calculation time   | 352s  |

*) Maximal calculation time was set to 3600.

##### Only one scenario can not be solved without alternative paths
The scenario/level ```./Envs/Test_9/Level_92.pkl``` can not be solved without using alternative to shortest path. The 
solver 
answers 
with 
```No solution exists. MPSolverInterface::result_status_ = MPSOLVER_INFEASIBLE``` solution found. 

##### Number of Agents vs. Calculation Time
The scaling of the x-axis (calculation time) is chosen logarithmically. The y-axis shows the number of agents. 
The poor scaling behavior of the current ```MIPSolver``` can be get out of the chart. (The number of agents determine 
the expected calculation time very well by using the formula ```calc_time ~ x ** num_agents```) or in words 
exponential increase. 


![Number of Agents versus Calculation Time](./images/results_MIP_challenge_01.png)


The whole program can be found in ````solve_test.py````

Author: Adrian Egli