There was a recent question on the OR-Tools forum that asked how to minimize a passenger’s in-vehicle travel time. I’ve done this before, but it has been a while, so this post summarizes what I do in similar problems.

## The issue

To start with a minimal example, I copied the program
`vrp_drop_nodes.py`

from the OR-Tools repository, trimming it down to
just two nodes. Everything is the same in my version, except for three
small changes. First, the data only contains the following distance
matrix:

```
data = {}
data["distance_matrix"] = [
# fmt: off
[0, 548, 776],
[548, 0, 684],
[776, 684, 0],
# fmt: on
]
```

Second, the number of vehicles is decreased to 2, to allow the case of each passenger getting served by their own driver. Finally, the disjunction penalty was bumped up a bit:

```
# Allow to drop nodes.
penalty = 100000
for node in range(1, len(data["distance_matrix"])):
routing.AddDisjunction([manager.NodeToIndex(node)], penalty)
```

Running this program exhibits exactly the behavior the original question wanted to avoid:

```
Objective: 2008
Dropped nodes:
Route for vehicle 0:
0 Load(0) -> 1 Load(1) -> 2 Load(2) -> 0 Load(2)
Distance of the route: 2008m
Load of the route: 2
```

There are some things worth staring at with this solution.

First, the order of nodes is irrelevant. The distance matrix is symmetrical, so it makes no difference to the objective if the vehicle travels first to node 1, then to node 2, then back to the depot (the reported solution) or to node 2, then to node 1, then back to the depot. A trip from [0, 2, 1, 0] has a total distance of 2008m, just like a trip from [0, 1, 2, 0].

Furthermore, using one vehicle is a better solution than using two. If two vehicles are used, the total distance is much greater: vehicle 1 would travel [0, 1, 0] (distance 1,096m) and vehicle 2 would travel from [0, 2, 0] (distance 1,552).

However, from the point of view of the passengers on board the vehicles, the best option is to use two vehicles. With two vehicles, the passenger at 1 is only in a vehicle for a distance of 548, and the passenger at 2 is only in a vehicle for a distance of 776.

For passengers, the second best option is to choose the sequence [0, 2, 1, 0] as the preferred path. In this case the passenger at 2 travels 684 + 548 = 1232, and the passenger at 1 travels just 548 in vehicle, for a total of 1780.

If the sequence [0, 1, 2, 0] is chosen, then passenger 1 travels 684 + 776 = 1460, while the passenger at 2 travels 776, for a total of 2236. So the solution chosen by the solver is in fact the worst solution possible from the point of view of the travelers in the vehicle. The question is how to insert this knowledge into the solver.

## Easiest approach: hard limits

The difficulty with telling the solver to focus on the passenger experience is that the main knob that one uses to tell the routing solver what to do is a dimension. A dimension is simply an accumulator at each location. Those accumulators are passed the current node, the prior node, and the current value of the accumulator on the vehicle when it arrives at the node. And that is it.

From the docs,

```
if j == next(i),
cumuls(j) = cumuls(i) + transits(i) + slacks(i) + state_dependent_transits(i)
```

That is, the accumulated value at `j`

is equal to the value at `i`

plus the transit cost from `i`

to `j`

(there is a typo in the docs
there) plus the slack value at `i`

(how much extra was needed at `i`

).

(In order for the solver to be fast, one is not allowed to do anything
with the `state_dependent_transits`

so just ignore that forever.)

Using this, my very first approach to addressing the per-passenger
needs was to set hard constraints. I know the best effort travel
distance from all nodes to the depot. That is just the straight line
distance in the matrix. I also know the value of the dimension in the
vehicle when the passenger is picked up, and the value at the end of
the trip. Thus the distance for a passenger at some node `n`

is
simply the total distance of the vehicle minus the distance traveled
when the passenger is picked up, or

```
elapsed_distance_n = final_distance[vehicle] - node_n_distance
```

My rule was that this final distance was not allowed to be worse than double the straight line travel time, with a maximum extra travel time of 45 minutes. So a 5 minute trip could be 10 minutes, but a 50 minute trip could be at most 95 minutes.

For this example, converting to distances only, my rule is that the total distance cannot be more than twice the straight line distance.

To make this rule happen, first I need to create the distance dimension.

```
# create a distance dimension
routing.AddDimension(
transit_callback_index,
0, # null capacity slack
9999999, # vehicle maximum capacities
True, # start cumul to zero
"Distance",
)
distance_dimension = routing.GetDimensionOrDie("Distance")
```

Next, I used the dimension in a loop over all vehicles and all nodes.

```
# Set maximum distance per traveler
solver = routing.solver()
for vehicle_id in range(data["num_vehicles"]):
end_index = routing.End(vehicle_id)
total_route_distance = distance_dimension.CumulVar(end_index)
for node in range(1, len(data["distance_matrix"])):
index = manager.NodeToIndex(node)
pickup_distance = distance_dimension.CumulVar(index)
travel_distance = total_route_distance - pickup_distance
# constrain travel distance to be at most twice the
# straight line distance
max_allowed = 2 * data["distance_matrix"][node][data["depot"]]
solver.Add(travel_distance <= max_allowed)
```

As before, the objective is still 2008, but now the solver chooses to go to node 2 first:

```
Objective: 2008
Dropped nodes:
Route for vehicle 0:
0 Load(0) -> 2 Load(1) -> 1 Load(2) -> 0 Load(2)
Distance of the route: 2008m
Load of the route: 2
```

So mission accomplished, apparently.

## A subtle mistake in the previous section

There is actually a mistake in the previous section’s constraint. The problem is that the constraint is applied for each vehicle and each node, regardless of whether the vehicle is being used for the node or not.

This is apparent when I try to achieve the “traveler optimal” solution of one vehicle per traveler.

To do that, it would appear that all I have to do is to require that the maximum allowed distance is equal to the straight line distance from the node to the depot, as follows:

```
solver = routing.solver()
for vehicle_id in range(data["num_vehicles"]):
end_index = routing.End(vehicle_id)
total_route_distance = distance_dimension.CumulVar(end_index)
for node in range(1, len(data["distance_matrix"])):
index = manager.NodeToIndex(node)
pickup_distance = distance_dimension.CumulVar(index)
travel_distance = total_route_distance - pickup_distance
# constrain travel distance
max_allowed = int(1 * data["distance_matrix"][node][data["depot"]])
solver.Add(travel_distance <= max_allowed)
```

This results in node 2 being dropped no matter how big the disjunction value is:

```
Objective: 10001096
Dropped nodes: 2
Route for vehicle 0:
0 Load(0) -> 1 Load(1) -> 0 Load(1)
Distance of the route: 1096m
Load of the route: 1
Route for vehicle 1:
0 Load(0) -> 0 Load(0)
Distance of the route: 0m
Load of the route: 0
```

This situation persists even if you bump up the limit from 1 to 1.1, 1.2, even 1.5.

The problem is the constraints are being applied to both vehicles, not just the vehicle that visits the node.

Furthermore, if
the first vehicle visits node 1, and the second vehicle is attempted
to visit node 2, the constraint on the dimension says that final
distance of *both* vehicles must work for node 2. So the first vehicle’s final
value is 1096 (out and back to node 1). That value is used to create
a `travel_distance`

of 1096 - 776 = 320, and then the constraint
requires that this value must be greater than or equal to the value of
the dimension at node 2. Clearly the only way for this to be true is
if the node is skipped.

(Note also that if a node is not visited, the value of the dimension at that node is not well defined. It should be zero ideally, but in reality there is absolutely no guarantee what it is.)

I get around this by checking if the node is visited by the vehicle in question, as follows:

```
vehicle_active = routing.VehicleVar(index) == vehicle_id
```

Here I create a boolean variable `vehicle_active`

. Unlike the
`CumulVar`

at an unvisited node, the `VehicleVar`

is guaranteed to
have a known value. It will be
-1 if the node is not visited, and the vehicle number that is used if
it is visited. So the expression `routing.VehicleVar(index) == vehicle_id`

evaluates to `false`

if the node is not visited, or if the
vehicle in question is not visiting the node. It evaluates to `true`

only if the node is visited by the current vehicle.

This boolean is useful because it evaluates to 0 or 1, and can multiply other variables. With this trick, the constraint can be rewritten as

```
solver.Add(vehicle_active * travel_distance <= max_allowed)
```

In this case, if the vehicle in question does not visit the node, then
the expression `vehicle_active * travel_distance`

is zero. It is
always the case that 0 will be less than or equal to the “max
allowed”, so irrelevant vehicles are no longer constraining nodes they
do not visit.

The full block is below:

```
# Set maximum distance per traveler
solver = routing.solver()
for vehicle_id in range(data["num_vehicles"]):
end_index = routing.End(vehicle_id)
total_route_distance = distance_dimension.CumulVar(end_index)
for node in range(1, len(data["distance_matrix"])):
index = manager.NodeToIndex(node)
pickup_distance = distance_dimension.CumulVar(index)
travel_distance = total_route_distance - pickup_distance
# constrain travel distance
max_allowed = int(1 * data["distance_matrix"][node][data["depot"]])
vehicle_active = routing.VehicleVar(index) == vehicle_id
solver.Add(vehicle_active * travel_distance <= max_allowed)
```

And the results are what I want:

```
Objective: 2648
Dropped nodes:
Route for vehicle 0:
0 Load(0) -> 1 Load(1) -> 0 Load(1)
Distance of the route: 1096m
Load of the route: 1
Route for vehicle 1:
0 Load(0) -> 2 Load(1) -> 0 Load(1)
Distance of the route: 1552m
Load of the route: 1
```

## Use soft constraints instead

While the previous solution works, it also suffers from the fact that it is inflexible. The hard limit of 2 or 1.5 or whatever is fixed before the solver runs. It would be much nicer to say that the goal is to achieve a straight line distance, but the solver is allowed to do worse if more nodes can be served.

To do this, I want to make use of the soft upper bound API calls that are available in the routing solver. However, they are only available for dimension values at nodes. It is not possible to pick some arbitrary quantity, add a soft upper bound, and add it to the objective.

This means that we have to do some more tinkering with the formulation to allow the use of the soft upper bound API.

### Make it a pickup and delivery problem

My change is to convert the problem from a regular vehicle routing problem to a pickup and delivery problem. Each passenger is picked up at their home node, and delivered to their destination node. In the original case, the “destinations” here are all the same…the depot node. But since no node can be repeated in a routing problem, I have to make one dummy destination node for each pickup node.

First up, a small ugly hack to wedge in the dumm delivery nodes. Because I was lazy and did not know whether I would need to lookup the “real” node, or just look up the paired delivery node, I made both.

```
data["distance_matrix"] = [
# fmt: off
[0, 548, 776],
[548, 0, 684],
[776, 684, 0],
# fmt: on
]
# simple hack to create delivery nodes
data["delivery_nodes"] = {}
data["pickup_delivery"] = {}
num_nodes = len(data["distance_matrix"])
first_delivery_node = num_nodes
for node in range(1, len(data["distance_matrix"])):
paired_delivery_node = first_delivery_node + (node - 1)
data["delivery_nodes"][paired_delivery_node] = 0
data["pickup_delivery"][node] = paired_delivery_node
num_nodes += 1
```

There is some fooling around with index numbers and so on to compute the paired delivery node number, as the node count starts at 1 not zero. The delivery nodes track the “real” node that the dummy delivery node references. In this case the target is always the depot, or node 0. The pickup delivery entry in the data dictionary captures each of the pickup and delivery pairs as a key, value pair.

Next for simplicity’s sake, I decided to expand the distance matrix with the dummy delivery nodes. I could also modify the lookup code to translate a dummy node into a real node, but by doing it this way I have less existing code to modify.

```
for from_node in range(num_nodes):
if from_node in data["delivery_nodes"]:
# initialize the new array
data["distance_matrix"].append([])
for to_node in range(num_nodes):
if to_node in data["delivery_nodes"]:
data["distance_matrix"][from_node].append(
data["distance_matrix"][from_node][data["delivery_nodes"][to_node]]
)
if from_node in data["delivery_nodes"]:
data["distance_matrix"][from_node].append(
data["distance_matrix"][data["delivery_nodes"][from_node]][to_node]
)
```

This nested loop is pretty simple, and works under the assumption that
all of the new delivery nodes are in order. If the next node in the
iteration is in the delivery set, then it is new to the distance
matrix, and I have to `append`

the correct value. That value is based
on the travel distance *to* or *from* the depot, whichever is relevant.

With the delivery destinations for each node in hand, the next step is to modify the maximum per-passenger distance constraint. Instead of the buggy use of the final distance value of each vehicle, we can instead use the distance value of the delivery node to compute the total in-vehicle distance experienced by each passenger.

```
solver = routing.solver()
for node, delivery_node in data["pickup_delivery"].items():
index = manager.NodeToIndex(node)
delivery_index = manager.NodeToIndex(delivery_node)
pickup_distance = distance_dimension.CumulVar(index)
deliver_distance = distance_dimension.CumulVar(delivery_index)
travel_distance = deliver_distance - pickup_distance
# constrain travel distance
max_allowed = int(1 * data["distance_matrix"][node][data["depot"]])
solver.Add(travel_distance <= max_allowed)
```

Note that I am still using the hard constraint here. I’ll get to the soft upper bound usage later on.

Because we now have in hand the actual destination node for the passenger, there is no need to repeat the loop for each vehicle. Instead, each maximum distance constraint is applied just once per passenger pickup node. I no longer need to use the boolean hack about whether the vehicle serves the node or not.

Finally, the usual boilerplate code to create the pickup and delivery node pairs must be inserted as well.

```
# [START pickup_delivery_constraint]
for pickup_node, delivery_node in data["pickup_delivery"].items():
pickup_index = manager.NodeToIndex(pickup_node)
delivery_index = manager.NodeToIndex(delivery_node)
routing.AddPickupAndDelivery(pickup_index, delivery_index)
# make sure both are on the same vehicle, or inactive together (veh = -1)
routing.solver().Add(
routing.VehicleVar(pickup_index) == routing.VehicleVar(delivery_index)
)
# precedence constraint: make sure the pickup happens before the delivery
routing.solver().Add(
distance_dimension.CumulVar(pickup_index)
<= distance_dimension.CumulVar(delivery_index)
)
# [END pickup_delivery_constraint]
```

This code is critical to tell the solver that the pickup node and delivery node must be rotated in and out of a route as a pair, not as singletons.

With that, the code is modified and can run fine. The result is below.

```
Dropped nodes:
Route for vehicle 0:
0 Load(0) -> 2 Load(1) -> 4 Load(0) -> 1 Load(1) -> 3 Load(0) -> 0 Load(0)
Distance of the route: 2648m
Load of the route: 0
Route for vehicle 1:
0 Load(0) -> 0 Load(0)
Distance of the route: 0m
Load of the route: 0
Total Distance of all routes: 2648m
```

Note that this run is *still* using the hard constraint. In this
first run, it is:

```
max_allowed = int(1 * data["distance_matrix"][node][data["depot"]])
```

This result is different than the prior run, but it makes sense. Instead of using both vehicles, just one vehicle is used. However each passenger is picked up and delivered directly to the destination, in order to obey the constraint that the travel distance must equal the straight distance to the depot.

If the constraint is changed to twice:

```
max_allowed = int(2 * data["distance_matrix"][node][data["depot"]])
```

then we get two pickups before any deliveries, and a much shorter total route:

```
Route for vehicle 0:
0 Load(0) -> 2 Load(1) -> 1 Load(2) -> 4 Load(1) -> 3 Load(0) -> 0 Load(0)
Distance of the route: 2008m
```

So we’ve confirmed the hard constraint is doing the right thing in the pickup and delivery formulation. However, the whole point of adding pickup and delivery nodes is to try to use the soft upper bound API.

### Use a soft upper bound

The API signature of the soft upper bound call is as follows:

```
class RoutingDimension {
public:
...
void SetCumulVarSoftUpperBound(int64_t index,
int64_t upper_bound,
int64_t coefficient);
```

What I have been doing so far is to set the constraint as a variable,
based on the traveled distance between the pickup node and the delivery
node. This requires using `IntVar`

values of the dimension as the
pickup and delivery nodes.

In this case, the soft upper bound API does not allow calling with an
`IntVar`

as an upper bound. Instead it needs a integer. Rather than
messing around with trying to create some clever proxy, it is easier
just to set the upper bound at 0, or perhaps at the straight-line
distance from the depot to the pickup node. In either case, it is
impossible for the solver to meet this goal, so it will try its best
to minimize every passenger’s total trip.

```
# Set maximum distance per traveler
solver = routing.solver()
for node, delivery_node in data["pickup_delivery"].items():
index = manager.NodeToIndex(node)
delivery_index = manager.NodeToIndex(delivery_node)
# this is an IntVar, as it is variable within the solver run
pickup_distance = distance_dimension.CumulVar(index)
# constrain travel distance using soft upper bound
# old way was to compute difference from pickup to delivery.
max_allowed = int(1 * data["distance_matrix"][node][data["depot"]])
# not relevant to soft upper bound call as pickup_distance is an IntVar
target_deliver_distance = pickup_distance + max_allowed
# Cannot apply a soft upper bound with that IntVar target distance!
# distance_dimension.SetCumulVarSoftUpperBound(delivery_index, target_deliver_distance, 1)
# instead try just setting to 0
distance_dimension.SetCumulVarSoftUpperBound(delivery_index, 0, 1)
# alternately, try using the depot to pickup distance
# distance_dimension.SetCumulVarSoftUpperBound(delivery_index, max_allowed, 1)
```

The result is the following:

```
Objective: 5296
Dropped nodes:
Route for vehicle 0:
0 Load(0) -> 1 Load(1) -> 3 Load(0) -> 0 Load(0)
Distance of the route: 1096m
Load of the route: 0
Route for vehicle 1:
0 Load(0) -> 2 Load(1) -> 4 Load(0) -> 0 Load(0)
Distance of the route: 1552m
Load of the route: 0
Total Distance of all routes: 2648m
```

This time, two vehicles are used, not just one. Why? Because I’ve told the solver to minimize the distance dimension value at every delivery node. If just one vehicle is used, then the distance dimension for the second delivery will be very high, even though the passenger only experiences a direct trip from pickup to delivery. With two vehicles used, the absolute value of the route at the point of the second delivery is lower, so it is the preferred solution.

So that begs the question whether the new goal is really doing the right thing. I think it is not.

## Analysis of the current solution

The soft upper bound call seems to be doing the right thing, but it is
probably not quite right. As pointed out in the code comments and the
discussion above, the ideal solution would be to set a soft upper
bound on the IntVar that results from knowledge of the *pickup* value.
If a passenger is picked up at total route distance of 1000, then that
passenger *does not see* that initial 1000 value that the vehicle took
trying to get to the passenger.

Further, if the soft upper bound is set to 0, then every delivery node at the end of a circuit will have the same value, and therefore the same penalty in the objective function. That implies that the solver can swap pickup order in a circuit and end up with the same result.

I tried this with 12 nodes, pulling more nodes from the original
`vrp_drop_nodes.py`

code. The result is:

```
Objective: 27280
Dropped nodes:
Route for vehicle 0:
0 Load(0) Distance(0) ->
8 Load(8) Distance(308) ->
5 Load(11) Distance(502) ->
9 Load(12) Distance(742) ->
21 Load(11) Distance(936) ->
20 Load(3) Distance(936) ->
17 Load(0) Distance(936) ->
6 Load(6) Distance(1438) ->
2 Load(7) Distance(1712) ->
10 Load(9) Distance(2112) ->
14 Load(8) Distance(2648) ->
18 Load(2) Distance(2648) ->
22 Load(0) Distance(2648) ->
0 Load(0)
Distance of the route: 2648m
Load of the route: 0
Route for vehicle 1:
0 Load(0) Distance(0) ->
12 Load(2) Distance(388) ->
11 Load(3) Distance(502) ->
3 Load(6) Distance(1016) ->
4 Load(12) Distance(1130) ->
1 Load(13) Distance(1324) ->
13 Load(12) Distance(1872) ->
15 Load(9) Distance(1872) ->
23 Load(8) Distance(1872) ->
24 Load(6) Distance(1872) ->
16 Load(0) Distance(1872) ->
7 Load(8) Distance(2066) ->
19 Load(0) Distance(2260) ->
0 Load(0)
Distance of the route: 2260m
Load of the route: 0
Total Distance of all routes: 4908m
```

This exact routing result is obtained for a soft upper bound of 0, or
a soft upper bound of the `max_allowed`

, being the straight line
distance from the depot to the pickup node. However, with the
`max_allowed`

soft upper bound, the objective function is
`21780`

, not `27280`

.

My feeling is that there may be certain problems that can be arranged to “defeat” this approach, resulting in a sub-optimal arrangement of pickups from the point of view of passenger travel (total in-vehicle distance or time). But I’m also pretty confident that in most cases, the result will be good enough.

In practice, the soft upper bound is likely to be based on a pickup time window. That is, passengers or goods are usually scheduled within an allowed time window. These values are integers, and are known before the solver run starts. By using the start of the pickup time window as the soft upper bound of the delivery time, it is more likely that the solver will consistently do the right thing.

If I was deploying this in real life, I would *also* include the hard
constraint on the absolute maximum in-vehicle distance or time allowed
for a trip from pickup to delivery. In many cases, for example for
paratransit services, that maximum value is statutory. In other
cases, it just makes good business sense. It is always comforting to
know that no matter what, the solver will make sure that in-vehicle
trips will never be greater than some known value.