Using PostgreSQL and friends for a street sweeping solver project

James Marca, Activimetrics LLC

October 21, 2020

Metadata

Focus

Case study
Practical examples, including
- Spatial processing
- “Recursion”
- Window functions
Some theory

My background

Started programming in 1980: Apple ][ Basic
BS 1989 HMC
Systems engineer
MS 1994 UC Irvine
Consulting firm in Boston 3 years
PhD 2002 UC Irvine

My PostgreSQL usage

Started using PostgreSQL in 2001 or so
- Needed to store GPS points for research
- MySQL geo support was lacking
- PostGIS was good enough

My PostgreSQL usage

Started using PostgreSQL in 2001 or so
- Needed to store GPS points for research
- MySQL geo support was lacking
- PostGIS was good enough
GPS → Perl → PostgreSQL/PostGIS

My go-to geo-enabled datastore

PostgreSQL/PostGIS ⇔ R spatial analysis
PostgreSQL/PostGIS ⇔ mod_perl activity survey website
PostgreSQL/PostGIS ⇔ Java accident risk website
PostgreSQL/PostGIS ⇔ node.js truck activity estimator
PostgreSQL/PostGIS ⇔ … lots of things …

Street sweeping

Given:
- a network of streets
- a fleet of vehicles
Make sure every curb is swept
Edge covering problem
Similar problems are trash collection, snow plowing

What do we need?

Solution elements

Geographic data–exact details of each street ✓
A good solver that can tackle NP-Hard problems ✓

Solution elements

Geographic data–exact details of each street ✓
A good solver that can tackle NP-Hard problems ✓

Solution elements

Geographic data–exact details of each street ✓
A good solver that can tackle NP-Hard problems ✓

Tasks for PostgreSQL

Store the data
Clean the data
Process the data to load into solver
Save the solver output
Process the solver output

PostgreSQL is data glue

The solver

What is it?
What does it do?
What does it need?

Google’s OR Tools

https://developers.google.com/optimization/
Open source operations research platform
Several different, related solvers and algorithms

Google’s OR Tools

https://developers.google.com/optimization/
Open source operations research platform
Several different, related solvers and algorithms
We are using the routing solver to solve a routing problem

Routing problems

Find the best path

Routing problems

Find the best path
for one or more vehicles

Routing problems

Find the best path
for one or more vehicles
to visit every node in a network

Routing problems

Find the best path
for one or more vehicles
to visit every node in a network
but the vehicles have a fixed capacity

Routing problems

Find the best path
for one or more vehicles
to visit every node in a network
but the vehicles have a fixed capacity
and need to head back to the depot to unload

Routing problems

Find the best path
for one or more vehicles
to visit every node in a network
but the vehicles have a fixed capacity
and need to head back to the depot to unload
and make sure that every vehicle has the about the same work
oh and can you try to minimize the number of vehicles that are needed too?
also the drivers have an eight hour shift, or they cost time and a half, so factor that in, as well as their contractual working hours and break times
(15 minutes in the morning and afternoon,
plus one hour lunch between 11 and 2)

Routing problems

OR Tools is designed around Constraints

Based on Constraint Programming
Flexible specification of linear constraints
Adding constraints can sometimes even improve solver runtime

The solver needs:

A list of nodes to visit + attributes
An all-to-all travel matrix
- distance and/or time
A list of vehicles + attributes
A list of vehicle depot locations
Plus details needed for other constraints

The solver needs:

A list of nodes to visit + attributes
An all-to-all travel matrix
- distance and/or time

The data

Where does it come from?
What does it contain?

OpenStreetMap

https://www.openstreetmap.org
Global, free, crowd-sourced geographic data
Quality varies country to country
Mistakes can be fixed
Generally excellent

Nodes, Ways, and Relations

Nodes are points in space
- Intersections
- Locations of interest

Nodes, Ways, and Relations

Nodes are points in space
Ways are lines
- Linear features
- Roads between nodes
- Area boundaries

Nodes, Ways, and Relations

Nodes are points in space
Ways are lines
Relations group nodes and ways
- Collect ways for a city boundary
- Define a highway or a bus route

Tags provide semantic meaning

Each tag describes a geographic attribute of the feature being shown by that specific node, way or relation.

Tags are freeform, “key=value”
No fixed standard, but approved conventions
- highway=residential
- bridge=movable + bridge:movable=drawbridge
- barrier=lift gate
Conventions evolve over time

Example node

Node: 7097377424
Version #1

added service roads.
Edited 9 months ago by pandypr · Changeset #79039579
Location: 33.6756646, -117.9193706
Part of
2 ways

    Harbor Boulevard (401292457)
    759766208

Example way

Way: 759766208
Version #1

added service roads.
Edited 9 months ago by pandypr · Changeset #79039579
Tags
highway     service
Nodes
8 nodes

    7097377424 (part of way Harbor Boulevard (401292457))
    7097377425
    7097377426
    7097377427
    7097377428
    7097377429
    7097377430
    7097377431 (part of way Ponderosa Street (158607554))

Example relation

Relation: Harbor Boulevard (2484073)
Version #147

ref
Edited about 1 month ago by Fluffy89502 · Changeset #90212661
Tags
name    Harbor Boulevard
route   road
type    route
Members
▶ 700 members

Nodes hold location information

Only nodes have latitude, longitude
Ways are made up of connected nodes
Relations collect ways and nodes

How is street sweeping a routing problem?

Routing problem inputs

A list of locations that must be reached
A cost matrix for movement between all location pairs

Street Sweeping as a routing problem

Sweep a street = Visit a location
Use network to determine travel times from street to street
Streets must be swept during a time window
Sweepers “pickup” debris along curbs
Sweepers “unload” their hoppers

Street Sweeping as a routing problem

Sweep a street = Visit a location
Use network to determine travel times from street to street
Streets must be swept during a time window
Sweepers “pickup” debris along curbs
Sweepers “unload” their hoppers
Pickup and delivery problem with time windows (PDPTW)

How to get the locations to visit

Sweep a street = Visit a location

How to get the locations to visit

Sweep a street = Visit a location

How to get the locations to visit

Sweep a street = Visit a location
Transform Network into its LineGraph dual

Strategy to get data into the solver

OpenStreetMap → street network → LineGraph → OR Tools
OpenStreetMap → street network → Travel Times → OR Tools

PostgreSQL and “friends”

PostGIS

https://postgis.net/

a spatial database extender for PostgreSQL. It adds support for geographic objects allowing location queries to be run in SQL

pgRouting

https://pgrouting.org/

extends the PostGIS / PostgreSQL geospatial database to provide geospatial routing functionality

Load OSM data

Quick highlights of grabbing OSM data

Don’t download the whole world; just get a region
Use a download service (like https://download.geofabrik.de/)
Navigate to region of interest to download a PBF file
For example, Southern California URL

https://download.geofabrik.de/north-america/us/california/socal-latest.osm.pbf

Limit data to just a city using its boundary

Use osmium https://osmcode.org/
The current best of breed for manipulating OSM data

osmium extract \
    -p glendale-poly.osm \
    -o glendale-latest.osm \
    socal-latest.osm.pbf

Load the data into pgRouting tables

osm2pgrouting \
     --f data/glendale-latest.osm \
     --conf data/map_config_streets.xml \
     --dbname glendale \
     --prefix 'glendale_' \
     --username dbuser \
     --clean

`osm2pgrouting` config file



  
  
    
    
    
    <tag_value name="trunk"             id="104" priority="1.05" maxspeed="50" />
    <tag_value name="trunk_link"        id="105" priority="1.05" maxspeed="50" />
    <tag_value name="primary"           id="106" priority="1.15" maxspeed="50" />
    <tag_value name="primary_link"      id="107" priority="1.15" maxspeed="50" />
    <tag_value name="secondary"         id="108" priority="1.5" maxspeed="50" />
    <tag_value name="secondary_link"    id="109" priority="1.5" maxspeed="50" />
    <tag_value name="tertiary"          id="110" priority="1.75" maxspeed="50" />
    <tag_value name="tertiary_link"     id="111" priority="1.75" maxspeed="50" />
    <tag_value name="residential"       id="112" priority="2.5" maxspeed="50" />
    <tag_value name="living_street"     id="113" priority="3" maxspeed="50" />
    
    <tag_value name="unclassified"      id="117" priority="3" maxspeed="50"/>
    <tag_value name="road"              id="100" priority="5" maxspeed="50" />

Generated ways table

mydb=> \d glendale_ways
    Table "public.glendale_ways"
      Column       |           Type
-------------------+---------------------------
 gid               | bigint
 osm_id            | bigint
 tag_id            | integer
 length            | double precision
 length_m          | double precision
 name              | text
 source            | bigint
 target            | bigint
 source_osm        | bigint
 target_osm        | bigint
 cost              | double precision
 reverse_cost      | double precision
 cost_s            | double precision
 reverse_cost_s    | double precision
 rule              | text
 one_way           | integer
 oneway            | text
 x1                | double precision
 y1                | double precision
 x2                | double precision
 y2                | double precision
 maxspeed_forward  | double precision
 maxspeed_backward | double precision
 priority          | double precision
 the_geom          | geometry(LineString,4326)
Indexes:
    "glendale_ways_pkey" PRIMARY KEY, btree (gid)
    "glendale_ways_the_geom_idx" gist (the_geom)
Foreign-key constraints:
    "glendale_ways_source_fkey" FOREIGN KEY (source) REFERENCES glendale_ways_vertices_pgr(id)
    "glendale_ways_source_osm_fkey" FOREIGN KEY (source_osm) REFERENCES glendale_ways_vertices_pgr(osm_id)
    "glendale_ways_tag_id_fkey" FOREIGN KEY (tag_id) REFERENCES configuration(tag_id)
    "glendale_ways_target_fkey" FOREIGN KEY (target) REFERENCES glendale_ways_vertices_pgr(id)
    "glendale_ways_target_osm_fkey" FOREIGN KEY (target_osm) REFERENCES glendale_ways_vertices_pgr(osm_id)

Sample data from ways table

 length_m | source | target | cost_s | reverse_cost_s | oneway
----------+--------+--------+--------+----------------+---------
    39.63 |   7194 |   1191 |   2.85 |          -2.85 | YES
    48.42 |  15404 |      1 |   3.49 |           3.49 | UNKNOWN
   124.38 |   1195 |     46 |   8.96 |          -8.96 | YES
   211.32 |   6794 |      2 |  15.21 |         -15.21 | YES
    59.75 |   5898 |      3 |   4.30 |          -4.30 | YES
   267.77 |   7762 |      5 |  19.28 |         -19.28 | YES
   147.70 |  11969 |      6 |  10.63 |          10.63 | UNKNOWN
    11.95 |   9845 |      8 |   0.86 |           0.86 | NO
    61.69 |  11183 |      9 |   4.44 |           4.44 | NO
    63.39 |   9838 |     10 |   4.56 |           4.56 | NO
(10 rows)

39.63m at 50km/hr takes 2.85s ✔

What have we accomplished?

Strategy to get data into the solver

OpenStreetMap → street network → LineGraph → OR Tools
OpenStreetMap → street network → Travel Times → OR Tools

Strategy to get data into the solver

OpenStreetMap → street network → LineGraph → OR Tools
OpenStreetMap → street network → Travel Times → OR Tools

Clean OSM data

(The hardest part)

Too many roadway segments

OSM is designed for many things
Some street segments are extraneous for our purposes
Example: intersections for service roads and driveways
Extra street segments make N larger,
adding complexity to solver for no reason

Objective: Combine segments per block

Goal is one segment per block
Need to examine each segment
- Is it an isolated middle segment?
- Can it be linked to another middle segment?
Solution is to recursively connect a segment with its neighbors

WITH RECURSIVE queries

https://www.postgresql.org/docs/13/queries-with.html
WITH statements AKA Common Table Expressions
- Great just to organize long SQL
WITH RECURSIVE statements
- Allows a WITH query to refer to its own output
- Can repeatedly use PostGIS functions to merge all segments for a block

WITH RECURSIVE refresher

WITH RECURSIVE blahquery(arg1, arg2) AS (
  -- non-recursive term --
  select foo, bar from baz
  UNION -- (or UNION ALL) --
  -- recursive term --
  select ... from blahquery ...

WITH RECURSIVE refresher


WITH RECURSIVE t(n) AS (
    VALUES (1)
  UNION ALL
    SELECT n+1 FROM t WHERE n < 10
)
SELECT sum(n) FROM t;

WITH RECURSIVE refresher


WITH RECURSIVE t(n) AS (
    VALUES (1)
  UNION ALL
    SELECT n+1 FROM t WHERE n < 10
)
SELECT sum(n) FROM t;
 sum
-----
  55
(1 row)

WITH RECURSIVE refresher


WITH RECURSIVE t(n) AS (
    VALUES (1)
  UNION ALL
    SELECT n+1 FROM t WHERE n < 10
)
SELECT n FROM t;
 n
----
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
(10 rows)

WITH RECURSIVE refresher


WITH RECURSIVE t(n) AS (
    VALUES (1)
  UNION ALL
    SELECT n+1 FROM t WHERE n < 10
)
SELECT sum(n) FROM t;
 sum
-----
  55
(1 row)

Segment data


  id  |         name          | source | target | one_way | cost_s | rev_cost_s
------+-----------------------+--------+--------+---------+--------+------------
  433 | Western Avenue        |      1 |    303 |       0 |   1.30 |       1.30
 4725 | Glenoaks Boulevard    |      1 |   4061 |       1 |   2.58 |      -2.58
  299 | Geneva Street         |      2 |    216 |       0 |  26.01 |      26.01
 1735 | Glenoaks Boulevard    |      2 |   1267 |       0 |   8.40 |       8.40

Counting sources and targets

Characteristics of “interior” segments

Each segment has source and target
Interior segment source is seen once
Interior segment target is seen once
So count all sources, all targets

Counts of source, target


WITH RECURSIVE
sources(source,count) as (
   select source,count(*) as count
   from glendale_ways group by source
   ),
targets(target,count) as (
   select target,count(*) as count
   from glendale_ways group by target
   )

Potential interior segments

Any record with target count=1 and source count=1


possible_interiors as (
   select w.*,s.count as scount, t.count as tcount
   from glendale_ways w
   join targets t on (w.target=t.target)
   join sources s on (w.source=s.source)
   where t.count=1 and s.count=1
   ),

“True” interiors

Possible interiors
AND
source is a target of another possible interior segment
AND
target node is a source of another possible interior segment

Interiors table


interiors as (
   select pi.*
   from possible_interiors pi
   join sources s on (pi.target=s.source)
   join targets t on (pi.source=t.target)
   where s.count=1 and t.count=1
   )

Visual explanation using maps

Sequence starts and ends

“Starts” are segments with
- target node is unique (count of 1)
- target is the start node of an interior segment
- source node is not unique (node is source for lots of segments)
“Ends” are segments with
- source node is unique (count of 1)
- source is the end node of an interior segment
- target is not unique (intersection)

Possible starts and starts


possible_starts as (
   select w.*, s.count as scount, t.count as tcount
   from glendale_ways w
   join targets t on (w.target=t.target)
   join sources s on (w.source=s.source)
   where t.count = 1 -- link is only one touching target
   )
starts as (
   select ps.*
   from possible_starts ps
   where ps.scount > 1
   )

Possible starts (blue), then starts (green)

Start segments initial rule is too narrow

Start segments rule: target node count=1, source node count>1

A source is a target

Some sources are also targets
Flow direction is not uniform


   union
   select ps.*
   from possible_starts ps
   join targets t on (ps.source=t.target)
   where ps.scount=1 and  t.count>1
   -- more than one link target == ps.source

Names change, one path

Look at possible interiors to identify a name change


   union
   select ps.*
   from possible_starts ps
   join possible_interiors pi on (ps.source=pi.target)
   where ps.name != pi.name and ps.scount=1 and pi.tcount=1
   -- name change of road

Singletons

Like name change, but not in possible_interiors set


   union
   select ps.*
   from possible_starts ps
   join possible_starts psi on (ps.source = psi.target
                                and ps.name != psi.name)
   -- singletons

(Data is always messier than one would expect)

The full starts query

possible_starts as (
   select w.*, s.count as scount, t.count as tcount
   from glendale_ways w
   join targets t on (w.target=t.target)
   join sources s on (w.source=s.source)
   where t.count = 1 -- link is only one touching target
   ),
starts as (
   select ps.*
   from possible_starts ps
   where ps.scount >1
  union   -- more than one link target == ps.source
   select ps.*
   from possible_starts ps
   join targets t on (ps.source=t.target)
   where ps.scount =1 and  t.count>1
  union   -- name change of road
   select ps.*
   from possible_starts ps
   join possible_interiors pi on (ps.source=pi.target)
   where ps.name != pi.name and ps.scount=1 and pi.tcount=1
  union   -- singletons
   select ps.*
   from possible_starts ps
   join possible_starts psi on (ps.source=psi.target and ps.name != psi.name)
   )

Ends query is similar

Nothing new, as starts and ends are basically the same

Join segments from end to start

The “recursive” part
Use PostGIS functions

Interlude: spatial basics

A quick overview of important spatial processing basics

I am not a geographer

Latitude and Longitude

The earth is a deformed sphere
Latitude is degrees up or down from equator
- South Pole -90°
- North Pole +90°
Longitude is degrees around the equator
- “Prime Meridian” is -180°, 0°, and 180°
- Greenwich, London, England
- East of Greenwich (Europe, Africa, Asia) is positive
- West of Greenwich (North and South America) is negative

Maps are flat, need projections

All maps have a projection
There are a surprising number of projections
Differences arise from the purpose of the map
- Accurate areas for a single country
- Cloropleths (maps colored to show a statistic like Covid-19 infections)
- Accurate straight line distances
- etc

Which projection to use?

The wikipedia article is good: https://en.wikipedia.org/wiki/Map_projection
Mercator
- Used in tiling map libraries (Google maps, etc) (good zooming)
- Not good for cloropleths
- Used for GPS receivers
Transverse Mercator https://en.wikipedia.org/wiki/Transverse_Mercator_projection
Albers
- Equal area projection
- Used for standard USA maps
- Good for cloropleths

Datums

Formal way to fit the ellipsoid to a local or global area
NAD27: Historical USA maps
NAD83: Recent USA maps; corrects NAD27 distortions
WGS84: Worldwide datum, used by GPS

UTM

Universal Transverse Mercator
Worldwide coordinate system
Slices surface into 60 zones
Each zone can be mapped with low distortion

Important points

If you have GPS coordinates or just (latitude, longitude) pairs, you have WGS84
Always write (latitude, longitude)
Programs and functions tend to flip the order, x then y

Simple Features for SQL

OpenGIS Consortium (OGC) specifications
Standard types and functions

Well Known Text (WKT), Binary (WKB)

POINT
LINESTRING
POLYGON
MULTIPOINT
MULTILINESTRING
MULTIPOLYGON
GEOMETRYCOLLECTION

Well Known Text (WKT), Binary (WKB)

POINT
LINESTRING

PostGIS

PostGIS offers types, functions, and indices for spatial data

GIS objects in PostGIS

See https://postgis.net/docs/manual-3.0/
PostGIS uses a superset of the OpenGIS specification
- “Extended” to include the object’s spatial referencing system identifier (SRID).
- WKT becomes EWKT
- WKB becomes EWKB
- SRID gives the object’s projection and datum
- 8500 SRID entries in stock PostGIS

EWKT

point: SRID=4326;POINT(-118.240413 34.160228)
linestring: SRID=4326;LINESTRING(-118.239602 34.159187, -118.239631 34.159161, -118.239673 34.159119)

Merge street segments

st_asewkt
st_makeline
st_geomfromewkt
st_simplifypreservetopology
st_union

Make curbs network

st_offsetcurve
st_transform
st_reverse

Breakup segments for animation

st_linesubstring
st_linemerge
st_length

Create POV points

st_collect
st_centroid

Use this one weird PostGIS trick …

How to measure geometries?

ST_Length(geom)
geom is probably in WGS84 projection
(Latitude, Longitude) in decimal degrees
Using st_length() on degrees is useless

UTM projection gives meters!

st_length(st_transform(geom,32611))

To get meters, transform geometry
WGS84 is SRID=4326
Transform to UTM projection, SRID=32611
Then st_length() call gives meters!

Which zone? 326??

Find your UTM zone https://en.wikipedia.org/wiki/ Universal_Transverse_Mercator_coordinate_system#UTM_zone
Pick the correct SRID


select srid,proj4text
 from spatial_ref_sys where srid between 32600 and 32661
 order by srid;
 srid  |    proj4text
-------+---------------------------------------------------
 32601 | +proj=utm +zone=1 +datum=WGS84 +units=m +no_defs
 32602 | +proj=utm +zone=2 +datum=WGS84 +units=m +no_defs
 32603 | +proj=utm +zone=3 +datum=WGS84 +units=m +no_defs
 32604 | +proj=utm +zone=4 +datum=WGS84 +units=m +no_defs
 32605 | +proj=utm +zone=5 +datum=WGS84 +units=m +no_defs
 …

And now back to the story

Iteration to merge segments

recursive calls are broken into two steps
the first is an initializing step
the second is the recursive part
the recursive part is a union with the initializing step
the recursion needs to have a well-defined stop

Initialization step


-- with recursive ...
search_graph(gid, length_m, name, source, target, depth,
             path, segments, cycle, ...all_the_things...) as (
   select
    g.gid, g.length_m,
    g.name, g.source, g.target,
    1 as depth,
    array[g.gid] as path,
    st_asewkt(g.the_geom) as segments,
    false as cycle,
    ... -- other stuff
   from ends g
   -- initialize with ends, connect interiors to source

Initialization notes

gid is unique identifier for each segment
path is an array of gid’s
Start the recursion from the end
Push new gid’s to the beginning of the array

Why st_asewkt?


st_asewkt(g.the_geom) as segments

Not free to convert geom to text representation
But union of geoms is pickier
By combining geoms as text, can preserve their type of LineString

Recursive part

  union all
   select
    g.gid, g.length_m + sg.length_m,
    sg.name, g.source, sg.target,
    sg.depth+1 as depth,
    g.gid || sg.path as path,
    st_asewkt( st_makeline( g.the_geom, sg.segments )),
    g.gid = ANY(sg.path) as cycle,
    ... -- other stuff
   from interiors g -- recurse on interiors
   join search_graph sg on
     (g.target=sg.source -- interior target -> chain source
      and g.name=sg.name)-- but same street name too please
   where sg.depth < 100 and not sg.cycle -- stop guards

PostGIS notes


st_asewkt( st_makeline( g.the_geom, sg.segments ))

st_makeline() used to avoid array type error
Makes a new line for each segment
Prepends new line bit to growing line
Whole result is dumped as well known text for next recursive loop

Alternate version


  ARRAY[g.the_geom] as segments
  …
  array_prepend(g.the_geom, sg.segments)
              ::geometry(LineString,4326)[],

Cast fixes recursive error re: mismatched array types
EXPLAIN ANALYZE says they’re the same speed:
- st_asewkt 117s vs ARRAY 119s

Example results


WITH RECURSIVE …
select gid,name,source,target,depth
from search_graph order by depth desc,name;
 gid  |         name          | source | target | depth
------+-----------------------+--------+--------+-------
 6344 | North Louise Street   |   5686 |    234 |    19
 6179 | North Louise Street   |   5685 |    234 |    18
 5311 | Emerald Isle Drive    |   4635 |    149 |    17
 6326 | North Louise Street   |   5520 |    234 |    17
 5309 | Emerald Isle Drive    |   4650 |    149 |    16
 5280 | Flintridge Drive      |   4620 |    147 |    16
 6327 | North Louise Street   |   5667 |    234 |    16
 5310 | Emerald Isle Drive    |   4648 |    149 |    15

Need to pick the longest

The longest segment has depth of 19
Need to choose that one, not the shorter ones
Next part of WITH RECURSIVE statement picks off longest segments

Longest groups


gid_paths as (
   select unnest(sg.path) as node,depth
   from search_graph sg ),

gid_max_depth as (
   select node,max(depth) as depth
   from gid_paths group by node ),

distinct_paths as (
  select distinct path
  from search_graph sg
  join gid_max_depth gm
       on (gm.depth=sg.depth and
           gm.node in (select unnest(sg.path))))

Make one record

Pick longest sequence using distinct_paths
Merge starts to add starting node
Convert text geom back to binary geom

Merged segments

segments as (
  select
    c.name, g.source, c.target, c.depth+1 as depth, g.gid || c.path as path,
    ST_SimplifyPreserveTopology(
      ST_GeomFromEWKT(
        ST_AsEWKT(ST_MakeLine(g.the_geom,
                              c.segments))),
      0.0000001) as the_geom,
      … other_columns …
   from search_graph c
   join distinct_paths dp on (c.path=dp.path)
   join starts g -- add start nodes to chain
        on (g.target=c.source  --start.target == source
            and g.name=c.name)) -- same name please

Book-keeping, and finish up

The remaining SQL just tidies up
Make a new table
- Start with the old table
- Drop the components of merged segments
- Add the new, longer merged segments


grouped as (
   select * from keep_ways
  union
   select * from new_ways
)
insert into new_glendale_ways ( … )
select … from grouped;

Final output of segment-joining work

Some notes

Not all segments are fixed properly
Reduced number of segments by 40%
- for Glendale, California
- went from 7653 links to 4597 links
Huge impact on problem size
Absolutely worth the effort to figure this out

Where do we stand now?

Strategy to get data into the solver

OpenStreetMap → street network → LineGraph → OR Tools
OpenStreetMap → street network → Travel Times → OR Tools

Strategy to get data into the solver

OpenStreetMap cleaned → street network → LineGraph → OR Tools
OpenStreetMap cleaned → street network → Travel Times → OR Tools

Converting streets to curbs

Making a routable, directed network

One-way and two-way streets

OSM data is pretty good about one-way streets
pgRouting can analyze OSM data, establish
forward and backward traversal costs
But mixing one-way and two-way streets is buggy

Convert all streets to curbs

Curbs are all one-way
On two-way streets, curb movements are in opposite directions
On one-way streets, curb movements are in same direction
Easier to reason about moving from curb to curb

Big SQL statement


drop sequence if exists curbgraph_v2_serial;
create sequence curbgraph_v2_serial;

drop table if exists curbs_v2_graph cascade;

with
tform as (
    select id, st_transform(the_geom,32611) as geom,reverse_cost
    from new_glendale_ways),
rhs as (
    select ST_Reverse(ST_Transform (
             ST_OffsetCurve(
                geom,
                -2)  -- 2 meters offset, 6 feet-ish, reverse of orig direction
                ,4326)) as geom, --  back to source geometry, (need reverse)
           id
    from tform),
lhs_twoway as (
    select ST_Reverse(ST_Transform (
             ST_OffsetCurve(
                geom,
                2)  -- 2 meters offset, 6 feet-ish, same sense
                ,4326)) as geom, --  back to source geometry, (need reverse because lhs)
           id
    from tform
    where reverse_cost > 0),
lhs_oneway as (
    select ST_Transform (
             ST_OffsetCurve(
                geom,
                2)  -- 2 meters offset, 6 feet-ish, same sense
                ,4326) as geom, --  back to source geometry, no rev. for 1-way str
           id
    from tform
    where reverse_cost < 0),
lhs as (
    select * from lhs_twoway
    union
    select * from lhs_oneway),
lhscurbs as (
  select
    nextval('curbgraph_v2_serial') as curbid,
    'lhs' as curbside,
    a.id,
    a.osm_ids,
    a.tag_ids,
    a.length,
    a.length_m,
    a.name,
    case when a.one_way=1 then  a.source else a.target end  as source,
    case when a.one_way=1 then  a.target else a.source end  as target,
    a.source_osm,
    a.target_osm,
    case when a.one_way=1 then  a.cost else a.reverse_cost end  as cost,
    -1 as reverse_cost,
    case when a.one_way=1 then  a.cost_s else a.reverse_cost_s end  as cost_s,
    -1 as reverse_cost_s,
    a.rule,
    a.one_way,
    a.oneway,
    a.maxspeed_forward,
    a.maxspeed_backward,
    a.priority,
    a.the_geom,
    cg.geom as curb_geom
  from new_glendale_ways a
  left outer join lhs cg on (a.id=cg.id)
  where one_way=1 or reverse_cost>0),
rhscurbs as (
  select
    nextval('curbgraph_v2_serial') as curbid,
    'rhs' as curbside,
    a.id,
    a.osm_ids,
    a.tag_ids,
    a.length,
    a.length_m,
    a.name,
    a.source,
    a.target,
    a.source_osm,
    a.target_osm,
    a.cost,
    -1 as reverse_cost,
    a.cost_s,
    -1 as reverse_cost_s,
    a.rule,
    a.one_way,
    a.oneway,
    a.maxspeed_forward,
    a.maxspeed_backward,
    a.priority,
    a.the_geom,
    cg.geom as curb_geom
  from new_glendale_ways a
  left outer join rhs cg on (a.id=cg.id)),
curbs as (
  select * from rhscurbs
  union
  select * from lhscurbs)
select * into curbs_v2_graph from curbs;

Update on progress

Strategy to get data into the solver

OpenStreetMap cleaned → street network → LineGraph → OR Tools
OpenStreetMap cleaned → street network → Travel Times → OR Tools

Strategy to get data into the solver

OpenStreetMap cleaned → street curb network → LineGraph → OR Tools
OpenStreetMap cleaned → street curb network → Travel Times → OR Tools

Strategy to get data into the solver

OpenStreetMap cleaned → street curb network → LineGraph → OR Tools
OpenStreetMap cleaned → street curb network → Travel Times → OR Tools

Making a linegraph

The usual network graph

Intersections are nodes
Streets are links between nodes

A linegraph is the network dual

Streets are nodes
Links represent legal movements between streets

Legal movements between curbs

Converting curbs to linegraph

Use pgRouting to make linegraph

With curb graph in hand, this is a very easy task


drop table if exists curbs_v2_linegraph;
SELECT * into curbs_v2_linegraph FROM pgr_lineGraph(
    'SELECT curbid as id, source, target, cost_s as cost, reverse_cost_s as reverse_cost FROM curbs_v2_graph'
);

Zooming in on an area

What next?

Strategy to get data into the solver

OpenStreetMap cleaned → street curb network → LineGraph → OR Tools
OpenStreetMap cleaned → street curb network → Travel Times → OR Tools

Strategy to get data into the solver

OpenStreetMap cleaned → street curb network → LineGraph → OR Tools ✓
OpenStreetMap cleaned → street curb network → Travel Times → OR Tools

Strategy to get data into the solver

OpenStreetMap cleaned → street curb network → LineGraph → OR Tools ✓
OpenStreetMap cleaned → street curb network → Travel Times → OR Tools

All to all matrix

Almost a one-liner

pgRouting has an excellent function pgr_dijkstraCostMatrix()
Creates a matrix of distances in one pass

pgr_dijkstracostmatrix

pgr_dijkstraCostMatrix(edges_sql, start_vids)

pgr_dijkstracostmatrix

SELECT * FROM pgr_dijkstraCostMatrix(
    'SELECT id, source, target, cost, reverse_cost FROM edge_table',
    (SELECT array_agg(id) FROM edge_table_vertices_pgr WHERE id < 5)
);

pgr_dijkstracostmatrix

select * FROM pgr_dijkstraCostMatrix(
    'SELECT id, source, target, target_length_m as cost, reverse_cost FROM new_curbs_v2_linegraph',
    (select array_agg(source) from (select distinct source from new_curbs_v2_linegraph))
);

Problem: RAM limits

9,194 curb segments means table with 84,529,636 entries
Process runs out of RAM!

Ugly hacks

Step through the curb table 3,000 at a time
Grab random bunches of under-represented origins
Rinse and repeat

First attempt at function

create or replace function fleshout_2000_curb_linegraph_matrix(starting int)
returns integer as
$BODY$
DECLARE
   i text;
   subsel text := 'SELECT id, source, target, target_length_m as cost, reverse_cost FROM new_curbs_v2_linegraph';
   insert_sql text := '';
   startid int := 0;
BEGIN
   insert_sql := '
   with
   id_block (sid) as (
      select distinct source
      from new_curbs_v2_linegraph nl
      where source > $1
      order by source
      limit 3000
      ),
    nextbatch as (
     select * FROM pgr_dijkstraCostMatrix('
   || quote_literal(subsel)
   ||',
      (select array_agg(sid) from id_block)))
   insert into new_curbs_linegraph_matrix
   select * from nextbatch
   on conflict do nothing';
   FOR startid IN starting..7000 by 1000 LOOP
       RAISE NOTICE 'populate db starting with %', startid;
       EXECUTE  insert_sql using startid;
   END LOOP;
   return  count(*) from new_curbs_linegraph_matrix;
END;
$BODY$
  LANGUAGE plpgsql;

Better approach: Sample from underrepresented

Use Ordered-Set aggregate function
https://www.postgresql.org/docs/13/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE
percentile_cont: Computes the continuous percentile, a value corresponding to the specified fraction within the ordered set of aggregated argument values. This will interpolate between adjacent input items if needed.

Sampling version of function

with
 sid_count (sid,cnt) as (
   select start_vid, count(*)
   from new_curbs_linegraph_matrix
   group by start_vid
   order by count),
 lo_block (sid) as (
   select sid from sid_count
   limit 500),
 pctl (hicount) as (
   SELECT percentile_cont(0.07) WITHIN GROUP (ORDER BY cnt) FROM sid_count ),
 hi_block (sid) as (
   select sid
   from sid_count
   cross join pctl
   where cnt > hicount
   order by random()
   limit 2500),
 id_block (sid) as (
   select sid from lo_block
   union
   select sid from hi_block),

The table is close enough

Each Origin should have 9193 destinations


with counts as (
   select start_vid,count(*) as cnt
   from new_curbs_linegraph_matrix group by start_vid)
select count(*),floor(cnt) from counts group by floor(cnt);
 count | floor
-------+-------
   201 |  9190
  2374 |  9191
  6607 |  9192
    11 |  9193
(4 rows)

Solve the Street Sweeping Problem

OR Tools to the rescue

OR Tools is great
But it isn’t PostgreSQL related
So I’ll talk about it some other time

Some benchmarks

My formulation takes about 20 minutes to generate an initial solution
Can run for hours
Difficult to get the “shape” of a solution right
Difficult to visualize the output

Save the generated paths

After solver finishes, generate a list of nodes “swept”
For deadhead nodes, use pgRouting to find intermediate nodes
- Deadhead meaning drive without sweeping over several streets to get to a street that needs sweeping
Gather the list of all nodes each vehicle visits (sweep plus non-sweep)

Python code to save list of nodes to DB

def sequence_to_table(self,vsequence,table_name):
    sequence = 0
    insert_query_string = """insert into {} (veh,sweep,linkid,geom)
    select %s,%s,%s,c.curb_geom as the_geom
    from curbs_v2_graph c
    where c.curbid =%s"""
    insert_query = sql.SQL(insert_query_string).format(sql.Identifier(table_name))
    with self.conn.cursor() as cur:
        cur.execute(
            sql.SQL("drop table if exists {}").format(sql.Identifier(table_name)))

        cur.execute(
            sql.SQL("CREATE TABLE {} (id serial, linkid integer, veh integer, sweep boolean default TRUE, geom geometry(LINESTRING,4326) )").format(sql.Identifier(table_name)))

        for (veh,sequence) in vsequence.items():
            #print(sequence)
            for row in sequence:
                # basic idea.  Save a table.  Each row is
                # shape,sequence number
                sweep = row[1]
                linkid = row[0]
                cur.execute(insert_query,(veh,sweep,linkid,linkid))
        self.conn.commit()

Aside

Do not use Python string formatting to insert strings and variables into your generated SQL
Doing so is strongly discouraged by psycopg
Instead use sql.SQL, and pass parameters to execute

sql.SQL("drop table {}").format(sql.Identifier(table_name)))
...
cur.execute(insert_query,(veh,sweep,linkid,linkid))

Visualizing the output

QGIS plus PostGIS tables

The real reason I included geometry in output table
QGIS can directly display PostGIS geometry tables

Nice maps, but …

The maps are difficult to view
Routes are on top of each other
No sense of the movement of the vehicle

Try animating!

Helpful blog posts all over
Search term: geogiffery
https://medium.com/@tjukanov/geospatial-animations-with-qgis-atlas-995d7ddb2d67

Use QGIS Atlas functionality

Image stack style animation
Make a print view
Control the print view with an “atlas”
Dump thousands of images to a directory
Use ffmpeg to stitch the images into a movie

Nausea-inducing results

Problem: inconsistent block sizes

Each “frame” of the animation is “the next block”
Blocks have different sizes
Inconsistent feet-per-animation-frame
Not a smooth animation

Use PostGIS to snip the roads

Break up the segments into pieces
Each frame adds a consistent distance to animation
Currently using 25 meters

ST_LineSubstring

ST_LineSubstring(geom, start_frac, end_frac)

geom: the linestring to process
start_frac: starting fraction for snipping the line (0 to 1)
end_frac: ending fraction for snipping the line (0 to 1)

Tip from the PostGIS docs

Use ST_LineSubstring to break line into N parts
Each part is from i to i+1, i = [0 .. N-1]
Use generate_series to generate the i values

PostGIS doc code:

SELECT field1, field2,
       ST_LineSubstring(the_geom, 100.00*n/len,
  CASE
    WHEN 100.00*(n+1) < len THEN 100.00*(n+1)/len
    ELSE 1
  END) AS the_geom
FROM
  (SELECT sometable.field1, sometable.field2,
  ST_LineMerge(sometable.the_geom) AS the_geom,
  ST_Length(sometable.the_geom) As len
  FROM sometable ) AS t
CROSS JOIN generate_series(0,10000) AS n
WHERE n*100.00/len < 1;

My modifications

Construct SQL with WITH statements
Compute required length of series based on longest road / 25 meters

Find the longest segment using one weird trick

with
 lengthshare as (
   select id,linkid,veh,sweep,geom,
          st_length(st_transform(geom,32611)) as len
   from solver_output
   order by id),
 maxlen as (
   select max(len) as len  from lengthshare ),

Determine “maxiter”


maxiter as (
   select (ceil(len/25.00)+1)::int as maxiter
   from maxlen
   )

Divide the longest length by 25, and round

Using maxiter, generate series


series as (
   select maxiter, generate_series(1,maxiter) - 1 as n
   from maxiter
   )

More flexible than the example code to determine maximum of series from data, rather than hard coding some big number.

Snip each line into pieces


snipped as (
   select id,
     id+(n/maxiter::numeric) as frame,
     linkid,veh,sweep,
     st_linesubstring(geom,
          25.00*n/len,
          case
            when 25.00*(n+1) < len then 25.00*(n+1)/len
            else 1
          end) as geom
   from lengthshare l
   cross join series s
   where s.n*25.00/len < 1
   order by frame )

Finally, save to new table


insert into
  solver_output_snipped (id,frame,linkid,veh,sweep,geom)
  select id,frame,linkid,veh,sweep,geom from snipped;

Small road section animation results

Bonus: Arrow heads!

Previous animation just showed current street
With snipped roads, can show progress along street (every 25m)
Arrow appears to move along streets
Looks more like a “real” animation

Still room for improvement

The frame jumps to keep vehicle (arrow) in center of frame
Okay when vehicle is moving straight
Ugly and jarring when vehicle changes direction

Key insight: need a POV table

Can model the point of view (POV) as its own table
POV table is mapped to its own QGIS layer
Keep the layer hidden
“Camera” movement should smoothly follow vehicle movement
The POV for each frame should be a spatial average

Make the POV table


with
  idgeom (uid, geom) as (
    select uid,
      st_collect(geom)
      over (order by uid
            rows between 2 preceding
                    and 10 following)
    from curbtable_globalspan_snipped  ),
  centroids (uid, geom) as (
    select uid,st_centroid(geom) as geom
    from idgeom   )
insert into pov (uid,geom)
select uid as uid,geom from centroids;

The result

A table of points for each frame
Point is the centroid of point window around current point
Centers the atlas window where needed

Leading average, small window: [-2, 10]

Bigger window: [-40, 50]

Even bigger window: [-90, 100]

Smoother animation with [-2, 10] window

Smoother animation with [-40, 50] window

Questions?

Thank you

Extras

FFPMEG

Sample ffmpeg command

ffmpeg -framerate 4 -i 'Maps/output_%04d.png'  -crf 25 -y Maps/out1.webm

-framerate 4: 4 images per second
output_%04d.png: Input image file names are in numerical order

See https://ffmpeg.org/ffmpeg-formats.html#image2-1

How to Download OSM data

How to find OSM data for a city

Go to the OSM website https://www.openstreetmap.org
Pan and zoom to area of interest (visual check)
Go to data download site https://download.geofabrik.de/
Get “smallest” encompassing region

Example: Port-au-Prince, Haiti

Central America: https://download.geofabrik.de/central-america.html (about 404 MB)
Haiti and Dominican Republic: https://download.geofabrik.de/central-america/haiti-and-domrep.html (about 63 MB)
Need to isolate just the city

Download the Port-au-Prince polygon

At OSM website, search for Port-au-Prince
Click on boundary relation and note the relation number
Switch to the data API using the relation number to download the polygon

Search for “Port-au-Prince”

Find relation id

Boundaries contain ways

Access OSM data API

Craft the download URL
Make sure it includes the word “full”
https://www.openstreetmap.org/api/0.6/relation/387318/full
Save as port-au-prince-poly.osm

Using PostgreSQL and friends for a street sweeping solver project

Metadata

Focus

My background

My PostgreSQL usage

My PostgreSQL usage

My go-to geo-enabled datastore

Street sweeping

What do we need?

Solution elements

Solution elements

Solution elements

Tasks for PostgreSQL

PostgreSQL is data glue

The solver

Google’s OR Tools

Google’s OR Tools

Routing problems

Routing problems

Routing problems

Routing problems

Routing problems

Routing problems

Routing problems

Routing problems

Routing problems

Routing problems

Routing problems

OR Tools is designed around Constraints

The solver needs:

The solver needs:

The data

OpenStreetMap

Nodes, Ways, and Relations

Nodes, Ways, and Relations

Nodes, Ways, and Relations

Tags provide semantic meaning

Example node

Example way

Example relation

Nodes hold location information

How is street sweeping a routing problem?

Routing problem inputs

Street Sweeping as a routing problem

Street Sweeping as a routing problem

How to get the locations to visit

How to get the locations to visit

How to get the locations to visit

Strategy to get data into the solver

PostgreSQL and “friends”

PostGIS

pgRouting

Load OSM data

Quick highlights of grabbing OSM data

Limit data to just a city using its boundary

Load the data into pgRouting tables

osm2pgrouting config file

Generated ways table

Sample data from ways table

What have we accomplished?

Strategy to get data into the solver

Strategy to get data into the solver

Clean OSM data

(The hardest part)

Too many roadway segments

Objective: Combine segments per block

WITH RECURSIVE queries

WITH RECURSIVE refresher

WITH RECURSIVE refresher

WITH RECURSIVE refresher

WITH RECURSIVE refresher

WITH RECURSIVE refresher

Segment data

Counting sources and targets

Characteristics of “interior” segments

Counts of source, target

Potential interior segments

“True” interiors

Interiors table

Visual explanation using maps

`osm2pgrouting` config file