Using PostgreSQL and friends for a street sweeping solver project

James Marca, Activimetrics LLC

October 21, 2020

Metadata

Focus

  • Case study
  • Practical examples, including
    • Spatial processing
    • “Recursion”
    • Window functions
  • Some theory

My background

  • Started programming in 1980: Apple ][ Basic
  • BS 1989 HMC
  • Systems engineer
  • MS 1994 UC Irvine
  • Consulting firm in Boston 3 years
  • PhD 2002 UC Irvine

My PostgreSQL usage

  • Started using PostgreSQL in 2001 or so
    • Needed to store GPS points for research
    • MySQL geo support was lacking
    • PostGIS was good enough

My PostgreSQL usage

  • Started using PostgreSQL in 2001 or so
    • Needed to store GPS points for research
    • MySQL geo support was lacking
    • PostGIS was good enough
  • GPS → Perl → PostgreSQL/PostGIS

My go-to geo-enabled datastore

  • PostgreSQL/PostGIS ⇔ R spatial analysis

  • PostgreSQL/PostGIS ⇔ mod_perl activity survey website

  • PostgreSQL/PostGIS ⇔ Java accident risk website

  • PostgreSQL/PostGIS ⇔ node.js truck activity estimator

  • PostgreSQL/PostGIS ⇔ … lots of things …

Street sweeping

  • Given:
    • a network of streets
    • a fleet of vehicles
  • Make sure every curb is swept
  • Edge covering problem
  • Similar problems are trash collection, snow plowing

What do we need?

Solution elements

  • Geographic data–exact details of each street
  • A good solver that can tackle NP-Hard problems

Solution elements

  • Geographic data–exact details of each street
  • A good solver that can tackle NP-Hard problems

Solution elements

  • Geographic data–exact details of each street
  • A good solver that can tackle NP-Hard problems

Tasks for PostgreSQL

  • Store the data
  • Clean the data
  • Process the data to load into solver
  • Save the solver output
  • Process the solver output

PostgreSQL is data glue

The solver

  • What is it?
  • What does it do?
  • What does it need?

Google’s OR Tools

Google’s OR Tools

Routing problems

Routing problems

Find the best path

Routing problems

Find the best path
for one or more vehicles

Routing problems

Find the best path
for one or more vehicles
to visit every node in a network

Routing problems

Find the best path
for one or more vehicles
to visit every node in a network
but the vehicles have a fixed capacity

Routing problems

Find the best path
for one or more vehicles
to visit every node in a network
but the vehicles have a fixed capacity
and need to head back to the depot to unload

Routing problems

Find the best path
for one or more vehicles
to visit every node in a network
but the vehicles have a fixed capacity
and need to head back to the depot to unload
and make sure that every vehicle has the about the same work

Routing problems

Find the best path
for one or more vehicles
to visit every node in a network
but the vehicles have a fixed capacity
and need to head back to the depot to unload
and make sure that every vehicle has the about the same work
oh and can you try to minimize the number of vehicles that are needed too?

Routing problems

Find the best path
for one or more vehicles
to visit every node in a network
but the vehicles have a fixed capacity
and need to head back to the depot to unload
and make sure that every vehicle has the about the same work
oh and can you try to minimize the number of vehicles that are needed too?
also the drivers have an eight hour shift, or they cost time and a half, so factor that in, as well as their contractual working hours and break times
(15 minutes in the morning and afternoon,
plus one hour lunch between 11 and 2)

Routing problems

Find the best path
for one or more vehicles
to visit every node in a network
but the vehicles have a fixed capacity
and need to head back to the depot to unload
and make sure that every vehicle has the about the same work
oh and can you try to minimize the number of vehicles that are needed too?
also the drivers have an eight hour shift, or they cost time and a half, so factor that in, as well as their contractual working hours and break times
(15 minutes in the morning and afternoon,
plus one hour lunch between 11 and 2)

Routing problems

OR Tools is designed around Constraints

  • Based on Constraint Programming
  • Flexible specification of linear constraints
  • Adding constraints can sometimes even improve solver runtime

The solver needs:

  • A list of nodes to visit + attributes
  • An all-to-all travel matrix
    • distance and/or time
  • A list of vehicles + attributes
  • A list of vehicle depot locations
  • Plus details needed for other constraints

The solver needs:

  • A list of nodes to visit + attributes
  • An all-to-all travel matrix
    • distance and/or time

The data

  • Where does it come from?
  • What does it contain?

OpenStreetMap

  • https://www.openstreetmap.org
  • Global, free, crowd-sourced geographic data
  • Quality varies country to country
  • Mistakes can be fixed
  • Generally excellent

Nodes, Ways, and Relations

  • Nodes are points in space
    • Intersections
    • Locations of interest

Nodes, Ways, and Relations

  • Nodes are points in space

  • Ways are lines

    • Linear features
    • Roads between nodes
    • Area boundaries

Nodes, Ways, and Relations

  • Nodes are points in space

  • Ways are lines

  • Relations group nodes and ways

    • Collect ways for a city boundary
    • Define a highway or a bus route

Tags provide semantic meaning

Each tag describes a geographic attribute of the feature being shown by that specific node, way or relation.

  • Tags are freeform, “key=value”
  • No fixed standard, but approved conventions
    • highway=residential
    • bridge=movable + bridge:movable=drawbridge
    • barrier=lift gate
  • Conventions evolve over time

Example node

Node: 7097377424
Version #1

added service roads.
Edited 9 months ago by pandypr · Changeset #79039579
Location: 33.6756646, -117.9193706
Part of
2 ways

    Harbor Boulevard (401292457)
    759766208

Example way

Way: 759766208
Version #1

added service roads.
Edited 9 months ago by pandypr · Changeset #79039579
Tags
highway     service
Nodes
8 nodes

    7097377424 (part of way Harbor Boulevard (401292457))
    7097377425
    7097377426
    7097377427
    7097377428
    7097377429
    7097377430
    7097377431 (part of way Ponderosa Street (158607554))

Example relation

Relation: Harbor Boulevard (2484073)
Version #147

ref
Edited about 1 month ago by Fluffy89502 · Changeset #90212661
Tags
name    Harbor Boulevard
route   road
type    route
Members
▶ 700 members

Nodes hold location information

  • Only nodes have latitude, longitude
  • Ways are made up of connected nodes
  • Relations collect ways and nodes

How is street sweeping a routing problem?

Routing problem inputs

  • A list of locations that must be reached
  • A cost matrix for movement between all location pairs

Street Sweeping as a routing problem

  • Sweep a street = Visit a location
  • Use network to determine travel times from street to street
  • Streets must be swept during a time window
  • Sweepers “pickup” debris along curbs
  • Sweepers “unload” their hoppers

Street Sweeping as a routing problem

  • Sweep a street = Visit a location
  • Use network to determine travel times from street to street
  • Streets must be swept during a time window
  • Sweepers “pickup” debris along curbs
  • Sweepers “unload” their hoppers
  • Pickup and delivery problem with time windows (PDPTW)

How to get the locations to visit

  • Sweep a street = Visit a location

How to get the locations to visit

  • Sweep a street = Visit a location

How to get the locations to visit

  • Sweep a street = Visit a location
  • Transform Network into its LineGraph dual

Strategy to get data into the solver

  • OpenStreetMap → street network → LineGraph → OR Tools
  • OpenStreetMap → street network → Travel Times → OR Tools

PostgreSQL and “friends”

PostGIS

a spatial database extender for PostgreSQL. It adds support for geographic objects allowing location queries to be run in SQL

pgRouting

extends the PostGIS / PostgreSQL geospatial database to provide geospatial routing functionality

Load OSM data

Quick highlights of grabbing OSM data

  • Don’t download the whole world; just get a region
  • Use a download service (like https://download.geofabrik.de/)
  • Navigate to region of interest to download a PBF file
  • For example, Southern California URL

https://download.geofabrik.de/north-america/us/california/socal-latest.osm.pbf

Limit data to just a city using its boundary

osmium extract \
    -p glendale-poly.osm \
    -o glendale-latest.osm \
    socal-latest.osm.pbf

Load the data into pgRouting tables

osm2pgrouting \
     --f data/glendale-latest.osm \
     --conf data/map_config_streets.xml \
     --dbname glendale \
     --prefix 'glendale_' \
     --username dbuser \
     --clean

osm2pgrouting config file



  
  
    
    
    
    <tag_value name="trunk"             id="104" priority="1.05" maxspeed="50" />
    <tag_value name="trunk_link"        id="105" priority="1.05" maxspeed="50" />
    <tag_value name="primary"           id="106" priority="1.15" maxspeed="50" />
    <tag_value name="primary_link"      id="107" priority="1.15" maxspeed="50" />
    <tag_value name="secondary"         id="108" priority="1.5" maxspeed="50" />
    <tag_value name="secondary_link"    id="109" priority="1.5" maxspeed="50" />
    <tag_value name="tertiary"          id="110" priority="1.75" maxspeed="50" />
    <tag_value name="tertiary_link"     id="111" priority="1.75" maxspeed="50" />
    <tag_value name="residential"       id="112" priority="2.5" maxspeed="50" />
    <tag_value name="living_street"     id="113" priority="3" maxspeed="50" />
    
    <tag_value name="unclassified"      id="117" priority="3" maxspeed="50"/>
    <tag_value name="road"              id="100" priority="5" maxspeed="50" />
  

Generated ways table

mydb=> \d glendale_ways
    Table "public.glendale_ways"
      Column       |           Type
-------------------+---------------------------
 gid               | bigint
 osm_id            | bigint
 tag_id            | integer
 length            | double precision
 length_m          | double precision
 name              | text
 source            | bigint
 target            | bigint
 source_osm        | bigint
 target_osm        | bigint
 cost              | double precision
 reverse_cost      | double precision
 cost_s            | double precision
 reverse_cost_s    | double precision
 rule              | text
 one_way           | integer
 oneway            | text
 x1                | double precision
 y1                | double precision
 x2                | double precision
 y2                | double precision
 maxspeed_forward  | double precision
 maxspeed_backward | double precision
 priority          | double precision
 the_geom          | geometry(LineString,4326)
Indexes:
    "glendale_ways_pkey" PRIMARY KEY, btree (gid)
    "glendale_ways_the_geom_idx" gist (the_geom)
Foreign-key constraints:
    "glendale_ways_source_fkey" FOREIGN KEY (source) REFERENCES glendale_ways_vertices_pgr(id)
    "glendale_ways_source_osm_fkey" FOREIGN KEY (source_osm) REFERENCES glendale_ways_vertices_pgr(osm_id)
    "glendale_ways_tag_id_fkey" FOREIGN KEY (tag_id) REFERENCES configuration(tag_id)
    "glendale_ways_target_fkey" FOREIGN KEY (target) REFERENCES glendale_ways_vertices_pgr(id)
    "glendale_ways_target_osm_fkey" FOREIGN KEY (target_osm) REFERENCES glendale_ways_vertices_pgr(osm_id)

Sample data from ways table

 length_m | source | target | cost_s | reverse_cost_s | oneway
----------+--------+--------+--------+----------------+---------
    39.63 |   7194 |   1191 |   2.85 |          -2.85 | YES
    48.42 |  15404 |      1 |   3.49 |           3.49 | UNKNOWN
   124.38 |   1195 |     46 |   8.96 |          -8.96 | YES
   211.32 |   6794 |      2 |  15.21 |         -15.21 | YES
    59.75 |   5898 |      3 |   4.30 |          -4.30 | YES
   267.77 |   7762 |      5 |  19.28 |         -19.28 | YES
   147.70 |  11969 |      6 |  10.63 |          10.63 | UNKNOWN
    11.95 |   9845 |      8 |   0.86 |           0.86 | NO
    61.69 |  11183 |      9 |   4.44 |           4.44 | NO
    63.39 |   9838 |     10 |   4.56 |           4.56 | NO
(10 rows)

39.63m at 50km/hr takes 2.85s ✔

What have we accomplished?

Strategy to get data into the solver

  • OpenStreetMap → street network → LineGraph → OR Tools
  • OpenStreetMap → street network → Travel Times → OR Tools

Strategy to get data into the solver

  • OpenStreetMap → street network → LineGraph → OR Tools
  • OpenStreetMap → street network → Travel Times → OR Tools

Clean OSM data

(The hardest part)

Too many roadway segments

  • OSM is designed for many things
  • Some street segments are extraneous for our purposes
  • Example: intersections for service roads and driveways
  • Extra street segments make N larger,
    adding complexity to solver for no reason

Objective: Combine segments per block

  • Goal is one segment per block
  • Need to examine each segment
    • Is it an isolated middle segment?
    • Can it be linked to another middle segment?
  • Solution is to recursively connect a segment with its neighbors

WITH RECURSIVE queries

  • https://www.postgresql.org/docs/13/queries-with.html
  • WITH statements AKA Common Table Expressions
    • Great just to organize long SQL
  • WITH RECURSIVE statements
    • Allows a WITH query to refer to its own output
    • Can repeatedly use PostGIS functions to merge all segments for a block

WITH RECURSIVE refresher

WITH RECURSIVE blahquery(arg1, arg2) AS (
  -- non-recursive term --
  select foo, bar from baz
  UNION -- (or UNION ALL) --
  -- recursive term --
  select ... from blahquery ...

WITH RECURSIVE refresher


WITH RECURSIVE t(n) AS (
    VALUES (1)
  UNION ALL
    SELECT n+1 FROM t WHERE n < 10
)
SELECT sum(n) FROM t;

WITH RECURSIVE refresher


WITH RECURSIVE t(n) AS (
    VALUES (1)
  UNION ALL
    SELECT n+1 FROM t WHERE n < 10
)
SELECT sum(n) FROM t;
 sum
-----
  55
(1 row)

WITH RECURSIVE refresher


WITH RECURSIVE t(n) AS (
    VALUES (1)
  UNION ALL
    SELECT n+1 FROM t WHERE n < 10
)
SELECT n FROM t;
 n
----
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
(10 rows)

WITH RECURSIVE refresher


WITH RECURSIVE t(n) AS (
    VALUES (1)
  UNION ALL
    SELECT n+1 FROM t WHERE n < 10
)
SELECT sum(n) FROM t;
 sum
-----
  55
(1 row)

Segment data


  id  |         name          | source | target | one_way | cost_s | rev_cost_s
------+-----------------------+--------+--------+---------+--------+------------
  433 | Western Avenue        |      1 |    303 |       0 |   1.30 |       1.30
 4725 | Glenoaks Boulevard    |      1 |   4061 |       1 |   2.58 |      -2.58
  299 | Geneva Street         |      2 |    216 |       0 |  26.01 |      26.01
 1735 | Glenoaks Boulevard    |      2 |   1267 |       0 |   8.40 |       8.40

Counting sources and targets

Lots of sources and targets
Lots of sources and targets
Lots of sources and targets
Lots of sources and targets

Characteristics of “interior” segments

  • Each segment has source and target
  • Interior segment source is seen once
  • Interior segment target is seen once
  • So count all sources, all targets

Counts of source, target


WITH RECURSIVE
sources(source,count) as (
   select source,count(*) as count
   from glendale_ways group by source
   ),
targets(target,count) as (
   select target,count(*) as count
   from glendale_ways group by target
   )

Potential interior segments

Any record with target count=1 and source count=1


possible_interiors as (
   select w.*,s.count as scount, t.count as tcount
   from glendale_ways w
   join targets t on (w.target=t.target)
   join sources s on (w.source=s.source)
   where t.count=1 and s.count=1
   ),

“True” interiors

  • Possible interiors
    AND
  • source is a target of another possible interior segment
    AND
  • target node is a source of another possible interior segment

Interiors table


interiors as (
   select pi.*
   from possible_interiors pi
   join sources s on (pi.target=s.source)
   join targets t on (pi.source=t.target)
   where s.count=1 and t.count=1
   )

Visual explanation using maps

Sequence starts and ends

  • “Starts” are segments with
    • target node is unique (count of 1)
    • target is the start node of an interior segment
    • source node is not unique (node is source for lots of segments)
  • “Ends” are segments with
    • source node is unique (count of 1)
    • source is the end node of an interior segment
    • target is not unique (intersection)

Possible starts and starts


possible_starts as (
   select w.*, s.count as scount, t.count as tcount
   from glendale_ways w
   join targets t on (w.target=t.target)
   join sources s on (w.source=s.source)
   where t.count = 1 -- link is only one touching target
   )
starts as (
   select ps.*
   from possible_starts ps
   where ps.scount > 1
   )

Possible starts (blue), then starts (green)

Start segments initial rule is too narrow

Start segments rule: target node count=1, source node count>1
Start segments rule: target node count=1, source node count>1

A source is a target

  • Some sources are also targets
  • Flow direction is not uniform

   union
   select ps.*
   from possible_starts ps
   join targets t on (ps.source=t.target)
   where ps.scount=1 and  t.count>1
   -- more than one link target == ps.source

Names change, one path

Look at possible interiors to identify a name change


   union
   select ps.*
   from possible_starts ps
   join possible_interiors pi on (ps.source=pi.target)
   where ps.name != pi.name and ps.scount=1 and pi.tcount=1
   -- name change of road

Singletons

Like name change, but not in possible_interiors set


   union
   select ps.*
   from possible_starts ps
   join possible_starts psi on (ps.source = psi.target
                                and ps.name != psi.name)
   -- singletons

(Data is always messier than one would expect)

The full starts query

possible_starts as (
   select w.*, s.count as scount, t.count as tcount
   from glendale_ways w
   join targets t on (w.target=t.target)
   join sources s on (w.source=s.source)
   where t.count = 1 -- link is only one touching target
   ),
starts as (
   select ps.*
   from possible_starts ps
   where ps.scount >1
  union   -- more than one link target == ps.source
   select ps.*
   from possible_starts ps
   join targets t on (ps.source=t.target)
   where ps.scount =1 and  t.count>1
  union   -- name change of road
   select ps.*
   from possible_starts ps
   join possible_interiors pi on (ps.source=pi.target)
   where ps.name != pi.name and ps.scount=1 and pi.tcount=1
  union   -- singletons
   select ps.*
   from possible_starts ps
   join possible_starts psi on (ps.source=psi.target and ps.name != psi.name)
   )

Ends query is similar

Nothing new, as starts and ends are basically the same

Join segments from end to start

  • The “recursive” part
  • Use PostGIS functions

Interlude: spatial basics

A quick overview of important spatial processing basics

I am not a geographer

Latitude and Longitude

  • The earth is a deformed sphere
  • Latitude is degrees up or down from equator
    • South Pole -90°
    • North Pole +90°
  • Longitude is degrees around the equator
    • “Prime Meridian” is -180°, 0°, and 180°
    • Greenwich, London, England
    • East of Greenwich (Europe, Africa, Asia) is positive
    • West of Greenwich (North and South America) is negative

Maps are flat, need projections

  • All maps have a projection
  • There are a surprising number of projections
  • Differences arise from the purpose of the map
    • Accurate areas for a single country
    • Cloropleths (maps colored to show a statistic like Covid-19 infections)
    • Accurate straight line distances
    • etc

Which projection to use?

Datums

  • Formal way to fit the ellipsoid to a local or global area
  • NAD27: Historical USA maps
  • NAD83: Recent USA maps; corrects NAD27 distortions
  • WGS84: Worldwide datum, used by GPS

UTM

  • Universal Transverse Mercator
  • Worldwide coordinate system
  • Slices surface into 60 zones
  • Each zone can be mapped with low distortion

Important points

  • If you have GPS coordinates or just (latitude, longitude) pairs, you have WGS84
  • Always write (latitude, longitude)
  • Programs and functions tend to flip the order, x then y

Simple Features for SQL

  • OpenGIS Consortium (OGC) specifications
  • Standard types and functions

Well Known Text (WKT), Binary (WKB)

  • POINT
  • LINESTRING
  • POLYGON
  • MULTIPOINT
  • MULTILINESTRING
  • MULTIPOLYGON
  • GEOMETRYCOLLECTION

Well Known Text (WKT), Binary (WKB)

  • POINT
  • LINESTRING

PostGIS

PostGIS offers types, functions, and indices for spatial data

GIS objects in PostGIS

  • See https://postgis.net/docs/manual-3.0/
  • PostGIS uses a superset of the OpenGIS specification
    • “Extended” to include the object’s spatial referencing system identifier (SRID).
    • WKT becomes EWKT
    • WKB becomes EWKB
    • SRID gives the object’s projection and datum
    • 8500 SRID entries in stock PostGIS

EWKT

point
SRID=4326;POINT(-118.240413 34.160228)
linestring
SRID=4326;LINESTRING(-118.239602 34.159187, -118.239631 34.159161, -118.239673 34.159119)

Merge street segments

  • st_asewkt
  • st_makeline
  • st_geomfromewkt
  • st_simplifypreservetopology
  • st_union

Make curbs network

  • st_offsetcurve
  • st_transform
  • st_reverse

Breakup segments for animation

  • st_linesubstring
  • st_linemerge
  • st_length

Create POV points

  • st_collect
  • st_centroid

Use this one weird PostGIS trick …

How to measure geometries?

  • ST_Length(geom)
  • geom is probably in WGS84 projection
  • (Latitude, Longitude) in decimal degrees
  • Using st_length() on degrees is useless

UTM projection gives meters!

st_length(st_transform(geom,32611))
  • To get meters, transform geometry
  • WGS84 is SRID=4326
  • Transform to UTM projection, SRID=32611
  • Then st_length() call gives meters!

Which zone? 326??


select srid,proj4text
 from spatial_ref_sys where srid between 32600 and 32661
 order by srid;
 srid  |    proj4text
-------+---------------------------------------------------
 32601 | +proj=utm +zone=1 +datum=WGS84 +units=m +no_defs
 32602 | +proj=utm +zone=2 +datum=WGS84 +units=m +no_defs
 32603 | +proj=utm +zone=3 +datum=WGS84 +units=m +no_defs
 32604 | +proj=utm +zone=4 +datum=WGS84 +units=m +no_defs
 32605 | +proj=utm +zone=5 +datum=WGS84 +units=m +no_defs
 …

And now back to the story

Iteration to merge segments

  • recursive calls are broken into two steps
  • the first is an initializing step
  • the second is the recursive part
  • the recursive part is a union with the initializing step
  • the recursion needs to have a well-defined stop

Initialization step


-- with recursive ...
search_graph(gid, length_m, name, source, target, depth,
             path, segments, cycle, ...all_the_things...) as (
   select
    g.gid, g.length_m,
    g.name, g.source, g.target,
    1 as depth,
    array[g.gid] as path,
    st_asewkt(g.the_geom) as segments,
    false as cycle,
    ... -- other stuff
   from ends g
   -- initialize with ends, connect interiors to source

Initialization notes

  • gid is unique identifier for each segment
  • path is an array of gid’s
  • Start the recursion from the end
  • Push new gid’s to the beginning of the array

Why st_asewkt?


st_asewkt(g.the_geom) as segments
  • Not free to convert geom to text representation
  • But union of geoms is pickier
  • By combining geoms as text, can preserve their type of LineString

Recursive part

  union all
   select
    g.gid, g.length_m + sg.length_m,
    sg.name, g.source, sg.target,
    sg.depth+1 as depth,
    g.gid || sg.path as path,
    st_asewkt( st_makeline( g.the_geom, sg.segments )),
    g.gid = ANY(sg.path) as cycle,
    ... -- other stuff
   from interiors g -- recurse on interiors
   join search_graph sg on
     (g.target=sg.source -- interior target -> chain source
      and g.name=sg.name)-- but same street name too please
   where sg.depth < 100 and not sg.cycle -- stop guards

PostGIS notes


st_asewkt( st_makeline( g.the_geom, sg.segments ))
  • st_makeline() used to avoid array type error
  • Makes a new line for each segment
  • Prepends new line bit to growing line
  • Whole result is dumped as well known text for next recursive loop

Alternate version


  ARRAY[g.the_geom] as segments
  …
  array_prepend(g.the_geom, sg.segments)
              ::geometry(LineString,4326)[],
  • Cast fixes recursive error re: mismatched array types
  • EXPLAIN ANALYZE says they’re the same speed:
    • st_asewkt 117s vs ARRAY 119s

Example results


WITH RECURSIVE …
select gid,name,source,target,depth
from search_graph order by depth desc,name;
 gid  |         name          | source | target | depth
------+-----------------------+--------+--------+-------
 6344 | North Louise Street   |   5686 |    234 |    19
 6179 | North Louise Street   |   5685 |    234 |    18
 5311 | Emerald Isle Drive    |   4635 |    149 |    17
 6326 | North Louise Street   |   5520 |    234 |    17
 5309 | Emerald Isle Drive    |   4650 |    149 |    16
 5280 | Flintridge Drive      |   4620 |    147 |    16
 6327 | North Louise Street   |   5667 |    234 |    16
 5310 | Emerald Isle Drive    |   4648 |    149 |    15

Need to pick the longest

  • The longest segment has depth of 19
  • Need to choose that one, not the shorter ones
  • Next part of WITH RECURSIVE statement picks off longest segments

Longest groups


gid_paths as (
   select unnest(sg.path) as node,depth
   from search_graph sg ),

gid_max_depth as (
   select node,max(depth) as depth
   from gid_paths group by node ),

distinct_paths as (
  select distinct path
  from search_graph sg
  join gid_max_depth gm
       on (gm.depth=sg.depth and
           gm.node in (select unnest(sg.path))))

Make one record

  • Pick longest sequence using distinct_paths
  • Merge starts to add starting node
  • Convert text geom back to binary geom

Merged segments

segments as (
  select
    c.name, g.source, c.target, c.depth+1 as depth, g.gid || c.path as path,
    ST_SimplifyPreserveTopology(
      ST_GeomFromEWKT(
        ST_AsEWKT(ST_MakeLine(g.the_geom,
                              c.segments))),
      0.0000001) as the_geom,
      … other_columns …
   from search_graph c
   join distinct_paths dp on (c.path=dp.path)
   join starts g -- add start nodes to chain
        on (g.target=c.source  --start.target == source
            and g.name=c.name)) -- same name please

Book-keeping, and finish up

  • The remaining SQL just tidies up
  • Make a new table
    • Start with the old table
    • Drop the components of merged segments
    • Add the new, longer merged segments

grouped as (
   select * from keep_ways
  union
   select * from new_ways
)
insert into new_glendale_ways ( … )
select … from grouped;

Final output of segment-joining work

Some notes

  • Not all segments are fixed properly
  • Reduced number of segments by 40%
    • for Glendale, California
    • went from 7653 links to 4597 links
  • Huge impact on problem size
  • Absolutely worth the effort to figure this out

Where do we stand now?

Strategy to get data into the solver

  • OpenStreetMap → street network → LineGraph → OR Tools
  • OpenStreetMap → street network → Travel Times → OR Tools

Strategy to get data into the solver

  • OpenStreetMap cleaned → street network → LineGraph → OR Tools
  • OpenStreetMap cleaned → street network → Travel Times → OR Tools

Converting streets to curbs

Making a routable, directed network

One-way and two-way streets

  • OSM data is pretty good about one-way streets
  • pgRouting can analyze OSM data, establish
    forward and backward traversal costs
  • But mixing one-way and two-way streets is buggy

Convert all streets to curbs

  • Curbs are all one-way
  • On two-way streets, curb movements are in opposite directions
  • On one-way streets, curb movements are in same direction
  • Easier to reason about moving from curb to curb

Big SQL statement


drop sequence if exists curbgraph_v2_serial;
create sequence curbgraph_v2_serial;

drop table if exists curbs_v2_graph cascade;

with
tform as (
    select id, st_transform(the_geom,32611) as geom,reverse_cost
    from new_glendale_ways),
rhs as (
    select ST_Reverse(ST_Transform (
             ST_OffsetCurve(
                geom,
                -2)  -- 2 meters offset, 6 feet-ish, reverse of orig direction
                ,4326)) as geom, --  back to source geometry, (need reverse)
           id
    from tform),
lhs_twoway as (
    select ST_Reverse(ST_Transform (
             ST_OffsetCurve(
                geom,
                2)  -- 2 meters offset, 6 feet-ish, same sense
                ,4326)) as geom, --  back to source geometry, (need reverse because lhs)
           id
    from tform
    where reverse_cost > 0),
lhs_oneway as (
    select ST_Transform (
             ST_OffsetCurve(
                geom,
                2)  -- 2 meters offset, 6 feet-ish, same sense
                ,4326) as geom, --  back to source geometry, no rev. for 1-way str
           id
    from tform
    where reverse_cost < 0),
lhs as (
    select * from lhs_twoway
    union
    select * from lhs_oneway),
lhscurbs as (
  select
    nextval('curbgraph_v2_serial') as curbid,
    'lhs' as curbside,
    a.id,
    a.osm_ids,
    a.tag_ids,
    a.length,
    a.length_m,
    a.name,
    case when a.one_way=1 then  a.source else a.target end  as source,
    case when a.one_way=1 then  a.target else a.source end  as target,
    a.source_osm,
    a.target_osm,
    case when a.one_way=1 then  a.cost else a.reverse_cost end  as cost,
    -1 as reverse_cost,
    case when a.one_way=1 then  a.cost_s else a.reverse_cost_s end  as cost_s,
    -1 as reverse_cost_s,
    a.rule,
    a.one_way,
    a.oneway,
    a.maxspeed_forward,
    a.maxspeed_backward,
    a.priority,
    a.the_geom,
    cg.geom as curb_geom
  from new_glendale_ways a
  left outer join lhs cg on (a.id=cg.id)
  where one_way=1 or reverse_cost>0),
rhscurbs as (
  select
    nextval('curbgraph_v2_serial') as curbid,
    'rhs' as curbside,
    a.id,
    a.osm_ids,
    a.tag_ids,
    a.length,
    a.length_m,
    a.name,
    a.source,
    a.target,
    a.source_osm,
    a.target_osm,
    a.cost,
    -1 as reverse_cost,
    a.cost_s,
    -1 as reverse_cost_s,
    a.rule,
    a.one_way,
    a.oneway,
    a.maxspeed_forward,
    a.maxspeed_backward,
    a.priority,
    a.the_geom,
    cg.geom as curb_geom
  from new_glendale_ways a
  left outer join rhs cg on (a.id=cg.id)),
curbs as (
  select * from rhscurbs
  union
  select * from lhscurbs)
select * into curbs_v2_graph from curbs;

Update on progress

Strategy to get data into the solver

  • OpenStreetMap cleaned → street network → LineGraph → OR Tools
  • OpenStreetMap cleaned → street network → Travel Times → OR Tools

Strategy to get data into the solver

  • OpenStreetMap cleaned → street curb network → LineGraph → OR Tools
  • OpenStreetMap cleaned → street curb network → Travel Times → OR Tools

Strategy to get data into the solver

  • OpenStreetMap cleaned → street curb network → LineGraph → OR Tools
  • OpenStreetMap cleaned → street curb network → Travel Times → OR Tools

Making a linegraph

The usual network graph

  • Intersections are nodes
  • Streets are links between nodes

A linegraph is the network dual

  • Streets are nodes
  • Links represent legal movements between streets

Converting curbs to linegraph

All the combinations
All the combinations
All the combinations
All the combinations
All the combinations
All the combinations
All the combinations
All the combinations
All the combinations
All the combinations

Use pgRouting to make linegraph

  • With curb graph in hand, this is a very easy task

drop table if exists curbs_v2_linegraph;
SELECT * into curbs_v2_linegraph FROM pgr_lineGraph(
    'SELECT curbid as id, source, target, cost_s as cost, reverse_cost_s as reverse_cost FROM curbs_v2_graph'
);

Zooming in on an area

What next?

Strategy to get data into the solver

  • OpenStreetMap cleaned → street curb network → LineGraph → OR Tools
  • OpenStreetMap cleaned → street curb network → Travel Times → OR Tools

Strategy to get data into the solver

  • OpenStreetMap cleaned → street curb network → LineGraph → OR Tools
  • OpenStreetMap cleaned → street curb network → Travel Times → OR Tools

Strategy to get data into the solver

  • OpenStreetMap cleaned → street curb network → LineGraph → OR Tools
  • OpenStreetMap cleaned → street curb network → Travel Times → OR Tools

All to all matrix

Almost a one-liner

  • pgRouting has an excellent function pgr_dijkstraCostMatrix()
  • Creates a matrix of distances in one pass

pgr_dijkstracostmatrix

pgr_dijkstraCostMatrix(edges_sql, start_vids)

pgr_dijkstracostmatrix

SELECT * FROM pgr_dijkstraCostMatrix(
    'SELECT id, source, target, cost, reverse_cost FROM edge_table',
    (SELECT array_agg(id) FROM edge_table_vertices_pgr WHERE id < 5)
);

pgr_dijkstracostmatrix

select * FROM pgr_dijkstraCostMatrix(
    'SELECT id, source, target, target_length_m as cost, reverse_cost FROM new_curbs_v2_linegraph',
    (select array_agg(source) from (select distinct source from new_curbs_v2_linegraph))
);

Problem: RAM limits

  • 9,194 curb segments means table with 84,529,636 entries
  • Process runs out of RAM!

Ugly hacks

  • Step through the curb table 3,000 at a time
  • Grab random bunches of under-represented origins
  • Rinse and repeat

First attempt at function

create or replace function fleshout_2000_curb_linegraph_matrix(starting int)
returns integer as
$BODY$
DECLARE
   i text;
   subsel text := 'SELECT id, source, target, target_length_m as cost, reverse_cost FROM new_curbs_v2_linegraph';
   insert_sql text := '';
   startid int := 0;
BEGIN
   insert_sql := '
   with
   id_block (sid) as (
      select distinct source
      from new_curbs_v2_linegraph nl
      where source > $1
      order by source
      limit 3000
      ),
    nextbatch as (
     select * FROM pgr_dijkstraCostMatrix('
   || quote_literal(subsel)
   ||',
      (select array_agg(sid) from id_block)))
   insert into new_curbs_linegraph_matrix
   select * from nextbatch
   on conflict do nothing';
   FOR startid IN starting..7000 by 1000 LOOP
       RAISE NOTICE 'populate db starting with %', startid;
       EXECUTE  insert_sql using startid;
   END LOOP;
   return  count(*) from new_curbs_linegraph_matrix;
END;
$BODY$
  LANGUAGE plpgsql;

Better approach: Sample from underrepresented

Sampling version of function

with
 sid_count (sid,cnt) as (
   select start_vid, count(*)
   from new_curbs_linegraph_matrix
   group by start_vid
   order by count),
 lo_block (sid) as (
   select sid from sid_count
   limit 500),
 pctl (hicount) as (
   SELECT percentile_cont(0.07) WITHIN GROUP (ORDER BY cnt) FROM sid_count ),
 hi_block (sid) as (
   select sid
   from sid_count
   cross join pctl
   where cnt > hicount
   order by random()
   limit 2500),
 id_block (sid) as (
   select sid from lo_block
   union
   select sid from hi_block),

The table is close enough

  • Each Origin should have 9193 destinations

with counts as (
   select start_vid,count(*) as cnt
   from new_curbs_linegraph_matrix group by start_vid)
select count(*),floor(cnt) from counts group by floor(cnt);
 count | floor
-------+-------
   201 |  9190
  2374 |  9191
  6607 |  9192
    11 |  9193
(4 rows)

Solve the Street Sweeping Problem

OR Tools to the rescue

  • OR Tools is great
  • But it isn’t PostgreSQL related
  • So I’ll talk about it some other time

Some benchmarks

  • My formulation takes about 20 minutes to generate an initial solution
  • Can run for hours
  • Difficult to get the “shape” of a solution right
  • Difficult to visualize the output

Save the generated paths

  • After solver finishes, generate a list of nodes “swept”
  • For deadhead nodes, use pgRouting to find intermediate nodes
    • Deadhead meaning drive without sweeping over several streets to get to a street that needs sweeping
  • Gather the list of all nodes each vehicle visits (sweep plus non-sweep)

Python code to save list of nodes to DB

def sequence_to_table(self,vsequence,table_name):
    sequence = 0
    insert_query_string = """insert into {} (veh,sweep,linkid,geom)
    select %s,%s,%s,c.curb_geom as the_geom
    from curbs_v2_graph c
    where c.curbid =%s"""
    insert_query = sql.SQL(insert_query_string).format(sql.Identifier(table_name))
    with self.conn.cursor() as cur:
        cur.execute(
            sql.SQL("drop table if exists {}").format(sql.Identifier(table_name)))

        cur.execute(
            sql.SQL("CREATE TABLE {} (id serial, linkid integer, veh integer, sweep boolean default TRUE, geom geometry(LINESTRING,4326) )").format(sql.Identifier(table_name)))

        for (veh,sequence) in vsequence.items():
            #print(sequence)
            for row in sequence:
                # basic idea.  Save a table.  Each row is
                # shape,sequence number
                sweep = row[1]
                linkid = row[0]
                cur.execute(insert_query,(veh,sweep,linkid,linkid))
        self.conn.commit()

Aside

  • Do not use Python string formatting to insert strings and variables into your generated SQL
  • Doing so is strongly discouraged by psycopg
  • Instead use sql.SQL, and pass parameters to execute
sql.SQL("drop table {}").format(sql.Identifier(table_name)))
...
cur.execute(insert_query,(veh,sweep,linkid,linkid))

Visualizing the output

QGIS plus PostGIS tables

  • The real reason I included geometry in output table
  • QGIS can directly display PostGIS geometry tables

Nice maps, but …

  • The maps are difficult to view
  • Routes are on top of each other
  • No sense of the movement of the vehicle

Try animating!

Use QGIS Atlas functionality

  • Image stack style animation
  • Make a print view
  • Control the print view with an “atlas”
  • Dump thousands of images to a directory
  • Use ffmpeg to stitch the images into a movie

Nausea-inducing results

Problem: inconsistent block sizes

  • Each “frame” of the animation is “the next block”
  • Blocks have different sizes
  • Inconsistent feet-per-animation-frame
  • Not a smooth animation

Use PostGIS to snip the roads

  • Break up the segments into pieces
  • Each frame adds a consistent distance to animation
  • Currently using 25 meters

ST_LineSubstring

ST_LineSubstring(geom, start_frac, end_frac)

  • geom: the linestring to process
  • start_frac: starting fraction for snipping the line (0 to 1)
  • end_frac: ending fraction for snipping the line (0 to 1)

Tip from the PostGIS docs

  • Use ST_LineSubstring to break line into N parts
  • Each part is from i to i+1, i = [0 .. N-1]
  • Use generate_series to generate the i values

PostGIS doc code:

SELECT field1, field2,
       ST_LineSubstring(the_geom, 100.00*n/len,
  CASE
    WHEN 100.00*(n+1) < len THEN 100.00*(n+1)/len
    ELSE 1
  END) AS the_geom
FROM
  (SELECT sometable.field1, sometable.field2,
  ST_LineMerge(sometable.the_geom) AS the_geom,
  ST_Length(sometable.the_geom) As len
  FROM sometable ) AS t
CROSS JOIN generate_series(0,10000) AS n
WHERE n*100.00/len < 1;

My modifications

  • Construct SQL with WITH statements
  • Compute required length of series based on longest road / 25 meters

Find the longest segment using one weird trick

with
 lengthshare as (
   select id,linkid,veh,sweep,geom,
          st_length(st_transform(geom,32611)) as len
   from solver_output
   order by id),
 maxlen as (
   select max(len) as len  from lengthshare ),

Determine “maxiter”


maxiter as (
   select (ceil(len/25.00)+1)::int as maxiter
   from maxlen
   )

Divide the longest length by 25, and round

Using maxiter, generate series


series as (
   select maxiter, generate_series(1,maxiter) - 1 as n
   from maxiter
   )

More flexible than the example code to determine maximum of series from data, rather than hard coding some big number.

Snip each line into pieces


snipped as (
   select id,
     id+(n/maxiter::numeric) as frame,
     linkid,veh,sweep,
     st_linesubstring(geom,
          25.00*n/len,
          case
            when 25.00*(n+1) < len then 25.00*(n+1)/len
            else 1
          end) as geom
   from lengthshare l
   cross join series s
   where s.n*25.00/len < 1
   order by frame )

Finally, save to new table


insert into
  solver_output_snipped (id,frame,linkid,veh,sweep,geom)
  select id,frame,linkid,veh,sweep,geom from snipped;

Small road section animation results

Bonus: Arrow heads!

  • Previous animation just showed current street
  • With snipped roads, can show progress along street (every 25m)
  • Arrow appears to move along streets
  • Looks more like a “real” animation

Still room for improvement

  • The frame jumps to keep vehicle (arrow) in center of frame
  • Okay when vehicle is moving straight
  • Ugly and jarring when vehicle changes direction

Key insight: need a POV table

  • Can model the point of view (POV) as its own table
  • POV table is mapped to its own QGIS layer
  • Keep the layer hidden
  • “Camera” movement should smoothly follow vehicle movement
  • The POV for each frame should be a spatial average

Make the POV table


with
  idgeom (uid, geom) as (
    select uid,
      st_collect(geom)
      over (order by uid
            rows between 2 preceding
                    and 10 following)
    from curbtable_globalspan_snipped  ),
  centroids (uid, geom) as (
    select uid,st_centroid(geom) as geom
    from idgeom   )
insert into pov (uid,geom)
select uid as uid,geom from centroids;

The result

  • A table of points for each frame
  • Point is the centroid of point window around current point
  • Centers the atlas window where needed

Leading average, small window: [-2, 10]

Bigger window: [-40, 50]

Even bigger window: [-90, 100]

Smoother animation with [-2, 10] window

Smoother animation with [-40, 50] window

Questions?

Thank you

Extras

FFPMEG

Sample ffmpeg command

ffmpeg -framerate 4 -i 'Maps/output_%04d.png'  -crf 25 -y Maps/out1.webm
-framerate 4
4 images per second
output_%04d.png
Input image file names are in numerical order

See https://ffmpeg.org/ffmpeg-formats.html#image2-1

How to Download OSM data

How to find OSM data for a city

Example: Port-au-Prince, Haiti

Download the Port-au-Prince polygon

  • At OSM website, search for Port-au-Prince
  • Click on boundary relation and note the relation number
  • Switch to the data API using the relation number to download the polygon

Search for “Port-au-Prince”

Search results, looking for “boundary”

Find relation id

Port-au-Prince boundary relation

Boundaries contain ways

Port-au-Prince boundary members

Access OSM data API

// reveal.js plugins