1781 Words

Reading time 9 min

Extracting a City Map

My last post went into a lot of detail on setting things up to solve edge covering problems. Despite the torrent of words I threw at the topic, I wanted to highlight the step of extracting map data from OpenStreetMap for a city. As with many of my posts, this one is motivated by a question on the OR-Tools mailing list. The original poster asked about generating walking routes to conduct a door-to-door survey of residents of Port-au-Prince, Haiti.

Finding OSM data

The first step to using OpenStreetMap (OSM) is to figure out how to find the data in the first place. The root source of course is the OSM API itself, which allows one to download the entire world map. As that is a very large file, it is better to access an area snapshot of the map from an alternative download server. One of my favorites is the Geofabrik download server, as it is regularly updated and well maintained. Using Port-au-Prince as an example, I first clicked on the Central America link, and then on the Haiti link. From there I downloaded the haiti-and-domrep-latest.osm.pbf OSM snapshot file. However, I’m interested in Port-au-Prince, not the entire island, and Geofabrik does not have handy outlines of cities available.

Unbelievable as it may seem, between preparing for the original Glendale proposal and in writing up the related blog post, I’d forgotten where I found the polygon for the city! (I swear all my tweets and blog posts are really just a way for my future self to remember things my past self already knows.) After a little bit of grousing and hair-pulling, I finally remembered.

The OSM API allows direct access to nodes, ways, and relations. Typically, a city is defined as a relation that associates ways and nodes with the city itself. One can find the relation by using the search box on the OSM map interface.

For example, to see the city of Port-au-Prince, you can type “Port-au-Prince” into the search bar. This is shown in the screenshot to the right. Read down the suggestions and look for one that says “boundary”. In this case, it is the second search result, the one that is highlighted in yellow. Clicking on that link will take you to the OSM page showing the boundary relation for the city, as shown in the screenshot below.

The Port-au-Prince boundary relation page will zoom to the city, showing the outline of the city, as well as the location of the administrative seat of the city. Most cities in OSM have this boundary relation, but as OSM is purely the work of volunteers, it may be the case that some smaller cities do not have a bounding polygon defined.

The boundary relation contains information on the ways and nodes that are members of the relation. OpenStreetMap is made up of nodes and ways, with relations being used to group together “related” nodes and ways. If you scroll down the left hand side data column, you can see the member relations, as shown in the third screenshot. What we need to do is download the boundary relation as well as its constituent nodes and ways.

Relations are defined by unique numbers. Those numbers can change however, so it is best to look up the city or area by name first, and then copy the relation number. In this case, the second screen shot shows the relation number identifying the Port-au-Prince boundary relation: 387318. The page also identifies the change that last touched the relation. In this case, it was three years ago.

In order to download the full relation, we need to change from the OSM map URL to the URL for the data API. The OSM API is defined and documented on the OSM API wiki page. Specifically, we want to use the 0.6 API version GET call to download the relation, as defined here.

However, there is a little trick that I found somewhere at some point in time but often forget. Specifically, if you follow the API instructions, the URL for the Port-au-Prince relation should be:

https://www.openstreetmap.org/api/0.6/relation/387318

However, if you download that URL, you won’t get what you want!

wget -O port-au-prince-poly.osm  https://www.openstreetmap.org/api/0.6/relation/387318
--2019-11-19 12:29:17--  https://www.openstreetmap.org/api/0.6/relation/387318
Resolving www.openstreetmap.org ...
Connecting to www.openstreetmap.org ...
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/xml]
Saving to: ‘port-au-prince-poly.osm’

port-au-prince-poly.osm   [ <=>         ]   6.31K  --.-KB/s    in 0.03s

2019-11-19 12:29:19 (214 KB/s) - ‘port-au-prince-poly.osm’ saved [6463]

$ cat port-au-prince-poly.osm
<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="CGImap 0.7.5 (28928 thorn-03.openstreetmap.org)" copyright="OpenStreetMap and contributors" attribution="http://www.openstreetmap.org/copyright" license="http://opendatacommons.org/licenses/odbl/1-0/">
 <relation id="387318" visible="true" version="42" changeset="43835965" timestamp="2016-11-21T02:03:47Z" user="nyuriks" uid="339581">
  <member type="way" ref="48436422" role="outer"/>
  <member type="way" ref="48800769" role="outer"/>
  <member type="way" ref="48800768" role="outer"/>
...

All you get is the list of things included in the relation, but not the things themselves. The trick is that the word “full” must be appended to the URL, in order to download the actual contents of the relation. Specifically, the URL should look like:

https://www.openstreetmap.org/api/0.6/relation/387318/full

This file can then be used in OSM-aware software tools as the boundary file defining Port-au-Prince, Haiti. You can download it using a web browser (save to a file), or else use wget, as in wget -O port-au-prince-poly.osm ....

So now I have both the area data file (for Haiti and Dominican Republic) and the outline definition for the city I am interested in (Port-au-Prince). The next step is to use the outline file to extract just the city’s data.

Extracting just a city’s data

The best tool to use these days for manipulating OSM data is Osmium. Because I want to load the city’s map data into PostgreSQL using pgRouting, I want the output file to be in OSM format, not in PBF format (which osm2pgrouting cannot decipher). The command is quite simple, using the -p option to specify the Port-au-Prince poly as the area to extract from the larger OSM file.

osmium extract -p port-au-prince-poly.osm \
    -o port-au-prince-latest.osm \
    haiti-and-domrep-latest.osm.pbf

There are many configurable alternatives for extracting data, specifying whether a way should overlap the poly, be completely contained within the poly, and so on. I prefer using the defaults, which should include all shapes that overlap the area of interest, and then I can use PostGIS to further refine the included shapes as needed. Consult the Osmium documentation for all of the possible options.

Loading into PostgreSQL

The next step is to load the data into PostgreSQL using osm2pgrouting. By using pgRouting’s tool to load the data, the streets will be automatically processed to convert the roads into a proper network. The best source to get more information is from the osm2pgrouting documentation.

osm2pgrouting --f data/port-au-prince-latest.osm \
              --conf data/map_config_streets.xml \
              --dbname portauprince \
              --prefix 'portauprince_' \
              --username dbuser \
              --clean

The one interesting feature here is the map configuration file, map_config_streets.xml. The default configuration will load all roads into the database. Alternate standard configurations exist for loading transit and walking type networks. In this case, I altered the default road loading configuration to exclude all highways. Originally I used it for work in Glendale, and only set it up after examining the types of roads that existed in that city. I wanted to exclude highways because the city did not have the responsibility of cleaning highway links. By omitting “motorway”, “motorway_link”, and “motorway_junction”, all of the freeway links in Glendale would be omitted. I also wanted to set a common “maxspeed” for all streets to 50 km/hr, but I ended up not using it.

My xml configuration file looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
  <!-- note maxspeed="50" means kph, not mph -->
  <tag_name name="highway" id="1">
    <!-- <tag_value name="motorway"          id="101" priority="1.0" maxspeed="130" /> -->
    <!-- <tag_value name="motorway_link"     id="102" priority="1.0" maxspeed="130" /> -->
    <!-- <tag_value name="motorway_junction" id="103" priority="1.0" maxspeed="130" /> -->
    <tag_value name="trunk"             id="104" priority="1.05" maxspeed="50" />
    <tag_value name="trunk_link"        id="105" priority="1.05" maxspeed="50" />
    <tag_value name="primary"           id="106" priority="1.15" maxspeed="50" />
    <tag_value name="primary_link"      id="107" priority="1.15" maxspeed="50" />
    <tag_value name="secondary"         id="108" priority="1.5" maxspeed="50" />
    <tag_value name="secondary_link"    id="109" priority="1.5" maxspeed="50"/>
    <tag_value name="tertiary"          id="110" priority="1.75" maxspeed="50" />
    <tag_value name="tertiary_link"     id="111" priority="1.75" maxspeed="50" />
    <tag_value name="residential"       id="112" priority="2.5" maxspeed="50" />
    <tag_value name="living_street"     id="113" priority="3" maxspeed="50" />
    <!-- <tag_value name="service"           id="114" priority="2.5" maxspeed="50" /> -->

    <tag_value name="unclassified"      id="117" priority="3" maxspeed="50"/>
    <tag_value name="road"              id="100" priority="5" maxspeed="50" />
  </tag_name>
</configuration>

Running the above command processes the data and loads it into the database. The larger the data set, the longer it will take to run. On my laptop running under Docker, the Port-au-Prince data file takes about 25 seconds to load. The output is as follows:

Execution starts at: Mon Dec  2 12:49:51 2019

***************************************************
           COMMAND LINE CONFIGURATION             *
***************************************************
Filename = data/port-au-prince-latest.osm
Configuration file = data/map_config_streets.xml
host = localhost
port = 5432
dbname = glendale
username = slash
schema=
prefix = portauprince_
suffix =
Drop tables
Don't create indexes
Don't add OSM nodes
***************************************************
Testing database connection: glendale
database connection successful: glendale
Connecting to the database
connection success

Dropping tables...
TABLE: portauprince_ways dropped ... OK.
TABLE: portauprince_ways_vertices_pgr dropped ... OK.
TABLE: portauprince_pointsofinterest dropped ... OK.
TABLE: configuration dropped ... OK.
TABLE: osm_nodes dropped ... OK.
TABLE: osm_ways dropped ... OK.
TABLE: osm_relations dropped ... OK.

Creating tables...
TABLE: portauprince_ways_vertices_pgr created ... OK.
TABLE: portauprince_ways created ... OK.
TABLE: portauprince_pointsofinterest created ... OK.
TABLE: configuration created ... OK.
Opening configuration file: data/map_config_streets.xml
    Parsing configuration

Exporting configuration ...
  - Done
Counting lines ...
  - Done
Opening data file: data/port-au-prince-latest.osm   total lines: 2725680
    Parsing data
    Finish Parsing data

Adding auxiliary tables to database...

Export Ways ...
    Processing 169321 ways:
[**|                ] (11%) Total processed: 20000  Vertices inserted: 15566    Split ways inserted 15581
[****|              ] (23%) Total processed: 40000  Vertices inserted: 2861 Split ways inserted 4235
[******|            ] (35%) Total processed: 60000  Vertices inserted: 2040 Split ways inserted 3743
[********|          ] (47%) Total processed: 80000  Vertices inserted: 122  Split ways inserted 274
[**********|        ] (59%) Total processed: 100000     Vertices inserted: 342  Split ways inserted 324
[************|      ] (70%) Total processed: 120000     Vertices inserted: 29   Split ways inserted 41
[**************|    ] (82%) Total processed: 140000     Vertices inserted: 10   Split ways inserted 21
[****************|  ] (94%) Total processed: 160000     Vertices inserted: 185  Split ways inserted 340
[******************|] (100%) Total processed: 169321    Vertices inserted: 296  Split ways inserted 495

Creating indexes ...

Processing Points of Interest ...
#########################
size of streets: 169321
Execution started at: Mon Dec  2 12:49:51 2019
Execution ended at:   Mon Dec  2 12:50:16 2019
Elapsed time: 25.153 Seconds.
User CPU time: -> 19.4725 seconds
#########################

And with that, the data for Port-au-Prince is loaded into PostgreSQL. The figure below is rendered using QGIS, dumping the portauprince_ways table straight to the screen, layered on top of standard OSM tiles. The line segments have been colored using the values of “priority” specified in the configuration XML file, above.

Port-au-Prince data, rendered using QGIS using portauprince_ways table, colored based on priority values in configuration xml file