Network Distance with LION map

We will attempt to use networkx library with the lion map to calculate network distance for people walking between two points - Download the zipfile - Extract and save the zipfile - Open the line shapefile - Extract nodes - Calculate network distance!

We will use the following libraries to achieve our purpose.

[1]:
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import networkx as nx
from shapely.geometry import box, LineString, Point,MultiPoint
import os
import sys
import requests
from zipfile import ZipFile as zzip
import fiona
from scipy.spatial import cKDTree
import numpy as np

sys.path.append(os.path.realpath('..'))
[2]:
os.getcwd()
[2]:
'H:\\Work\\GIS_exploration\\code'
[3]:
print(fiona.__version__)
1.8.6

Downloading and Extracting the NYC Lion file

  • This does not need to be run repeatedly

[4]:
url = r"https://www1.nyc.gov/assets/planning/download/zip/data-maps/open-data/nyclion_19b.zip"

# download the file contents in binary format
r = requests.get(url)
# open method to open a file on your system and write the contents
with open("../input_data/nyclion_19b.zip", "wb") as file:
    file.write(r.content)
[7]:
fp = "../input_data/nyclion_19b.zip"
# opening the zip file in READ mode
with zzip(fp, 'r') as file:
    # printing all the contents of the zip file
    file.printdir()

    # extracting all the files
    #rint('Extracting all the files now...')
    file.extractall("../input_data/")
    print('Done!')
File Name                                             Modified             Size
lion/lion.gdb/a00000001.freelist               2019-05-13 12:48:56         4440
lion/lion.gdb/a00000001.gdbindexes             2019-05-13 12:42:56          110
lion/lion.gdb/a00000001.gdbtable               2019-05-13 12:48:56          385
lion/lion.gdb/a00000001.gdbtablx               2019-05-13 12:48:56         5152
lion/lion.gdb/a00000001.TablesByName.atx       2019-05-13 12:48:56         4118
lion/lion.gdb/a00000002.gdbtable               2019-05-13 12:42:56         2055
lion/lion.gdb/a00000002.gdbtablx               2019-05-13 12:42:56         5152
lion/lion.gdb/a00000003.gdbindexes             2019-05-13 12:42:56           42
lion/lion.gdb/a00000003.gdbtable               2019-05-13 12:44:22         1825
lion/lion.gdb/a00000003.gdbtablx               2019-05-13 12:44:22         5152
lion/lion.gdb/a00000004.CatItemsByPhysicalName.atx 2019-05-13 12:48:56         4118
lion/lion.gdb/a00000004.CatItemsByType.atx     2019-05-13 12:48:56         4118
lion/lion.gdb/a00000004.FDO_UUID.atx           2019-05-13 12:48:56         4118
lion/lion.gdb/a00000004.freelist               2019-05-13 12:49:38        57688
lion/lion.gdb/a00000004.gdbindexes             2019-05-13 12:42:56          310
lion/lion.gdb/a00000004.gdbtable               2019-05-13 12:49:38       399741
lion/lion.gdb/a00000004.gdbtablx               2019-05-13 12:49:38         5152
lion/lion.gdb/a00000004.spx                    2019-05-13 12:49:30        45078
lion/lion.gdb/a00000005.CatItemTypesByName.atx 2019-05-13 12:42:56        12310
lion/lion.gdb/a00000005.CatItemTypesByParentTypeID.atx 2019-05-13 12:42:56         4118
lion/lion.gdb/a00000005.CatItemTypesByUUID.atx 2019-05-13 12:42:56         4118
lion/lion.gdb/a00000005.gdbindexes             2019-05-13 12:42:56          296
lion/lion.gdb/a00000005.gdbtable               2019-05-13 12:42:56         1803
lion/lion.gdb/a00000005.gdbtablx               2019-05-13 12:42:56         5152
lion/lion.gdb/a00000006.CatRelsByDestinationID.atx 2019-05-13 12:48:56         4118
lion/lion.gdb/a00000006.CatRelsByOriginID.atx  2019-05-13 12:48:56         4118
lion/lion.gdb/a00000006.CatRelsByType.atx      2019-05-13 12:48:56         4118
lion/lion.gdb/a00000006.FDO_UUID.atx           2019-05-13 12:48:56         4118
lion/lion.gdb/a00000006.freelist               2019-05-13 12:48:56         4440
lion/lion.gdb/a00000006.gdbindexes             2019-05-13 12:42:56          318
lion/lion.gdb/a00000006.gdbtable               2019-05-13 12:48:56          701
lion/lion.gdb/a00000006.gdbtablx               2019-05-13 12:48:56         5152
lion/lion.gdb/a00000007.CatRelTypesByBackwardLabel.atx 2019-05-13 12:42:56         4118
lion/lion.gdb/a00000007.CatRelTypesByDestItemTypeID.atx 2019-05-13 12:42:56         4118
lion/lion.gdb/a00000007.CatRelTypesByForwardLabel.atx 2019-05-13 12:42:56         4118
lion/lion.gdb/a00000007.CatRelTypesByName.atx  2019-05-13 12:42:56         4118
lion/lion.gdb/a00000007.CatRelTypesByOriginItemTypeID.atx 2019-05-13 12:42:56         4118
lion/lion.gdb/a00000007.CatRelTypesByUUID.atx  2019-05-13 12:42:56         4118
lion/lion.gdb/a00000007.gdbindexes             2019-05-13 12:42:56          602
lion/lion.gdb/a00000007.gdbtable               2019-05-13 12:42:56         2504
lion/lion.gdb/a00000007.gdbtablx               2019-05-13 12:42:56         5152
lion/lion.gdb/a0000000a.gdbindexes             2019-05-13 12:44:48          116
lion/lion.gdb/a0000000a.gdbtable               2019-05-13 12:44:48      3247250
lion/lion.gdb/a0000000a.gdbtablx               2019-05-13 12:44:48       665632
lion/lion.gdb/a0000000a.spx                    2019-05-13 12:44:48      1609750
lion/lion.gdb/a0000000b.gdbindexes             2019-05-13 12:44:56           66
lion/lion.gdb/a0000000b.gdbtable               2019-05-13 12:45:24      5491953
lion/lion.gdb/a0000000b.gdbtablx               2019-05-13 12:45:24      1172512
lion/lion.gdb/a0000000c.gdbindexes             2019-05-13 12:45:30           66
lion/lion.gdb/a0000000c.gdbtable               2019-05-13 12:45:42      6406110
lion/lion.gdb/a0000000c.gdbtablx               2019-05-13 12:45:42       527392
lion/lion.gdb/a0000000d.freelist               2019-05-13 12:48:44        20824
lion/lion.gdb/a0000000d.gdbindexes             2019-05-13 12:48:30          116
lion/lion.gdb/a0000000d.gdbtable               2019-05-13 12:48:44    126137796
lion/lion.gdb/a0000000d.gdbtablx               2019-05-13 12:48:44      1136672
lion/lion.gdb/a0000000d.spx                    2019-05-13 12:48:30      4362262
lion/lion.gdb/gdb                              2019-05-13 12:42:56            4
lion/lion.gdb/timestamps                       2019-05-13 12:47:14          400
lion/LION - Street Name Labels.lyr             2013-09-24 18:10:22        18432
lion/LION Streets - Generic.lyr                2017-02-10 07:51:36        21504
lion/LION Streets - Roadbeds.lyr               2017-02-10 07:54:38        21504
lion/LION - Generic.lyr                        2017-02-10 07:48:08        24576
lion/LION - Roadbeds.lyr                       2017-02-10 07:49:32        24576
lion/LION - Nodes.lyr                          2012-08-17 13:19:50        11776
lion/LION - Street Direction Arrows (ArcGIS 10x).lyr 2013-01-15 10:58:06        22528
lion/LION - Street Direction Arrows (Requires Maplex).lyr 2013-03-12 14:22:44        22528
lion/TrafficRepArrows.style                    2012-09-27 08:32:40       598016
lion/ReadMe.txt                                2014-03-17 15:27:04         7758
altnames_metadata.html                         2019-05-13 12:49:22        38976
altnames_metadata.pdf                          2019-05-13 14:35:58       129541
lion_metadata.html                             2019-05-13 12:49:12       199830
lion_metadata.pdf                              2019-05-13 14:35:22       483768
node_metadata.html                             2019-05-13 12:49:32        42396
node_metadata.pdf                              2019-05-13 14:34:28       135027
node_stname_metadata.html                      2019-05-13 12:49:40        32999
node_stname_metadata.pdf                       2019-05-13 14:36:56       123835
Done!

Get layers from GDB

[8]:
gdb_file = r"../input_data/lion/lion.gdb"
layers = fiona.listlayers(gdb_file)

View the layers

[9]:
for layer in layers:
    print(layer)
node
node_stname
altnames
lion

Open the lion layer

It was impossible to read in this layer with the old version of Fiona. Please ensure that your Fiona version is the most up to date.

[10]:
gdb_file = r"../input_data/lion/lion.gdb"
lion_gdf = gpd.read_file(gdb_file, driver='FileGDB', layer='lion')
[11]:
lion_gdf.crs
[11]:
{'init': 'epsg:2263'}
[12]:
lion_gdf = lion_gdf.to_crs({'init': 'epsg:4326'})
[14]:
lion_gdf.plot(figsize = (10,10));
_images/networkdistance_lion_16_0.png
[13]:
lion_gdf.head()
[13]:
Street SAFStreetName FeatureTyp SegmentTyp IncExFlag RB_Layer NonPed TrafDir TrafSrc SpecAddr ... LHi_Hyphen RLo_Hyphen RHi_Hyphen FromLeft ToLeft FromRight ToRight Join_ID SHAPE_Length geometry
0 EAST 168 STREET 0 U B T DOT ... 699 596 716 599 699 596 716 2251001000000 396.030947 (LINESTRING (-73.90346685871668 40.83035379646...
1 WEST 192 STREET 0 U B A DOT ... 98 63 99 58 98 63 99 2798401000000 279.360514 (LINESTRING (-73.9012006771837 40.866613078249...
2 UNION AVENUE 0 U B W DOT ... 1079 1016 1084 1017 1079 1016 1084 2728001000000 618.327133 (LINESTRING (-73.90117669122948 40.82438890917...
3 UNION AVENUE BEHAGEN PLAYGROUND 0 U B W DOT N ... None None None 0 0 0 0 21279501000000N 618.327133 (LINESTRING (-73.90117669122948 40.82438890917...
4 DELAFIELD AVENUE 6 U B T DOT ... 4645 4600 4664 4601 4645 4600 4664 2187601000000 670.281037 (LINESTRING (-73.90695663477764 40.89360793519...

5 rows × 118 columns

We need to clean the lion_gdf a little so that non-pedestrians accessible routes are not included.

FeatureTyp <> 'F' AND FeatureTyp <> '9' AND FeatureTyp <> '1' AND FeatureTyp <> '7' AND FeatureTyp <> '3' AND TrafDir <> '' AND NonPed <> 'V'

Source: https://pointsunknown.nyc/tutorials/04_Q3_SpatialAnalysis.html

[14]:
lion_gdf['Street'].count()
[14]:
226977
[15]:
lion_gdf['todrop'] = (lion_gdf['NonPed'] == 'V')
[16]:
lion_gdf['todrop'].value_counts()
[16]:
False    201908
True      25069
Name: todrop, dtype: int64
[17]:
lion_gdf['todrop'] = lion_gdf['FeatureTyp'].isin(['F','9','1','7','3'])
[18]:
lion_gdf['todrop'].value_counts()
[18]:
False    202573
True      24404
Name: todrop, dtype: int64
[19]:
lion_gdf['TrafDir'].value_counts()
[19]:
T    96024
W    44976
A    40322
     34499
P    11156
Name: TrafDir, dtype: int64
[20]:
lion_gdf['todrop'] = lion_gdf['TrafDir'].isin([' '])
[21]:
lion_gdf['todrop'].value_counts()
[21]:
False    192478
True      34499
Name: todrop, dtype: int64
[22]:
lion_gdf['todrop'] = (lion_gdf['NonPed'] == 'V') |
                    (lion_gdf['FeatureTyp'].isin(['F','9','1','7','3'])) |
                    (lion_gdf['TrafDir'].isin([' ']))
[23]:
lion_gdf['todrop'].value_counts()
[23]:
False    166475
True      60502
Name: todrop, dtype: int64

Save the clean version in a new dataframe

[24]:
clean_lion_gdf = gpd.GeoDataFrame(lion_gdf.loc[lion_gdf['todrop'] == False])

Checking that those observations were dropped

[25]:
(clean_lion_gdf['NonPed'] == 'V').value_counts()
[25]:
False    166475
Name: NonPed, dtype: int64
[26]:
clean_lion_gdf['FeatureTyp'].isin(['F','9','1','7','3']).value_counts()
[26]:
False    166475
Name: FeatureTyp, dtype: int64
[27]:
clean_lion_gdf['TrafDir'].isin([' ']).value_counts()
[27]:
False    166475
Name: TrafDir, dtype: int64
[28]:
clean_lion_gdf.drop(['todrop'], axis = 1, inplace = True)
[113]:
clean_lion_gdf.plot(figsize=(10,10))
[113]:
<matplotlib.axes._subplots.AxesSubplot at 0x23137539a90>
_images/networkdistance_lion_36_1.png

Save the lion layer as a shapefile

[29]:
fp = r"../intermediate_data/clean_lion_line.shp"
clean_lion_gdf.to_file(fp)

Test case

We will use two points that are across from each other, but has a highway between them. The network analysis route should move along the road until it find a road that can cross under the highway.

To see the points:

Destination point: https://www.google.com/maps/place/40%C2%B044’13.9%22N+73%C2%B051’50.6%22W/@40.737199,-73.8662467,17z/data=!3m1!4b1!4m5!3m4!1s0x0:0x0!8m2!3d40.737199!4d-73.864058

Origin point: https://www.google.com/maps/place/40%C2%B043’56.9%22N+73%C2%B051’42.0%22W/@40.732484,-73.8638547,17z/data=!3m1!4b1!4m5!3m4!1s0x0:0x0!8m2!3d40.732484!4d-73.861666

[31]:
fp = r"../intermediate_data/clean_lion_line.shp"
lion_graph = nx.read_shp(fp)
[32]:
lion_graph
[32]:
<networkx.classes.digraph.DiGraph at 0x1fbe3e6bf60>

This is the key to calculating the right network distance. The graph must be undirected for pedestrian distance.

[33]:
undirected_lion_graph = lion_graph.to_undirected()
[34]:
undirected_lion_graph
[34]:
<networkx.classes.graph.Graph at 0x1fc0f464400>

Writing the Graph to shapefile creates the nodes and edges

[35]:
fp = r"../intermediate_data"
nx.write_shp(lion_graph, fp)

Read the node as geopandas dataframes

[36]:
fp = r"../intermediate_data/nodes.shp"
node_gdf = gpd.read_file(fp)
[37]:
node_gdf.head()
[37]:
FID geometry
0 0 POINT (-73.90346685871668 40.83035379646685)
1 1 POINT (-73.9023800533195 40.82964662740077)
2 2 POINT (-73.9012006771837 40.86661307824997)
3 3 POINT (-73.90207336386543 40.86699910556131)
4 4 POINT (-73.90117669122948 40.82438890917424)
[38]:
node_gdf.crs = {'init': 'epsg:4326'}
[41]:
node_gdf.plot(figsize=(10,10),markersize=1);
_images/networkdistance_lion_51_0.png
[42]:
target_xy = (-73.864058, 40.737199)
orig_xy = (-73.861666,40.732484)

We place the geometry of the nodes into the an numpy array and put the array into scipy’s cKDTree. This turns the “index into a set of k-dimensional points which can be used to rapidly look up the nearest neighbors of any point.” Super fast nearest-neighbor matching.

[43]:
nB = np.array(list(zip(node_gdf.geometry.x, node_gdf.geometry.y)) )
btree = cKDTree(nB)

Here we are finding the nearest node to our target_xy.

[44]:
dist, target_node = btree.query(target_xy,k=1)
print(dist, target_node)
3.7438436323677245e-05 66053

The distance measure above will be important for us to calculate the entire distance (distance from origin_xy to orgin_node + network distance + distance from target node to target_xy).

Lets view the node and the geometry of that node.

[45]:
node_gdf['geometry'].loc[node_gdf['FID'] == 66053]
[45]:
66053    POINT (-73.86404739259665 40.73716309568955)
Name: geometry, dtype: object

Repeat for the origin node.

[46]:
dist, orig_node = btree.query(orig_xy,k=1)
print(dist, orig_node)
2.6730270855641812e-05 76256
[47]:
node_gdf['geometry'].loc[node_gdf['FID'] == 76256]
[47]:
76256    POINT (-73.86164235281636 40.73247153733231)
Name: geometry, dtype: object

View the contents in the graph

[67]:
list(undirected_lion_graph.nodes)[0:10]
[67]:
[(-73.90346685871668, 40.83035379646685),
 (-73.9023800533195, 40.82964662740077),
 (-73.9012006771837, 40.86661307824997),
 (-73.90207336386543, 40.866999105561305),
 (-73.90117669122948, 40.82438890917424),
 (-73.90050540472349, 40.826007622001576),
 (-73.90695663477764, 40.89360793519608),
 (-73.90695595783123, 40.89544764345081),
 (-73.90706510075327, 40.89927414148254),
 (-73.9071219824245, 40.89929111645273)]

In the undirected_lion_graph, the labels of the nodes are by default the coordinates. We will change it to intergers for ease of finding them.

[68]:
undirected_lion_graph = nx.convert_node_labels_to_integers(undirected_lion_graph, first_label = 0)
[69]:
list(undirected_lion_graph.nodes)[0:10]
[69]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Checking that undirected_lion_graph has the nodes from above.

[70]:
#target node
undirected_lion_graph.has_node(target_node)
[70]:
True
[71]:
#origin node
undirected_lion_graph.has_node(orig_node)
[71]:
True

Network analysis!

[72]:
route = nx.shortest_path(G=undirected_lion_graph, source=orig_node, target=target_node, weight = None)

Capture the route nodes and turn it into a LineString

[73]:
route_nodes = node_gdf.loc[route]
[74]:
route_line = LineString(list(route_nodes.geometry.values))

Make a geodataframe to store the data

[75]:
route_geom = gpd.GeoDataFrame(crs=node_gdf.crs)
route_geom['geometry'] = None
route_geom['id'] = None

Add the information into the geodataframe

[76]:
route_geom.loc[0, 'geometry'] = route_line
[77]:
route_geom.loc[0, 'id'] = str(list(route_nodes['FID'].values))

Now is the time to convert the data into epsg 2263.

[78]:
route_geom.crs
[78]:
{'init': 'epsg:4326'}
[79]:
route_geom = route_geom.to_crs({'init': 'epsg:2263'})

Calculate network distance of the route!

[80]:
route_geom['length_ft'] = route_geom.length
route_geom['length_miles'] = route_geom['length_ft'] *0.000189394
[81]:
route_geom.head()
[81]:
geometry id length_ft length_miles
0 LINESTRING (1022595.221231206 206165.993610297... [76256, 58177, 66289, 76111, 72542, 72541, 813... 2893.217327 0.547958
[82]:
route_geom.plot(figsize = (10,10));
_images/networkdistance_lion_87_0.png
[83]:
fp =  r"..\intermediate_data\lion_test_route_shapefile.shp"
route_geom[['id','geometry']].to_file(fp)