Network Distance with LION map¶
We will attempt to use networkx library with the lion map to calculate network distance for people walking between two points - Download the zipfile - Extract and save the zipfile - Open the line shapefile - Extract nodes - Calculate network distance!
We will use the following libraries to achieve our purpose.
[1]:
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import networkx as nx
from shapely.geometry import box, LineString, Point,MultiPoint
import os
import sys
import requests
from zipfile import ZipFile as zzip
import fiona
from scipy.spatial import cKDTree
import numpy as np
sys.path.append(os.path.realpath('..'))
[2]:
os.getcwd()
[2]:
'H:\\Work\\GIS_exploration\\code'
[3]:
print(fiona.__version__)
1.8.6
Downloading and Extracting the NYC Lion file¶
This does not need to be run repeatedly
[4]:
url = r"https://www1.nyc.gov/assets/planning/download/zip/data-maps/open-data/nyclion_19b.zip"
# download the file contents in binary format
r = requests.get(url)
# open method to open a file on your system and write the contents
with open("../input_data/nyclion_19b.zip", "wb") as file:
file.write(r.content)
[7]:
fp = "../input_data/nyclion_19b.zip"
# opening the zip file in READ mode
with zzip(fp, 'r') as file:
# printing all the contents of the zip file
file.printdir()
# extracting all the files
#rint('Extracting all the files now...')
file.extractall("../input_data/")
print('Done!')
File Name Modified Size
lion/lion.gdb/a00000001.freelist 2019-05-13 12:48:56 4440
lion/lion.gdb/a00000001.gdbindexes 2019-05-13 12:42:56 110
lion/lion.gdb/a00000001.gdbtable 2019-05-13 12:48:56 385
lion/lion.gdb/a00000001.gdbtablx 2019-05-13 12:48:56 5152
lion/lion.gdb/a00000001.TablesByName.atx 2019-05-13 12:48:56 4118
lion/lion.gdb/a00000002.gdbtable 2019-05-13 12:42:56 2055
lion/lion.gdb/a00000002.gdbtablx 2019-05-13 12:42:56 5152
lion/lion.gdb/a00000003.gdbindexes 2019-05-13 12:42:56 42
lion/lion.gdb/a00000003.gdbtable 2019-05-13 12:44:22 1825
lion/lion.gdb/a00000003.gdbtablx 2019-05-13 12:44:22 5152
lion/lion.gdb/a00000004.CatItemsByPhysicalName.atx 2019-05-13 12:48:56 4118
lion/lion.gdb/a00000004.CatItemsByType.atx 2019-05-13 12:48:56 4118
lion/lion.gdb/a00000004.FDO_UUID.atx 2019-05-13 12:48:56 4118
lion/lion.gdb/a00000004.freelist 2019-05-13 12:49:38 57688
lion/lion.gdb/a00000004.gdbindexes 2019-05-13 12:42:56 310
lion/lion.gdb/a00000004.gdbtable 2019-05-13 12:49:38 399741
lion/lion.gdb/a00000004.gdbtablx 2019-05-13 12:49:38 5152
lion/lion.gdb/a00000004.spx 2019-05-13 12:49:30 45078
lion/lion.gdb/a00000005.CatItemTypesByName.atx 2019-05-13 12:42:56 12310
lion/lion.gdb/a00000005.CatItemTypesByParentTypeID.atx 2019-05-13 12:42:56 4118
lion/lion.gdb/a00000005.CatItemTypesByUUID.atx 2019-05-13 12:42:56 4118
lion/lion.gdb/a00000005.gdbindexes 2019-05-13 12:42:56 296
lion/lion.gdb/a00000005.gdbtable 2019-05-13 12:42:56 1803
lion/lion.gdb/a00000005.gdbtablx 2019-05-13 12:42:56 5152
lion/lion.gdb/a00000006.CatRelsByDestinationID.atx 2019-05-13 12:48:56 4118
lion/lion.gdb/a00000006.CatRelsByOriginID.atx 2019-05-13 12:48:56 4118
lion/lion.gdb/a00000006.CatRelsByType.atx 2019-05-13 12:48:56 4118
lion/lion.gdb/a00000006.FDO_UUID.atx 2019-05-13 12:48:56 4118
lion/lion.gdb/a00000006.freelist 2019-05-13 12:48:56 4440
lion/lion.gdb/a00000006.gdbindexes 2019-05-13 12:42:56 318
lion/lion.gdb/a00000006.gdbtable 2019-05-13 12:48:56 701
lion/lion.gdb/a00000006.gdbtablx 2019-05-13 12:48:56 5152
lion/lion.gdb/a00000007.CatRelTypesByBackwardLabel.atx 2019-05-13 12:42:56 4118
lion/lion.gdb/a00000007.CatRelTypesByDestItemTypeID.atx 2019-05-13 12:42:56 4118
lion/lion.gdb/a00000007.CatRelTypesByForwardLabel.atx 2019-05-13 12:42:56 4118
lion/lion.gdb/a00000007.CatRelTypesByName.atx 2019-05-13 12:42:56 4118
lion/lion.gdb/a00000007.CatRelTypesByOriginItemTypeID.atx 2019-05-13 12:42:56 4118
lion/lion.gdb/a00000007.CatRelTypesByUUID.atx 2019-05-13 12:42:56 4118
lion/lion.gdb/a00000007.gdbindexes 2019-05-13 12:42:56 602
lion/lion.gdb/a00000007.gdbtable 2019-05-13 12:42:56 2504
lion/lion.gdb/a00000007.gdbtablx 2019-05-13 12:42:56 5152
lion/lion.gdb/a0000000a.gdbindexes 2019-05-13 12:44:48 116
lion/lion.gdb/a0000000a.gdbtable 2019-05-13 12:44:48 3247250
lion/lion.gdb/a0000000a.gdbtablx 2019-05-13 12:44:48 665632
lion/lion.gdb/a0000000a.spx 2019-05-13 12:44:48 1609750
lion/lion.gdb/a0000000b.gdbindexes 2019-05-13 12:44:56 66
lion/lion.gdb/a0000000b.gdbtable 2019-05-13 12:45:24 5491953
lion/lion.gdb/a0000000b.gdbtablx 2019-05-13 12:45:24 1172512
lion/lion.gdb/a0000000c.gdbindexes 2019-05-13 12:45:30 66
lion/lion.gdb/a0000000c.gdbtable 2019-05-13 12:45:42 6406110
lion/lion.gdb/a0000000c.gdbtablx 2019-05-13 12:45:42 527392
lion/lion.gdb/a0000000d.freelist 2019-05-13 12:48:44 20824
lion/lion.gdb/a0000000d.gdbindexes 2019-05-13 12:48:30 116
lion/lion.gdb/a0000000d.gdbtable 2019-05-13 12:48:44 126137796
lion/lion.gdb/a0000000d.gdbtablx 2019-05-13 12:48:44 1136672
lion/lion.gdb/a0000000d.spx 2019-05-13 12:48:30 4362262
lion/lion.gdb/gdb 2019-05-13 12:42:56 4
lion/lion.gdb/timestamps 2019-05-13 12:47:14 400
lion/LION - Street Name Labels.lyr 2013-09-24 18:10:22 18432
lion/LION Streets - Generic.lyr 2017-02-10 07:51:36 21504
lion/LION Streets - Roadbeds.lyr 2017-02-10 07:54:38 21504
lion/LION - Generic.lyr 2017-02-10 07:48:08 24576
lion/LION - Roadbeds.lyr 2017-02-10 07:49:32 24576
lion/LION - Nodes.lyr 2012-08-17 13:19:50 11776
lion/LION - Street Direction Arrows (ArcGIS 10x).lyr 2013-01-15 10:58:06 22528
lion/LION - Street Direction Arrows (Requires Maplex).lyr 2013-03-12 14:22:44 22528
lion/TrafficRepArrows.style 2012-09-27 08:32:40 598016
lion/ReadMe.txt 2014-03-17 15:27:04 7758
altnames_metadata.html 2019-05-13 12:49:22 38976
altnames_metadata.pdf 2019-05-13 14:35:58 129541
lion_metadata.html 2019-05-13 12:49:12 199830
lion_metadata.pdf 2019-05-13 14:35:22 483768
node_metadata.html 2019-05-13 12:49:32 42396
node_metadata.pdf 2019-05-13 14:34:28 135027
node_stname_metadata.html 2019-05-13 12:49:40 32999
node_stname_metadata.pdf 2019-05-13 14:36:56 123835
Done!
Get layers from GDB
[8]:
gdb_file = r"../input_data/lion/lion.gdb"
layers = fiona.listlayers(gdb_file)
View the layers
[9]:
for layer in layers:
print(layer)
node
node_stname
altnames
lion
Open the lion layer¶
It was impossible to read in this layer with the old version of Fiona
. Please ensure that your Fiona
version is the most up to date.
[10]:
gdb_file = r"../input_data/lion/lion.gdb"
lion_gdf = gpd.read_file(gdb_file, driver='FileGDB', layer='lion')
[11]:
lion_gdf.crs
[11]:
{'init': 'epsg:2263'}
[12]:
lion_gdf = lion_gdf.to_crs({'init': 'epsg:4326'})
[14]:
lion_gdf.plot(figsize = (10,10));
[13]:
lion_gdf.head()
[13]:
Street | SAFStreetName | FeatureTyp | SegmentTyp | IncExFlag | RB_Layer | NonPed | TrafDir | TrafSrc | SpecAddr | ... | LHi_Hyphen | RLo_Hyphen | RHi_Hyphen | FromLeft | ToLeft | FromRight | ToRight | Join_ID | SHAPE_Length | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | EAST 168 STREET | 0 | U | B | T | DOT | ... | 699 | 596 | 716 | 599 | 699 | 596 | 716 | 2251001000000 | 396.030947 | (LINESTRING (-73.90346685871668 40.83035379646... | ||||
1 | WEST 192 STREET | 0 | U | B | A | DOT | ... | 98 | 63 | 99 | 58 | 98 | 63 | 99 | 2798401000000 | 279.360514 | (LINESTRING (-73.9012006771837 40.866613078249... | ||||
2 | UNION AVENUE | 0 | U | B | W | DOT | ... | 1079 | 1016 | 1084 | 1017 | 1079 | 1016 | 1084 | 2728001000000 | 618.327133 | (LINESTRING (-73.90117669122948 40.82438890917... | ||||
3 | UNION AVENUE | BEHAGEN PLAYGROUND | 0 | U | B | W | DOT | N | ... | None | None | None | 0 | 0 | 0 | 0 | 21279501000000N | 618.327133 | (LINESTRING (-73.90117669122948 40.82438890917... | ||
4 | DELAFIELD AVENUE | 6 | U | B | T | DOT | ... | 4645 | 4600 | 4664 | 4601 | 4645 | 4600 | 4664 | 2187601000000 | 670.281037 | (LINESTRING (-73.90695663477764 40.89360793519... |
5 rows × 118 columns
We need to clean the lion_gdf
a little so that non-pedestrians accessible routes are not included.
FeatureTyp <> 'F' AND FeatureTyp <> '9' AND FeatureTyp <> '1' AND FeatureTyp <> '7' AND FeatureTyp <> '3' AND TrafDir <> '' AND NonPed <> 'V'
Source: https://pointsunknown.nyc/tutorials/04_Q3_SpatialAnalysis.html
[14]:
lion_gdf['Street'].count()
[14]:
226977
[15]:
lion_gdf['todrop'] = (lion_gdf['NonPed'] == 'V')
[16]:
lion_gdf['todrop'].value_counts()
[16]:
False 201908
True 25069
Name: todrop, dtype: int64
[17]:
lion_gdf['todrop'] = lion_gdf['FeatureTyp'].isin(['F','9','1','7','3'])
[18]:
lion_gdf['todrop'].value_counts()
[18]:
False 202573
True 24404
Name: todrop, dtype: int64
[19]:
lion_gdf['TrafDir'].value_counts()
[19]:
T 96024
W 44976
A 40322
34499
P 11156
Name: TrafDir, dtype: int64
[20]:
lion_gdf['todrop'] = lion_gdf['TrafDir'].isin([' '])
[21]:
lion_gdf['todrop'].value_counts()
[21]:
False 192478
True 34499
Name: todrop, dtype: int64
[22]:
lion_gdf['todrop'] = (lion_gdf['NonPed'] == 'V') |
(lion_gdf['FeatureTyp'].isin(['F','9','1','7','3'])) |
(lion_gdf['TrafDir'].isin([' ']))
[23]:
lion_gdf['todrop'].value_counts()
[23]:
False 166475
True 60502
Name: todrop, dtype: int64
Save the clean version in a new dataframe
[24]:
clean_lion_gdf = gpd.GeoDataFrame(lion_gdf.loc[lion_gdf['todrop'] == False])
Checking that those observations were dropped
[25]:
(clean_lion_gdf['NonPed'] == 'V').value_counts()
[25]:
False 166475
Name: NonPed, dtype: int64
[26]:
clean_lion_gdf['FeatureTyp'].isin(['F','9','1','7','3']).value_counts()
[26]:
False 166475
Name: FeatureTyp, dtype: int64
[27]:
clean_lion_gdf['TrafDir'].isin([' ']).value_counts()
[27]:
False 166475
Name: TrafDir, dtype: int64
[28]:
clean_lion_gdf.drop(['todrop'], axis = 1, inplace = True)
[113]:
clean_lion_gdf.plot(figsize=(10,10))
[113]:
<matplotlib.axes._subplots.AxesSubplot at 0x23137539a90>
Save the lion layer as a shapefile¶
[29]:
fp = r"../intermediate_data/clean_lion_line.shp"
clean_lion_gdf.to_file(fp)
Test case¶
We will use two points that are across from each other, but has a highway between them. The network analysis route should move along the road until it find a road that can cross under the highway.
To see the points:
Destination point: https://www.google.com/maps/place/40%C2%B044’13.9%22N+73%C2%B051’50.6%22W/@40.737199,-73.8662467,17z/data=!3m1!4b1!4m5!3m4!1s0x0:0x0!8m2!3d40.737199!4d-73.864058
Origin point: https://www.google.com/maps/place/40%C2%B043’56.9%22N+73%C2%B051’42.0%22W/@40.732484,-73.8638547,17z/data=!3m1!4b1!4m5!3m4!1s0x0:0x0!8m2!3d40.732484!4d-73.861666
[31]:
fp = r"../intermediate_data/clean_lion_line.shp"
lion_graph = nx.read_shp(fp)
[32]:
lion_graph
[32]:
<networkx.classes.digraph.DiGraph at 0x1fbe3e6bf60>
This is the key to calculating the right network distance. The graph must be undirected for pedestrian distance.
[33]:
undirected_lion_graph = lion_graph.to_undirected()
[34]:
undirected_lion_graph
[34]:
<networkx.classes.graph.Graph at 0x1fc0f464400>
Writing the Graph to shapefile creates the nodes and edges
[35]:
fp = r"../intermediate_data"
nx.write_shp(lion_graph, fp)
Read the node
as geopandas dataframes
[36]:
fp = r"../intermediate_data/nodes.shp"
node_gdf = gpd.read_file(fp)
[37]:
node_gdf.head()
[37]:
FID | geometry | |
---|---|---|
0 | 0 | POINT (-73.90346685871668 40.83035379646685) |
1 | 1 | POINT (-73.9023800533195 40.82964662740077) |
2 | 2 | POINT (-73.9012006771837 40.86661307824997) |
3 | 3 | POINT (-73.90207336386543 40.86699910556131) |
4 | 4 | POINT (-73.90117669122948 40.82438890917424) |
[38]:
node_gdf.crs = {'init': 'epsg:4326'}
[41]:
node_gdf.plot(figsize=(10,10),markersize=1);
[42]:
target_xy = (-73.864058, 40.737199)
orig_xy = (-73.861666,40.732484)
We place the geometry of the nodes
into the an numpy array and put the array into scipy’s cKDTree. This turns the “index into a set of k-dimensional points which can be used to rapidly look up the nearest neighbors of any point.” Super fast nearest-neighbor matching.
[43]:
nB = np.array(list(zip(node_gdf.geometry.x, node_gdf.geometry.y)) )
btree = cKDTree(nB)
Here we are finding the nearest node to our target_xy.
[44]:
dist, target_node = btree.query(target_xy,k=1)
print(dist, target_node)
3.7438436323677245e-05 66053
The distance measure above will be important for us to calculate the entire distance (distance from origin_xy to orgin_node + network distance + distance from target node to target_xy).
Lets view the node and the geometry of that node.
[45]:
node_gdf['geometry'].loc[node_gdf['FID'] == 66053]
[45]:
66053 POINT (-73.86404739259665 40.73716309568955)
Name: geometry, dtype: object
Repeat for the origin node.
[46]:
dist, orig_node = btree.query(orig_xy,k=1)
print(dist, orig_node)
2.6730270855641812e-05 76256
[47]:
node_gdf['geometry'].loc[node_gdf['FID'] == 76256]
[47]:
76256 POINT (-73.86164235281636 40.73247153733231)
Name: geometry, dtype: object
View the contents in the graph
[67]:
list(undirected_lion_graph.nodes)[0:10]
[67]:
[(-73.90346685871668, 40.83035379646685),
(-73.9023800533195, 40.82964662740077),
(-73.9012006771837, 40.86661307824997),
(-73.90207336386543, 40.866999105561305),
(-73.90117669122948, 40.82438890917424),
(-73.90050540472349, 40.826007622001576),
(-73.90695663477764, 40.89360793519608),
(-73.90695595783123, 40.89544764345081),
(-73.90706510075327, 40.89927414148254),
(-73.9071219824245, 40.89929111645273)]
In the undirected_lion_graph
, the labels of the nodes are by default the coordinates. We will change it to intergers for ease of finding them.
[68]:
undirected_lion_graph = nx.convert_node_labels_to_integers(undirected_lion_graph, first_label = 0)
[69]:
list(undirected_lion_graph.nodes)[0:10]
[69]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Checking that undirected_lion_graph
has the nodes from above.
[70]:
#target node
undirected_lion_graph.has_node(target_node)
[70]:
True
[71]:
#origin node
undirected_lion_graph.has_node(orig_node)
[71]:
True
Network analysis!¶
[72]:
route = nx.shortest_path(G=undirected_lion_graph, source=orig_node, target=target_node, weight = None)
Capture the route nodes and turn it into a LineString
[73]:
route_nodes = node_gdf.loc[route]
[74]:
route_line = LineString(list(route_nodes.geometry.values))
Make a geodataframe to store the data
[75]:
route_geom = gpd.GeoDataFrame(crs=node_gdf.crs)
route_geom['geometry'] = None
route_geom['id'] = None
Add the information into the geodataframe
[76]:
route_geom.loc[0, 'geometry'] = route_line
[77]:
route_geom.loc[0, 'id'] = str(list(route_nodes['FID'].values))
Now is the time to convert the data into epsg 2263
.
[78]:
route_geom.crs
[78]:
{'init': 'epsg:4326'}
[79]:
route_geom = route_geom.to_crs({'init': 'epsg:2263'})
Calculate network distance of the route!
[80]:
route_geom['length_ft'] = route_geom.length
route_geom['length_miles'] = route_geom['length_ft'] *0.000189394
[81]:
route_geom.head()
[81]:
geometry | id | length_ft | length_miles | |
---|---|---|---|---|
0 | LINESTRING (1022595.221231206 206165.993610297... | [76256, 58177, 66289, 76111, 72542, 72541, 813... | 2893.217327 | 0.547958 |
[82]:
route_geom.plot(figsize = (10,10));
[83]:
fp = r"..\intermediate_data\lion_test_route_shapefile.shp"
route_geom[['id','geometry']].to_file(fp)