An Evaluation of the Impact of Leading Pedestrian Interval Signals in NYC¶
By Jeremy J. Sze
Submitted in partial fulfillment of the requirements for the degree of Master of Arts in Economics, Hunter College The City University of New York 2019
Link to thesis: https://academicworks.cuny.edu/hc_sas_etds/432/
Abstract¶
I evaluated the impact of the phased introduction of Leading Pedestrian Interval Signals (LPIs) on collision and injury outcomes at 12,987 signalized traffic intersections in New York City over the course of 25 quarters from 2012 to 2018. An intersection is treated when a LPIs is installed to give pedestrians lead time to cross the street before vehicles are allowed to move. Outcomes from NYPD’s Motor Vehicle Collisions data were matched to signalized intersections. I hypothesize that LPIs would reduce collisions and reduce injuries for pedestrians at intersections. A difference in difference fixed effects panel regression was used to identify the causal effect of introducing LPIs. This approach accounts for the problem that unobserved heterogeneity that might bias results in simpler regression approaches. The analysis suggests that the introduction of LPIs decreased quarterly collision counts by 5.45% and decreased the quarterly number of pedestrians injured by 14.7% over the same intervention period. LPIs appears to be effective in reducing both collisions and injuries.
Acknowledgements¶
Throughout the analysis and writing of my thesis I had received a great deal of support and assistance. I would like to thank my thesis advisor, Professor Jonathan Conning, who helped me develop my data processing pipeline and analytical plan. Next, I would like to thank Professor Partha Deb and Professor Matthew Baker for their valuable time and input during the model development stage and Professor Randall Filer for his guidance in the bi-weekly thesis workshops. In addition, I would like to thank my sister, Jocelyne Sze and friend Dennis Kim for proofreading my thesis drafts. Finally, I would like to thank my wife, Maria Eugenia Brandão for her wise counsel and support as we discussed ideas for my thesis.
Contents¶
Signal intersection (stata)
LPIs (python)
Linking Signal intersection to other data (python)
NYPD Motor Vehicle Collision data
download (stata)
clean (stata)
link to signal intersection (python)
Calculate collision outcomes (stata)
Set up panel data
monthly analytical panel (stata)
quarterly analytical panel (stata)
Convert analytical panel data into shapefile (python)
Thiessen Polygons (python)
Non-spatial Analysis (stata)
Spatial Analysis (stata)
1. Signal intersection¶
Stata
Explore duplicates
Create intersection IDS
About “signal_controllers.csv”¶
This was obtained through a request to the DOT Commissioner.
[1]:
cd "..\input_data\DOT_traffic_signals_Oct_2018\"
C:\Users\jerem\Box Sync\Policy Evaluation\input_data\DOT_traffic_signals_Oct_2018
[2]:
import delimited using "signal_controllers.csv",clear stringcols(_all)
(6 vars, 13,278 obs)
[4]:
describe
Contains data
obs: 13,278
vars: 7
size: 2,376,762
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
y str22 %22s Y
x str22 %22s X
st1_name str31 %31s ST1_Name
st2_name str32 %32s ST2_Name
st3_name str35 %35s ST3_Name
st4_name str36 %36s ST4_Name
dup byte %12.0g
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sorted by:
Note: Dataset has changed since last saved.
[3]:
duplicates tag y x, gen(dup)
tab dup
Duplicates in terms of y x
dup | Freq. Percent Cum.
------------+-----------------------------------
0 | 13,156 99.08 99.08
1 | 112 0.84 99.92
2 | 3 0.02 99.95
6 | 7 0.05 100.00
------------+-----------------------------------
Total | 13,278 100.00
[5]:
duplicates drop y x, force
Duplicates in terms of y x
(64 observations deleted)
[7]:
gen intersection_id = _n, before(y)
Check to remember why we dropped intersection_id 2799
[8]:
// Nonsensical coordinates
drop if intersection_id == 2799
(1 observation deleted)
[ ]:
save "signal_controllers_clean.dta"
2. Leading Pedestrian Interval Signals (LPIs)¶
Python
Explore the LPIs data using python’s Geopandas library
Create IDs
Save into Stata dta
[1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
from scipy.spatial import cKDTree
from shapely.geometry import Point, MultiPoint
import pysal as ps
import libpysal
from libpysal.cg.voronoi import voronoi, voronoi_frames
from pysal.contrib.viz import mapping as map
from pylab import *
from pysal.contrib.viz import folium_mapping as fm
import geojson as gj
import seaborn as sns
import mplleaflet as mpll
%matplotlib inline
import os
os.environ["PROJ_LIB"] = "C:\ProgramData\Anaconda3\Library\share" #window
C:\Users\jerem\.conda\envs\geo2\lib\site-packages\pysal\__init__.py:65: VisibleDeprecationWarning: PySAL's API will be changed on 2018-12-31. The last release made with this API is version 1.14.4. A preview of the next API version is provided in the `pysal` 2.0 prelease candidate. The API changes and a guide on how to change imports is provided at https://migrating.pysal.org
), VisibleDeprecationWarning)
[10]:
cd
C:\Users\jerem
Opening the Vision Zero Leading Pedestrian Interval Signals shapefile
[19]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\VZV_Leading Pedestrian Interval Signals\geo_export_0c63b43f-83c0-4834-aa91-2c564c1bff2c.shp"
[20]:
lpis_df = gpd.read_file(fp)
[21]:
lpis_df.head(3)
[21]:
cross_stre | date_insta | time_insta | lat | long | main_stree | geometry | |
---|---|---|---|---|---|---|---|
0 | Dreiser loop East | 2018-08-01 | 00:00:00.000 | 40.878465 | -73.828273 | Co-op City Blvd | POINT (-73.82827338235253 40.87846542795222) |
1 | West 119 St | 2018-08-08 | 00:00:00.000 | 40.803958 | -73.948271 | Lenox Avenue | POINT (-73.94827133431397 40.80395846158952) |
2 | West 120 St | 2018-08-08 | 00:00:00.000 | 40.804587 | -73.947812 | Lenox Avenue | POINT (-73.94781210658503 40.80458679730486) |
[22]:
lpis_df.plot(figsize=(15, 15));
[22]:
<matplotlib.axes._subplots.AxesSubplot at 0x21e3d70c780>
First, we have to check the CRS (coordinate reference system) of the shapefile. This is important because all the other shapefiles have to be on the same projection in order to do spatial joins and identify nearest neighbors. For more information on projections, you can watch this short video and read this document from Earthdatascience.org.
To see the crs of lpis_df, we use geodataframe.crs
[23]:
lpis_df.crs
[23]:
{'init': 'epsg:4326'}
EPSG 4326 is the coordinate system for the world. It is also known as WGS84. When you have a longitude and latitude from this CRS, you can copy and paste it into Google Maps to find that location.
Since we are working with New York City only, we prefer to use a projection that more accurately protrays the shape of NYC. We will reproject lpis_df to EPSG 2263 (NAD83 / New York Long Island (ftUS)).
[24]:
lpis_df = lpis_df.to_crs({'init': 'epsg:2263'})
We can see that lpis_df now has the right projection.
[25]:
lpis_df.crs
[25]:
{'init': 'epsg:2263'}
[26]:
lpis_df.plot(figsize=(15, 15));
[26]:
<matplotlib.axes._subplots.AxesSubplot at 0x21e3e917128>
If you look at both plots again, you will notice that the axes labels are different. This is due to the different CRS.
I create IDs in the dataframe, since lpis_df has no ID variables.
[27]:
lpis_df['LPIS_ID'] = lpis_df.index
lpis_df.head(3)
[27]:
cross_stre | date_insta | time_insta | lat | long | main_stree | geometry | LPIS_ID | |
---|---|---|---|---|---|---|---|---|
0 | Dreiser loop East | 2018-08-01 | 00:00:00.000 | 40.878465 | -73.828273 | Co-op City Blvd | POINT (1031739.000154228 259373.000004255) | 0 |
1 | West 119 St | 2018-08-08 | 00:00:00.000 | 40.803958 | -73.948271 | Lenox Avenue | POINT (998570.9998893011 232184.9999184268) | 1 |
2 | West 120 St | 2018-08-08 | 00:00:00.000 | 40.804587 | -73.947812 | Lenox Avenue | POINT (998697.9999633889 232414.0001165908) | 2 |
Next, I want to export the dataframe to a Stata dta file. Objects need to be converted to str.
[24]:
# Might be unnecessary
#str_cols = list(lpis_df.select_dtypes(include=['object']).columns)
#for col in str_cols:
# lpis_df[col] = lpis_df[col].astype(str)
[ ]:
#fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\working_data\VZV_LPIS_data.dta"
#lpis_df.to_stata(fp)
3. Linking Signal intersection to other data¶
python
uses scipy’s cKDTree
connects each data to the nearest signal intersection
Define a function to obtain IDs of nearest neighbors and distance measure.
[28]:
# ckdnearest function
# from "https://gist.github.com/jhconning/63a34a51acff83d116adc52308faf240"
def ckdnearest(gdA, gdB, bcol):
"""
This function takes geodataframes: `gdA` and `gdB` and
a column name `bcol`. Both dataframes are assumed to have a `geometry` column.
It finds the nearest neighbor from each location in `gdA` to a
nearest neighbor in `gdB`.
It returns a two-column pandas dataframe with a 'distance' (here rounded to nearest foot)
and the value of the `bcol` in `gdB' (e.g. 'school_name')
"""
nA = np.array(list(zip(gdA.geometry.x, gdA.geometry.y)) )
nB = np.array(list(zip(gdB.geometry.x, gdB.geometry.y)) )
btree = cKDTree(nB)
dist, idx = btree.query(nA,k=1)
df = pd.DataFrame.from_dict({'distance': dist.astype(int),
'bcol' : gdB.loc[idx, bcol].values })
return df
3.a. Signal Intersections¶
Read the cleaned Signal intersection Stata dta file
[3]:
# Open signal intersection stata dta file
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\DOT_traffic_signals_Oct_2018\signal_controllers_clean.dta"
[4]:
sig_inters_df = pd.read_stata(fp)
[5]:
sig_inters_df.head(3)
[5]:
intersection_id | y | x | st1_name | st2_name | st3_name | st4_name | dup | |
---|---|---|---|---|---|---|---|---|
0 | 1.0 | 199793.609300002 | 986336.149 | ALLEN STREET | CANAL STREET | 0 | ||
1 | 2.0 | 202206.161899999 | 982769.331 | AVENUE OF THE AMERICAS | LAIGHT STREET | CANAL STREET | 0 | |
2 | 3.0 | 201790.0942 | 982805.618399993 | AVENUE OF THE AMERICAS | LISPENARD STREET | WEST BROADWAY | 0 |
[8]:
sig_inters_df['y'].dtypes
[8]:
dtype('O')
[9]:
sig_inters_df['x'].dtypes
[9]:
dtype('O')
Convert the y and x coordinates into float
[11]:
# Convert coordinates into float
sig_inters_df['y'] = sig_inters_df['y'].astype(float)
sig_inters_df['x'] = sig_inters_df['x'].astype(float)
Create Coordinates by combining x and y together with zip into a list
[14]:
sig_inters_df['Coordinates'] = list(zip(sig_inters_df.x, sig_inters_df.y))
Turn Coordinates into shapely Points
[16]:
sig_inters_df['Coordinates'] = sig_inters_df['Coordinates'].apply(Point)
Create Geopandas dataframe
[17]:
sig_inters_gdf = gpd.GeoDataFrame(sig_inters_df, geometry='Coordinates')
3.b. Leading Pedestrian Signal Interval¶
[29]:
lpis_df.head(3)
[29]:
cross_stre | date_insta | time_insta | lat | long | main_stree | geometry | LPIS_ID | |
---|---|---|---|---|---|---|---|---|
0 | Dreiser loop East | 2018-08-01 | 00:00:00.000 | 40.878465 | -73.828273 | Co-op City Blvd | POINT (1031739.000154228 259373.000004255) | 0 |
1 | West 119 St | 2018-08-08 | 00:00:00.000 | 40.803958 | -73.948271 | Lenox Avenue | POINT (998570.9998893011 232184.9999184268) | 1 |
2 | West 120 St | 2018-08-08 | 00:00:00.000 | 40.804587 | -73.947812 | Lenox Avenue | POINT (998697.9999633889 232414.0001165908) | 2 |
For each LPIs, we want to find the closest signalized intersection
[30]:
sig_inters_gdf[['distance_to_LPIS','nearest_LPIS']] = ckdnearest(sig_inters_gdf, lpis_df,'LPIS_ID')
[31]:
sig_inters_gdf.head(3)
[31]:
intersection_id | y | x | st1_name | st2_name | st3_name | st4_name | dup | Coordinates | distance_to_LPIS | nearest_LPIS | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 199793.6093 | 986336.1490 | ALLEN STREET | CANAL STREET | 0 | POINT (986336.149 199793.609300002) | 276 | 1915 | ||
1 | 2.0 | 202206.1619 | 982769.3310 | AVENUE OF THE AMERICAS | LAIGHT STREET | CANAL STREET | 0 | POINT (982769.331 202206.161899999) | 592 | 2016 | |
2 | 3.0 | 201790.0942 | 982805.6184 | AVENUE OF THE AMERICAS | LISPENARD STREET | WEST BROADWAY | 0 | POINT (982805.6183999931 201790.0942) | 413 | 2016 |
See the number of observations in sig_inters_gdf
[32]:
sig_inters_gdf['intersection_id'].count()
[32]:
13213
We want to merge into sig_inters_gdf additional information from lpis_df
[33]:
sig_inters_gdf = sig_inters_gdf.merge(lpis_df[['date_insta','LPIS_ID']], how='left', left_on='nearest_LPIS', right_on='LPIS_ID', validate ="m:1")
[34]:
sig_inters_gdf.head(3)
[34]:
intersection_id | y | x | st1_name | st2_name | st3_name | st4_name | dup | Coordinates | distance_to_LPIS | nearest_LPIS | date_insta | LPIS_ID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 199793.6093 | 986336.1490 | ALLEN STREET | CANAL STREET | 0 | POINT (986336.149 199793.609300002) | 276 | 1915 | 2015-10-16 | 1915 | ||
1 | 2.0 | 202206.1619 | 982769.3310 | AVENUE OF THE AMERICAS | LAIGHT STREET | CANAL STREET | 0 | POINT (982769.331 202206.161899999) | 592 | 2016 | 1997-08-07 | 2016 | |
2 | 3.0 | 201790.0942 | 982805.6184 | AVENUE OF THE AMERICAS | LISPENARD STREET | WEST BROADWAY | 0 | POINT (982805.6183999931 201790.0942) | 413 | 2016 | 1997-08-07 | 2016 |
Remove the additional variable LPIS_ID
[35]:
sig_inters_gdf.drop(['LPIS_ID'], axis=1, inplace=True)
3.c. School shapefile¶
source: https://www.baruch.cuny.edu/confluence/display/geoportal/NYC+Geodatabase
[36]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\school_private_public_2263\school_private_public_2263.shp"
[37]:
school_gdf = gpd.read_file(fp)
[38]:
school_gdf.head(3)
[38]:
uid | idagency | facname | opname | address | city | zipcode | bcode | facsubgrp | factype | capacity | util | xcoord | ycoord | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 192.0 | 321000145390 | Our Lady Of Refuge School | Our Lady Of Refuge School | 2708 Briggs Avenue | Bronx | 10458 | 36005 | Non-Public K-12 Schools | Elementary School - Non-public | NaN | 267.0 | 1.014359e+06 | 254977.5176 | POINT (1014358.6833 254977.5176) |
1 | 294.0 | 332000226225 | Yeshiva Toldos Yesuscher | Yeshiva Toldos Yesuscher | 1531 63 Street | Brooklyn | 11219 | 36047 | Non-Public K-12 Schools | Elementary School - Non-public | NaN | 74.0 | 9.857114e+05 | 166689.5140 | POINT (985711.4182 166689.514) |
2 | 2233.0 | 331400225670 | Ohel Elozer | Ohel Elozer | 263 Classon Ave-Ste 4b | Brooklyn | 11205 | 36047 | Non-Public K-12 Schools | High School - Non-public | NaN | 161.0 | 9.951838e+05 | 191390.1394 | POINT (995183.844 191390.1394) |
[39]:
school_gdf.crs
[39]:
{'proj': 'lcc',
'lat_1': 41.03333333333333,
'lat_2': 40.66666666666666,
'lat_0': 40.16666666666666,
'lon_0': -74,
'x_0': 300000.0000000001,
'y_0': 0,
'datum': 'NAD83',
'units': 'us-ft',
'no_defs': True}
For each school, we want to find the closest signalized intersection
[40]:
sig_inters_gdf[['distance_to_Sch','nearest_Sch']] = ckdnearest(sig_inters_gdf, school_gdf,'uid')
[41]:
sig_inters_gdf.head(3)
[41]:
intersection_id | y | x | st1_name | st2_name | st3_name | st4_name | dup | Coordinates | distance_to_LPIS | nearest_LPIS | date_insta | distance_to_Sch | nearest_Sch | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 199793.6093 | 986336.1490 | ALLEN STREET | CANAL STREET | 0 | POINT (986336.149 199793.609300002) | 276 | 1915 | 2015-10-16 | 529 | 3433.0 | ||
1 | 2.0 | 202206.1619 | 982769.3310 | AVENUE OF THE AMERICAS | LAIGHT STREET | CANAL STREET | 0 | POINT (982769.331 202206.161899999) | 592 | 2016 | 1997-08-07 | 795 | 38150.0 | |
2 | 3.0 | 201790.0942 | 982805.6184 | AVENUE OF THE AMERICAS | LISPENARD STREET | WEST BROADWAY | 0 | POINT (982805.6183999931 201790.0942) | 413 | 2016 | 1997-08-07 | 635 | 38150.0 |
3.d. Vision Zero Left Turn Calming intervention¶
[42]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\left_turn_traffic_calming_shapefile\left_turn_traffic_calming.shp"
[43]:
ltc_gdf = gpd.read_file(fp)
[44]:
ltc_gdf.head(3)
[44]:
treatment_ | completion | geometry | |
---|---|---|---|
0 | Quick Kurb to X-walk | 2017-12-08 | POINT (-73.97604825017703 40.75144139179798) |
1 | Daylighting, Box markings, (1) 6' Rubber speed... | 2018-04-12 | POINT (-73.82577379991478 40.75635805716421) |
2 | Quick Kurb to X-walk | 2016-06-30 | POINT (-73.97719688637932 40.7643236906894) |
[46]:
ltc_gdf.crs
[46]:
{'init': 'epsg:4326'}
We have to reproject the geodataframe to the right CRS (EPSG 2263: NAD83 / New York Long Island (ftUS))
[47]:
ltc_gdf = ltc_gdf.to_crs({'init': 'epsg:2263'})
Create ID variable for ltc_gdf
[48]:
ltc_gdf['LTC_ID'] = ltc_gdf.index
[49]:
ltc_gdf.head(3)
[49]:
treatment_ | completion | geometry | LTC_ID | |
---|---|---|---|---|
0 | Quick Kurb to X-walk | 2017-12-08 | POINT (990886.2363875993 213047.939427546) | 0 |
1 | Daylighting, Box markings, (1) 6' Rubber speed... | 2018-04-12 | POINT (1032518.714146682 214886.3347008746) | 1 |
2 | Quick Kurb to X-walk | 2016-06-30 | POINT (990566.7655698524 217741.2961825352) | 2 |
For each left turn calming intervention, we want to find the closest signalized intersection
[50]:
sig_inters_gdf[['distance_to_LTC','nearest_LTC']] = ckdnearest(sig_inters_gdf, ltc_gdf,'LTC_ID')
[51]:
sig_inters_gdf.head(3)
[51]:
intersection_id | y | x | st1_name | st2_name | st3_name | st4_name | dup | Coordinates | distance_to_LPIS | nearest_LPIS | date_insta | distance_to_Sch | nearest_Sch | distance_to_LTC | nearest_LTC | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 199793.6093 | 986336.1490 | ALLEN STREET | CANAL STREET | 0 | POINT (986336.149 199793.609300002) | 276 | 1915 | 2015-10-16 | 529 | 3433.0 | 1183 | 191 | ||
1 | 2.0 | 202206.1619 | 982769.3310 | AVENUE OF THE AMERICAS | LAIGHT STREET | CANAL STREET | 0 | POINT (982769.331 202206.161899999) | 592 | 2016 | 1997-08-07 | 795 | 38150.0 | 411 | 77 | |
2 | 3.0 | 201790.0942 | 982805.6184 | AVENUE OF THE AMERICAS | LISPENARD STREET | WEST BROADWAY | 0 | POINT (982805.6183999931 201790.0942) | 413 | 2016 | 1997-08-07 | 635 | 38150.0 | 790 | 77 |
We want to merge into sig_inters_gdf additional information from ltc_gdf
[52]:
sig_inters_gdf = sig_inters_gdf.merge(ltc_gdf[['treatment_', 'completion', 'LTC_ID']], how='left', left_on='nearest_LTC', right_on='LTC_ID', validate ="m:1")
[53]:
sig_inters_gdf.head(3)
[53]:
intersection_id | y | x | st1_name | st2_name | st3_name | st4_name | dup | Coordinates | distance_to_LPIS | nearest_LPIS | date_insta | distance_to_Sch | nearest_Sch | distance_to_LTC | nearest_LTC | treatment_ | completion | LTC_ID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 199793.6093 | 986336.1490 | ALLEN STREET | CANAL STREET | 0 | POINT (986336.149 199793.609300002) | 276 | 1915 | 2015-10-16 | 529 | 3433.0 | 1183 | 191 | Daylighting, Box markings, Pegatracks, Delinea... | 2017-12-08 | 191 | ||
1 | 2.0 | 202206.1619 | 982769.3310 | AVENUE OF THE AMERICAS | LAIGHT STREET | CANAL STREET | 0 | POINT (982769.331 202206.161899999) | 592 | 2016 | 1997-08-07 | 795 | 38150.0 | 411 | 77 | Quick Kurb to x-walk | 2017-12-08 | 77 | |
2 | 3.0 | 201790.0942 | 982805.6184 | AVENUE OF THE AMERICAS | LISPENARD STREET | WEST BROADWAY | 0 | POINT (982805.6183999931 201790.0942) | 413 | 2016 | 1997-08-07 | 635 | 38150.0 | 790 | 77 | Quick Kurb to x-walk | 2017-12-08 | 77 |
[55]:
sig_inters_gdf.drop(['LTC_ID'], axis=1, inplace=True)
3.e. Vision Zero Street Improvement¶
[57]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\street_improvement_proj_intersections_point\street_improvement_proj_intersections_point.shp"
[58]:
sip_gdf = gpd.read_file(fp)
[59]:
sip_gdf.head(3)
[59]:
Pjct_Name | SIP_YR | DateComplt | SIPProjTyp | geometry | |
---|---|---|---|---|---|
0 | Jackson Avenue at Pulaski Bridge | 2009 | 2009-07-31 | - | POINT (997734.4838460496 210053.6281589993) |
1 | 12th Ave at W. 135th St | 2009 | 2009-05-09 | Traffic Network Chng - Pedestrian Safety | POINT (995894.2742718621 238501.8740837589) |
2 | Flatbush Ave at Church Ave | 2009 | 2009-09-25 | Pedestrian Safety - School Safety | POINT (995708.027925191 176232.7026909116) |
[60]:
sip_gdf.crs
[60]:
{'proj': 'lcc',
'lat_1': 41.03333333333333,
'lat_2': 40.66666666666666,
'lat_0': 40.16666666666666,
'lon_0': -74,
'x_0': 300000.0000000001,
'y_0': 0,
'datum': 'NAD83',
'units': 'us-ft',
'no_defs': True}
We have to reproject the geodataframe to the right CRS (EPSG 2263: NAD83 / New York Long Island (ftUS))
[63]:
sip_gdf = sip_gdf.to_crs({'init': 'epsg:2263'})
[64]:
sip_gdf.crs
[64]:
{'init': 'epsg:2263'}
Create ID variable for sip_gdf
[65]:
sip_gdf['StImpro_ID'] = sip_gdf.index
[66]:
sip_gdf.head(3)
[66]:
Pjct_Name | SIP_YR | DateComplt | SIPProjTyp | geometry | StImpro_ID | |
---|---|---|---|---|---|---|
0 | Jackson Avenue at Pulaski Bridge | 2009 | 2009-07-31 | - | POINT (997734.4838460496 210053.6281589342) | 0 |
1 | 12th Ave at W. 135th St | 2009 | 2009-05-09 | Traffic Network Chng - Pedestrian Safety | POINT (995894.2742718621 238501.8740837032) | 1 |
2 | Flatbush Ave at Church Ave | 2009 | 2009-09-25 | Pedestrian Safety - School Safety | POINT (995708.027925191 176232.7026908512) | 2 |
For each street improvement project, we want to find the closest signalized intersection
[67]:
sig_inters_gdf[['distance_to_StImpro','nearest_StImpro']] = ckdnearest(sig_inters_gdf, sip_gdf,'StImpro_ID')
We want to merge into sig_inters_gdf additional information from sip_gdf
[68]:
sig_inters_gdf = sig_inters_gdf.merge(sip_gdf[['DateComplt', 'SIPProjTyp', 'StImpro_ID']],
how='left',
left_on='nearest_StImpro',
right_on='StImpro_ID',
validate ="m:1")
[69]:
sig_inters_gdf.head(3)
[69]:
intersection_id | y | x | st1_name | st2_name | st3_name | st4_name | dup | Coordinates | distance_to_LPIS | ... | nearest_Sch | distance_to_LTC | nearest_LTC | treatment_ | completion | distance_to_StImpro | nearest_StImpro | DateComplt | SIPProjTyp | StImpro_ID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 199793.6093 | 986336.1490 | ALLEN STREET | CANAL STREET | 0 | POINT (986336.149 199793.609300002) | 276 | ... | 3433.0 | 1183 | 191 | Daylighting, Box markings, Pegatracks, Delinea... | 2017-12-08 | 272 | 4 | 2009-10-31 | - | 4 | ||
1 | 2.0 | 202206.1619 | 982769.3310 | AVENUE OF THE AMERICAS | LAIGHT STREET | CANAL STREET | 0 | POINT (982769.331 202206.161899999) | 592 | ... | 38150.0 | 411 | 77 | Quick Kurb to x-walk | 2017-12-08 | 2101 | 245 | 2016-07-23 | VZ Priority Geo - Traffic Calming | 245 | |
2 | 3.0 | 201790.0942 | 982805.6184 | AVENUE OF THE AMERICAS | LISPENARD STREET | WEST BROADWAY | 0 | POINT (982805.6183999931 201790.0942) | 413 | ... | 38150.0 | 790 | 77 | Quick Kurb to x-walk | 2017-12-08 | 2098 | 245 | 2016-07-23 | VZ Priority Geo - Traffic Calming | 245 |
3 rows × 23 columns
[70]:
sig_inters_gdf.drop(['StImpro_ID'], axis=1, inplace=True)
3.f. Bike Routes¶
[71]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\bike_routes_points_2263\bike_routes_points_2263.shp"
[72]:
br_gdf = gpd.read_file(fp)
[73]:
br_gdf.head(3)
[73]:
tf_facilit | comments | bikedir | ft_facilit | objectid_1 | allclasses | date_instd | time_instd | lanecount | segment_id | boro | street | date_modda | time_modda | tostreet | fromstreet | onoffst | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Protected Path | None | L | None | 1.0 | I | None | 00:00:00.000 | 1.0 | 33547 | 1.0 | 9 AV | None | 00:00:00.000 | W 31 ST | W 16 ST | ON | POINT (984139.5802614246 211708.657997195) |
1 | Protected Path | None | L | None | 1.0 | I | None | 00:00:00.000 | 1.0 | 33547 | 1.0 | 9 AV | None | 00:00:00.000 | W 31 ST | W 16 ST | ON | POINT (984140.0661334711 211709.5320271455) |
2 | Protected Path | None | L | None | 1.0 | I | None | 00:00:00.000 | 1.0 | 33547 | 1.0 | 9 AV | None | 00:00:00.000 | W 31 ST | W 16 ST | ON | POINT (984140.5520055175 211710.4060570961) |
[74]:
br_gdf.crs
[74]:
{'proj': 'lcc',
'lat_1': 41.03333333333333,
'lat_2': 40.66666666666666,
'lat_0': 40.16666666666666,
'lon_0': -74,
'x_0': 300000.0000000001,
'y_0': 0,
'datum': 'NAD83',
'units': 'us-ft',
'no_defs': True}
[75]:
br_gdf = br_gdf.to_crs({'init': 'epsg:2263'})
[76]:
sig_inters_gdf[['distance_to_bikeroute','nearest_bikeroute']] = ckdnearest(sig_inters_gdf, br_gdf,'objectid_1')
[77]:
sig_inters_gdf.head(3)
[77]:
intersection_id | y | x | st1_name | st2_name | st3_name | st4_name | dup | Coordinates | distance_to_LPIS | ... | distance_to_LTC | nearest_LTC | treatment_ | completion | distance_to_StImpro | nearest_StImpro | DateComplt | SIPProjTyp | distance_to_bikeroute | nearest_bikeroute | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 199793.6093 | 986336.1490 | ALLEN STREET | CANAL STREET | 0 | POINT (986336.149 199793.609300002) | 276 | ... | 1183 | 191 | Daylighting, Box markings, Pegatracks, Delinea... | 2017-12-08 | 272 | 4 | 2009-10-31 | - | 3 | 8553.0 | ||
1 | 2.0 | 202206.1619 | 982769.3310 | AVENUE OF THE AMERICAS | LAIGHT STREET | CANAL STREET | 0 | POINT (982769.331 202206.161899999) | 592 | ... | 411 | 77 | Quick Kurb to x-walk | 2017-12-08 | 2101 | 245 | 2016-07-23 | VZ Priority Geo - Traffic Calming | 230 | 1778.0 | |
2 | 3.0 | 201790.0942 | 982805.6184 | AVENUE OF THE AMERICAS | LISPENARD STREET | WEST BROADWAY | 0 | POINT (982805.6183999931 201790.0942) | 413 | ... | 790 | 77 | Quick Kurb to x-walk | 2017-12-08 | 2098 | 245 | 2016-07-23 | VZ Priority Geo - Traffic Calming | 1 | 1769.0 |
3 rows × 24 columns
3.g. Borough¶
[78]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\nyc_boroughs_2263\nyc_boroughs_2263.shp"
[79]:
borough_gdf = gpd.read_file(fp)
[81]:
borough_gdf.head()
[81]:
bcode | bname | name | namelsad | geometry | |
---|---|---|---|---|---|
0 | 36005 | Bronx | Bronx | Bronx County | (POLYGON ((1008982.068976385 272752.8735210547... |
1 | 36047 | Brooklyn | Kings | Kings County | (POLYGON ((978869.3811487257 186863.7807399245... |
2 | 36061 | Manhattan | New York | New York County | (POLYGON ((1007701.483091666 258286.8905491272... |
3 | 36081 | Queens | Queens | Queens County | (POLYGON ((1026830.772887008 155435.7100568501... |
4 | 36085 | Staten Island | Richmond | Richmond County | (POLYGON ((930721.1281812892 156627.9162643671... |
[82]:
borough_gdf.crs = sig_inters_gdf.crs
[83]:
sig_inters_gdf = gpd.sjoin(sig_inters_gdf,borough_gdf[['bname', 'geometry']], how='left', op='intersects')
[84]:
sig_inters_gdf.drop('index_right', axis=1, inplace=True)
[85]:
sig_inters_gdf.head(3)
[85]:
intersection_id | y | x | st1_name | st2_name | st3_name | st4_name | dup | Coordinates | distance_to_LPIS | ... | nearest_LTC | treatment_ | completion | distance_to_StImpro | nearest_StImpro | DateComplt | SIPProjTyp | distance_to_bikeroute | nearest_bikeroute | bname | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 199793.6093 | 986336.1490 | ALLEN STREET | CANAL STREET | 0 | POINT (986336.149 199793.609300002) | 276 | ... | 191 | Daylighting, Box markings, Pegatracks, Delinea... | 2017-12-08 | 272 | 4 | 2009-10-31 | - | 3 | 8553.0 | Manhattan | ||
1 | 2.0 | 202206.1619 | 982769.3310 | AVENUE OF THE AMERICAS | LAIGHT STREET | CANAL STREET | 0 | POINT (982769.331 202206.161899999) | 592 | ... | 77 | Quick Kurb to x-walk | 2017-12-08 | 2101 | 245 | 2016-07-23 | VZ Priority Geo - Traffic Calming | 230 | 1778.0 | Manhattan | |
2 | 3.0 | 201790.0942 | 982805.6184 | AVENUE OF THE AMERICAS | LISPENARD STREET | WEST BROADWAY | 0 | POINT (982805.6183999931 201790.0942) | 413 | ... | 77 | Quick Kurb to x-walk | 2017-12-08 | 2098 | 245 | 2016-07-23 | VZ Priority Geo - Traffic Calming | 1 | 1769.0 | Manhattan |
3 rows × 25 columns
3.h. Priority Intersections¶
[86]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\vz_priority_intersections_shapefile\vz_priority_intersections_shapefile\vz_priority_intersections.shp"
[87]:
pi_gdf = gpd.read_file(fp)
[88]:
pi_gdf.crs
[88]:
{'init': 'epsg:4326'}
[89]:
pi_gdf = pi_gdf.to_crs({'init': 'epsg:2263'})
[90]:
pi_gdf['priority_inters_ID'] = pi_gdf.index
[91]:
sig_inters_gdf[['distance_to_priorityinters','nearest_priorityinters']] = ckdnearest(sig_inters_gdf, pi_gdf,'priority_inters_ID')
[92]:
sig_inters_gdf.head(3)
[92]:
intersection_id | y | x | st1_name | st2_name | st3_name | st4_name | dup | Coordinates | distance_to_LPIS | ... | completion | distance_to_StImpro | nearest_StImpro | DateComplt | SIPProjTyp | distance_to_bikeroute | nearest_bikeroute | bname | distance_to_priorityinters | nearest_priorityinters | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 199793.6093 | 986336.1490 | ALLEN STREET | CANAL STREET | 0 | POINT (986336.149 199793.609300002) | 276 | ... | 2017-12-08 | 272 | 4 | 2009-10-31 | - | 3 | 8553.0 | Manhattan | 7 | 245 | ||
1 | 2.0 | 202206.1619 | 982769.3310 | AVENUE OF THE AMERICAS | LAIGHT STREET | CANAL STREET | 0 | POINT (982769.331 202206.161899999) | 592 | ... | 2017-12-08 | 2101 | 245 | 2016-07-23 | VZ Priority Geo - Traffic Calming | 230 | 1778.0 | Manhattan | 2289 | 191 | |
2 | 3.0 | 201790.0942 | 982805.6184 | AVENUE OF THE AMERICAS | LISPENARD STREET | WEST BROADWAY | 0 | POINT (982805.6183999931 201790.0942) | 413 | ... | 2017-12-08 | 2098 | 245 | 2016-07-23 | VZ Priority Geo - Traffic Calming | 1 | 1769.0 | Manhattan | 2331 | 191 |
3 rows × 27 columns
3.i. Safe Streets for Seniors¶
[93]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\safe_streets_for_seniors_shapefile\safe_streets_for_seniors.shp"
[94]:
seniors_gdf = gpd.read_file(fp)
[95]:
seniors_gdf.crs
[95]:
{'init': 'epsg:4326'}
[96]:
seniors_gdf = seniors_gdf.to_crs({'init': 'epsg:2263'})
[97]:
seniors_gdf['seniors_ID'] = seniors_gdf.index
[98]:
seniors_gdf.crs = sig_inters_gdf.crs
Spatial Join to identify intersections within Senior Zones
[99]:
sig_inters_gdf = gpd.sjoin(sig_inters_gdf, seniors_gdf[['Name', 'seniors_ID','geometry']], how='left', op='within')
[100]:
sig_inters_gdf.drop('index_right', axis=1, inplace=True)
[101]:
sig_inters_gdf.head(3)
[101]:
intersection_id | y | x | st1_name | st2_name | st3_name | st4_name | dup | Coordinates | distance_to_LPIS | ... | nearest_StImpro | DateComplt | SIPProjTyp | distance_to_bikeroute | nearest_bikeroute | bname | distance_to_priorityinters | nearest_priorityinters | Name | seniors_ID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 199793.6093 | 986336.1490 | ALLEN STREET | CANAL STREET | 0 | POINT (986336.149 199793.609300002) | 276 | ... | 4 | 2009-10-31 | - | 3 | 8553.0 | Manhattan | 7 | 245 | Lower East Side | 27.0 | ||
1 | 2.0 | 202206.1619 | 982769.3310 | AVENUE OF THE AMERICAS | LAIGHT STREET | CANAL STREET | 0 | POINT (982769.331 202206.161899999) | 592 | ... | 245 | 2016-07-23 | VZ Priority Geo - Traffic Calming | 230 | 1778.0 | Manhattan | 2289 | 191 | NaN | NaN | |
2 | 3.0 | 201790.0942 | 982805.6184 | AVENUE OF THE AMERICAS | LISPENARD STREET | WEST BROADWAY | 0 | POINT (982805.6183999931 201790.0942) | 413 | ... | 245 | 2016-07-23 | VZ Priority Geo - Traffic Calming | 1 | 1769.0 | Manhattan | 2331 | 191 | NaN | NaN |
3 rows × 29 columns
[102]:
str_cols = list(sig_inters_gdf.select_dtypes(include=['object']).columns)
for col in str_cols:
sig_inters_gdf[col] = sig_inters_gdf[col].astype(str)
[103]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\working_data\signal_intersection.dta"
[ ]:
#sig_inters_gdf.to_stata(fp)
4. NYPD Motor Vehicle Collision data¶
4.a. download¶
stata
[104]:
clear
[105]:
import delimited "https://data.cityofnewyork.us/api/views/h9gi-nx95/rows.csv?accessType=DOWNLOAD"
(29 vars, 1,482,622 obs)
[106]:
notes: "Downloaded $S_DATE $S_TIME"
notes list
_dta:
1. "Downloaded 25 Apr 2019 10:52:22"
save
is commented out so I do not replace my original dataset
[ ]:
//save "..\input_data\NYPD_Motor_Vehicle_Collisions.dta"
4.b. clean¶
stata
[233]:
use "..\input_data\NYPD_Motor_Vehicle_Collisions.dta",clear
[234]:
notes list
//Check that notes is "Downloaded 28 Sep 2018 15:48:00"
_dta:
1. "Downloaded 28 Sep 2018 15:48:00"
Formatting the data
[235]:
%head
date | time | borough | zipcode | latitude | longitude | location | onstreetname | crossstreetname | offstreetname | numberofpersonsinjured | numberofpersonskilled | numberofpedestriansinjured | numberofpedestrianskilled | numberofcyclistinjured | numberofcyclistkilled | numberofmotoristinjured | numberofmotoristkilled | contributingfactorvehicle1 | contributingfactorvehicle2 | contributingfactorvehicle3 | contributingfactorvehicle4 | contributingfactorvehicle5 | uniquekey | vehicletypecode1 | vehicletypecode2 | vehicletypecode3 | vehicletypecode4 | vehicletypecode5 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 09/24/2018 | 0:00 | STATEN ISLAND | 10306 | 40.573219 | -74.106995 | (40.57322, -74.106995) | HYLAN BOULEVARD | OTIS AVENUE | . | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Driver Inattention/Distraction | Unspecified | . | . | . | 3987045 | Station Wagon/Sport Utility Vehicle | Sedan | . | . | . |
2 | 09/24/2018 | 0:00 | . | . | 40.612614 | -74.076958 | (40.612614, -74.07696) | FLETCHER STREET | . | . | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Driver Inattention/Distraction | Unspecified | . | . | . | 3987187 | Station Wagon/Sport Utility Vehicle | Sedan | . | . | . |
3 | 09/24/2018 | 0:00 | . | . | . | . | . | . | . | 161-18 140 street | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | . | . | . | . | 3986792 | Sedan | . | . | . | . |
4 | 09/24/2018 | 0:00 | . | . | 40.833981 | -73.826347 | (40.83398, -73.82635) | BRUCKNER EXPRESSWAY | . | . | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | Unsafe Speed | Unspecified | . | . | . | 3986861 | Sedan | Sedan | . | . | . |
5 | 09/24/2018 | 0:05 | BROOKLYN | 11249 | 40.720074 | -73.959846 | (40.720074, -73.95985) | WYTHE AVENUE | NORTH 8 STREET | . | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Failure to Yield Right-of-Way | Unspecified | . | . | . | 3986532 | Sedan | Sedan | . | . | . |
6 | 09/24/2018 | 0:28 | BROOKLYN | 11210 | 40.631283 | -73.94603 | (40.631283, -73.94603) | . | . | 1609 FLATBUSH AVENUE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Driver Inattention/Distraction | Driver Inattention/Distraction | . | . | . | 3986656 | Sedan | Sedan | . | . | . |
7 | 09/24/2018 | 0:30 | QUEENS | 11385 | 40.693848 | -73.901184 | (40.693848, -73.901184) | CODY AVENUE | WYCKOFF AVENUE | . | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Passing or Lane Usage Improper | Unspecified | . | . | . | 3988303 | Station Wagon/Sport Utility Vehicle | . | . | . | . |
8 | 09/24/2018 | 0:45 | . | . | 40.678162 | -73.897484 | (40.67816, -73.897484) | PENNSYLVANIA AVENUE | . | . | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Aggressive Driving/Road Rage | Unspecified | . | . | . | 3986730 | Sedan | Sedan | . | . | . |
9 | 09/24/2018 | 10:00 | BROOKLYN | 11201 | 40.702908 | -73.981133 | (40.702908, -73.98113) | WATER STREET | HUDSON AVENUE | . | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Turning Improperly | Unspecified | . | . | . | 3986916 | Station Wagon/Sport Utility Vehicle | Sedan | . | . | . |
10 | 09/24/2018 | 10:00 | BROOKLYN | 11201 | 40.694687 | -73.981743 | (40.694687, -73.98174) | . | . | 38 FLEET WALK | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Backing Unsafely | Unspecified | . | . | . | 3986791 | Box Truck | Flat Bed | . | . | . |
[236]:
order(uniquekey), before(date)
Create dates that are in the Stata internal form(SIF) that Stata can understand
[237]:
codebook date
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
date DATE
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
type: string (str105)
unique values: 2,278 missing "": 0/1,351,215
examples: "03/22/2016"
"06/05/2015"
"08/10/2015"
"10/17/2014"
warning: variable has leading and embedded blanks
The date variable is a string in the format of month day and year. Therefore, we will use Stata’s mask MDY to create the date in SIF.
[238]:
gen date2 = date(date, "MDY"),after(date)
format date2 %td
(1 missing value generated)
There is an observation with nonsensical values in the date str variable
[239]:
drop date
rename date2 date
[240]:
gen time2 = clock(time, "hm"),after(time)
format time2 %tcHH:MM
drop time
rename time2 time
(1 missing value generated)
We can easily extract year from SIF data
[241]:
gen year = year(date), before(date)
(1 missing value generated)
Create latenight indicator variable
[242]:
gen latenight = (time > tc(23:00:00) & time <= tc(23:59:59)) | (time >= tc(00:00:00) & time < tc(05:00:00)), after(time)
[243]:
tab latenight
latenight | Freq. Percent Cum.
------------+-----------------------------------
0 | 1,218,191 90.16 90.16
1 | 133,024 9.84 100.00
------------+-----------------------------------
Total | 1,351,215 100.00
Rename long variable names
[244]:
rename numberofpersonsinjured personsinjured
rename numberofpersonskilled personskilled
rename numberofpedestriansinjured pedestriansinjured
rename numberofpedestrianskilled pedestrianskilled
rename numberofcyclistinjured cyclistinjured
rename numberofcyclistkilled cyclistkilled
rename numberofmotoristinjured motoristinjured
rename numberofmotoristkilled motoristkilled
Create an indicator for observations with missing Longditude and Latitude points
[245]:
gen mi_latlong = mi(latitude) & mi(longitude), after(longitude)
[246]:
tab mi_latlong year, column
+-------------------+
| Key |
|-------------------|
| frequency |
| column percentage |
+-------------------+
| year
mi_latlong | 2012 2013 2014 2015 2016 2017 2018 | Total
-----------+-----------------------------------------------------------------------------+----------
0 | 85,450 171,914 172,724 182,956 145,733 212,986 155,901 | 1,127,664
| 84.99 84.39 83.84 84.05 63.96 92.84 93.93 | 83.46
-----------+-----------------------------------------------------------------------------+----------
1 | 15,089 31,809 33,302 34,731 82,104 16,437 10,078 | 223,550
| 15.01 15.61 16.16 15.95 36.04 7.16 6.07 | 16.54
-----------+-----------------------------------------------------------------------------+----------
Total | 100,539 203,723 206,026 217,687 227,837 229,423 165,979 | 1,351,214
| 100.00 100.00 100.00 100.00 100.00 100.00 100.00 | 100.00
Use Stata’s duplicates
command to identify duplicates among onstreetname and crossstreetname
[247]:
duplicates report onstreetname crossstreetname if ///
!missing(onstreetname) & !missing(crossstreetname)
Duplicates in terms of onstreetname crossstreetname
--------------------------------------
copies | observations surplus
----------+---------------------------
1 | 42735 0
2 | 42570 21285
3 | 43125 28750
4 | 43116 32337
5 | 40200 32160
6 | 37398 31165
7 | 34363 29454
8 | 32688 28602
9 | 30033 26696
10 | 28170 25353
11 | 26191 23810
12 | 23868 21879
13 | 22945 21180
14 | 21154 19643
15 | 19485 18186
16 | 17856 16740
17 | 16881 15888
18 | 17046 16099
19 | 14725 13950
20 | 14660 13927
21 | 12810 12200
22 | 12562 11991
23 | 11638 11132
24 | 11688 11201
25 | 10525 10104
26 | 10166 9775
27 | 9936 9568
28 | 9912 9558
29 | 7743 7476
30 | 8340 8062
31 | 7750 7500
32 | 7584 7347
33 | 6864 6656
34 | 8058 7821
35 | 8050 7820
36 | 6876 6685
37 | 7178 6984
38 | 6384 6216
39 | 5538 5396
40 | 5880 5733
41 | 5166 5040
42 | 5250 5125
43 | 3612 3528
44 | 5588 5461
45 | 3915 3828
46 | 3956 3870
47 | 4606 4508
48 | 3840 3760
49 | 4410 4320
50 | 4350 4263
51 | 3774 3700
52 | 3640 3570
53 | 3127 3068
54 | 4212 4134
55 | 3850 3780
56 | 2912 2860
57 | 2622 2576
58 | 3074 3021
59 | 3186 3132
60 | 3420 3363
61 | 2989 2940
62 | 2976 2928
63 | 1827 1798
64 | 2624 2583
65 | 2015 1984
66 | 2376 2340
67 | 1876 1848
68 | 2176 2144
69 | 2415 2380
70 | 2030 2001
71 | 2059 2030
72 | 1800 1775
73 | 2190 2160
74 | 2590 2555
75 | 1725 1702
76 | 1520 1500
77 | 2079 2052
78 | 1248 1232
79 | 1501 1482
80 | 1920 1896
81 | 1701 1680
82 | 1804 1782
83 | 498 492
84 | 1764 1743
85 | 2720 2688
86 | 1634 1615
87 | 1305 1290
88 | 1760 1740
89 | 1068 1056
90 | 720 712
91 | 1183 1170
92 | 920 910
93 | 1953 1932
94 | 1034 1023
95 | 1520 1504
96 | 1344 1330
97 | 1552 1536
98 | 1372 1358
99 | 1287 1274
100 | 1400 1386
101 | 1313 1300
102 | 816 808
103 | 515 510
104 | 624 618
105 | 945 936
106 | 954 945
107 | 856 848
108 | 1728 1712
109 | 763 756
110 | 1430 1417
111 | 1110 1100
112 | 448 444
113 | 565 560
114 | 798 791
115 | 805 798
116 | 1044 1035
117 | 702 696
118 | 590 585
119 | 952 944
120 | 480 476
121 | 484 480
122 | 976 968
123 | 738 732
124 | 620 615
125 | 875 868
126 | 378 375
127 | 381 378
128 | 640 635
129 | 387 384
130 | 260 258
131 | 262 260
132 | 396 393
133 | 1197 1188
134 | 1340 1330
135 | 540 536
136 | 544 540
137 | 548 544
138 | 414 411
139 | 278 276
140 | 840 834
141 | 705 700
142 | 1136 1128
143 | 286 284
145 | 725 720
146 | 1022 1015
147 | 588 584
148 | 444 441
149 | 447 444
151 | 1057 1050
152 | 760 755
153 | 153 152
154 | 154 153
155 | 310 308
156 | 312 310
157 | 157 156
158 | 316 314
159 | 636 632
160 | 480 477
161 | 966 960
162 | 486 483
163 | 163 162
165 | 330 328
166 | 498 495
167 | 167 166
168 | 336 334
169 | 507 504
170 | 170 169
171 | 342 340
172 | 688 684
173 | 173 172
174 | 174 173
175 | 525 522
176 | 704 700
177 | 177 176
178 | 356 354
179 | 358 356
180 | 360 358
185 | 370 368
186 | 372 370
189 | 189 188
190 | 380 378
191 | 382 380
192 | 192 191
196 | 588 585
197 | 197 196
199 | 199 198
200 | 200 199
202 | 202 201
206 | 206 205
208 | 624 621
210 | 210 209
212 | 636 633
213 | 213 212
214 | 214 213
216 | 432 430
218 | 218 217
219 | 219 218
223 | 223 222
225 | 225 224
229 | 229 228
231 | 231 230
234 | 234 233
235 | 235 234
238 | 238 237
241 | 241 240
244 | 244 243
245 | 245 244
255 | 510 508
263 | 263 262
266 | 266 265
269 | 269 268
278 | 278 277
280 | 280 279
287 | 287 286
288 | 288 287
292 | 292 291
297 | 297 296
299 | 299 298
306 | 306 305
317 | 317 316
326 | 326 325
343 | 343 342
351 | 351 350
359 | 359 358
400 | 400 399
408 | 408 407
24990 | 24990 24989
--------------------------------------
use Stata’s strtrim
function to remove leading and trailing spaces
[248]:
replace onstreetname = strtrim(onstreetname)
replace crossstreetname = strtrim(crossstreetname)
(1,079,830 real changes made)
(738,472 real changes made)
[249]:
sort onstreetname crossstreetname borough zipcode latitude longitude
If borough, zipcode, on street and cross street are the same, we will use the latitude and longitudes for those observations for observations with missing coordinates. s
[250]:
replace latitude = latitude[_n-1] if ///
mi(latitude) & ///
!mi(latitude[_n-1]) & ///
!mi(borough) & ///
!mi(borough[_n-1]) & ///
!mi(zipcode) & ///
!mi(zipcode[_n-1]) & ///
!mi(onstreetname) & ///
!mi(onstreetname[_n-1]) & ///
!mi(crossstreetname) & ///
!mi(crossstreetname[_n-1]) & ///
(borough == borough[_n-1]) & ///
(zipcode == zipcode[_n-1]) & ///
(onstreetname == onstreetname[_n-1]) & ///
(crossstreetname == crossstreetname[_n-1])
replace longitude = longitude[_n-1] if ///
mi(longitude) & ///
!mi(longitude[_n-1]) & ///
!mi(borough) & ///
!mi(borough[_n-1]) & ///
!mi(zipcode) & ///
!mi(zipcode[_n-1]) & ///
!mi(onstreetname) & ///
!mi(onstreetname[_n-1]) & ///
!mi(crossstreetname) & ///
!mi(crossstreetname[_n-1]) & ///
(borough == borough[_n-1]) & ///
(zipcode == zipcode[_n-1]) & ///
(onstreetname == onstreetname[_n-1]) & ///
(crossstreetname == crossstreetname[_n-1])
(22,579 real changes made)
(22,579 real changes made)
Create indicator for observations which had their coordinates filled
[251]:
gen imput_latlong = (mi_latlong ==1 & !mi(latitude) & !mi(longitude)), after(mi_latlong)
[252]:
tab imput_latlong
imput_latlo |
ng | Freq. Percent Cum.
------------+-----------------------------------
0 | 1,328,636 98.33 98.33
1 | 22,579 1.67 100.00
------------+-----------------------------------
Total | 1,351,215 100.00
Create a second indicator for those observations that still have missing coordinates
[253]:
gen mi_latlong2 = mi(latitude) & mi(longitude), after(imput_latlong)
[254]:
tab mi_latlong2 year, column
+-------------------+
| Key |
|-------------------|
| frequency |
| column percentage |
+-------------------+
mi_latlong | year
2 | 2012 2013 2014 2015 2016 2017 2018 | Total
-----------+-----------------------------------------------------------------------------+----------
0 | 85,452 171,917 172,730 182,959 164,166 215,363 157,656 | 1,150,243
| 84.99 84.39 83.84 84.05 72.05 93.87 94.99 | 85.13
-----------+-----------------------------------------------------------------------------+----------
1 | 15,087 31,806 33,296 34,728 63,671 14,060 8,323 | 200,971
| 15.01 15.61 16.16 15.95 27.95 6.13 5.01 | 14.87
-----------+-----------------------------------------------------------------------------+----------
Total | 100,539 203,723 206,026 217,687 227,837 229,423 165,979 | 1,351,214
| 100.00 100.00 100.00 100.00 100.00 100.00 100.00 | 100.00
Drop observations whose coordinates fall outside NYC’s boundary box
[255]:
drop if (longitude < -74.5 | longitude > -73) & !mi(longitude)
drop if (latitude < 40 | latitude > 41) & !mi(latitude)
(439 observations deleted)
(4 observations deleted)
Drop observations that have missing coordinates
[256]:
drop if mi_latlong2 == 1
(200,972 observations deleted)
Cross tabulation of missing coordinates by year
[257]:
tab mi_latlong2 year, column
+-------------------+
| Key |
|-------------------|
| frequency |
| column percentage |
+-------------------+
mi_latlong | year
2 | 2012 2013 2014 2015 2016 2017 2018 | Total
-----------+-----------------------------------------------------------------------------+----------
0 | 85,452 171,917 172,730 182,959 164,159 215,211 157,372 | 1,149,800
| 100.00 100.00 100.00 100.00 100.00 100.00 100.00 | 100.00
-----------+-----------------------------------------------------------------------------+----------
Total | 85,452 171,917 172,730 182,959 164,159 215,211 157,372 | 1,149,800
| 100.00 100.00 100.00 100.00 100.00 100.00 100.00 | 100.00
[ ]:
//save "..\working_data\NYPD_Motor_Vehicle_Collisions_clean.dta"
4.c. link to signal intersection¶
python
[258]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\working_data\NYPD_Motor_Vehicle_Collisions_clean.dta"
collision_df = pd.read_stata(fp)
[259]:
collision_df.head(3)
[259]:
uniquekey | year | date | time | latenight | borough | zipcode | latitude | longitude | mi_latlong | ... | contributingfactorvehicle5 | vehicletypecode1 | vehicletypecode2 | vehicletypecode3 | vehicletypecode4 | vehicletypecode5 | bicyclerelated | taxirelated | publicrelated | dup | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3427749.0 | 2016.0 | 2016-04-24 | 1960-01-01 18:02:00 | 0.0 | 40.744896 | -73.770203 | 0.0 | ... | PASSENGER VEHICLE | PASSENGER VEHICLE | 0.0 | 0.0 | 0.0 | 24989.0 | ||||||
1 | 3474445.0 | 2016.0 | 2016-07-05 | 1960-01-01 01:30:00 | 1.0 | 40.650494 | -74.011772 | 0.0 | ... | PASSENGER VEHICLE | PASSENGER VEHICLE | 0.0 | 0.0 | 0.0 | 24989.0 | ||||||
2 | 3467416.0 | 2016.0 | 2016-06-22 | 1960-01-01 12:15:00 | 0.0 | 40.705044 | -73.959030 | 0.0 | ... | PASSENGER VEHICLE | PASSENGER VEHICLE | 0.0 | 0.0 | 0.0 | 24989.0 |
3 rows × 38 columns
[260]:
collision_df["longitude"].dtype
[260]:
dtype('float32')
[261]:
collision_df['Coordinates'] = list(zip(collision_df.longitude, collision_df.latitude))
collision_df['Coordinates'] = collision_df['Coordinates'].apply(Point)
collision_gdf = gpd.GeoDataFrame(collision_df, geometry='Coordinates')
[262]:
collision_gdf.head(3)
[262]:
uniquekey | year | date | time | latenight | borough | zipcode | latitude | longitude | mi_latlong | ... | vehicletypecode1 | vehicletypecode2 | vehicletypecode3 | vehicletypecode4 | vehicletypecode5 | bicyclerelated | taxirelated | publicrelated | dup | Coordinates | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3427749.0 | 2016.0 | 2016-04-24 | 1960-01-01 18:02:00 | 0.0 | 40.744896 | -73.770203 | 0.0 | ... | PASSENGER VEHICLE | PASSENGER VEHICLE | 0.0 | 0.0 | 0.0 | 24989.0 | POINT (-73.77020263671875 40.74489593505859) | |||||
1 | 3474445.0 | 2016.0 | 2016-07-05 | 1960-01-01 01:30:00 | 1.0 | 40.650494 | -74.011772 | 0.0 | ... | PASSENGER VEHICLE | PASSENGER VEHICLE | 0.0 | 0.0 | 0.0 | 24989.0 | POINT (-74.01177215576172 40.65049362182617) | |||||
2 | 3467416.0 | 2016.0 | 2016-06-22 | 1960-01-01 12:15:00 | 0.0 | 40.705044 | -73.959030 | 0.0 | ... | PASSENGER VEHICLE | PASSENGER VEHICLE | 0.0 | 0.0 | 0.0 | 24989.0 | POINT (-73.95903015136719 40.70504379272461) |
3 rows × 39 columns
[263]:
collision_gdf.plot(figsize=(15, 15));
[263]:
<matplotlib.axes._subplots.AxesSubplot at 0x21f27f0ba90>
[264]:
collision_gdf.crs = {'init' :'epsg:4326'}
[265]:
collision_gdf = collision_gdf.to_crs({'init': 'epsg:2263'})
[266]:
collision_gdf.crs
[266]:
{'init': 'epsg:2263'}
[267]:
collision_gdf.plot(figsize=(15, 15));
[274]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\DOT_traffic_signals_Oct_2018\signal_controllers_clean.dta"
sig_inters_df2 = pd.read_stata(fp)
sig_inters_df2.head(3)
[274]:
intersection_id | y | x | st1_name | st2_name | st3_name | st4_name | dup | |
---|---|---|---|---|---|---|---|---|
0 | 1.0 | 199793.609300002 | 986336.149 | ALLEN STREET | CANAL STREET | 0 | ||
1 | 2.0 | 202206.161899999 | 982769.331 | AVENUE OF THE AMERICAS | LAIGHT STREET | CANAL STREET | 0 | |
2 | 3.0 | 201790.0942 | 982805.618399993 | AVENUE OF THE AMERICAS | LISPENARD STREET | WEST BROADWAY | 0 |
[275]:
# Convert coordinates into float
sig_inters_df2['y'] = sig_inters_df2['y'].astype(float)
sig_inters_df2['x'] = sig_inters_df2['x'].astype(float)
# Put the latitude and longtitude
sig_inters_df2['Coordinates'] = list(zip(sig_inters_df2.x, sig_inters_df2.y))
sig_inters_df2['Coordinates'] = sig_inters_df2['Coordinates'].apply(Point)
sig_inters_gdf2 = gpd.GeoDataFrame(sig_inters_df2, geometry='Coordinates')
[277]:
sig_inters_gdf2.crs
[279]:
sig_inters_gdf2.crs = {'init' :'epsg:2263'}
sig_inters_gdf2.crs
[279]:
{'init': 'epsg:2263'}
[280]:
collision_gdf[['distance_to_sigInt','nearest_sigInt']] = ckdnearest(collision_gdf, sig_inters_gdf2,'intersection_id')
[281]:
collision_gdf.head(3)
[281]:
uniquekey | year | date | time | latenight | borough | zipcode | latitude | longitude | mi_latlong | ... | vehicletypecode3 | vehicletypecode4 | vehicletypecode5 | bicyclerelated | taxirelated | publicrelated | dup | Coordinates | distance_to_sigInt | nearest_sigInt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3427749.0 | 2016.0 | 2016-04-24 | 1960-01-01 18:02:00 | 0.0 | 40.744896 | -73.770203 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 24989.0 | POINT (1047925.420187834 210745.8332306232) | 749 | 11247.0 | |||||
1 | 3474445.0 | 2016.0 | 2016-07-05 | 1960-01-01 01:30:00 | 1.0 | 40.650494 | -74.011772 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 24989.0 | POINT (980983.3829843847 176268.9676320423) | 30 | 5677.0 | |||||
2 | 3467416.0 | 2016.0 | 2016-06-22 | 1960-01-01 12:15:00 | 0.0 | 40.705044 | -73.959030 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 24989.0 | POINT (995609.2936718578 196145.5996373807) | 111 | 7617.0 |
3 rows × 41 columns
[282]:
str_cols = list(collision_gdf.select_dtypes(include=['object']).columns)
for col in str_cols:
collision_gdf[col] = collision_gdf[col].astype(str)
[ ]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\working_data\collision_signal_intersection.dta"
[ ]:
#collision_gdf.to_stata(fp)
5. Calculate collision outcomes¶
stata
[286]:
use "..\working_data\collision_signal_intersection.dta",clear
[287]:
describe
Contains data from ..\working_data\collision_signal_intersection.dta
obs: 1,147,839
vars: 42 15 Nov 2018 16:01
size: 896,462,259
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
index long %12.0g
uniquekey double %10.0g
year float %9.0g
date str10 %10s
time str19 %19s
latenight float %9.0g
borough str13 %13s
zipcode str5 %5s
latitude float %9.0g
longitude float %9.0g
mi_latlong float %9.0g
imput_latlong float %9.0g
mi_latlong2 float %9.0g
location str25 %25s
onstreetname str32 %32s
crossstreetname str32 %32s
offstreetname str40 %40s
personsinjured double %10.0g
personskilled double %10.0g
pedestriansin~d double %10.0g
pedestrianski~d double %10.0g
cyclistinjured double %10.0g
cyclistkilled double %10.0g
motoristinjured double %10.0g
motoristkilled double %10.0g
contributingf~1 str53 %53s
contributingf~2 str53 %53s
contributingf~3 str53 %53s
contributingf~4 str53 %53s
contributingf~5 str43 %43s
vehicletypeco~1 str35 %35s
vehicletypeco~2 str35 %35s
vehicletypeco~3 str35 %35s
vehicletypeco~4 str35 %35s
vehicletypeco~5 str35 %35s
bicyclerelated float %9.0g
taxirelated float %9.0g
publicrelated float %9.0g
dup double %10.0g
Coordinates str43 %43s
distance_to_s~t long %12.0g
nearest_sigInt float %9.0g
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sorted by:
[288]:
gen intersection_id = nearest_sigInt
[289]:
codebook intersection_id
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
intersection_id (unlabeled)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
type: numeric (float)
range: [1,13214] units: 1
unique values: 13,116 missing .: 0/1,147,839
mean: 6513.4
std. dev: 3979.88
percentiles: 10% 25% 50% 75% 90%
981 3009 6437 10260 11828
[290]:
// Determine if collisions that are close to an intersection
sum distance_to_sigInt, detail
distance_to_sigInt
-------------------------------------------------------------
Percentiles Smallest
1% 2 0
5% 2 0
10% 2 0 Obs 1,147,839
25% 3 0 Sum of Wgt. 1,147,839
50% 10 Mean 208.3536
Largest Std. Dev. 375.8735
75% 279 7498
90% 615 7500 Variance 141280.9
95% 853 7500 Skewness 4.267881
99% 1632 7685 Kurtosis 35.85233
[291]:
sum distance_to_sigInt if distance_to_sigInt < r(p75), detail
distance_to_sigInt
-------------------------------------------------------------
Percentiles Smallest
1% 1 0
5% 2 0
10% 2 0 Obs 859,866
25% 3 0 Sum of Wgt. 859,866
50% 3 Mean 51.85574
Largest Std. Dev. 86.39933
75% 65 278
90% 227 278 Variance 7464.844
95% 259 278 Skewness 1.550851
99% 273 278 Kurtosis 3.837138
[295]:
sum distance_to_sigInt if distance_to_sigInt < r(p75), detail
distance_to_sigInt
-------------------------------------------------------------
Percentiles Smallest
1% 2 0
5% 2 0
10% 2 0 Obs 1,147,839
25% 3 0 Sum of Wgt. 1,147,839
50% 10 Mean 208.3536
Largest Std. Dev. 375.8735
75% 279 7498
90% 615 7500 Variance 141280.9
95% 853 7500 Skewness 4.267881
99% 1632 7685 Kurtosis 35.85233
[298]:
gen intersection = (distance_to_sigInt < 10)
[299]:
tab intersection
intersectio |
n | Freq. Percent Cum.
------------+-----------------------------------
0 | 576,297 50.21 50.21
1 | 571,542 49.79 100.00
------------+-----------------------------------
Total | 1,147,839 100.00
[300]:
codebook date
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
date (unlabeled)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
type: string (str10)
unique values: 2,277 missing "": 0/1,147,839
examples: "2013-11-04"
"2015-03-06"
"2016-06-01"
"2017-09-03"
[301]:
gen date2 = date(date, "YMD"),after(date)
format date2 %td
drop date
rename date2 date
[302]:
%head
index | uniquekey | year | date | time | latenight | borough | zipcode | latitude | longitude | mi_latlong | imput_latlong | mi_latlong2 | location | onstreetname | crossstreetname | offstreetname | personsinjured | personskilled | pedestriansinjured | pedestrianskilled | cyclistinjured | cyclistkilled | motoristinjured | motoristkilled | contributingfactorvehicle1 | contributingfactorvehicle2 | contributingfactorvehicle3 | contributingfactorvehicle4 | contributingfactorvehicle5 | vehicletypecode1 | vehicletypecode2 | vehicletypecode3 | vehicletypecode4 | vehicletypecode5 | bicyclerelated | taxirelated | publicrelated | dup | Coordinates | distance_to_sigInt | nearest_sigInt | intersection_id | intersection | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0 | 3427749 | 2016 | 24apr2016 | 1960-01-01 18:02:00 | 0 | . | . | 40.744896 | -73.770203 | 0 | 0 | 0 | (40.7448949, -73.7702044) | . | . | 2 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | Fatigued/Drowsy | Fatigued/Drowsy | . | . | . | PASSENGER VEHICLE | PASSENGER VEHICLE | . | . | . | 0 | 0 | 0 | 24989 | POINT (1047925.420187834 210745.8332306232) | 749 | 11247 | 11247 | 0 | |
2 | 1 | 3474445 | 2016 | 05jul2016 | 1960-01-01 01:30:00 | 1 | . | . | 40.650494 | -74.011772 | 0 | 0 | 0 | (40.6504919, -74.0117738) | . | . | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | Unspecified | . | . | . | PASSENGER VEHICLE | PASSENGER VEHICLE | . | . | . | 0 | 0 | 0 | 24989 | POINT (980983.3829843847 176268.9676320423) | 30 | 5677 | 5677 | 0 | |
3 | 2 | 3467416 | 2016 | 22jun2016 | 1960-01-01 12:15:00 | 0 | . | . | 40.705044 | -73.95903 | 0 | 0 | 0 | (40.7050435, -73.9590323) | . | . | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | Unspecified | . | . | . | PASSENGER VEHICLE | PASSENGER VEHICLE | . | . | . | 0 | 0 | 0 | 24989 | POINT (995609.2936718578 196145.5996373807) | 111 | 7617 | 7617 | 0 | |
4 | 3 | 2966009 | 2014 | 11nov2014 | 1960-01-01 23:30:00 | 1 | . | . | 40.651398 | -74.010574 | 0 | 0 | 0 | (40.6513986, -74.0105739) | . | . | . | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Driver Inattention/Distraction | Unspecified | . | . | . | PASSENGER VEHICLE | PASSENGER VEHICLE | . | . | . | 0 | 0 | 0 | . | POINT (981315.8004963149 176598.3076968596) | 114 | 5675 | 5675 | 0 |
5 | 4 | 2984443 | 2014 | 09nov2014 | 1960-01-01 06:02:00 | 0 | . | . | 40.700741 | -73.99498 | 0 | 0 | 0 | (40.7007397, -73.9949795) | . | . | . | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | Fell Asleep | Unspecified | . | . | . | PASSENGER VEHICLE | SPORT UTILITY / STATION WAGON | . | . | . | 0 | 0 | 0 | . | POINT (985641.9735507105 194575.2794881121) | 459 | 4854 | 4854 | 0 |
6 | 5 | 3378303 | 2016 | 26jan2016 | 1960-01-01 16:27:00 | 0 | . | . | 40.586208 | -73.934166 | 0 | 0 | 0 | (40.5862074, -73.934164) | . | . | . | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | Unspecified | . | . | . | SPORT UTILITY / STATION WAGON | PASSENGER VEHICLE | . | . | . | 0 | 0 | 0 | . | POINT (1002535.674372012 152854.8011915441) | 780 | 6820 | 6820 | 0 |
7 | 6 | 3035164 | 2014 | 14aug2014 | 1960-01-01 15:55:00 | 0 | . | . | 40.739086 | -73.816101 | 0 | 0 | 0 | (40.7390852, -73.8160998) | . | . | . | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Fatigued/Drowsy | Unspecified | Unspecified | . | . | PASSENGER VEHICLE | UNKNOWN | VAN | . | . | 0 | 0 | 0 | . | POINT (1035211.71333712 208599.1167361466) | 283 | 10828 | 10828 | 0 |
8 | 7 | 3121423 | 2012 | 16jul2012 | 1960-01-01 16:40:00 | 0 | . | . | 40.608223 | -74.129395 | 0 | 0 | 0 | (40.6082248, -74.1293966) | . | . | . | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Turning Improperly | Unspecified | . | . | . | SPORT UTILITY / STATION WAGON | SPORT UTILITY / STATION WAGON | . | . | . | 0 | 0 | 0 | . | POINT (948321.9988617403 160894.9525950026) | 754 | 12676 | 12676 | 0 |
9 | 8 | 3468203 | 2016 | 18jun2016 | 1960-01-01 14:00:00 | 0 | . | . | 40.68877 | -73.999062 | 0 | 0 | 0 | (40.6887701, -73.999062) | . | . | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | Unspecified | . | . | . | PASSENGER VEHICLE | PASSENGER VEHICLE | . | . | . | 0 | 0 | 0 | 24989 | POINT (984510.248463902 190214.0274668361) | 172 | 5314 | 5314 | 0 | |
10 | 9 | 3725939 | 2017 | 28jul2017 | 1960-01-01 14:43:00 | 0 | . | . | 40.83609 | -73.870392 | 0 | 0 | 0 | (40.83609, -73.87039) | . | . | . | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Backing Unsafely | Unspecified | . | . | . | PASSENGER VEHICLE | . | . | . | . | 0 | 0 | 0 | . | POINT (1020114.481894317 243914.0169084446) | 136 | 3959 | 3959 | 0 |
[303]:
gen month = month(date)
gen quarter = quarter(date)
gen collision_count = 1
[304]:
gen latenight_collision_count = latenight*collision_count
gen latenight_personsinjured = latenight*personsinjured
gen latenight_personskilled = latenight*personskilled
gen latenight_pedestriansinjured = latenight*pedestriansinjured
gen latenight_pedestrianskilled = latenight*pedestrianskilled
gen latenight_cyclistinjured = latenight*cyclistinjured
gen latenight_cyclistkilled = latenight*cyclistkilled
gen latenight_motoristinjured = latenight*motoristinjured
gen latenight_motoristkilled = latenight*motoristkilled
gen day = (latenight == 0)
gen day_collision_count = day*collision_count
gen day_personsinjured = day*personsinjured
gen day_personskilled = day*personskilled
gen day_pedestriansinjured = day*pedestriansinjured
gen day_pedestrianskilled = day*pedestrianskilled
gen day_cyclistinjured = day*cyclistinjured
gen day_cyclistkilled = day*cyclistkilled
gen day_motoristinjured = day*motoristinjured
gen day_motoristkilled = day*motoristkilled
(1 missing value generated)
(1 missing value generated)
(1 missing value generated)
(1 missing value generated)
(1 missing value generated)
(1 missing value generated)
(1 missing value generated)
(1 missing value generated)
(1 missing value generated)
(1 missing value generated)
(1 missing value generated)
(1 missing value generated)
(1 missing value generated)
(1 missing value generated)
(1 missing value generated)
(1 missing value generated)
[305]:
global counts collision_count latenight_collision_count day_collision_count
global all_outcome personsinjured personskilled pedestriansinjured pedestrianskilled cyclistinjured cyclistkilled motoristinjured motoristkilled
global late_outcome latenight_personsinjured latenight_personskilled latenight_pedestriansinjured latenight_pedestrianskilled latenight_cyclistinjured latenight_cyclistkilled latenight_motoristinjured latenight_motoristkilled
global day_outcome day_personsinjured day_personskilled day_pedestriansinjured day_pedestrianskilled day_cyclistinjured day_cyclistkilled day_motoristinjured day_motoristkilled
[306]:
preserve
collapse ///
(sum) "$counts $all_outcome $late_outcome $day_outcome" ///
, by(intersection_id month year )
//save "..\working_data\collision_monthly.dta",replace
restore
[ ]:
preserve
collapse ///
(sum) "$counts $all_outcome $late_outcome $day_outcome" ///
, by(intersection_id quarter year)
//save "..\working_data\collision_quarterly.dta",replace
restore
6. Set up Quarterly Panel data¶
stata
[307]:
clear
[308]:
cd
C:\Users\jerem\Box Sync\Policy Evaluation\dofiles
[309]:
// There are 13213 signal intersections
global intersect = 13213+1
global periods_per_year = 4
global obs = $periods_per_year*7*($intersect)
display $obs
set obs $obs
369992
number of observations (_N) was 0, now 369,992
[310]:
egen intersection_id = seq(), from(1) to($intersect)
sort intersection_id
[311]:
egen quarter = seq(), from(1) to (4) by(intersection_id)
egen year = seq(), from(2012) to (2018) block(4)
[312]:
gen quarterly = yq(year,quarter)
format quarterly %tq
[313]:
//These are outside of the range of our data
drop if quarter <= 2 & year == 2012
drop if quarter >= 4 & year == 2018
(26,428 observations deleted)
(13,214 observations deleted)
6.a. Merge in Intersection Data¶
[314]:
// Intersection level data
mmerge intersection_id using "..\working_data\signal_intersection.dta", ///
type(n:1) ///
unmatched(using) ///
umatch(intersection_id)
-------------------------------------------------------------------------------
merge specs |
matching type | n:1
mv's on match vars | none
unmatched obs from | using
---------------------+---------------------------------------------------------
master file | <data in memory not named>
obs | 330350
vars | 4
match vars | intersection_id (not a key)
-------------------+---------------------------------------------------------
using file | ..\working_data\signal_intersection.dta
obs | 13213
vars | 38
match vars | intersection_id (key)
---------------------+---------------------------------------------------------
variable intersection_id does not uniquely identify observations in the master data
result file | <data in memory not named>
obs | 330325
vars | 43 (including _merge)
------------+---------------------------------------------------------
_merge | 330325 obs both in master and using data (code==3)
-------------------------------------------------------------------------------
[315]:
describe
Contains data
obs: 330,325
vars: 42
size: 153,270,800
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
intersection_id int %8.0g
quarter byte %8.0g
year int %8.0g
quarterly int %tq
index int %12.0g
y double %10.0g
x double %10.0g
st1_name str31 %31s
st2_name str32 %32s
st3_name str35 %35s
st4_name str36 %36s
dup byte %8.0g
Coordinates str42 %42s
distance_to_L~S int %12.0g
nearest_LPIS int %12.0g
date_insta str10 %10s
distance_to_S~t int %12.0g
nearest_Street long %12.0g
Number_Tra str4 %4s
Number_Par str4 %4s
Number_Tot str4 %4s
TrafDir str4 %4s
distance_to_Sch int %12.0g
nearest_Sch long %10.0g
distance_to_LTC long %12.0g
nearest_LTC int %12.0g
treatment_ str80 %80s
completion str10 %10s
distance_to_S~o long %12.0g
nearest_StImpro int %12.0g
DateComplt str10 %10s
SIPProjTyp str53 %53s
distance_to_b~e int %12.0g
nearest_biker~e int %10.0g
distance_to_t~e int %12.0g
nearest_truck~e long %12.0g
bname str13 %13s
distance_to_p~s int %12.0g
nearest_prior~s int %12.0g
Name str26 %26s
seniors_ID byte %10.0g
_merge byte %32.0g __MERGE
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sorted by:
Note: Dataset has changed since last saved.
6.b. Create borough dummies¶
[316]:
tab bname,m
bname | Freq. Percent Cum.
--------------+-----------------------------------
Bronx | 42,600 12.90 12.90
Brooklyn | 117,400 35.54 48.44
Manhattan | 71,775 21.73 70.17
Queens | 82,800 25.07 95.23
Staten Island | 15,650 4.74 99.97
nan | 100 0.03 100.00
--------------+-----------------------------------
Total | 330,325 100.00
[317]:
gen bronx = (bname == "Bronx")
gen brooklyn = (bname == "Brooklyn")
gen manhattan = (bname == "Manhattan")
gen queens = (bname == "Queens")
gen statenisland = (bname == "Staten Island")
6.c. Nearest Bike Route¶
[320]:
destring nearest_bikeroute,replace
nearest_bikeroute already numeric; no replace
[321]:
mmerge nearest_bikeroute using "..\working_data\bike_routes_2263.dta", ///
type(n:1) ///
unmatched(master) ///
umatch(objectid_1) ///
ukeep(date_instd tf_facilit)
-------------------------------------------------------------------------------
merge specs |
matching type | n:1
mv's on match vars | none
unmatched obs from | master
---------------------+---------------------------------------------------------
master file | <data in memory not named>
obs | 330325
vars | 46
match vars | nearest_bikeroute (not a key)
-------------------+---------------------------------------------------------
using file | ..\working_data\bike_routes_2263.dta
obs | 14980
vars | 3 (selection via udrop/ukeep)
match vars | objectid_1 (key)
---------------------+---------------------------------------------------------
variable nearest_bikeroute does not uniquely identify observations in the master data
result file | <data in memory not named>
obs | 330325
vars | 50 (including _merge)
------------+---------------------------------------------------------
_merge | 330325 obs both in master and using data (code==3)
-------------------------------------------------------------------------------
[322]:
gen bike_route_install_dt = date(date_instd, "YMD"), after(date_instd)
format bike_route_install_dt %td
[323]:
sum bike_route_install_dt, format
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
bike_route~t | 330,325 24nov2002 7771.114 01jan1900 23dec2016
[324]:
gen bike_route_install_mt = mofd(bike_route_install_dt)
format bike_route_install_mt %tm
[325]:
sum bike_route_install_mt, format
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
bike_rout~mt | 330,325 2002m11 255.2326 1900m1 2016m12
[326]:
gen bike_route_install_qt = qofd(bike_route_install_dt)
format bike_route_install_qt %tq
[327]:
sum bike_route_install_qt, format
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
bike_rout~qt | 330,325 2002q4 85.00089 1900q1 2016q4
Determine if collision occured on bike route¶
[328]:
sum distance_to_bikeroute, detail
distance_to_bikeroute
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% 0 0
10% 1 0 Obs 330,325
25% 3 0 Sum of Wgt. 330,325
50% 537 Mean 1023.112
Largest Std. Dev. 1365.944
75% 1416 8907
90% 2871 8907 Variance 1865802
95% 4039 8907 Skewness 2.00231
99% 6101 8907 Kurtosis 7.320251
[329]:
sum distance_to_bikeroute if distance_to_bikeroute < r(p50) , detail
distance_to_bikeroute
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% 0 0
10% 0 0 Obs 165,025
25% 1 0 Sum of Wgt. 165,025
50% 3 Mean 122.9515
Largest Std. Dev. 176.5845
75% 262 536
90% 447 536 Variance 31182.09
95% 507 536 Skewness 1.105402
99% 530 536 Kurtosis 2.69221
[330]:
// Time invariant bike route indicator
gen bike_route_ever = distance_to_bikeroute <= 3
label variable bike_route_ever "Indicates that bike route passes through this intersection (<3 ft)"
[331]:
tab bike_route_ever
Indicates |
that bike |
route |
passes |
through |
this |
intersectio |
n (<3 ft) | Freq. Percent Cum.
------------+-----------------------------------
0 | 245,200 74.23 74.23
1 | 85,125 25.77 100.00
------------+-----------------------------------
Total | 330,325 100.00
[333]:
// Time-variant bike route indicator
gen bike_route_tv = 0
replace bike_route_tv = 1 if quarterly >= bike_route_install_qt & bike_route_ever == 1
label variable bike_route_tv "Indicates that bike route passes through this intersection timevarient(<3 ft)"
(74,140 real changes made)
[334]:
tab bike_route_tv
Indicates |
that bike |
route |
passes |
through |
this |
intersectio |
n |
timevarient |
(<3 ft) | Freq. Percent Cum.
------------+-----------------------------------
0 | 256,185 77.56 77.56
1 | 74,140 22.44 100.00
------------+-----------------------------------
Total | 330,325 100.00
[347]:
tab bike_route_tv year
Indicates |
that bike |
route |
passes |
through |
this |
intersecti |
on |
timevarien | year
t(<3 ft) | 2012 2013 2014 2015 2016 2017 2018 | Total
-----------+-----------------------------------------------------------------------------+----------
0 | 21,681 42,808 41,981 41,054 40,005 39,232 29,424 | 256,185
1 | 4,745 10,044 10,871 11,798 12,847 13,620 10,215 | 74,140
-----------+-----------------------------------------------------------------------------+----------
Total | 26,426 52,852 52,852 52,852 52,852 52,852 39,639 | 330,325
6.d. Left Turn Calming Intervention¶
[335]:
rename treatment_ left_turn_treatment
[336]:
gen left_turn_install_dt = date(completion, "YMD"), after(completion)
format left_turn_install_dt %td
gen int left_turn_install_mt = mofd(left_turn_install_dt), after(left_turn_install_dt)
format left_turn_install_mt %tm
gen int left_turn_install_qt = qofd(left_turn_install_dt), after(left_turn_install_dt)
format left_turn_install_qt %tq
Matching intersection to closest left turn calming intervention¶
[337]:
egen double left_turn_min = min(distance_to_LTC), by(nearest_LTC)
[338]:
gen flag_left_turn_ever = .
replace flag_left_turn_ever = 1 if left_turn_min == distance_to_LTC
replace flag_left_turn_ever = 0 if mi(flag_left_turn_ever)
(330,325 missing values generated)
(5,800 real changes made)
(324,525 real changes made)
[339]:
gen flag_left_turn = flag_left_turn_ever
replace flag_left_turn = 0 if flag_left_turn == 1 & quarterly < left_turn_install_qt
(4,359 real changes made)
[345]:
tab flag_left_turn year
flag_left_ | year
turn | 2012 2013 2014 2015 2016 2017 2018 | Total
-----------+-----------------------------------------------------------------------------+----------
0 | 26,426 52,852 52,852 52,852 52,676 52,266 38,960 | 328,884
1 | 0 0 0 0 176 586 679 | 1,441
-----------+-----------------------------------------------------------------------------+----------
Total | 26,426 52,852 52,852 52,852 52,852 52,852 39,639 | 330,325
6.e. Street Improvement Project intervention¶
[340]:
rename SIPProjTyp street_improv_treatment
[341]:
gen street_improv_install_dt = date(DateComplt, "YMD"), after(DateComplt)
format street_improv_install_dt %td
gen int street_improv_install_mt = mofd(street_improv_install_dt), after(street_improv_install_dt)
format street_improv_install_mt %tm
gen int street_improv_install_qt = qofd(street_improv_install_dt), after(street_improv_install_dt)
format street_improv_install_qt %tq
Matching intersection to closest street improvement project¶
[342]:
egen double street_improv_min = min(distance_to_StImpro), by(nearest_StImpro)
[343]:
gen flag_street_improv_ever = .
replace flag_street_improv_ever = 1 if street_improv_min == distance_to_StImpro
replace flag_street_improv_ever = 0 if mi(flag_street_improv_ever)
(330,325 missing values generated)
(9,125 real changes made)
(321,200 real changes made)
[344]:
gen flag_street_improv = flag_street_improv_ever
replace flag_street_improv = 0 if flag_street_improv == 1 & quarterly < street_improv_install_qt
(4,592 real changes made)
[346]:
tab flag_street_improv year
flag_stree | year
t_improv | 2012 2013 2014 2015 2016 2017 2018 | Total
-----------+-----------------------------------------------------------------------------+----------
0 | 26,297 52,556 52,449 52,234 51,957 51,704 38,595 | 325,792
1 | 129 296 403 618 895 1,148 1,044 | 4,533
-----------+-----------------------------------------------------------------------------+----------
Total | 26,426 52,852 52,852 52,852 52,852 52,852 39,639 | 330,325
6.f. Leading Pedestrian Interval Signal¶
[348]:
gen LPIS_install_dt = date(date_insta, "YMD")
format LPIS_install_dt %td
gen int LPIS_install_mt = mofd(LPIS_install_dt), after(LPIS_install_dt)
format LPIS_install_mt %tm
gen int LPIS_install_qt = qofd(LPIS_install_dt), after(LPIS_install_mt)
format LPIS_install_qt %tq
gen LPIS_install_yr = year(LPIS_install_dt), after(LPIS_install_qt)
Matching intersection to closest leading pedestrian interval signal¶
[349]:
egen double LPIS_min = min(distance_to_LPIS), by(nearest_LPIS)
[350]:
gen flag_LPIS_ever = .
replace flag_LPIS_ever = 1 if LPIS_min == distance_to_LPIS
replace flag_LPIS_ever = 0 if mi(flag_LPIS_ever)
(330,325 missing values generated)
(72,875 real changes made)
(257,450 real changes made)
[351]:
gen flag_LPIS = flag_LPIS_ever
[356]:
// Replace with indicator with 0 if quarter was before LPIS installation
replace flag_LPIS = 0 if flag_LPIS == 1 & quarterly < LPIS_install_qt
(47,593 real changes made)
[368]:
label variable flag_LPIS "Indicates the intersection at the time they became an LPIS intervention"
label variable flag_LPIS_ever "Indicates the intersection if they ever received LPIS intervention"
Problem: Certain LPIs intersections had been install before collision data¶
[357]:
sum quarterly, detail format
quarterly
-------------------------------------------------------------
Percentiles Smallest
1% 2012q3 2012q3
5% 2012q4 2012q3
10% 2013q1 2012q3 Obs 330,325
25% 2014q1 2012q3 Sum of Wgt. 330,325
50% 2015q3 Mean 2015q3
Largest Std. Dev. 7.211113
75% 2017q1 2018q3
90% 2018q1 2018q3 Variance 52.00016
95% 2018q2 2018q3 Skewness 0
99% 2018q3 2018q3 Kurtosis 1.796154
[358]:
sum LPIS_install_qt, detail format
LPIS_install_qt
-------------------------------------------------------------
Percentiles Smallest
1% 1899q4 1899q4
5% 2004q1 1899q4
10% 2013q2 1899q4 Obs 330,325
25% 2015q4 1899q4 Sum of Wgt. 330,325
50% 2016q3 Mean 2014q2
Largest Std. Dev. 51.7974
75% 2017q3 2018q3
90% 2018q2 2018q3 Variance 2682.971
95% 2018q3 2018q3 Skewness -7.818992
99% 2018q3 2018q3 Kurtosis 68.19259
We will remove intersections that cannot be analyzed because LPIs was implemented before 01jul2012
[359]:
tab flag_LPIS if LPIS_install_qt < tq(2013q1)
flag_LPIS | Freq. Percent Cum.
------------+-----------------------------------
0 | 26,477 82.42 82.42
1 | 5,648 17.58 100.00
------------+-----------------------------------
Total | 32,125 100.00
[360]:
tab intersection_id if LPIS_install_qt < tq(2013q1) & flag_LPIS_ever == 1
intersectio |
n_id | Freq. Percent Cum.
------------+-----------------------------------
7 | 25 0.44 0.44
8 | 25 0.44 0.88
17 | 25 0.44 1.33
57 | 25 0.44 1.77
58 | 25 0.44 2.21
61 | 25 0.44 2.65
83 | 25 0.44 3.10
109 | 25 0.44 3.54
149 | 25 0.44 3.98
151 | 25 0.44 4.42
161 | 25 0.44 4.87
230 | 25 0.44 5.31
231 | 25 0.44 5.75
238 | 25 0.44 6.19
239 | 25 0.44 6.64
243 | 25 0.44 7.08
255 | 25 0.44 7.52
256 | 25 0.44 7.96
280 | 25 0.44 8.41
282 | 25 0.44 8.85
286 | 25 0.44 9.29
317 | 25 0.44 9.73
333 | 25 0.44 10.18
449 | 25 0.44 10.62
454 | 25 0.44 11.06
468 | 25 0.44 11.50
511 | 25 0.44 11.95
547 | 25 0.44 12.39
550 | 25 0.44 12.83
557 | 25 0.44 13.27
560 | 25 0.44 13.72
567 | 25 0.44 14.16
582 | 25 0.44 14.60
601 | 25 0.44 15.04
605 | 25 0.44 15.49
606 | 25 0.44 15.93
620 | 25 0.44 16.37
624 | 25 0.44 16.81
628 | 25 0.44 17.26
629 | 25 0.44 17.70
633 | 25 0.44 18.14
642 | 25 0.44 18.58
661 | 25 0.44 19.03
676 | 25 0.44 19.47
692 | 25 0.44 19.91
693 | 25 0.44 20.35
724 | 25 0.44 20.80
800 | 25 0.44 21.24
813 | 25 0.44 21.68
827 | 25 0.44 22.12
858 | 25 0.44 22.57
862 | 25 0.44 23.01
903 | 25 0.44 23.45
914 | 25 0.44 23.89
940 | 25 0.44 24.34
949 | 25 0.44 24.78
964 | 25 0.44 25.22
981 | 25 0.44 25.66
984 | 25 0.44 26.11
986 | 25 0.44 26.55
993 | 25 0.44 26.99
1066 | 25 0.44 27.43
1070 | 25 0.44 27.88
1092 | 25 0.44 28.32
1101 | 25 0.44 28.76
1140 | 25 0.44 29.20
1188 | 25 0.44 29.65
1195 | 25 0.44 30.09
1201 | 25 0.44 30.53
1202 | 25 0.44 30.97
1205 | 25 0.44 31.42
1207 | 25 0.44 31.86
1211 | 25 0.44 32.30
1221 | 25 0.44 32.74
1228 | 25 0.44 33.19
1235 | 25 0.44 33.63
1237 | 25 0.44 34.07
1287 | 25 0.44 34.51
1290 | 25 0.44 34.96
1321 | 25 0.44 35.40
1372 | 25 0.44 35.84
1420 | 25 0.44 36.28
1448 | 25 0.44 36.73
1450 | 25 0.44 37.17
1451 | 25 0.44 37.61
1461 | 25 0.44 38.05
1467 | 25 0.44 38.50
1486 | 25 0.44 38.94
1487 | 25 0.44 39.38
1500 | 25 0.44 39.82
1506 | 25 0.44 40.27
1507 | 25 0.44 40.71
1532 | 25 0.44 41.15
1551 | 25 0.44 41.59
1558 | 25 0.44 42.04
1563 | 25 0.44 42.48
1574 | 25 0.44 42.92
1575 | 25 0.44 43.36
1663 | 25 0.44 43.81
1664 | 25 0.44 44.25
1730 | 25 0.44 44.69
1731 | 25 0.44 45.13
1736 | 25 0.44 45.58
1742 | 25 0.44 46.02
1759 | 25 0.44 46.46
1760 | 25 0.44 46.90
1777 | 25 0.44 47.35
1788 | 25 0.44 47.79
1795 | 25 0.44 48.23
1826 | 25 0.44 48.67
1827 | 25 0.44 49.12
1838 | 25 0.44 49.56
1845 | 25 0.44 50.00
1863 | 25 0.44 50.44
1898 | 25 0.44 50.88
1912 | 25 0.44 51.33
1914 | 25 0.44 51.77
1916 | 25 0.44 52.21
1923 | 25 0.44 52.65
1940 | 25 0.44 53.10
1941 | 25 0.44 53.54
2025 | 25 0.44 53.98
2045 | 25 0.44 54.42
2102 | 25 0.44 54.87
2217 | 25 0.44 55.31
2321 | 25 0.44 55.75
2381 | 25 0.44 56.19
2387 | 25 0.44 56.64
2487 | 25 0.44 57.08
2510 | 25 0.44 57.52
2539 | 25 0.44 57.96
2667 | 25 0.44 58.41
2670 | 25 0.44 58.85
2673 | 25 0.44 59.29
2825 | 25 0.44 59.73
3005 | 25 0.44 60.18
3140 | 25 0.44 60.62
3455 | 25 0.44 61.06
3561 | 25 0.44 61.50
4090 | 25 0.44 61.95
4340 | 25 0.44 62.39
4701 | 25 0.44 62.83
4735 | 25 0.44 63.27
5042 | 25 0.44 63.72
5054 | 25 0.44 64.16
5276 | 25 0.44 64.60
5278 | 25 0.44 65.04
5285 | 25 0.44 65.49
5314 | 25 0.44 65.93
5345 | 25 0.44 66.37
5346 | 25 0.44 66.81
5348 | 25 0.44 67.26
5363 | 25 0.44 67.70
5385 | 25 0.44 68.14
5398 | 25 0.44 68.58
5418 | 25 0.44 69.03
5429 | 25 0.44 69.47
5430 | 25 0.44 69.91
5467 | 25 0.44 70.35
5514 | 25 0.44 70.80
5566 | 25 0.44 71.24
5638 | 25 0.44 71.68
5685 | 25 0.44 72.12
5708 | 25 0.44 72.57
5761 | 25 0.44 73.01
5805 | 25 0.44 73.45
5827 | 25 0.44 73.89
6030 | 25 0.44 74.34
6152 | 25 0.44 74.78
6153 | 25 0.44 75.22
6179 | 25 0.44 75.66
6255 | 25 0.44 76.11
6318 | 25 0.44 76.55
6428 | 25 0.44 76.99
6751 | 25 0.44 77.43
6807 | 25 0.44 77.88
6822 | 25 0.44 78.32
6868 | 25 0.44 78.76
7316 | 25 0.44 79.20
7506 | 25 0.44 79.65
9193 | 25 0.44 80.09
9299 | 25 0.44 80.53
9516 | 25 0.44 80.97
9821 | 25 0.44 81.42
9830 | 25 0.44 81.86
9831 | 25 0.44 82.30
9832 | 25 0.44 82.74
9833 | 25 0.44 83.19
9834 | 25 0.44 83.63
9835 | 25 0.44 84.07
9836 | 25 0.44 84.51
9837 | 25 0.44 84.96
9838 | 25 0.44 85.40
9839 | 25 0.44 85.84
9840 | 25 0.44 86.28
9841 | 25 0.44 86.73
9842 | 25 0.44 87.17
9843 | 25 0.44 87.61
9844 | 25 0.44 88.05
9845 | 25 0.44 88.50
9846 | 25 0.44 88.94
9942 | 25 0.44 89.38
9993 | 25 0.44 89.82
10016 | 25 0.44 90.27
10037 | 25 0.44 90.71
10040 | 25 0.44 91.15
10190 | 25 0.44 91.59
10309 | 25 0.44 92.04
10456 | 25 0.44 92.48
10576 | 25 0.44 92.92
10676 | 25 0.44 93.36
10818 | 25 0.44 93.81
10825 | 25 0.44 94.25
11024 | 25 0.44 94.69
11027 | 25 0.44 95.13
11268 | 25 0.44 95.58
11284 | 25 0.44 96.02
11358 | 25 0.44 96.46
11719 | 25 0.44 96.90
12266 | 25 0.44 97.35
12606 | 25 0.44 97.79
12627 | 25 0.44 98.23
12655 | 25 0.44 98.67
12772 | 25 0.44 99.12
12891 | 25 0.44 99.56
12895 | 25 0.44 100.00
------------+-----------------------------------
Total | 5,650 100.00
[361]:
drop if LPIS_install_qt < tq(2013q1) & flag_LPIS_ever == 1
(5,650 observations deleted)
6.g. Safe Streets for Seniors¶
[369]:
codebook Name
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Name (unlabeled)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
type: string (str26)
unique values: 42 missing "": 0/324,675
examples: "nan"
"nan"
"nan"
"nan"
warning: variable has embedded blanks
[370]:
tab Name
Name | Freq. Percent Cum.
---------------------------+-----------------------------------
Astoria | 775 0.24 0.24
Bath Beach | 975 0.30 0.54
Bay Ridge | 1,700 0.52 1.06
Bedford Stuyvesant | 4,600 1.42 2.48
Bensonhurst | 2,475 0.76 3.24
Borough Park | 5,000 1.54 4.78
Brighton Beach | 225 0.07 4.85
Brownsville | 5,100 1.57 6.42
Chelsea/9th Avenue | 1,900 0.59 7.01
Chinatown | 475 0.15 7.15
Corona Elmhurst | 7,125 2.19 9.35
East Concourse | 450 0.14 9.49
East Flatbush | 600 0.18 9.67
East Harlem | 2,625 0.81 10.48
East Village | 125 0.04 10.52
Flatbush | 1,225 0.38 10.90
Flushing | 700 0.22 11.11
Fordham/University Heights | 600 0.18 11.30
Forest Hills | 1,200 0.37 11.67
Greenpoint | 450 0.14 11.80
Hamilton Heights | 1,050 0.32 12.13
Highbridge/Lower Concourse | 3,600 1.11 13.24
Jackson Heights | 1,150 0.35 13.59
Jamaica Hills | 125 0.04 13.63
Kings Bay - Gerritsen | 1,875 0.58 14.21
Kingsbridge | 600 0.18 14.39
Lenox Hill - Turtle Bay | 3,225 0.99 15.38
Lower East Side | 800 0.25 15.63
Manhattan Valley | 3,175 0.98 16.61
Middle Village - Glendale | 1,175 0.36 16.97
Midwood | 300 0.09 17.06
Mott Haven | 1,075 0.33 17.39
New Dorp/Hyland Boulevard | 200 0.06 17.46
Pelham Gardens | 75 0.02 17.48
Rego Park | 250 0.08 17.56
Sheepshead Bay | 750 0.23 17.79
South Beach | 100 0.03 17.82
Sunnyside | 625 0.19 18.01
Upper West Side | 1,675 0.52 18.53
Washington Heights | 1,375 0.42 18.95
Yorkville | 2,875 0.89 19.84
nan | 260,275 80.16 100.00
---------------------------+-----------------------------------
Total | 324,675 100.00
[371]:
gen flag_seniors = (strpos(Name, "nan") == 0)
[373]:
tab flag_seniors
flag_senior |
s | Freq. Percent Cum.
------------+-----------------------------------
0 | 260,275 80.16 80.16
1 | 64,400 19.84 100.00
------------+-----------------------------------
Total | 324,675 100.00
[374]:
tab flag_seniors flag_LPIS
| Indicates the
| intersection at the
| time they became an
flag_senio | LPIS intervention
rs | 0 1 | Total
-----------+----------------------+----------
0 | 246,705 13,570 | 260,275
1 | 58,336 6,064 | 64,400
-----------+----------------------+----------
Total | 305,041 19,634 | 324,675
6.h. Priority Intersection¶
[377]:
sum distance_to_p*, detail
distance_to_priorityinters
-------------------------------------------------------------
Percentiles Smallest
1% 3 0
5% 372 0
10% 640 0 Obs 324,675
25% 1110 0 Sum of Wgt. 324,675
50% 1991 Mean 2603.942
Largest Std. Dev. 2534.222
75% 3221 28481
90% 4978 28481 Variance 6422279
95% 6561 28481 Skewness 3.67198
99% 13842 28481 Kurtosis 23.89315
[376]:
gen flag_priorityinters = (distance_to_priorityinters < 10)
[379]:
tab flag_priorityinters
flag_priori |
tyinters | Freq. Percent Cum.
------------+-----------------------------------
0 | 319,200 98.31 98.31
1 | 5,475 1.69 100.00
------------+-----------------------------------
Total | 324,675 100.00
[378]:
tab flag_priorityinters flag_LPIS
| Indicates the
| intersection at the
| time they became an
flag_prior | LPIS intervention
ityinters | 0 1 | Total
-----------+----------------------+----------
0 | 300,649 18,551 | 319,200
1 | 4,392 1,083 | 5,475
-----------+----------------------+----------
Total | 305,041 19,634 | 324,675
6.i. School intersection/crossing¶
[380]:
sum distance_to_Sch,detail
distance_to_Sch
-------------------------------------------------------------
Percentiles Smallest
1% 125 17
5% 197 17
10% 246 17 Obs 324,675
25% 425 17 Sum of Wgt. 324,675
50% 692 Mean 822.7985
Largest Std. Dev. 612.1056
75% 1055 10685
90% 1511 10685 Variance 374673.3
95% 1872 10685 Skewness 3.568631
99% 2871 10685 Kurtosis 35.5307
[381]:
gen flag_school = (distance_to_Sch < 200)
[382]:
tab flag_school
flag_school | Freq. Percent Cum.
------------+-----------------------------------
0 | 307,775 94.79 94.79
1 | 16,900 5.21 100.00
------------+-----------------------------------
Total | 324,675 100.00
[383]:
tab flag_school flag_LPIS
| Indicates the
| intersection at the
| time they became an
flag_schoo | LPIS intervention
l | 0 1 | Total
-----------+----------------------+----------
0 | 289,690 18,085 | 307,775
1 | 15,351 1,549 | 16,900
-----------+----------------------+----------
Total | 305,041 19,634 | 324,675
6.j. Decay effect¶
[362]:
sort intersection_id quarterly
[363]:
bysort intersection_id: gen flag_LPIS_1yr = 1 if quarterly == (LPIS_install_qt+1)
bysort intersection_id: replace flag_LPIS_1yr = 1 if quarterly == (LPIS_install_qt+2)
bysort intersection_id: replace flag_LPIS_1yr = 1 if quarterly == (LPIS_install_qt+3)
bysort intersection_id: replace flag_LPIS_1yr = 1 if quarterly == (LPIS_install_qt+4)
replace flag_LPIS_1yr = 0 if missing(flag_LPIS_1yr)
replace flag_LPIS_1yr = 0 if flag_LPIS_ever == 0
(313,505 missing values generated)
(10,265 real changes made)
(9,357 real changes made)
(8,784 real changes made)
(285,099 real changes made)
(30,934 real changes made)
[364]:
bysort intersection_id: gen flag_LPIS_2yr = 1 if quarterly == (LPIS_install_qt+5)
bysort intersection_id: replace flag_LPIS_2yr = 1 if quarterly == (LPIS_install_qt+6)
bysort intersection_id: replace flag_LPIS_2yr = 1 if quarterly == (LPIS_install_qt+7)
bysort intersection_id: replace flag_LPIS_2yr = 1 if quarterly == (LPIS_install_qt+8)
replace flag_LPIS_2yr = 0 if missing(flag_LPIS_2yr)
replace flag_LPIS_2yr = 0 if flag_LPIS_ever == 0
(316,621 missing values generated)
(7,091 real changes made)
(6,147 real changes made)
(5,611 real changes made)
(297,772 real changes made)
(21,441 real changes made)
[365]:
bysort intersection_id: gen flag_LPIS_2yrup = 1 if quarterly == (LPIS_install_qt+5)
bysort intersection_id: replace flag_LPIS_2yrup = 1 if flag_LPIS_2yrup[_n-1] == 1
replace flag_LPIS_2yrup = 0 if missing(flag_LPIS_2yrup)
replace flag_LPIS_2yrup = 0 if flag_LPIS_ever == 0
(316,621 missing values generated)
(35,482 real changes made)
(281,139 real changes made)
(35,233 real changes made)
[366]:
bysort intersection_id: gen flag_LPIS_3yrup = 1 if quarterly == (LPIS_install_qt+9)
bysort intersection_id: replace flag_LPIS_3yrup = 1 if flag_LPIS_3yrup[_n-1] == 1
replace flag_LPIS_3yrup = 0 if missing(flag_LPIS_3yrup)
replace flag_LPIS_3yrup = 0 if flag_LPIS_ever == 0
(320,275 missing values generated)
(14,683 real changes made)
(305,592 real changes made)
(16,242 real changes made)
6.k. Panel is strongly balanced¶
[367]:
duplicates report intersection_id
Duplicates in terms of intersection_id
--------------------------------------
copies | observations surplus
----------+---------------------------
25 | 324675 311688
--------------------------------------
6.l. Merge outcomes data¶
[384]:
// Outcomes data
mmerge quarter year intersection_id using "..\working_data\collision_quarterly.dta", ///
type(1:1) ///
unmatched(master)
-------------------------------------------------------------------------------
merge specs |
matching type | 1:1
mv's on match vars | none
unmatched obs from | master
---------------------+---------------------------------------------------------
master file | <data in memory not named>
obs | 324675
vars | 79
match vars | quarter year intersection_id (key)
-------------------+---------------------------------------------------------
using file | ..\working_data\collision_quarterly.dta
obs | 195341
vars | 30
match vars | quarter year intersection_id (key)
---------------------+---------------------------------------------------------
result file | <data in memory not named>
obs | 324675
vars | 108 (including _merge)
------------+---------------------------------------------------------
_merge | 133939 obs only in master data (code==1)
| 190736 obs both in master and using data (code==3)
-------------------------------------------------------------------------------
6.m. Fill in missing information¶
[385]:
// Fill in '0' for intersection quarter year that did not have collisions
replace personsinjured = 0 if _merge == 1
replace personskilled = 0 if _merge == 1
replace pedestriansinjured = 0 if _merge == 1
replace pedestrianskilled = 0 if _merge == 1
replace cyclistinjured = 0 if _merge == 1
replace cyclistkilled = 0 if _merge == 1
replace motoristinjured = 0 if _merge == 1
replace motoristkilled = 0 if _merge == 1
replace collision_count = 0 if mi(collision_count)
replace latenight_personsinjured = 0 if _merge == 1
replace latenight_personskilled = 0 if _merge == 1
replace latenight_pedestriansinjured = 0 if _merge == 1
replace latenight_pedestrianskilled = 0 if _merge == 1
replace latenight_cyclistinjured = 0 if _merge == 1
replace latenight_cyclistkilled = 0 if _merge == 1
replace latenight_motoristinjured = 0 if _merge == 1
replace latenight_motoristkilled = 0 if _merge == 1
replace latenight_collision_count = 0 if mi(latenight_collision_count)
replace day_personsinjured = 0 if _merge == 1
replace day_personskilled = 0 if _merge == 1
replace day_pedestriansinjured = 0 if _merge == 1
replace day_pedestrianskilled = 0 if _merge == 1
replace day_cyclistinjured = 0 if _merge == 1
replace day_cyclistkilled = 0 if _merge == 1
replace day_motoristinjured = 0 if _merge == 1
replace day_motoristkilled = 0 if _merge == 1
replace day_collision_count = 0 if mi(day_collision_count)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
(133,939 real changes made)
[387]:
gen flag_collision = collision_count > 0
gen latenight_flag_collision = latenight_collision_count > 0
gen day_flag_collision = day_collision_count > 0
Label outcomes variables
[388]:
label variable personsinjured "No. of persons injured during that month"
label variable personskilled "No. of persons killed during that month"
label variable pedestriansinjured "No. of pedestrians injured during that month"
label variable pedestrianskilled "No. of pedestrians killed during that month"
label variable cyclistinjured "No. of cyclist injured during that month"
label variable cyclistkilled "No. of cyclist killed during that month"
label variable motoristinjured "No. of motorist injured during that month"
label variable motoristkilled "No. of motorist killed during that month"
label variable collision_count "No. of collisions occured during that month"
label variable flag_collision "At least one collision occurred that month"
label variable latenight_personsinjured "No. of persons injured during that month between 11pm - 5am"
label variable latenight_personskilled "No. of persons killed during that month between 11pm - 5am"
label variable latenight_pedestriansinjured "No. of pedestrians injured during that month between 11pm - 5am"
label variable latenight_pedestrianskilled "No. of pedestrians killed during that month between 11pm - 5am"
label variable latenight_cyclistinjured "No. of cyclist injured during that month between 11pm - 5am"
label variable latenight_cyclistkilled "No. of cyclist killed during that month between 11pm - 5am"
label variable latenight_motoristinjured "No. of motorist injured during that month between 11pm - 5am"
label variable latenight_motoristkilled "No. of motorist killed during that month between 11pm - 5am"
label variable latenight_collision_count "No. of collisions occured during that month between 11pm - 5am"
label variable latenight_flag_collision " At least one collision occurred that month between 11pm - 5am"
label variable day_personsinjured "No. of persons injured during that month between 5am - 11pm"
label variable day_personskilled "No. of persons killed during that month between 5am - 11pm"
label variable day_pedestriansinjured "No. of pedestrians injured during that month between 5am - 11pm"
label variable day_pedestrianskilled "No. of pedestrians killed during that month between 5am - 11pm"
label variable day_cyclistinjured "No. of cyclist injured during that month between 5am - 11pm"
label variable day_cyclistkilled "No. of cyclist killed during that month between 5am - 11pm"
label variable day_motoristinjured "No. of motorist injured during that month between 5am - 11pm"
label variable day_motoristkilled "No. of motorist killed during that month between 5am - 11pm"
label variable day_collision_count "No. of collisions occured during that month between 5am - 11pm"
label variable day_flag_collision " At least one collision occurred that month between 5am - 11pm"
[ ]:
//save "..\working_data\analytical_file_panel_qt.dta",replace
7. Convert analytical panel data into shapefile¶
python
[2]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\working_data\analytical_file_panel_qt.dta"
[3]:
analytical_df = pd.read_stata(fp)
[4]:
analytical_df.head(3)
[4]:
intersection_id | quarter | year | quarterly | index | y | x | st1_name | st2_name | st3_name | ... | day_personskilled | day_pedestriansinjured | day_pedestrianskilled | day_cyclistinjured | day_cyclistkilled | day_motoristinjured | day_motoristkilled | flag_collision | latenight_flag_collision | day_flag_collision | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 2013 | 2013-01-01 | 0 | 199793.6093 | 986336.1490 | ALLEN STREET | CANAL STREET | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.0 | 0.0 | 1.0 | |
1 | 2 | 1 | 2013 | 2013-01-01 | 1 | 202206.1619 | 982769.3310 | AVENUE OF THE AMERICAS | LAIGHT STREET | CANAL STREET | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 |
2 | 3 | 1 | 2013 | 2013-01-01 | 2 | 201790.0942 | 982805.6184 | AVENUE OF THE AMERICAS | LISPENARD STREET | WEST BROADWAY | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 |
3 rows × 163 columns
[5]:
analytical_df.columns
[5]:
Index(['intersection_id', 'quarter', 'year', 'quarterly', 'index', 'y', 'x',
'st1_name', 'st2_name', 'st3_name',
...
'day_personskilled', 'day_pedestriansinjured', 'day_pedestrianskilled',
'day_cyclistinjured', 'day_cyclistkilled', 'day_motoristinjured',
'day_motoristkilled', 'flag_collision', 'latenight_flag_collision',
'day_flag_collision'],
dtype='object', length=163)
[6]:
# Put the latitude and longtitude
analytical_df['geometry'] = list(zip(analytical_df.x, analytical_df.y))
analytical_df['geometry'] = analytical_df['geometry'].apply(Point)
analytical_gdf = gpd.GeoDataFrame(analytical_df, geometry='geometry')
[7]:
analytical_gdf.crs = {'init' :'epsg:2263'}
[8]:
analytical_gdf.plot(figsize=(15, 15));
The shapefile is required to calculate contiguity matrix in Stata
[397]:
save_out = analytical_gdf[['collision_count','personsinjured','pedestriansinjured','cyclistinjured','motoristinjured',
'bronx','brooklyn','manhattan','queens','statenisland','distance_to_LPIS','distance_to_Street',
'distance_to_Sch','distance_to_LTC','distance_to_bikeroute','distance_to_truckroute',
'flag_left_turn_ever','flag_left_turn','flag_LPIS_ever','flag_LPIS','bike_route_tv',
'bike_route_ever', 'flag_seniors','flag_priorityinters','flag_school', 'flag_street_improv',
'quarter','year','intersection_id','y','x','geometry']]
[398]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\working_data\analytical_panel_shapefile\analytical_panel_qt_shapefile.shp"
[ ]:
#save_out.to_file(origin+fp)
8. Thiessen Polygons¶
python
[9]:
analytical_gdf.columns
[9]:
Index(['intersection_id', 'quarter', 'year', 'quarterly', 'index', 'y', 'x',
'st1_name', 'st2_name', 'st3_name',
...
'day_pedestriansinjured', 'day_pedestrianskilled', 'day_cyclistinjured',
'day_cyclistkilled', 'day_motoristinjured', 'day_motoristkilled',
'flag_collision', 'latenight_flag_collision', 'day_flag_collision',
'geometry'],
dtype='object', length=164)
We want to extract only the cross section of the quarterly panel shapefile
[10]:
analytical_gdf.head(3)
[10]:
intersection_id | quarter | year | quarterly | index | y | x | st1_name | st2_name | st3_name | ... | day_pedestriansinjured | day_pedestrianskilled | day_cyclistinjured | day_cyclistkilled | day_motoristinjured | day_motoristkilled | flag_collision | latenight_flag_collision | day_flag_collision | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 2013 | 2013-01-01 | 0 | 199793.6093 | 986336.1490 | ALLEN STREET | CANAL STREET | ... | 0 | 0 | 0 | 0 | 0 | 0 | 1.0 | 0.0 | 1.0 | POINT (986336.149 199793.609300002) | |
1 | 2 | 1 | 2013 | 2013-01-01 | 1 | 202206.1619 | 982769.3310 | AVENUE OF THE AMERICAS | LAIGHT STREET | CANAL STREET | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | POINT (982769.331 202206.161899999) |
2 | 3 | 1 | 2013 | 2013-01-01 | 2 | 201790.0942 | 982805.6184 | AVENUE OF THE AMERICAS | LISPENARD STREET | WEST BROADWAY | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | POINT (982805.6183999931 201790.0942) |
3 rows × 164 columns
[11]:
# Extract cross section of quarter==3 & year == 2012
sub_analytical_gdf = gpd.GeoDataFrame(analytical_gdf.loc[(analytical_gdf['quarter'] == 3) & (analytical_gdf['year'] == 2012)])
[12]:
sub_analytical_gdf['intersection_id'].count()
[12]:
12987
[13]:
sub_analytical_gdf.crs = {'init': 'epsg:2263'}
[14]:
points = list(zip(sub_analytical_gdf.x,sub_analytical_gdf.y))
Instead of extracting a cross section, we can also collapse the data to the unique intersections level
[15]:
collapsed_analytical_gdf = gpd.GeoDataFrame(analytical_gdf.groupby('intersection_id').agg({
'collision_count': sum, # find the sum of collision_count in each intersection
'personsinjured': sum, # find the sum of personsinjured in each intersection
'pedestriansinjured': sum, # find the sum of pedestriansinjured in each intersection
'cyclistinjured': sum , # find the sum of cyclistinjured in each intersection
'motoristinjured': sum , # find the sum of motoristinjured in each intersection
'flag_LPIS_ever': "first" ,
'x': "first",
'y': "first",
'geometry': "first"})).reset_index() # find the first geometry
[16]:
collapsed_analytical_gdf.columns
[16]:
Index(['intersection_id', 'collision_count', 'personsinjured',
'pedestriansinjured', 'cyclistinjured', 'motoristinjured',
'flag_LPIS_ever', 'x', 'y', 'geometry'],
dtype='object')
[17]:
collapsed_analytical_gdf['intersection_id'].count()
[17]:
12987
[18]:
collapsed_analytical_gdf.crs = {'init': 'epsg:2263'}
[19]:
points = list(zip(collapsed_analytical_gdf.x,collapsed_analytical_gdf.y))
8.a. Voronoi¶
Creates dataframe regions and vertices
source: https://github.com/pysal/libpysal/blob/master/notebooks/voronoi.ipynb
[20]:
regions, vertices = voronoi_frames(points)
[21]:
regions.columns
[21]:
Index(['geometry'], dtype='object')
[22]:
regions.crs = {'init': 'epsg:2263'}
vertices.crs = {'init': 'epsg:2263'}
[23]:
fig, ax = plt.subplots(figsize=(15, 15))
regions.plot(ax=ax, color='blue',edgecolor='black', alpha=0.3)
vertices.plot(ax=ax, color='red');
Problem here is that the voronoi polygons extends far out from NYC’s boundaries. Therefore, we will use geopandas
’s overlay
to “cut” the borough shape out.
[24]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\nyc_boroughs_2263\nyc_boroughs_2263.shp"
[25]:
borough_gdf = gpd.read_file(fp)
[26]:
borough_gdf.crs
[26]:
{'proj': 'lcc',
'lat_1': 41.03333333333333,
'lat_2': 40.66666666666666,
'lat_0': 40.16666666666666,
'lon_0': -74,
'x_0': 300000.0000000001,
'y_0': 0,
'datum': 'NAD83',
'units': 'us-ft',
'no_defs': True}
[27]:
borough_gdf = borough_gdf.to_crs({'init': 'epsg:2263'})
[28]:
borough_gdf.count()
[28]:
bcode 5
bname 5
name 5
namelsad 5
geometry 5
dtype: int64
There are multipolygons in each boroughs, but we want each polygon to be on its own. Therefore, we will use geopandas
’s explode
function.
[29]:
borough_explode_gdf = borough_gdf.explode()
[30]:
borough_explode_gdf.count()
[30]:
bcode 85
bname 85
name 85
namelsad 85
geometry 85
dtype: int64
[31]:
# Overlay with borough to extract the outlines
voronoi_sub_analytical_gdf = gpd.overlay(regions, borough_explode_gdf, how='intersection')
[32]:
voronoi_sub_analytical_gdf.plot(figsize=(15, 15));
[33]:
voronoi_sub_analytical_gdf.count()
[33]:
bcode 13315
bname 13315
name 13315
namelsad 13315
geometry 13315
dtype: int64
[34]:
voronoi_sub_analytical_gdf.columns
[34]:
Index(['bcode', 'bname', 'name', 'namelsad', 'geometry'], dtype='object')
[35]:
collapsed_analytical_gdf.columns
[35]:
Index(['intersection_id', 'collision_count', 'personsinjured',
'pedestriansinjured', 'cyclistinjured', 'motoristinjured',
'flag_LPIS_ever', 'x', 'y', 'geometry'],
dtype='object')
Problem: Recall that there were only 12987 counts previously, but now there is 13315. We will use geopandas
’s sjoin
to find polygons that intersect with the signal intersection points.
[38]:
voronoi_intersections_collapsed_analytical_gdf = gpd.sjoin(voronoi_sub_analytical_gdf,collapsed_analytical_gdf, how='inner', op='intersects')
[40]:
voronoi_intersections_collapsed_analytical_gdf.head()
[40]:
bcode | bname | name | namelsad | geometry | index_right | intersection_id | collision_count | personsinjured | pedestriansinjured | cyclistinjured | motoristinjured | flag_LPIS_ever | x | y | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 36061 | Manhattan | New York | New York County | POLYGON ((986291.1955668803 200052.298871208, ... | 0 | 1 | 97.0 | 21.0 | 7 | 3 | 11.0 | 0 | 986336.1490 | 199793.6093 |
3 | 36061 | Manhattan | New York | New York County | POLYGON ((982721.8764644475 202448.3499543682,... | 1 | 2 | 4.0 | 1.0 | 0 | 0 | 1.0 | 0 | 982769.3310 | 202206.1619 |
5 | 36061 | Manhattan | New York | New York County | POLYGON ((983012.7139141727 201750.4648247989,... | 2 | 3 | 0.0 | 0.0 | 0 | 0 | 0.0 | 0 | 982805.6184 | 201790.0942 |
7 | 36061 | Manhattan | New York | New York County | POLYGON ((982967.8732795889 201664.0831515846,... | 3 | 4 | 50.0 | 5.0 | 3 | 1 | 1.0 | 0 | 982812.2555 | 201529.9618 |
9 | 36061 | Manhattan | New York | New York County | POLYGON ((982619.6644923392 201201.7787439331,... | 4 | 5 | 45.0 | 10.0 | 1 | 2 | 7.0 | 0 | 982816.3956 | 201226.1097 |
[41]:
voronoi_sub_analytical_gdf['bcode'].count()
[41]:
13315
[42]:
print(voronoi_intersections_collapsed_analytical_gdf.columns.values)
['bcode' 'bname' 'name' 'namelsad' 'geometry' 'index_right'
'intersection_id' 'collision_count' 'personsinjured' 'pedestriansinjured'
'cyclistinjured' 'motoristinjured' 'flag_LPIS_ever' 'x' 'y']
It is suppose to be 12,987, but we only have 12983. The 4 intersections were lost because they are outside the boundaries of the boroughs.
[43]:
voronoi_intersections_collapsed_analytical_gdf.plot(figsize=(15, 15));
[84]:
quantiles = ps.Quantiles(voronoi_intersections_collapsed_analytical_gdf['collision_count'], k=4)
f, ax = plt.subplots(1, figsize=(25, 18))
voronoi_intersections_collapsed_analytical_gdf.assign(cl=quantiles.yb).plot(column='cl', categorical=True, \
k=4, cmap='OrRd', linewidth=0.1, ax=ax, \
edgecolor='black', legend=True)
ax.set_axis_off()
plt.title("Total Number of Collisions by Quartiles")
plt.savefig('../manuscripts/NYC_collisions_quartiles.pdf', bbox_inches='tight')
[85]:
quantiles10 = ps.Quantiles(voronoi_intersections_collapsed_analytical_gdf['collision_count'], k=10)
f, ax = plt.subplots(1, figsize=(25, 18))
voronoi_intersections_collapsed_analytical_gdf.assign(cl=quantiles10.yb).plot(column='cl', categorical=True, \
k=10, cmap='OrRd', linewidth=0.1, ax=ax, \
edgecolor='black', legend=True)
ax.set_axis_off()
plt.title("Total Number of Collisions by Deciles")
plt.savefig('../manuscripts/NYC_collisions_deciles.pdf', bbox_inches='tight')
[79]:
quantiles = ps.Quantiles(voronoi_intersections_collapsed_analytical_gdf['flag_LPIS_ever'], k=2)
f, ax = plt.subplots(1, figsize=(25, 18))
voronoi_intersections_collapsed_analytical_gdf.assign(cl=quantiles.yb).plot(column='cl', categorical=True, \
k=4, cmap='BuPu', linewidth=0.1, ax=ax, \
edgecolor='black', legend=True)
ax.set_axis_off()
plt.title("Leading Pedestrian Intervanl Signals")
plt.show();
8.b. Spatial Autocorrelation¶
“The concept of spatial autocorrelation relates to the combination of two types of similarity: spatial similarity and attribute similarity. Although there are many different measures of spatial autocorrelation, they all combine these two types of simmilarity into a summary measure.” source: http://darribas.org/gds_scipy16/ipynb_md/04_esda.html
[47]:
voronoi_intersections_collapsed_analytical_gdf.columns
[47]:
Index(['bcode', 'bname', 'name', 'namelsad', 'geometry', 'index_right',
'intersection_id', 'collision_count', 'personsinjured',
'pedestriansinjured', 'cyclistinjured', 'motoristinjured',
'flag_LPIS_ever', 'x', 'y'],
dtype='object')
[48]:
save_out = voronoi_intersections_collapsed_analytical_gdf[['intersection_id', 'collision_count', 'personsinjured',
'pedestriansinjured', 'cyclistinjured', 'motoristinjured', 'y', 'x',
'geometry']]
[51]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\working_data\analytical_panel_shapefile\collapsed_analytical_panel_qt_shapefile.shp"
save_out.to_file(fp)
[52]:
data = ps.pdio.read_files(fp)
W = ps.queen_from_shapefile(fp)
W.transform = 'r'
[53]:
data.columns
[53]:
Index(['intersecti', 'collision_', 'personsinj', 'pedestrian', 'cyclistinj',
'motoristin', 'y', 'x', 'geometry'],
dtype='object')
[56]:
data.columns = (['intersection_id', 'collision_count', 'personsinjured',
'pedestriansinjured', 'cyclistinjured', 'motoristinjured', 'y', 'x',
'geometry'])
[57]:
data.columns
[57]:
Index(['intersection_id', 'collision_count', 'personsinjured',
'pedestriansinjured', 'cyclistinjured', 'motoristinjured', 'y', 'x',
'geometry'],
dtype='object')
[58]:
collision_count_Lag = ps.lag_spatial(W, data['collision_count'])
[67]:
collision_count_LagQ4 = ps.Quantiles(collision_count_Lag, k=4)
collision_count_LagQ10 = ps.Quantiles(collision_count_Lag, k=10)
[88]:
f, ax = plt.subplots(1, figsize=(25, 18))
voronoi_intersections_collapsed_analytical_gdf.assign(cl=collision_count_LagQ4.yb).plot(column='cl', categorical=True, \
k=4, cmap='OrRd', linewidth=0.1, ax=ax, \
edgecolor='black', legend=True)
ax.set_axis_off()
plt.title("Collisions Spatial Lag Quartiles")
plt.savefig('../manuscripts/NYC_collisions_quartiles_lagged.pdf', bbox_inches='tight')
[91]:
f, ax = plt.subplots(1, figsize=(25, 18))
voronoi_intersections_collapsed_analytical_gdf.assign(cl=collision_count_LagQ10.yb).plot(column='cl', categorical=True, \
k=10, cmap='OrRd', linewidth=0.1, ax=ax, \
edgecolor='black', legend=True)
ax.set_axis_off()
plt.title("Collisions Spatial Lag Deciles")
plt.savefig('../manuscripts/NYC_collisions_deciles_lagged.pdf')