An Evaluation of the Impact of Leading Pedestrian Interval Signals in NYC

By Jeremy J. Sze

Submitted in partial fulfillment of the requirements for the degree of Master of Arts in Economics, Hunter College The City University of New York 2019

Thesis Sponsor: Professor Jonathan Conning
Professor Partha Deb

Link to thesis: https://academicworks.cuny.edu/hc_sas_etds/432/

Abstract

I evaluated the impact of the phased introduction of Leading Pedestrian Interval Signals (LPIs) on collision and injury outcomes at 12,987 signalized traffic intersections in New York City over the course of 25 quarters from 2012 to 2018. An intersection is treated when a LPIs is installed to give pedestrians lead time to cross the street before vehicles are allowed to move. Outcomes from NYPD’s Motor Vehicle Collisions data were matched to signalized intersections. I hypothesize that LPIs would reduce collisions and reduce injuries for pedestrians at intersections. A difference in difference fixed effects panel regression was used to identify the causal effect of introducing LPIs. This approach accounts for the problem that unobserved heterogeneity that might bias results in simpler regression approaches. The analysis suggests that the introduction of LPIs decreased quarterly collision counts by 5.45% and decreased the quarterly number of pedestrians injured by 14.7% over the same intervention period. LPIs appears to be effective in reducing both collisions and injuries.

Acknowledgements

Throughout the analysis and writing of my thesis I had received a great deal of support and assistance. I would like to thank my thesis advisor, Professor Jonathan Conning, who helped me develop my data processing pipeline and analytical plan. Next, I would like to thank Professor Partha Deb and Professor Matthew Baker for their valuable time and input during the model development stage and Professor Randall Filer for his guidance in the bi-weekly thesis workshops. In addition, I would like to thank my sister, Jocelyne Sze and friend Dennis Kim for proofreading my thesis drafts. Finally, I would like to thank my wife, Maria Eugenia Brandão for her wise counsel and support as we discussed ideas for my thesis.

Contents

  1. Signal intersection (stata)

  2. LPIs (python)

  3. Linking Signal intersection to other data (python)

  4. NYPD Motor Vehicle Collision data

    • download (stata)

    • clean (stata)

    • link to signal intersection (python)

  5. Calculate collision outcomes (stata)

  6. Set up panel data

    • monthly analytical panel (stata)

    • quarterly analytical panel (stata)

  7. Convert analytical panel data into shapefile (python)

  8. Thiessen Polygons (python)

  9. Non-spatial Analysis (stata)

  10. Spatial Analysis (stata)

1. Signal intersection

  • Stata

  • Explore duplicates

  • Create intersection IDS

About “signal_controllers.csv”

This was obtained through a request to the DOT Commissioner.

[1]:
cd "..\input_data\DOT_traffic_signals_Oct_2018\"
C:\Users\jerem\Box Sync\Policy Evaluation\input_data\DOT_traffic_signals_Oct_2018
[2]:
import delimited using "signal_controllers.csv",clear stringcols(_all)
(6 vars, 13,278 obs)
[4]:
describe

Contains data
  obs:        13,278
 vars:             7
 size:     2,376,762
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
y               str22   %22s                  Y
x               str22   %22s                  X
st1_name        str31   %31s                  ST1_Name
st2_name        str32   %32s                  ST2_Name
st3_name        str35   %35s                  ST3_Name
st4_name        str36   %36s                  ST4_Name
dup             byte    %12.0g
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sorted by:
     Note: Dataset has changed since last saved.
[3]:
duplicates tag y x, gen(dup)
tab dup


Duplicates in terms of y x


        dup |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |     13,156       99.08       99.08
          1 |        112        0.84       99.92
          2 |          3        0.02       99.95
          6 |          7        0.05      100.00
------------+-----------------------------------
      Total |     13,278      100.00
[5]:
duplicates drop y x, force

Duplicates in terms of y x

(64 observations deleted)
[7]:
gen intersection_id = _n, before(y)

Check to remember why we dropped intersection_id 2799

[8]:
// Nonsensical coordinates
drop if intersection_id ==  2799
(1 observation deleted)
[ ]:
save "signal_controllers_clean.dta"

2. Leading Pedestrian Interval Signals (LPIs)

  • Python

  • Explore the LPIs data using python’s Geopandas library

  • Create IDs

  • Save into Stata dta

[1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
from scipy.spatial import cKDTree
from shapely.geometry import Point, MultiPoint
import pysal as ps
import libpysal
from libpysal.cg.voronoi  import voronoi, voronoi_frames

from pysal.contrib.viz import mapping as map
from pylab import *
from pysal.contrib.viz import folium_mapping as fm
import geojson as gj
import seaborn as sns
import mplleaflet as mpll

%matplotlib inline
import os
os.environ["PROJ_LIB"] = "C:\ProgramData\Anaconda3\Library\share" #window
C:\Users\jerem\.conda\envs\geo2\lib\site-packages\pysal\__init__.py:65: VisibleDeprecationWarning: PySAL's API will be changed on 2018-12-31. The last release made with this API is version 1.14.4. A preview of the next API version is provided in the `pysal` 2.0 prelease candidate. The API changes and a guide on how to change imports is provided at https://migrating.pysal.org
  ), VisibleDeprecationWarning)
[10]:
cd
C:\Users\jerem

Opening the Vision Zero Leading Pedestrian Interval Signals shapefile

[19]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\VZV_Leading Pedestrian Interval Signals\geo_export_0c63b43f-83c0-4834-aa91-2c564c1bff2c.shp"
[20]:
lpis_df = gpd.read_file(fp)
[21]:
lpis_df.head(3)
[21]:
cross_stre date_insta time_insta lat long main_stree geometry
0 Dreiser loop East 2018-08-01 00:00:00.000 40.878465 -73.828273 Co-op City Blvd POINT (-73.82827338235253 40.87846542795222)
1 West 119 St 2018-08-08 00:00:00.000 40.803958 -73.948271 Lenox Avenue POINT (-73.94827133431397 40.80395846158952)
2 West 120 St 2018-08-08 00:00:00.000 40.804587 -73.947812 Lenox Avenue POINT (-73.94781210658503 40.80458679730486)
[22]:
lpis_df.plot(figsize=(15, 15));
[22]:
<matplotlib.axes._subplots.AxesSubplot at 0x21e3d70c780>
_images/Impact-Evaluation-of-Leading-Pedestrian-Interval-Signals_21_1.png

First, we have to check the CRS (coordinate reference system) of the shapefile. This is important because all the other shapefiles have to be on the same projection in order to do spatial joins and identify nearest neighbors. For more information on projections, you can watch this short video and read this document from Earthdatascience.org.

To see the crs of lpis_df, we use geodataframe.crs

[23]:
lpis_df.crs
[23]:
{'init': 'epsg:4326'}

EPSG 4326 is the coordinate system for the world. It is also known as WGS84. When you have a longitude and latitude from this CRS, you can copy and paste it into Google Maps to find that location.

Since we are working with New York City only, we prefer to use a projection that more accurately protrays the shape of NYC. We will reproject lpis_df to EPSG 2263 (NAD83 / New York Long Island (ftUS)).

[24]:
lpis_df = lpis_df.to_crs({'init': 'epsg:2263'})

We can see that lpis_df now has the right projection.

[25]:
lpis_df.crs
[25]:
{'init': 'epsg:2263'}
[26]:
lpis_df.plot(figsize=(15, 15));
[26]:
<matplotlib.axes._subplots.AxesSubplot at 0x21e3e917128>
_images/Impact-Evaluation-of-Leading-Pedestrian-Interval-Signals_30_1.png

If you look at both plots again, you will notice that the axes labels are different. This is due to the different CRS.

I create IDs in the dataframe, since lpis_df has no ID variables.

[27]:
lpis_df['LPIS_ID'] = lpis_df.index
lpis_df.head(3)
[27]:
cross_stre date_insta time_insta lat long main_stree geometry LPIS_ID
0 Dreiser loop East 2018-08-01 00:00:00.000 40.878465 -73.828273 Co-op City Blvd POINT (1031739.000154228 259373.000004255) 0
1 West 119 St 2018-08-08 00:00:00.000 40.803958 -73.948271 Lenox Avenue POINT (998570.9998893011 232184.9999184268) 1
2 West 120 St 2018-08-08 00:00:00.000 40.804587 -73.947812 Lenox Avenue POINT (998697.9999633889 232414.0001165908) 2

Next, I want to export the dataframe to a Stata dta file. Objects need to be converted to str.

[24]:
# Might be unnecessary
#str_cols = list(lpis_df.select_dtypes(include=['object']).columns)
#for col in str_cols:
#    lpis_df[col] = lpis_df[col].astype(str)
[ ]:
#fp =  r"C:\Users\jerem\Box Sync\Policy Evaluation\working_data\VZV_LPIS_data.dta"
#lpis_df.to_stata(fp)

3. Linking Signal intersection to other data

  • python

  • uses scipy’s cKDTree

  • connects each data to the nearest signal intersection

Define a function to obtain IDs of nearest neighbors and distance measure.

[28]:
# ckdnearest function
# from "https://gist.github.com/jhconning/63a34a51acff83d116adc52308faf240"
def ckdnearest(gdA, gdB, bcol):
    """
    This function takes geodataframes: `gdA` and `gdB` and
    a column name `bcol`. Both dataframes are assumed to have a `geometry` column.
    It finds the nearest neighbor from each location in `gdA` to a
    nearest neighbor in `gdB`.

    It returns a two-column pandas dataframe with a 'distance' (here rounded to nearest foot)
    and the value of the `bcol` in `gdB'  (e.g. 'school_name')
    """

    nA = np.array(list(zip(gdA.geometry.x, gdA.geometry.y)) )
    nB = np.array(list(zip(gdB.geometry.x, gdB.geometry.y)) )
    btree = cKDTree(nB)
    dist, idx = btree.query(nA,k=1)
    df = pd.DataFrame.from_dict({'distance': dist.astype(int),
                             'bcol' : gdB.loc[idx, bcol].values })
    return df

3.a. Signal Intersections

Read the cleaned Signal intersection Stata dta file

[3]:
# Open signal intersection stata dta file
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\DOT_traffic_signals_Oct_2018\signal_controllers_clean.dta"
[4]:
sig_inters_df = pd.read_stata(fp)
[5]:
sig_inters_df.head(3)
[5]:
intersection_id y x st1_name st2_name st3_name st4_name dup
0 1.0 199793.609300002 986336.149 ALLEN STREET CANAL STREET 0
1 2.0 202206.161899999 982769.331 AVENUE OF THE AMERICAS LAIGHT STREET CANAL STREET 0
2 3.0 201790.0942 982805.618399993 AVENUE OF THE AMERICAS LISPENARD STREET WEST BROADWAY 0
[8]:
sig_inters_df['y'].dtypes
[8]:
dtype('O')
[9]:
sig_inters_df['x'].dtypes
[9]:
dtype('O')

Convert the y and x coordinates into float

[11]:
# Convert coordinates into float
sig_inters_df['y'] = sig_inters_df['y'].astype(float)
sig_inters_df['x'] = sig_inters_df['x'].astype(float)

Create Coordinates by combining x and y together with zip into a list

[14]:
sig_inters_df['Coordinates'] = list(zip(sig_inters_df.x, sig_inters_df.y))

Turn Coordinates into shapely Points

[16]:
sig_inters_df['Coordinates'] = sig_inters_df['Coordinates'].apply(Point)

Create Geopandas dataframe

[17]:
sig_inters_gdf = gpd.GeoDataFrame(sig_inters_df, geometry='Coordinates')

3.b. Leading Pedestrian Signal Interval

[29]:
lpis_df.head(3)
[29]:
cross_stre date_insta time_insta lat long main_stree geometry LPIS_ID
0 Dreiser loop East 2018-08-01 00:00:00.000 40.878465 -73.828273 Co-op City Blvd POINT (1031739.000154228 259373.000004255) 0
1 West 119 St 2018-08-08 00:00:00.000 40.803958 -73.948271 Lenox Avenue POINT (998570.9998893011 232184.9999184268) 1
2 West 120 St 2018-08-08 00:00:00.000 40.804587 -73.947812 Lenox Avenue POINT (998697.9999633889 232414.0001165908) 2

For each LPIs, we want to find the closest signalized intersection

[30]:
sig_inters_gdf[['distance_to_LPIS','nearest_LPIS']] = ckdnearest(sig_inters_gdf, lpis_df,'LPIS_ID')
[31]:
sig_inters_gdf.head(3)
[31]:
intersection_id y x st1_name st2_name st3_name st4_name dup Coordinates distance_to_LPIS nearest_LPIS
0 1.0 199793.6093 986336.1490 ALLEN STREET CANAL STREET 0 POINT (986336.149 199793.609300002) 276 1915
1 2.0 202206.1619 982769.3310 AVENUE OF THE AMERICAS LAIGHT STREET CANAL STREET 0 POINT (982769.331 202206.161899999) 592 2016
2 3.0 201790.0942 982805.6184 AVENUE OF THE AMERICAS LISPENARD STREET WEST BROADWAY 0 POINT (982805.6183999931 201790.0942) 413 2016

See the number of observations in sig_inters_gdf

[32]:
sig_inters_gdf['intersection_id'].count()
[32]:
13213

We want to merge into sig_inters_gdf additional information from lpis_df

[33]:
sig_inters_gdf = sig_inters_gdf.merge(lpis_df[['date_insta','LPIS_ID']], how='left', left_on='nearest_LPIS', right_on='LPIS_ID', validate ="m:1")
[34]:
sig_inters_gdf.head(3)
[34]:
intersection_id y x st1_name st2_name st3_name st4_name dup Coordinates distance_to_LPIS nearest_LPIS date_insta LPIS_ID
0 1.0 199793.6093 986336.1490 ALLEN STREET CANAL STREET 0 POINT (986336.149 199793.609300002) 276 1915 2015-10-16 1915
1 2.0 202206.1619 982769.3310 AVENUE OF THE AMERICAS LAIGHT STREET CANAL STREET 0 POINT (982769.331 202206.161899999) 592 2016 1997-08-07 2016
2 3.0 201790.0942 982805.6184 AVENUE OF THE AMERICAS LISPENARD STREET WEST BROADWAY 0 POINT (982805.6183999931 201790.0942) 413 2016 1997-08-07 2016

Remove the additional variable LPIS_ID

[35]:
sig_inters_gdf.drop(['LPIS_ID'], axis=1, inplace=True)

3.c. School shapefile

source: https://www.baruch.cuny.edu/confluence/display/geoportal/NYC+Geodatabase

[36]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\school_private_public_2263\school_private_public_2263.shp"
[37]:
school_gdf = gpd.read_file(fp)
[38]:
school_gdf.head(3)
[38]:
uid idagency facname opname address city zipcode bcode facsubgrp factype capacity util xcoord ycoord geometry
0 192.0 321000145390 Our Lady Of Refuge School Our Lady Of Refuge School 2708 Briggs Avenue Bronx 10458 36005 Non-Public K-12 Schools Elementary School - Non-public NaN 267.0 1.014359e+06 254977.5176 POINT (1014358.6833 254977.5176)
1 294.0 332000226225 Yeshiva Toldos Yesuscher Yeshiva Toldos Yesuscher 1531 63 Street Brooklyn 11219 36047 Non-Public K-12 Schools Elementary School - Non-public NaN 74.0 9.857114e+05 166689.5140 POINT (985711.4182 166689.514)
2 2233.0 331400225670 Ohel Elozer Ohel Elozer 263 Classon Ave-Ste 4b Brooklyn 11205 36047 Non-Public K-12 Schools High School - Non-public NaN 161.0 9.951838e+05 191390.1394 POINT (995183.844 191390.1394)
[39]:
school_gdf.crs
[39]:
{'proj': 'lcc',
 'lat_1': 41.03333333333333,
 'lat_2': 40.66666666666666,
 'lat_0': 40.16666666666666,
 'lon_0': -74,
 'x_0': 300000.0000000001,
 'y_0': 0,
 'datum': 'NAD83',
 'units': 'us-ft',
 'no_defs': True}

For each school, we want to find the closest signalized intersection

[40]:
sig_inters_gdf[['distance_to_Sch','nearest_Sch']] = ckdnearest(sig_inters_gdf, school_gdf,'uid')
[41]:
sig_inters_gdf.head(3)
[41]:
intersection_id y x st1_name st2_name st3_name st4_name dup Coordinates distance_to_LPIS nearest_LPIS date_insta distance_to_Sch nearest_Sch
0 1.0 199793.6093 986336.1490 ALLEN STREET CANAL STREET 0 POINT (986336.149 199793.609300002) 276 1915 2015-10-16 529 3433.0
1 2.0 202206.1619 982769.3310 AVENUE OF THE AMERICAS LAIGHT STREET CANAL STREET 0 POINT (982769.331 202206.161899999) 592 2016 1997-08-07 795 38150.0
2 3.0 201790.0942 982805.6184 AVENUE OF THE AMERICAS LISPENARD STREET WEST BROADWAY 0 POINT (982805.6183999931 201790.0942) 413 2016 1997-08-07 635 38150.0

3.d. Vision Zero Left Turn Calming intervention

[42]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\left_turn_traffic_calming_shapefile\left_turn_traffic_calming.shp"
[43]:
ltc_gdf = gpd.read_file(fp)
[44]:
ltc_gdf.head(3)
[44]:
treatment_ completion geometry
0 Quick Kurb to X-walk 2017-12-08 POINT (-73.97604825017703 40.75144139179798)
1 Daylighting, Box markings, (1) 6' Rubber speed... 2018-04-12 POINT (-73.82577379991478 40.75635805716421)
2 Quick Kurb to X-walk 2016-06-30 POINT (-73.97719688637932 40.7643236906894)
[46]:
ltc_gdf.crs
[46]:
{'init': 'epsg:4326'}

We have to reproject the geodataframe to the right CRS (EPSG 2263: NAD83 / New York Long Island (ftUS))

[47]:
ltc_gdf = ltc_gdf.to_crs({'init': 'epsg:2263'})

Create ID variable for ltc_gdf

[48]:
ltc_gdf['LTC_ID'] = ltc_gdf.index
[49]:
ltc_gdf.head(3)
[49]:
treatment_ completion geometry LTC_ID
0 Quick Kurb to X-walk 2017-12-08 POINT (990886.2363875993 213047.939427546) 0
1 Daylighting, Box markings, (1) 6' Rubber speed... 2018-04-12 POINT (1032518.714146682 214886.3347008746) 1
2 Quick Kurb to X-walk 2016-06-30 POINT (990566.7655698524 217741.2961825352) 2

For each left turn calming intervention, we want to find the closest signalized intersection

[50]:
sig_inters_gdf[['distance_to_LTC','nearest_LTC']] = ckdnearest(sig_inters_gdf, ltc_gdf,'LTC_ID')
[51]:
sig_inters_gdf.head(3)
[51]:
intersection_id y x st1_name st2_name st3_name st4_name dup Coordinates distance_to_LPIS nearest_LPIS date_insta distance_to_Sch nearest_Sch distance_to_LTC nearest_LTC
0 1.0 199793.6093 986336.1490 ALLEN STREET CANAL STREET 0 POINT (986336.149 199793.609300002) 276 1915 2015-10-16 529 3433.0 1183 191
1 2.0 202206.1619 982769.3310 AVENUE OF THE AMERICAS LAIGHT STREET CANAL STREET 0 POINT (982769.331 202206.161899999) 592 2016 1997-08-07 795 38150.0 411 77
2 3.0 201790.0942 982805.6184 AVENUE OF THE AMERICAS LISPENARD STREET WEST BROADWAY 0 POINT (982805.6183999931 201790.0942) 413 2016 1997-08-07 635 38150.0 790 77

We want to merge into sig_inters_gdf additional information from ltc_gdf

[52]:
sig_inters_gdf = sig_inters_gdf.merge(ltc_gdf[['treatment_', 'completion', 'LTC_ID']], how='left', left_on='nearest_LTC', right_on='LTC_ID', validate ="m:1")
[53]:
sig_inters_gdf.head(3)
[53]:
intersection_id y x st1_name st2_name st3_name st4_name dup Coordinates distance_to_LPIS nearest_LPIS date_insta distance_to_Sch nearest_Sch distance_to_LTC nearest_LTC treatment_ completion LTC_ID
0 1.0 199793.6093 986336.1490 ALLEN STREET CANAL STREET 0 POINT (986336.149 199793.609300002) 276 1915 2015-10-16 529 3433.0 1183 191 Daylighting, Box markings, Pegatracks, Delinea... 2017-12-08 191
1 2.0 202206.1619 982769.3310 AVENUE OF THE AMERICAS LAIGHT STREET CANAL STREET 0 POINT (982769.331 202206.161899999) 592 2016 1997-08-07 795 38150.0 411 77 Quick Kurb to x-walk 2017-12-08 77
2 3.0 201790.0942 982805.6184 AVENUE OF THE AMERICAS LISPENARD STREET WEST BROADWAY 0 POINT (982805.6183999931 201790.0942) 413 2016 1997-08-07 635 38150.0 790 77 Quick Kurb to x-walk 2017-12-08 77
[55]:
sig_inters_gdf.drop(['LTC_ID'], axis=1, inplace=True)

3.e. Vision Zero Street Improvement

[57]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\street_improvement_proj_intersections_point\street_improvement_proj_intersections_point.shp"
[58]:
sip_gdf = gpd.read_file(fp)
[59]:
sip_gdf.head(3)
[59]:
Pjct_Name SIP_YR DateComplt SIPProjTyp geometry
0 Jackson Avenue at Pulaski Bridge 2009 2009-07-31 - POINT (997734.4838460496 210053.6281589993)
1 12th Ave at W. 135th St 2009 2009-05-09 Traffic Network Chng - Pedestrian Safety POINT (995894.2742718621 238501.8740837589)
2 Flatbush Ave at Church Ave 2009 2009-09-25 Pedestrian Safety - School Safety POINT (995708.027925191 176232.7026909116)
[60]:
sip_gdf.crs
[60]:
{'proj': 'lcc',
 'lat_1': 41.03333333333333,
 'lat_2': 40.66666666666666,
 'lat_0': 40.16666666666666,
 'lon_0': -74,
 'x_0': 300000.0000000001,
 'y_0': 0,
 'datum': 'NAD83',
 'units': 'us-ft',
 'no_defs': True}

We have to reproject the geodataframe to the right CRS (EPSG 2263: NAD83 / New York Long Island (ftUS))

[63]:
sip_gdf = sip_gdf.to_crs({'init': 'epsg:2263'})
[64]:
sip_gdf.crs
[64]:
{'init': 'epsg:2263'}

Create ID variable for sip_gdf

[65]:
sip_gdf['StImpro_ID'] = sip_gdf.index
[66]:
sip_gdf.head(3)
[66]:
Pjct_Name SIP_YR DateComplt SIPProjTyp geometry StImpro_ID
0 Jackson Avenue at Pulaski Bridge 2009 2009-07-31 - POINT (997734.4838460496 210053.6281589342) 0
1 12th Ave at W. 135th St 2009 2009-05-09 Traffic Network Chng - Pedestrian Safety POINT (995894.2742718621 238501.8740837032) 1
2 Flatbush Ave at Church Ave 2009 2009-09-25 Pedestrian Safety - School Safety POINT (995708.027925191 176232.7026908512) 2

For each street improvement project, we want to find the closest signalized intersection

[67]:
sig_inters_gdf[['distance_to_StImpro','nearest_StImpro']] = ckdnearest(sig_inters_gdf, sip_gdf,'StImpro_ID')

We want to merge into sig_inters_gdf additional information from sip_gdf

[68]:
sig_inters_gdf = sig_inters_gdf.merge(sip_gdf[['DateComplt', 'SIPProjTyp', 'StImpro_ID']],
                                      how='left',
                                      left_on='nearest_StImpro',
                                      right_on='StImpro_ID',
                                      validate ="m:1")
[69]:
sig_inters_gdf.head(3)
[69]:
intersection_id y x st1_name st2_name st3_name st4_name dup Coordinates distance_to_LPIS ... nearest_Sch distance_to_LTC nearest_LTC treatment_ completion distance_to_StImpro nearest_StImpro DateComplt SIPProjTyp StImpro_ID
0 1.0 199793.6093 986336.1490 ALLEN STREET CANAL STREET 0 POINT (986336.149 199793.609300002) 276 ... 3433.0 1183 191 Daylighting, Box markings, Pegatracks, Delinea... 2017-12-08 272 4 2009-10-31 - 4
1 2.0 202206.1619 982769.3310 AVENUE OF THE AMERICAS LAIGHT STREET CANAL STREET 0 POINT (982769.331 202206.161899999) 592 ... 38150.0 411 77 Quick Kurb to x-walk 2017-12-08 2101 245 2016-07-23 VZ Priority Geo - Traffic Calming 245
2 3.0 201790.0942 982805.6184 AVENUE OF THE AMERICAS LISPENARD STREET WEST BROADWAY 0 POINT (982805.6183999931 201790.0942) 413 ... 38150.0 790 77 Quick Kurb to x-walk 2017-12-08 2098 245 2016-07-23 VZ Priority Geo - Traffic Calming 245

3 rows × 23 columns

[70]:
sig_inters_gdf.drop(['StImpro_ID'], axis=1, inplace=True)

3.f. Bike Routes

[71]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\bike_routes_points_2263\bike_routes_points_2263.shp"
[72]:
br_gdf = gpd.read_file(fp)
[73]:
br_gdf.head(3)
[73]:
tf_facilit comments bikedir ft_facilit objectid_1 allclasses date_instd time_instd lanecount segment_id boro street date_modda time_modda tostreet fromstreet onoffst geometry
0 Protected Path None L None 1.0 I None 00:00:00.000 1.0 33547 1.0 9 AV None 00:00:00.000 W 31 ST W 16 ST ON POINT (984139.5802614246 211708.657997195)
1 Protected Path None L None 1.0 I None 00:00:00.000 1.0 33547 1.0 9 AV None 00:00:00.000 W 31 ST W 16 ST ON POINT (984140.0661334711 211709.5320271455)
2 Protected Path None L None 1.0 I None 00:00:00.000 1.0 33547 1.0 9 AV None 00:00:00.000 W 31 ST W 16 ST ON POINT (984140.5520055175 211710.4060570961)
[74]:
br_gdf.crs
[74]:
{'proj': 'lcc',
 'lat_1': 41.03333333333333,
 'lat_2': 40.66666666666666,
 'lat_0': 40.16666666666666,
 'lon_0': -74,
 'x_0': 300000.0000000001,
 'y_0': 0,
 'datum': 'NAD83',
 'units': 'us-ft',
 'no_defs': True}
[75]:
br_gdf = br_gdf.to_crs({'init': 'epsg:2263'})
[76]:
sig_inters_gdf[['distance_to_bikeroute','nearest_bikeroute']] = ckdnearest(sig_inters_gdf, br_gdf,'objectid_1')
[77]:
sig_inters_gdf.head(3)
[77]:
intersection_id y x st1_name st2_name st3_name st4_name dup Coordinates distance_to_LPIS ... distance_to_LTC nearest_LTC treatment_ completion distance_to_StImpro nearest_StImpro DateComplt SIPProjTyp distance_to_bikeroute nearest_bikeroute
0 1.0 199793.6093 986336.1490 ALLEN STREET CANAL STREET 0 POINT (986336.149 199793.609300002) 276 ... 1183 191 Daylighting, Box markings, Pegatracks, Delinea... 2017-12-08 272 4 2009-10-31 - 3 8553.0
1 2.0 202206.1619 982769.3310 AVENUE OF THE AMERICAS LAIGHT STREET CANAL STREET 0 POINT (982769.331 202206.161899999) 592 ... 411 77 Quick Kurb to x-walk 2017-12-08 2101 245 2016-07-23 VZ Priority Geo - Traffic Calming 230 1778.0
2 3.0 201790.0942 982805.6184 AVENUE OF THE AMERICAS LISPENARD STREET WEST BROADWAY 0 POINT (982805.6183999931 201790.0942) 413 ... 790 77 Quick Kurb to x-walk 2017-12-08 2098 245 2016-07-23 VZ Priority Geo - Traffic Calming 1 1769.0

3 rows × 24 columns

3.g. Borough

[78]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\nyc_boroughs_2263\nyc_boroughs_2263.shp"
[79]:
borough_gdf = gpd.read_file(fp)
[81]:
borough_gdf.head()
[81]:
bcode bname name namelsad geometry
0 36005 Bronx Bronx Bronx County (POLYGON ((1008982.068976385 272752.8735210547...
1 36047 Brooklyn Kings Kings County (POLYGON ((978869.3811487257 186863.7807399245...
2 36061 Manhattan New York New York County (POLYGON ((1007701.483091666 258286.8905491272...
3 36081 Queens Queens Queens County (POLYGON ((1026830.772887008 155435.7100568501...
4 36085 Staten Island Richmond Richmond County (POLYGON ((930721.1281812892 156627.9162643671...
[82]:
borough_gdf.crs = sig_inters_gdf.crs
[83]:
sig_inters_gdf = gpd.sjoin(sig_inters_gdf,borough_gdf[['bname', 'geometry']], how='left', op='intersects')
[84]:
sig_inters_gdf.drop('index_right', axis=1, inplace=True)
[85]:
sig_inters_gdf.head(3)
[85]:
intersection_id y x st1_name st2_name st3_name st4_name dup Coordinates distance_to_LPIS ... nearest_LTC treatment_ completion distance_to_StImpro nearest_StImpro DateComplt SIPProjTyp distance_to_bikeroute nearest_bikeroute bname
0 1.0 199793.6093 986336.1490 ALLEN STREET CANAL STREET 0 POINT (986336.149 199793.609300002) 276 ... 191 Daylighting, Box markings, Pegatracks, Delinea... 2017-12-08 272 4 2009-10-31 - 3 8553.0 Manhattan
1 2.0 202206.1619 982769.3310 AVENUE OF THE AMERICAS LAIGHT STREET CANAL STREET 0 POINT (982769.331 202206.161899999) 592 ... 77 Quick Kurb to x-walk 2017-12-08 2101 245 2016-07-23 VZ Priority Geo - Traffic Calming 230 1778.0 Manhattan
2 3.0 201790.0942 982805.6184 AVENUE OF THE AMERICAS LISPENARD STREET WEST BROADWAY 0 POINT (982805.6183999931 201790.0942) 413 ... 77 Quick Kurb to x-walk 2017-12-08 2098 245 2016-07-23 VZ Priority Geo - Traffic Calming 1 1769.0 Manhattan

3 rows × 25 columns

3.h. Priority Intersections

[86]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\vz_priority_intersections_shapefile\vz_priority_intersections_shapefile\vz_priority_intersections.shp"
[87]:
pi_gdf = gpd.read_file(fp)
[88]:
pi_gdf.crs
[88]:
{'init': 'epsg:4326'}
[89]:
pi_gdf = pi_gdf.to_crs({'init': 'epsg:2263'})
[90]:
pi_gdf['priority_inters_ID'] = pi_gdf.index
[91]:
sig_inters_gdf[['distance_to_priorityinters','nearest_priorityinters']] = ckdnearest(sig_inters_gdf, pi_gdf,'priority_inters_ID')
[92]:
sig_inters_gdf.head(3)
[92]:
intersection_id y x st1_name st2_name st3_name st4_name dup Coordinates distance_to_LPIS ... completion distance_to_StImpro nearest_StImpro DateComplt SIPProjTyp distance_to_bikeroute nearest_bikeroute bname distance_to_priorityinters nearest_priorityinters
0 1.0 199793.6093 986336.1490 ALLEN STREET CANAL STREET 0 POINT (986336.149 199793.609300002) 276 ... 2017-12-08 272 4 2009-10-31 - 3 8553.0 Manhattan 7 245
1 2.0 202206.1619 982769.3310 AVENUE OF THE AMERICAS LAIGHT STREET CANAL STREET 0 POINT (982769.331 202206.161899999) 592 ... 2017-12-08 2101 245 2016-07-23 VZ Priority Geo - Traffic Calming 230 1778.0 Manhattan 2289 191
2 3.0 201790.0942 982805.6184 AVENUE OF THE AMERICAS LISPENARD STREET WEST BROADWAY 0 POINT (982805.6183999931 201790.0942) 413 ... 2017-12-08 2098 245 2016-07-23 VZ Priority Geo - Traffic Calming 1 1769.0 Manhattan 2331 191

3 rows × 27 columns

3.i. Safe Streets for Seniors

[93]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\safe_streets_for_seniors_shapefile\safe_streets_for_seniors.shp"
[94]:
seniors_gdf = gpd.read_file(fp)
[95]:
seniors_gdf.crs
[95]:
{'init': 'epsg:4326'}
[96]:
seniors_gdf = seniors_gdf.to_crs({'init': 'epsg:2263'})
[97]:
seniors_gdf['seniors_ID'] = seniors_gdf.index
[98]:
seniors_gdf.crs = sig_inters_gdf.crs

Spatial Join to identify intersections within Senior Zones

[99]:
sig_inters_gdf = gpd.sjoin(sig_inters_gdf, seniors_gdf[['Name', 'seniors_ID','geometry']], how='left', op='within')
[100]:
sig_inters_gdf.drop('index_right', axis=1, inplace=True)
[101]:
sig_inters_gdf.head(3)
[101]:
intersection_id y x st1_name st2_name st3_name st4_name dup Coordinates distance_to_LPIS ... nearest_StImpro DateComplt SIPProjTyp distance_to_bikeroute nearest_bikeroute bname distance_to_priorityinters nearest_priorityinters Name seniors_ID
0 1.0 199793.6093 986336.1490 ALLEN STREET CANAL STREET 0 POINT (986336.149 199793.609300002) 276 ... 4 2009-10-31 - 3 8553.0 Manhattan 7 245 Lower East Side 27.0
1 2.0 202206.1619 982769.3310 AVENUE OF THE AMERICAS LAIGHT STREET CANAL STREET 0 POINT (982769.331 202206.161899999) 592 ... 245 2016-07-23 VZ Priority Geo - Traffic Calming 230 1778.0 Manhattan 2289 191 NaN NaN
2 3.0 201790.0942 982805.6184 AVENUE OF THE AMERICAS LISPENARD STREET WEST BROADWAY 0 POINT (982805.6183999931 201790.0942) 413 ... 245 2016-07-23 VZ Priority Geo - Traffic Calming 1 1769.0 Manhattan 2331 191 NaN NaN

3 rows × 29 columns

[102]:
str_cols = list(sig_inters_gdf.select_dtypes(include=['object']).columns)
for col in str_cols:
    sig_inters_gdf[col] = sig_inters_gdf[col].astype(str)
[103]:
fp =  r"C:\Users\jerem\Box Sync\Policy Evaluation\working_data\signal_intersection.dta"
[ ]:
#sig_inters_gdf.to_stata(fp)

4. NYPD Motor Vehicle Collision data

4.a. download

  • stata

[104]:
clear
[105]:
import delimited "https://data.cityofnewyork.us/api/views/h9gi-nx95/rows.csv?accessType=DOWNLOAD"
(29 vars, 1,482,622 obs)
[106]:
notes: "Downloaded $S_DATE $S_TIME"
notes list



_dta:
  1.  "Downloaded 25 Apr 2019 10:52:22"

save is commented out so I do not replace my original dataset

[ ]:
//save "..\input_data\NYPD_Motor_Vehicle_Collisions.dta"

4.b. clean

  • stata

[233]:
use "..\input_data\NYPD_Motor_Vehicle_Collisions.dta",clear
[234]:
notes list
//Check that notes is "Downloaded 28 Sep 2018 15:48:00"

_dta:
  1.  "Downloaded 28 Sep 2018 15:48:00"

Formatting the data

[235]:
%head
date time borough zipcode latitude longitude location onstreetname crossstreetname offstreetname numberofpersonsinjured numberofpersonskilled numberofpedestriansinjured numberofpedestrianskilled numberofcyclistinjured numberofcyclistkilled numberofmotoristinjured numberofmotoristkilled contributingfactorvehicle1 contributingfactorvehicle2 contributingfactorvehicle3 contributingfactorvehicle4 contributingfactorvehicle5 uniquekey vehicletypecode1 vehicletypecode2 vehicletypecode3 vehicletypecode4 vehicletypecode5
1 09/24/2018 0:00 STATEN ISLAND 10306 40.573219 -74.106995 (40.57322, -74.106995) HYLAN BOULEVARD OTIS AVENUE . 0 0 0 0 0 0 0 0 Driver Inattention/Distraction Unspecified . . . 3987045 Station Wagon/Sport Utility Vehicle Sedan . . .
2 09/24/2018 0:00 . . 40.612614 -74.076958 (40.612614, -74.07696) FLETCHER STREET . . 0 0 0 0 0 0 0 0 Driver Inattention/Distraction Unspecified . . . 3987187 Station Wagon/Sport Utility Vehicle Sedan . . .
3 09/24/2018 0:00 . . . . . . . 161-18 140 street 0 0 0 0 0 0 0 0 Unspecified . . . . 3986792 Sedan . . . .
4 09/24/2018 0:00 . . 40.833981 -73.826347 (40.83398, -73.82635) BRUCKNER EXPRESSWAY . . 1 0 0 0 0 0 1 0 Unsafe Speed Unspecified . . . 3986861 Sedan Sedan . . .
5 09/24/2018 0:05 BROOKLYN 11249 40.720074 -73.959846 (40.720074, -73.95985) WYTHE AVENUE NORTH 8 STREET . 0 0 0 0 0 0 0 0 Failure to Yield Right-of-Way Unspecified . . . 3986532 Sedan Sedan . . .
6 09/24/2018 0:28 BROOKLYN 11210 40.631283 -73.94603 (40.631283, -73.94603) . . 1609 FLATBUSH AVENUE 0 0 0 0 0 0 0 0 Driver Inattention/Distraction Driver Inattention/Distraction . . . 3986656 Sedan Sedan . . .
7 09/24/2018 0:30 QUEENS 11385 40.693848 -73.901184 (40.693848, -73.901184) CODY AVENUE WYCKOFF AVENUE . 0 0 0 0 0 0 0 0 Passing or Lane Usage Improper Unspecified . . . 3988303 Station Wagon/Sport Utility Vehicle . . . .
8 09/24/2018 0:45 . . 40.678162 -73.897484 (40.67816, -73.897484) PENNSYLVANIA AVENUE . . 0 0 0 0 0 0 0 0 Aggressive Driving/Road Rage Unspecified . . . 3986730 Sedan Sedan . . .
9 09/24/2018 10:00 BROOKLYN 11201 40.702908 -73.981133 (40.702908, -73.98113) WATER STREET HUDSON AVENUE . 0 0 0 0 0 0 0 0 Turning Improperly Unspecified . . . 3986916 Station Wagon/Sport Utility Vehicle Sedan . . .
10 09/24/2018 10:00 BROOKLYN 11201 40.694687 -73.981743 (40.694687, -73.98174) . . 38 FLEET WALK 0 0 0 0 0 0 0 0 Backing Unsafely Unspecified . . . 3986791 Box Truck Flat Bed . . .
[236]:
order(uniquekey), before(date)

Create dates that are in the Stata internal form(SIF) that Stata can understand

[237]:
codebook date

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
date                                                                                                                                                                                                                                                       DATE
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                  type:  string (str105)

         unique values:  2,278                    missing "":  0/1,351,215

              examples:  "03/22/2016"
                         "06/05/2015"
                         "08/10/2015"
                         "10/17/2014"

               warning:  variable has leading and embedded blanks

The date variable is a string in the format of month day and year. Therefore, we will use Stata’s mask MDY to create the date in SIF.

[238]:
gen date2 = date(date, "MDY"),after(date)
format date2 %td

(1 missing value generated)

There is an observation with nonsensical values in the date str variable

[239]:
drop date
rename date2 date
[240]:
gen time2 = clock(time, "hm"),after(time)
format time2 %tcHH:MM
drop time
rename time2 time

(1 missing value generated)



We can easily extract year from SIF data

[241]:
gen year = year(date), before(date)
(1 missing value generated)

Create latenight indicator variable

[242]:
gen latenight = (time > tc(23:00:00) & time <= tc(23:59:59)) | (time >= tc(00:00:00) & time < tc(05:00:00)), after(time)
[243]:
tab latenight

  latenight |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |  1,218,191       90.16       90.16
          1 |    133,024        9.84      100.00
------------+-----------------------------------
      Total |  1,351,215      100.00

Rename long variable names

[244]:
rename numberofpersonsinjured personsinjured
rename numberofpersonskilled personskilled
rename numberofpedestriansinjured pedestriansinjured
rename numberofpedestrianskilled pedestrianskilled
rename numberofcyclistinjured cyclistinjured
rename numberofcyclistkilled cyclistkilled
rename numberofmotoristinjured motoristinjured
rename numberofmotoristkilled motoristkilled

Create an indicator for observations with missing Longditude and Latitude points

[245]:
gen mi_latlong = mi(latitude) & mi(longitude), after(longitude)
[246]:
tab mi_latlong year, column

+-------------------+
| Key               |
|-------------------|
|     frequency     |
| column percentage |
+-------------------+

           |                                     year
mi_latlong |      2012       2013       2014       2015       2016       2017       2018 |     Total
-----------+-----------------------------------------------------------------------------+----------
         0 |    85,450    171,914    172,724    182,956    145,733    212,986    155,901 | 1,127,664
           |     84.99      84.39      83.84      84.05      63.96      92.84      93.93 |     83.46
-----------+-----------------------------------------------------------------------------+----------
         1 |    15,089     31,809     33,302     34,731     82,104     16,437     10,078 |   223,550
           |     15.01      15.61      16.16      15.95      36.04       7.16       6.07 |     16.54
-----------+-----------------------------------------------------------------------------+----------
     Total |   100,539    203,723    206,026    217,687    227,837    229,423    165,979 | 1,351,214
           |    100.00     100.00     100.00     100.00     100.00     100.00     100.00 |    100.00

Use Stata’s duplicates command to identify duplicates among onstreetname and crossstreetname

[247]:
duplicates report onstreetname crossstreetname if ///
!missing(onstreetname) & !missing(crossstreetname)

Duplicates in terms of onstreetname crossstreetname

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |        42735             0
        2 |        42570         21285
        3 |        43125         28750
        4 |        43116         32337
        5 |        40200         32160
        6 |        37398         31165
        7 |        34363         29454
        8 |        32688         28602
        9 |        30033         26696
       10 |        28170         25353
       11 |        26191         23810
       12 |        23868         21879
       13 |        22945         21180
       14 |        21154         19643
       15 |        19485         18186
       16 |        17856         16740
       17 |        16881         15888
       18 |        17046         16099
       19 |        14725         13950
       20 |        14660         13927
       21 |        12810         12200
       22 |        12562         11991
       23 |        11638         11132
       24 |        11688         11201
       25 |        10525         10104
       26 |        10166          9775
       27 |         9936          9568
       28 |         9912          9558
       29 |         7743          7476
       30 |         8340          8062
       31 |         7750          7500
       32 |         7584          7347
       33 |         6864          6656
       34 |         8058          7821
       35 |         8050          7820
       36 |         6876          6685
       37 |         7178          6984
       38 |         6384          6216
       39 |         5538          5396
       40 |         5880          5733
       41 |         5166          5040
       42 |         5250          5125
       43 |         3612          3528
       44 |         5588          5461
       45 |         3915          3828
       46 |         3956          3870
       47 |         4606          4508
       48 |         3840          3760
       49 |         4410          4320
       50 |         4350          4263
       51 |         3774          3700
       52 |         3640          3570
       53 |         3127          3068
       54 |         4212          4134
       55 |         3850          3780
       56 |         2912          2860
       57 |         2622          2576
       58 |         3074          3021
       59 |         3186          3132
       60 |         3420          3363
       61 |         2989          2940
       62 |         2976          2928
       63 |         1827          1798
       64 |         2624          2583
       65 |         2015          1984
       66 |         2376          2340
       67 |         1876          1848
       68 |         2176          2144
       69 |         2415          2380
       70 |         2030          2001
       71 |         2059          2030
       72 |         1800          1775
       73 |         2190          2160
       74 |         2590          2555
       75 |         1725          1702
       76 |         1520          1500
       77 |         2079          2052
       78 |         1248          1232
       79 |         1501          1482
       80 |         1920          1896
       81 |         1701          1680
       82 |         1804          1782
       83 |          498           492
       84 |         1764          1743
       85 |         2720          2688
       86 |         1634          1615
       87 |         1305          1290
       88 |         1760          1740
       89 |         1068          1056
       90 |          720           712
       91 |         1183          1170
       92 |          920           910
       93 |         1953          1932
       94 |         1034          1023
       95 |         1520          1504
       96 |         1344          1330
       97 |         1552          1536
       98 |         1372          1358
       99 |         1287          1274
      100 |         1400          1386
      101 |         1313          1300
      102 |          816           808
      103 |          515           510
      104 |          624           618
      105 |          945           936
      106 |          954           945
      107 |          856           848
      108 |         1728          1712
      109 |          763           756
      110 |         1430          1417
      111 |         1110          1100
      112 |          448           444
      113 |          565           560
      114 |          798           791
      115 |          805           798
      116 |         1044          1035
      117 |          702           696
      118 |          590           585
      119 |          952           944
      120 |          480           476
      121 |          484           480
      122 |          976           968
      123 |          738           732
      124 |          620           615
      125 |          875           868
      126 |          378           375
      127 |          381           378
      128 |          640           635
      129 |          387           384
      130 |          260           258
      131 |          262           260
      132 |          396           393
      133 |         1197          1188
      134 |         1340          1330
      135 |          540           536
      136 |          544           540
      137 |          548           544
      138 |          414           411
      139 |          278           276
      140 |          840           834
      141 |          705           700
      142 |         1136          1128
      143 |          286           284
      145 |          725           720
      146 |         1022          1015
      147 |          588           584
      148 |          444           441
      149 |          447           444
      151 |         1057          1050
      152 |          760           755
      153 |          153           152
      154 |          154           153
      155 |          310           308
      156 |          312           310
      157 |          157           156
      158 |          316           314
      159 |          636           632
      160 |          480           477
      161 |          966           960
      162 |          486           483
      163 |          163           162
      165 |          330           328
      166 |          498           495
      167 |          167           166
      168 |          336           334
      169 |          507           504
      170 |          170           169
      171 |          342           340
      172 |          688           684
      173 |          173           172
      174 |          174           173
      175 |          525           522
      176 |          704           700
      177 |          177           176
      178 |          356           354
      179 |          358           356
      180 |          360           358
      185 |          370           368
      186 |          372           370
      189 |          189           188
      190 |          380           378
      191 |          382           380
      192 |          192           191
      196 |          588           585
      197 |          197           196
      199 |          199           198
      200 |          200           199
      202 |          202           201
      206 |          206           205
      208 |          624           621
      210 |          210           209
      212 |          636           633
      213 |          213           212
      214 |          214           213
      216 |          432           430
      218 |          218           217
      219 |          219           218
      223 |          223           222
      225 |          225           224
      229 |          229           228
      231 |          231           230
      234 |          234           233
      235 |          235           234
      238 |          238           237
      241 |          241           240
      244 |          244           243
      245 |          245           244
      255 |          510           508
      263 |          263           262
      266 |          266           265
      269 |          269           268
      278 |          278           277
      280 |          280           279
      287 |          287           286
      288 |          288           287
      292 |          292           291
      297 |          297           296
      299 |          299           298
      306 |          306           305
      317 |          317           316
      326 |          326           325
      343 |          343           342
      351 |          351           350
      359 |          359           358
      400 |          400           399
      408 |          408           407
    24990 |        24990         24989
--------------------------------------

use Stata’s strtrim function to remove leading and trailing spaces

[248]:
replace onstreetname = strtrim(onstreetname)
replace crossstreetname = strtrim(crossstreetname)

(1,079,830 real changes made)

(738,472 real changes made)
[249]:
sort onstreetname crossstreetname borough zipcode latitude longitude

If borough, zipcode, on street and cross street are the same, we will use the latitude and longitudes for those observations for observations with missing coordinates. s

[250]:
replace latitude = latitude[_n-1] if ///
mi(latitude) & ///
!mi(latitude[_n-1]) & ///
!mi(borough) & ///
!mi(borough[_n-1]) & ///
!mi(zipcode) & ///
!mi(zipcode[_n-1]) & ///
!mi(onstreetname) & ///
!mi(onstreetname[_n-1]) & ///
!mi(crossstreetname) & ///
!mi(crossstreetname[_n-1]) & ///
(borough == borough[_n-1]) & ///
(zipcode == zipcode[_n-1]) & ///
(onstreetname == onstreetname[_n-1]) & ///
(crossstreetname ==  crossstreetname[_n-1])

replace longitude = longitude[_n-1] if ///
mi(longitude) & ///
!mi(longitude[_n-1]) & ///
!mi(borough) & ///
!mi(borough[_n-1]) & ///
!mi(zipcode) & ///
!mi(zipcode[_n-1]) & ///
!mi(onstreetname) & ///
!mi(onstreetname[_n-1]) & ///
!mi(crossstreetname) & ///
!mi(crossstreetname[_n-1]) & ///
(borough == borough[_n-1]) & ///
(zipcode == zipcode[_n-1]) & ///
(onstreetname == onstreetname[_n-1]) & ///
(crossstreetname ==  crossstreetname[_n-1])

(22,579 real changes made)

(22,579 real changes made)

Create indicator for observations which had their coordinates filled

[251]:
gen imput_latlong = (mi_latlong ==1 & !mi(latitude) & !mi(longitude)), after(mi_latlong)
[252]:
tab imput_latlong

imput_latlo |
         ng |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |  1,328,636       98.33       98.33
          1 |     22,579        1.67      100.00
------------+-----------------------------------
      Total |  1,351,215      100.00

Create a second indicator for those observations that still have missing coordinates

[253]:
gen mi_latlong2 = mi(latitude) & mi(longitude), after(imput_latlong)
[254]:
tab mi_latlong2 year, column

+-------------------+
| Key               |
|-------------------|
|     frequency     |
| column percentage |
+-------------------+

mi_latlong |                                     year
         2 |      2012       2013       2014       2015       2016       2017       2018 |     Total
-----------+-----------------------------------------------------------------------------+----------
         0 |    85,452    171,917    172,730    182,959    164,166    215,363    157,656 | 1,150,243
           |     84.99      84.39      83.84      84.05      72.05      93.87      94.99 |     85.13
-----------+-----------------------------------------------------------------------------+----------
         1 |    15,087     31,806     33,296     34,728     63,671     14,060      8,323 |   200,971
           |     15.01      15.61      16.16      15.95      27.95       6.13       5.01 |     14.87
-----------+-----------------------------------------------------------------------------+----------
     Total |   100,539    203,723    206,026    217,687    227,837    229,423    165,979 | 1,351,214
           |    100.00     100.00     100.00     100.00     100.00     100.00     100.00 |    100.00

Drop observations whose coordinates fall outside NYC’s boundary box

[255]:
drop if (longitude < -74.5 | longitude > -73) &  !mi(longitude)
drop if (latitude < 40 |  latitude > 41) & !mi(latitude)

(439 observations deleted)

(4 observations deleted)

Drop observations that have missing coordinates

[256]:
drop if mi_latlong2 == 1
(200,972 observations deleted)

Cross tabulation of missing coordinates by year

[257]:
tab mi_latlong2 year, column

+-------------------+
| Key               |
|-------------------|
|     frequency     |
| column percentage |
+-------------------+

mi_latlong |                                     year
         2 |      2012       2013       2014       2015       2016       2017       2018 |     Total
-----------+-----------------------------------------------------------------------------+----------
         0 |    85,452    171,917    172,730    182,959    164,159    215,211    157,372 | 1,149,800
           |    100.00     100.00     100.00     100.00     100.00     100.00     100.00 |    100.00
-----------+-----------------------------------------------------------------------------+----------
     Total |    85,452    171,917    172,730    182,959    164,159    215,211    157,372 | 1,149,800
           |    100.00     100.00     100.00     100.00     100.00     100.00     100.00 |    100.00
[ ]:
//save "..\working_data\NYPD_Motor_Vehicle_Collisions_clean.dta"

5. Calculate collision outcomes

  • stata

[286]:
use "..\working_data\collision_signal_intersection.dta",clear
[287]:
describe

Contains data from ..\working_data\collision_signal_intersection.dta
  obs:     1,147,839
 vars:            42                          15 Nov 2018 16:01
 size:   896,462,259
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
index           long    %12.0g
uniquekey       double  %10.0g
year            float   %9.0g
date            str10   %10s
time            str19   %19s
latenight       float   %9.0g
borough         str13   %13s
zipcode         str5    %5s
latitude        float   %9.0g
longitude       float   %9.0g
mi_latlong      float   %9.0g
imput_latlong   float   %9.0g
mi_latlong2     float   %9.0g
location        str25   %25s
onstreetname    str32   %32s
crossstreetname str32   %32s
offstreetname   str40   %40s
personsinjured  double  %10.0g
personskilled   double  %10.0g
pedestriansin~d double  %10.0g
pedestrianski~d double  %10.0g
cyclistinjured  double  %10.0g
cyclistkilled   double  %10.0g
motoristinjured double  %10.0g
motoristkilled  double  %10.0g
contributingf~1 str53   %53s
contributingf~2 str53   %53s
contributingf~3 str53   %53s
contributingf~4 str53   %53s
contributingf~5 str43   %43s
vehicletypeco~1 str35   %35s
vehicletypeco~2 str35   %35s
vehicletypeco~3 str35   %35s
vehicletypeco~4 str35   %35s
vehicletypeco~5 str35   %35s
bicyclerelated  float   %9.0g
taxirelated     float   %9.0g
publicrelated   float   %9.0g
dup             double  %10.0g
Coordinates     str43   %43s
distance_to_s~t long    %12.0g
nearest_sigInt  float   %9.0g
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sorted by:
[288]:
gen intersection_id = nearest_sigInt
[289]:
codebook intersection_id

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
intersection_id                                                                                                                                                      (unlabeled)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                  type:  numeric (float)

                 range:  [1,13214]                    units:  1
         unique values:  13,116                   missing .:  0/1,147,839

                  mean:    6513.4
              std. dev:   3979.88

           percentiles:        10%       25%       50%       75%       90%
                               981      3009      6437     10260     11828
[290]:
// Determine if collisions that are close to an intersection
sum distance_to_sigInt, detail

                     distance_to_sigInt
-------------------------------------------------------------
      Percentiles      Smallest
 1%            2              0
 5%            2              0
10%            2              0       Obs           1,147,839
25%            3              0       Sum of Wgt.   1,147,839

50%           10                      Mean           208.3536
                        Largest       Std. Dev.      375.8735
75%          279           7498
90%          615           7500       Variance       141280.9
95%          853           7500       Skewness       4.267881
99%         1632           7685       Kurtosis       35.85233
[291]:
sum distance_to_sigInt if distance_to_sigInt < r(p75), detail

                     distance_to_sigInt
-------------------------------------------------------------
      Percentiles      Smallest
 1%            1              0
 5%            2              0
10%            2              0       Obs             859,866
25%            3              0       Sum of Wgt.     859,866

50%            3                      Mean           51.85574
                        Largest       Std. Dev.      86.39933
75%           65            278
90%          227            278       Variance       7464.844
95%          259            278       Skewness       1.550851
99%          273            278       Kurtosis       3.837138
[295]:
sum distance_to_sigInt if distance_to_sigInt < r(p75), detail

                     distance_to_sigInt
-------------------------------------------------------------
      Percentiles      Smallest
 1%            2              0
 5%            2              0
10%            2              0       Obs           1,147,839
25%            3              0       Sum of Wgt.   1,147,839

50%           10                      Mean           208.3536
                        Largest       Std. Dev.      375.8735
75%          279           7498
90%          615           7500       Variance       141280.9
95%          853           7500       Skewness       4.267881
99%         1632           7685       Kurtosis       35.85233
[298]:
gen intersection = (distance_to_sigInt < 10)
[299]:
tab intersection

intersectio |
          n |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |    576,297       50.21       50.21
          1 |    571,542       49.79      100.00
------------+-----------------------------------
      Total |  1,147,839      100.00
[300]:
codebook date

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
date                                                                                                                                                                 (unlabeled)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                  type:  string (str10)

         unique values:  2,277                    missing "":  0/1,147,839

              examples:  "2013-11-04"
                         "2015-03-06"
                         "2016-06-01"
                         "2017-09-03"
[301]:
gen date2 = date(date, "YMD"),after(date)
format date2 %td
drop date
rename date2 date
[302]:
%head
index uniquekey year date time latenight borough zipcode latitude longitude mi_latlong imput_latlong mi_latlong2 location onstreetname crossstreetname offstreetname personsinjured personskilled pedestriansinjured pedestrianskilled cyclistinjured cyclistkilled motoristinjured motoristkilled contributingfactorvehicle1 contributingfactorvehicle2 contributingfactorvehicle3 contributingfactorvehicle4 contributingfactorvehicle5 vehicletypecode1 vehicletypecode2 vehicletypecode3 vehicletypecode4 vehicletypecode5 bicyclerelated taxirelated publicrelated dup Coordinates distance_to_sigInt nearest_sigInt intersection_id intersection
1 0 3427749 2016 24apr2016 1960-01-01 18:02:00 0 . . 40.744896 -73.770203 0 0 0 (40.7448949, -73.7702044) . . 2 0 0 0 0 0 2 0 Fatigued/Drowsy Fatigued/Drowsy . . . PASSENGER VEHICLE PASSENGER VEHICLE . . . 0 0 0 24989 POINT (1047925.420187834 210745.8332306232) 749 11247 11247 0
2 1 3474445 2016 05jul2016 1960-01-01 01:30:00 1 . . 40.650494 -74.011772 0 0 0 (40.6504919, -74.0117738) . . 0 0 0 0 0 0 0 0 Unspecified Unspecified . . . PASSENGER VEHICLE PASSENGER VEHICLE . . . 0 0 0 24989 POINT (980983.3829843847 176268.9676320423) 30 5677 5677 0
3 2 3467416 2016 22jun2016 1960-01-01 12:15:00 0 . . 40.705044 -73.95903 0 0 0 (40.7050435, -73.9590323) . . 0 0 0 0 0 0 0 0 Unspecified Unspecified . . . PASSENGER VEHICLE PASSENGER VEHICLE . . . 0 0 0 24989 POINT (995609.2936718578 196145.5996373807) 111 7617 7617 0
4 3 2966009 2014 11nov2014 1960-01-01 23:30:00 1 . . 40.651398 -74.010574 0 0 0 (40.6513986, -74.0105739) . . . 0 0 0 0 0 0 0 0 Driver Inattention/Distraction Unspecified . . . PASSENGER VEHICLE PASSENGER VEHICLE . . . 0 0 0 . POINT (981315.8004963149 176598.3076968596) 114 5675 5675 0
5 4 2984443 2014 09nov2014 1960-01-01 06:02:00 0 . . 40.700741 -73.99498 0 0 0 (40.7007397, -73.9949795) . . . 1 0 0 0 0 0 1 0 Fell Asleep Unspecified . . . PASSENGER VEHICLE SPORT UTILITY / STATION WAGON . . . 0 0 0 . POINT (985641.9735507105 194575.2794881121) 459 4854 4854 0
6 5 3378303 2016 26jan2016 1960-01-01 16:27:00 0 . . 40.586208 -73.934166 0 0 0 (40.5862074, -73.934164) . . . 0 0 0 0 0 0 0 0 Unspecified Unspecified . . . SPORT UTILITY / STATION WAGON PASSENGER VEHICLE . . . 0 0 0 . POINT (1002535.674372012 152854.8011915441) 780 6820 6820 0
7 6 3035164 2014 14aug2014 1960-01-01 15:55:00 0 . . 40.739086 -73.816101 0 0 0 (40.7390852, -73.8160998) . . . 0 0 0 0 0 0 0 0 Fatigued/Drowsy Unspecified Unspecified . . PASSENGER VEHICLE UNKNOWN VAN . . 0 0 0 . POINT (1035211.71333712 208599.1167361466) 283 10828 10828 0
8 7 3121423 2012 16jul2012 1960-01-01 16:40:00 0 . . 40.608223 -74.129395 0 0 0 (40.6082248, -74.1293966) . . . 0 0 0 0 0 0 0 0 Turning Improperly Unspecified . . . SPORT UTILITY / STATION WAGON SPORT UTILITY / STATION WAGON . . . 0 0 0 . POINT (948321.9988617403 160894.9525950026) 754 12676 12676 0
9 8 3468203 2016 18jun2016 1960-01-01 14:00:00 0 . . 40.68877 -73.999062 0 0 0 (40.6887701, -73.999062) . . 0 0 0 0 0 0 0 0 Unspecified Unspecified . . . PASSENGER VEHICLE PASSENGER VEHICLE . . . 0 0 0 24989 POINT (984510.248463902 190214.0274668361) 172 5314 5314 0
10 9 3725939 2017 28jul2017 1960-01-01 14:43:00 0 . . 40.83609 -73.870392 0 0 0 (40.83609, -73.87039) . . . 0 0 0 0 0 0 0 0 Backing Unsafely Unspecified . . . PASSENGER VEHICLE . . . . 0 0 0 . POINT (1020114.481894317 243914.0169084446) 136 3959 3959 0
[303]:
gen month = month(date)
gen quarter = quarter(date)
gen collision_count = 1
[304]:
gen latenight_collision_count = latenight*collision_count
gen latenight_personsinjured = latenight*personsinjured
gen latenight_personskilled = latenight*personskilled
gen latenight_pedestriansinjured = latenight*pedestriansinjured
gen latenight_pedestrianskilled = latenight*pedestrianskilled
gen latenight_cyclistinjured = latenight*cyclistinjured
gen latenight_cyclistkilled = latenight*cyclistkilled
gen latenight_motoristinjured = latenight*motoristinjured
gen latenight_motoristkilled = latenight*motoristkilled

gen day = (latenight == 0)

gen day_collision_count = day*collision_count
gen day_personsinjured = day*personsinjured
gen day_personskilled = day*personskilled
gen day_pedestriansinjured = day*pedestriansinjured
gen day_pedestrianskilled = day*pedestrianskilled
gen day_cyclistinjured = day*cyclistinjured
gen day_cyclistkilled = day*cyclistkilled
gen day_motoristinjured = day*motoristinjured
gen day_motoristkilled = day*motoristkilled


(1 missing value generated)

(1 missing value generated)

(1 missing value generated)

(1 missing value generated)

(1 missing value generated)

(1 missing value generated)

(1 missing value generated)

(1 missing value generated)



(1 missing value generated)

(1 missing value generated)

(1 missing value generated)

(1 missing value generated)

(1 missing value generated)

(1 missing value generated)

(1 missing value generated)

(1 missing value generated)
[305]:
global counts collision_count latenight_collision_count day_collision_count
global all_outcome personsinjured personskilled pedestriansinjured pedestrianskilled cyclistinjured cyclistkilled motoristinjured motoristkilled
global late_outcome latenight_personsinjured latenight_personskilled latenight_pedestriansinjured latenight_pedestrianskilled latenight_cyclistinjured latenight_cyclistkilled latenight_motoristinjured latenight_motoristkilled
global day_outcome day_personsinjured day_personskilled day_pedestriansinjured day_pedestrianskilled day_cyclistinjured day_cyclistkilled day_motoristinjured day_motoristkilled
[306]:
preserve
collapse ///
(sum) "$counts $all_outcome $late_outcome $day_outcome" ///
 , by(intersection_id month year )

//save "..\working_data\collision_monthly.dta",replace
restore
[ ]:
preserve
collapse ///
(sum) "$counts $all_outcome $late_outcome $day_outcome" ///
 , by(intersection_id quarter year)

//save "..\working_data\collision_quarterly.dta",replace
restore

6. Set up Quarterly Panel data

  • stata

[307]:
clear
[308]:
cd
C:\Users\jerem\Box Sync\Policy Evaluation\dofiles
[309]:
// There are 13213 signal intersections
global intersect = 13213+1
global periods_per_year = 4
global obs = $periods_per_year*7*($intersect)

display $obs
set obs $obs




369992

number of observations (_N) was 0, now 369,992
[310]:
egen intersection_id = seq(), from(1) to($intersect)
sort intersection_id
[311]:
egen quarter = seq(), from(1) to (4) by(intersection_id)
egen year = seq(), from(2012) to (2018) block(4)
[312]:
gen quarterly = yq(year,quarter)
format quarterly %tq
[313]:
//These are outside of the range of our data
drop if quarter <= 2 & year == 2012
drop if quarter >= 4 & year == 2018

(26,428 observations deleted)

(13,214 observations deleted)

6.a. Merge in Intersection Data

[314]:
// Intersection level data
mmerge intersection_id using "..\working_data\signal_intersection.dta", ///
type(n:1) ///
unmatched(using) ///
umatch(intersection_id)


-------------------------------------------------------------------------------
merge specs          |
       matching type | n:1
  mv's on match vars | none
  unmatched obs from | using
---------------------+---------------------------------------------------------
  master        file | <data in memory not named>
                 obs | 330350
                vars |      4
          match vars | intersection_id  (not a key)
  -------------------+---------------------------------------------------------
  using         file | ..\working_data\signal_intersection.dta
                 obs |  13213
                vars |     38
          match vars | intersection_id  (key)
---------------------+---------------------------------------------------------
variable intersection_id does not uniquely identify observations in the master data
result          file | <data in memory not named>
                 obs | 330325
                vars |     43  (including _merge)
         ------------+---------------------------------------------------------
              _merge | 330325  obs both in master and using data      (code==3)
-------------------------------------------------------------------------------
[315]:
describe

Contains data
  obs:       330,325
 vars:            42
 size:   153,270,800
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
intersection_id int     %8.0g
quarter         byte    %8.0g
year            int     %8.0g
quarterly       int     %tq
index           int     %12.0g
y               double  %10.0g
x               double  %10.0g
st1_name        str31   %31s
st2_name        str32   %32s
st3_name        str35   %35s
st4_name        str36   %36s
dup             byte    %8.0g
Coordinates     str42   %42s
distance_to_L~S int     %12.0g
nearest_LPIS    int     %12.0g
date_insta      str10   %10s
distance_to_S~t int     %12.0g
nearest_Street  long    %12.0g
Number_Tra      str4    %4s
Number_Par      str4    %4s
Number_Tot      str4    %4s
TrafDir         str4    %4s
distance_to_Sch int     %12.0g
nearest_Sch     long    %10.0g
distance_to_LTC long    %12.0g
nearest_LTC     int     %12.0g
treatment_      str80   %80s
completion      str10   %10s
distance_to_S~o long    %12.0g
nearest_StImpro int     %12.0g
DateComplt      str10   %10s
SIPProjTyp      str53   %53s
distance_to_b~e int     %12.0g
nearest_biker~e int     %10.0g
distance_to_t~e int     %12.0g
nearest_truck~e long    %12.0g
bname           str13   %13s
distance_to_p~s int     %12.0g
nearest_prior~s int     %12.0g
Name            str26   %26s
seniors_ID      byte    %10.0g
_merge          byte    %32.0g     __MERGE
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sorted by:
     Note: Dataset has changed since last saved.

6.b. Create borough dummies

[316]:
tab bname,m

        bname |      Freq.     Percent        Cum.
--------------+-----------------------------------
        Bronx |     42,600       12.90       12.90
     Brooklyn |    117,400       35.54       48.44
    Manhattan |     71,775       21.73       70.17
       Queens |     82,800       25.07       95.23
Staten Island |     15,650        4.74       99.97
          nan |        100        0.03      100.00
--------------+-----------------------------------
        Total |    330,325      100.00
[317]:
gen bronx = (bname == "Bronx")
gen brooklyn = (bname == "Brooklyn")
gen manhattan = (bname == "Manhattan")
gen queens = (bname == "Queens")
gen statenisland = (bname == "Staten Island")

6.c. Nearest Bike Route

[320]:
destring nearest_bikeroute,replace
nearest_bikeroute already numeric; no replace
[321]:
mmerge nearest_bikeroute using "..\working_data\bike_routes_2263.dta", ///
type(n:1) ///
unmatched(master) ///
umatch(objectid_1) ///
ukeep(date_instd tf_facilit)


-------------------------------------------------------------------------------
merge specs          |
       matching type | n:1
  mv's on match vars | none
  unmatched obs from | master
---------------------+---------------------------------------------------------
  master        file | <data in memory not named>
                 obs | 330325
                vars |     46
          match vars | nearest_bikeroute  (not a key)
  -------------------+---------------------------------------------------------
  using         file | ..\working_data\bike_routes_2263.dta
                 obs |  14980
                vars |      3  (selection via udrop/ukeep)
          match vars | objectid_1  (key)
---------------------+---------------------------------------------------------
variable nearest_bikeroute does not uniquely identify observations in the master data
result          file | <data in memory not named>
                 obs | 330325
                vars |     50  (including _merge)
         ------------+---------------------------------------------------------
              _merge | 330325  obs both in master and using data      (code==3)
-------------------------------------------------------------------------------
[322]:
gen bike_route_install_dt = date(date_instd, "YMD"), after(date_instd)
format bike_route_install_dt %td
[323]:
sum bike_route_install_dt, format

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bike_route~t |    330,325   24nov2002    7771.114  01jan1900  23dec2016
[324]:
gen bike_route_install_mt = mofd(bike_route_install_dt)
format bike_route_install_mt %tm
[325]:
sum bike_route_install_mt, format

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bike_rout~mt |    330,325   2002m11    255.2326   1900m1  2016m12
[326]:
gen bike_route_install_qt = qofd(bike_route_install_dt)
format bike_route_install_qt %tq
[327]:
sum bike_route_install_qt, format

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bike_rout~qt |    330,325   2002q4    85.00089  1900q1  2016q4

Determine if collision occured on bike route

[328]:
sum distance_to_bikeroute, detail

                    distance_to_bikeroute
-------------------------------------------------------------
      Percentiles      Smallest
 1%            0              0
 5%            0              0
10%            1              0       Obs             330,325
25%            3              0       Sum of Wgt.     330,325

50%          537                      Mean           1023.112
                        Largest       Std. Dev.      1365.944
75%         1416           8907
90%         2871           8907       Variance        1865802
95%         4039           8907       Skewness        2.00231
99%         6101           8907       Kurtosis       7.320251
[329]:
sum distance_to_bikeroute if distance_to_bikeroute < r(p50) , detail

                    distance_to_bikeroute
-------------------------------------------------------------
      Percentiles      Smallest
 1%            0              0
 5%            0              0
10%            0              0       Obs             165,025
25%            1              0       Sum of Wgt.     165,025

50%            3                      Mean           122.9515
                        Largest       Std. Dev.      176.5845
75%          262            536
90%          447            536       Variance       31182.09
95%          507            536       Skewness       1.105402
99%          530            536       Kurtosis        2.69221
[330]:
// Time invariant bike route indicator
gen bike_route_ever = distance_to_bikeroute <= 3
label variable bike_route_ever "Indicates that bike route passes through this intersection (<3 ft)"
[331]:
tab bike_route_ever

  Indicates |
  that bike |
      route |
     passes |
    through |
       this |
intersectio |
  n (<3 ft) |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |    245,200       74.23       74.23
          1 |     85,125       25.77      100.00
------------+-----------------------------------
      Total |    330,325      100.00
[333]:
// Time-variant bike route indicator
gen bike_route_tv = 0
replace bike_route_tv = 1 if quarterly >= bike_route_install_qt & bike_route_ever == 1
label variable bike_route_tv "Indicates that bike route passes through this intersection timevarient(<3 ft)"



(74,140 real changes made)

[334]:
tab bike_route_tv

  Indicates |
  that bike |
      route |
     passes |
    through |
       this |
intersectio |
          n |
timevarient |
    (<3 ft) |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |    256,185       77.56       77.56
          1 |     74,140       22.44      100.00
------------+-----------------------------------
      Total |    330,325      100.00
[347]:
tab bike_route_tv year

 Indicates |
 that bike |
     route |
    passes |
   through |
      this |
intersecti |
        on |
timevarien |                                     year
  t(<3 ft) |      2012       2013       2014       2015       2016       2017       2018 |     Total
-----------+-----------------------------------------------------------------------------+----------
         0 |    21,681     42,808     41,981     41,054     40,005     39,232     29,424 |   256,185
         1 |     4,745     10,044     10,871     11,798     12,847     13,620     10,215 |    74,140
-----------+-----------------------------------------------------------------------------+----------
     Total |    26,426     52,852     52,852     52,852     52,852     52,852     39,639 |   330,325

6.d. Left Turn Calming Intervention

[335]:
rename treatment_ left_turn_treatment
[336]:
gen left_turn_install_dt = date(completion, "YMD"), after(completion)
format left_turn_install_dt %td
gen int left_turn_install_mt = mofd(left_turn_install_dt), after(left_turn_install_dt)
format left_turn_install_mt %tm
gen int left_turn_install_qt = qofd(left_turn_install_dt), after(left_turn_install_dt)
format left_turn_install_qt %tq

Matching intersection to closest left turn calming intervention

[337]:
egen double left_turn_min = min(distance_to_LTC), by(nearest_LTC)
[338]:
gen flag_left_turn_ever = .
replace flag_left_turn_ever = 1 if left_turn_min == distance_to_LTC
replace flag_left_turn_ever = 0 if mi(flag_left_turn_ever)

(330,325 missing values generated)

(5,800 real changes made)

(324,525 real changes made)
[339]:
gen flag_left_turn = flag_left_turn_ever
replace flag_left_turn = 0 if flag_left_turn == 1 & quarterly < left_turn_install_qt


(4,359 real changes made)
[345]:
tab flag_left_turn year

flag_left_ |                                     year
      turn |      2012       2013       2014       2015       2016       2017       2018 |     Total
-----------+-----------------------------------------------------------------------------+----------
         0 |    26,426     52,852     52,852     52,852     52,676     52,266     38,960 |   328,884
         1 |         0          0          0          0        176        586        679 |     1,441
-----------+-----------------------------------------------------------------------------+----------
     Total |    26,426     52,852     52,852     52,852     52,852     52,852     39,639 |   330,325

6.e. Street Improvement Project intervention

[340]:
rename SIPProjTyp street_improv_treatment
[341]:
gen street_improv_install_dt = date(DateComplt, "YMD"), after(DateComplt)
format street_improv_install_dt %td
gen int street_improv_install_mt = mofd(street_improv_install_dt), after(street_improv_install_dt)
format street_improv_install_mt %tm
gen int street_improv_install_qt = qofd(street_improv_install_dt), after(street_improv_install_dt)
format street_improv_install_qt %tq

Matching intersection to closest street improvement project

[342]:
egen double street_improv_min = min(distance_to_StImpro), by(nearest_StImpro)
[343]:
gen flag_street_improv_ever = .
replace flag_street_improv_ever = 1 if street_improv_min == distance_to_StImpro
replace flag_street_improv_ever = 0 if mi(flag_street_improv_ever)

(330,325 missing values generated)

(9,125 real changes made)

(321,200 real changes made)
[344]:
gen flag_street_improv = flag_street_improv_ever
replace flag_street_improv = 0 if flag_street_improv == 1 & quarterly < street_improv_install_qt


(4,592 real changes made)
[346]:
tab flag_street_improv year

flag_stree |                                     year
  t_improv |      2012       2013       2014       2015       2016       2017       2018 |     Total
-----------+-----------------------------------------------------------------------------+----------
         0 |    26,297     52,556     52,449     52,234     51,957     51,704     38,595 |   325,792
         1 |       129        296        403        618        895      1,148      1,044 |     4,533
-----------+-----------------------------------------------------------------------------+----------
     Total |    26,426     52,852     52,852     52,852     52,852     52,852     39,639 |   330,325

6.f. Leading Pedestrian Interval Signal

[348]:
gen LPIS_install_dt = date(date_insta, "YMD")
format LPIS_install_dt %td
gen int LPIS_install_mt = mofd(LPIS_install_dt), after(LPIS_install_dt)
format LPIS_install_mt %tm
gen int LPIS_install_qt = qofd(LPIS_install_dt), after(LPIS_install_mt)
format LPIS_install_qt %tq
gen LPIS_install_yr = year(LPIS_install_dt), after(LPIS_install_qt)

Matching intersection to closest leading pedestrian interval signal

[349]:
egen double LPIS_min = min(distance_to_LPIS), by(nearest_LPIS)
[350]:
gen flag_LPIS_ever = .
replace flag_LPIS_ever = 1 if LPIS_min == distance_to_LPIS
replace flag_LPIS_ever = 0 if mi(flag_LPIS_ever)

(330,325 missing values generated)

(72,875 real changes made)

(257,450 real changes made)
[351]:
gen flag_LPIS = flag_LPIS_ever
[356]:
// Replace with indicator with 0 if quarter was before LPIS installation
replace flag_LPIS = 0 if flag_LPIS == 1 & quarterly < LPIS_install_qt
(47,593 real changes made)
[368]:
label variable flag_LPIS "Indicates the intersection at the time they became an LPIS intervention"
label variable flag_LPIS_ever "Indicates the intersection if they ever received LPIS intervention"

Problem: Certain LPIs intersections had been install before collision data

[357]:
sum quarterly, detail format

                          quarterly
-------------------------------------------------------------
      Percentiles      Smallest
 1%    2012q3      2012q3
 5%    2012q4      2012q3
10%    2013q1      2012q3       Obs             330,325
25%    2014q1      2012q3       Sum of Wgt.     330,325

50%    2015q3                      Mean          2015q3
                        Largest       Std. Dev.      7.211113
75%    2017q1      2018q3
90%    2018q1      2018q3       Variance       52.00016
95%    2018q2      2018q3       Skewness              0
99%    2018q3      2018q3       Kurtosis       1.796154
[358]:
sum LPIS_install_qt, detail format

                       LPIS_install_qt
-------------------------------------------------------------
      Percentiles      Smallest
 1%    1899q4      1899q4
 5%    2004q1      1899q4
10%    2013q2      1899q4       Obs             330,325
25%    2015q4      1899q4       Sum of Wgt.     330,325

50%    2016q3                      Mean          2014q2
                        Largest       Std. Dev.       51.7974
75%    2017q3      2018q3
90%    2018q2      2018q3       Variance       2682.971
95%    2018q3      2018q3       Skewness      -7.818992
99%    2018q3      2018q3       Kurtosis       68.19259

We will remove intersections that cannot be analyzed because LPIs was implemented before 01jul2012

[359]:
tab flag_LPIS if LPIS_install_qt < tq(2013q1)

  flag_LPIS |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |     26,477       82.42       82.42
          1 |      5,648       17.58      100.00
------------+-----------------------------------
      Total |     32,125      100.00
[360]:
tab intersection_id if LPIS_install_qt < tq(2013q1) & flag_LPIS_ever == 1

intersectio |
       n_id |      Freq.     Percent        Cum.
------------+-----------------------------------
          7 |         25        0.44        0.44
          8 |         25        0.44        0.88
         17 |         25        0.44        1.33
         57 |         25        0.44        1.77
         58 |         25        0.44        2.21
         61 |         25        0.44        2.65
         83 |         25        0.44        3.10
        109 |         25        0.44        3.54
        149 |         25        0.44        3.98
        151 |         25        0.44        4.42
        161 |         25        0.44        4.87
        230 |         25        0.44        5.31
        231 |         25        0.44        5.75
        238 |         25        0.44        6.19
        239 |         25        0.44        6.64
        243 |         25        0.44        7.08
        255 |         25        0.44        7.52
        256 |         25        0.44        7.96
        280 |         25        0.44        8.41
        282 |         25        0.44        8.85
        286 |         25        0.44        9.29
        317 |         25        0.44        9.73
        333 |         25        0.44       10.18
        449 |         25        0.44       10.62
        454 |         25        0.44       11.06
        468 |         25        0.44       11.50
        511 |         25        0.44       11.95
        547 |         25        0.44       12.39
        550 |         25        0.44       12.83
        557 |         25        0.44       13.27
        560 |         25        0.44       13.72
        567 |         25        0.44       14.16
        582 |         25        0.44       14.60
        601 |         25        0.44       15.04
        605 |         25        0.44       15.49
        606 |         25        0.44       15.93
        620 |         25        0.44       16.37
        624 |         25        0.44       16.81
        628 |         25        0.44       17.26
        629 |         25        0.44       17.70
        633 |         25        0.44       18.14
        642 |         25        0.44       18.58
        661 |         25        0.44       19.03
        676 |         25        0.44       19.47
        692 |         25        0.44       19.91
        693 |         25        0.44       20.35
        724 |         25        0.44       20.80
        800 |         25        0.44       21.24
        813 |         25        0.44       21.68
        827 |         25        0.44       22.12
        858 |         25        0.44       22.57
        862 |         25        0.44       23.01
        903 |         25        0.44       23.45
        914 |         25        0.44       23.89
        940 |         25        0.44       24.34
        949 |         25        0.44       24.78
        964 |         25        0.44       25.22
        981 |         25        0.44       25.66
        984 |         25        0.44       26.11
        986 |         25        0.44       26.55
        993 |         25        0.44       26.99
       1066 |         25        0.44       27.43
       1070 |         25        0.44       27.88
       1092 |         25        0.44       28.32
       1101 |         25        0.44       28.76
       1140 |         25        0.44       29.20
       1188 |         25        0.44       29.65
       1195 |         25        0.44       30.09
       1201 |         25        0.44       30.53
       1202 |         25        0.44       30.97
       1205 |         25        0.44       31.42
       1207 |         25        0.44       31.86
       1211 |         25        0.44       32.30
       1221 |         25        0.44       32.74
       1228 |         25        0.44       33.19
       1235 |         25        0.44       33.63
       1237 |         25        0.44       34.07
       1287 |         25        0.44       34.51
       1290 |         25        0.44       34.96
       1321 |         25        0.44       35.40
       1372 |         25        0.44       35.84
       1420 |         25        0.44       36.28
       1448 |         25        0.44       36.73
       1450 |         25        0.44       37.17
       1451 |         25        0.44       37.61
       1461 |         25        0.44       38.05
       1467 |         25        0.44       38.50
       1486 |         25        0.44       38.94
       1487 |         25        0.44       39.38
       1500 |         25        0.44       39.82
       1506 |         25        0.44       40.27
       1507 |         25        0.44       40.71
       1532 |         25        0.44       41.15
       1551 |         25        0.44       41.59
       1558 |         25        0.44       42.04
       1563 |         25        0.44       42.48
       1574 |         25        0.44       42.92
       1575 |         25        0.44       43.36
       1663 |         25        0.44       43.81
       1664 |         25        0.44       44.25
       1730 |         25        0.44       44.69
       1731 |         25        0.44       45.13
       1736 |         25        0.44       45.58
       1742 |         25        0.44       46.02
       1759 |         25        0.44       46.46
       1760 |         25        0.44       46.90
       1777 |         25        0.44       47.35
       1788 |         25        0.44       47.79
       1795 |         25        0.44       48.23
       1826 |         25        0.44       48.67
       1827 |         25        0.44       49.12
       1838 |         25        0.44       49.56
       1845 |         25        0.44       50.00
       1863 |         25        0.44       50.44
       1898 |         25        0.44       50.88
       1912 |         25        0.44       51.33
       1914 |         25        0.44       51.77
       1916 |         25        0.44       52.21
       1923 |         25        0.44       52.65
       1940 |         25        0.44       53.10
       1941 |         25        0.44       53.54
       2025 |         25        0.44       53.98
       2045 |         25        0.44       54.42
       2102 |         25        0.44       54.87
       2217 |         25        0.44       55.31
       2321 |         25        0.44       55.75
       2381 |         25        0.44       56.19
       2387 |         25        0.44       56.64
       2487 |         25        0.44       57.08
       2510 |         25        0.44       57.52
       2539 |         25        0.44       57.96
       2667 |         25        0.44       58.41
       2670 |         25        0.44       58.85
       2673 |         25        0.44       59.29
       2825 |         25        0.44       59.73
       3005 |         25        0.44       60.18
       3140 |         25        0.44       60.62
       3455 |         25        0.44       61.06
       3561 |         25        0.44       61.50
       4090 |         25        0.44       61.95
       4340 |         25        0.44       62.39
       4701 |         25        0.44       62.83
       4735 |         25        0.44       63.27
       5042 |         25        0.44       63.72
       5054 |         25        0.44       64.16
       5276 |         25        0.44       64.60
       5278 |         25        0.44       65.04
       5285 |         25        0.44       65.49
       5314 |         25        0.44       65.93
       5345 |         25        0.44       66.37
       5346 |         25        0.44       66.81
       5348 |         25        0.44       67.26
       5363 |         25        0.44       67.70
       5385 |         25        0.44       68.14
       5398 |         25        0.44       68.58
       5418 |         25        0.44       69.03
       5429 |         25        0.44       69.47
       5430 |         25        0.44       69.91
       5467 |         25        0.44       70.35
       5514 |         25        0.44       70.80
       5566 |         25        0.44       71.24
       5638 |         25        0.44       71.68
       5685 |         25        0.44       72.12
       5708 |         25        0.44       72.57
       5761 |         25        0.44       73.01
       5805 |         25        0.44       73.45
       5827 |         25        0.44       73.89
       6030 |         25        0.44       74.34
       6152 |         25        0.44       74.78
       6153 |         25        0.44       75.22
       6179 |         25        0.44       75.66
       6255 |         25        0.44       76.11
       6318 |         25        0.44       76.55
       6428 |         25        0.44       76.99
       6751 |         25        0.44       77.43
       6807 |         25        0.44       77.88
       6822 |         25        0.44       78.32
       6868 |         25        0.44       78.76
       7316 |         25        0.44       79.20
       7506 |         25        0.44       79.65
       9193 |         25        0.44       80.09
       9299 |         25        0.44       80.53
       9516 |         25        0.44       80.97
       9821 |         25        0.44       81.42
       9830 |         25        0.44       81.86
       9831 |         25        0.44       82.30
       9832 |         25        0.44       82.74
       9833 |         25        0.44       83.19
       9834 |         25        0.44       83.63
       9835 |         25        0.44       84.07
       9836 |         25        0.44       84.51
       9837 |         25        0.44       84.96
       9838 |         25        0.44       85.40
       9839 |         25        0.44       85.84
       9840 |         25        0.44       86.28
       9841 |         25        0.44       86.73
       9842 |         25        0.44       87.17
       9843 |         25        0.44       87.61
       9844 |         25        0.44       88.05
       9845 |         25        0.44       88.50
       9846 |         25        0.44       88.94
       9942 |         25        0.44       89.38
       9993 |         25        0.44       89.82
      10016 |         25        0.44       90.27
      10037 |         25        0.44       90.71
      10040 |         25        0.44       91.15
      10190 |         25        0.44       91.59
      10309 |         25        0.44       92.04
      10456 |         25        0.44       92.48
      10576 |         25        0.44       92.92
      10676 |         25        0.44       93.36
      10818 |         25        0.44       93.81
      10825 |         25        0.44       94.25
      11024 |         25        0.44       94.69
      11027 |         25        0.44       95.13
      11268 |         25        0.44       95.58
      11284 |         25        0.44       96.02
      11358 |         25        0.44       96.46
      11719 |         25        0.44       96.90
      12266 |         25        0.44       97.35
      12606 |         25        0.44       97.79
      12627 |         25        0.44       98.23
      12655 |         25        0.44       98.67
      12772 |         25        0.44       99.12
      12891 |         25        0.44       99.56
      12895 |         25        0.44      100.00
------------+-----------------------------------
      Total |      5,650      100.00
[361]:
drop if LPIS_install_qt < tq(2013q1) & flag_LPIS_ever == 1
(5,650 observations deleted)

6.g. Safe Streets for Seniors

[369]:
codebook Name

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Name                                                                                                                                                                 (unlabeled)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                  type:  string (str26)

         unique values:  42                       missing "":  0/324,675

              examples:  "nan"
                         "nan"
                         "nan"
                         "nan"

               warning:  variable has embedded blanks
[370]:
tab Name

                      Name |      Freq.     Percent        Cum.
---------------------------+-----------------------------------
                   Astoria |        775        0.24        0.24
                Bath Beach |        975        0.30        0.54
                 Bay Ridge |      1,700        0.52        1.06
        Bedford Stuyvesant |      4,600        1.42        2.48
               Bensonhurst |      2,475        0.76        3.24
              Borough Park |      5,000        1.54        4.78
            Brighton Beach |        225        0.07        4.85
               Brownsville |      5,100        1.57        6.42
        Chelsea/9th Avenue |      1,900        0.59        7.01
                 Chinatown |        475        0.15        7.15
           Corona Elmhurst |      7,125        2.19        9.35
            East Concourse |        450        0.14        9.49
             East Flatbush |        600        0.18        9.67
               East Harlem |      2,625        0.81       10.48
              East Village |        125        0.04       10.52
                  Flatbush |      1,225        0.38       10.90
                  Flushing |        700        0.22       11.11
Fordham/University Heights |        600        0.18       11.30
              Forest Hills |      1,200        0.37       11.67
                Greenpoint |        450        0.14       11.80
          Hamilton Heights |      1,050        0.32       12.13
Highbridge/Lower Concourse |      3,600        1.11       13.24
           Jackson Heights |      1,150        0.35       13.59
             Jamaica Hills |        125        0.04       13.63
     Kings Bay - Gerritsen |      1,875        0.58       14.21
               Kingsbridge |        600        0.18       14.39
   Lenox Hill - Turtle Bay |      3,225        0.99       15.38
           Lower East Side |        800        0.25       15.63
          Manhattan Valley |      3,175        0.98       16.61
 Middle Village - Glendale |      1,175        0.36       16.97
                   Midwood |        300        0.09       17.06
                Mott Haven |      1,075        0.33       17.39
 New Dorp/Hyland Boulevard |        200        0.06       17.46
            Pelham Gardens |         75        0.02       17.48
                 Rego Park |        250        0.08       17.56
            Sheepshead Bay |        750        0.23       17.79
               South Beach |        100        0.03       17.82
                 Sunnyside |        625        0.19       18.01
           Upper West Side |      1,675        0.52       18.53
        Washington Heights |      1,375        0.42       18.95
                 Yorkville |      2,875        0.89       19.84
                       nan |    260,275       80.16      100.00
---------------------------+-----------------------------------
                     Total |    324,675      100.00
[371]:
gen flag_seniors = (strpos(Name, "nan") == 0)
[373]:
tab flag_seniors

flag_senior |
          s |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |    260,275       80.16       80.16
          1 |     64,400       19.84      100.00
------------+-----------------------------------
      Total |    324,675      100.00
[374]:
tab flag_seniors flag_LPIS

           |     Indicates the
           |  intersection at the
           |  time they became an
flag_senio |   LPIS intervention
        rs |         0          1 |     Total
-----------+----------------------+----------
         0 |   246,705     13,570 |   260,275
         1 |    58,336      6,064 |    64,400
-----------+----------------------+----------
     Total |   305,041     19,634 |   324,675

6.h. Priority Intersection

[377]:
sum distance_to_p*, detail

                 distance_to_priorityinters
-------------------------------------------------------------
      Percentiles      Smallest
 1%            3              0
 5%          372              0
10%          640              0       Obs             324,675
25%         1110              0       Sum of Wgt.     324,675

50%         1991                      Mean           2603.942
                        Largest       Std. Dev.      2534.222
75%         3221          28481
90%         4978          28481       Variance        6422279
95%         6561          28481       Skewness        3.67198
99%        13842          28481       Kurtosis       23.89315
[376]:
gen flag_priorityinters = (distance_to_priorityinters < 10)
[379]:
tab flag_priorityinters

flag_priori |
   tyinters |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |    319,200       98.31       98.31
          1 |      5,475        1.69      100.00
------------+-----------------------------------
      Total |    324,675      100.00
[378]:
tab flag_priorityinters flag_LPIS

           |     Indicates the
           |  intersection at the
           |  time they became an
flag_prior |   LPIS intervention
 ityinters |         0          1 |     Total
-----------+----------------------+----------
         0 |   300,649     18,551 |   319,200
         1 |     4,392      1,083 |     5,475
-----------+----------------------+----------
     Total |   305,041     19,634 |   324,675

6.i. School intersection/crossing

[380]:
sum distance_to_Sch,detail

                       distance_to_Sch
-------------------------------------------------------------
      Percentiles      Smallest
 1%          125             17
 5%          197             17
10%          246             17       Obs             324,675
25%          425             17       Sum of Wgt.     324,675

50%          692                      Mean           822.7985
                        Largest       Std. Dev.      612.1056
75%         1055          10685
90%         1511          10685       Variance       374673.3
95%         1872          10685       Skewness       3.568631
99%         2871          10685       Kurtosis        35.5307
[381]:
gen flag_school = (distance_to_Sch < 200)
[382]:
tab flag_school

flag_school |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |    307,775       94.79       94.79
          1 |     16,900        5.21      100.00
------------+-----------------------------------
      Total |    324,675      100.00
[383]:
tab flag_school flag_LPIS

           |     Indicates the
           |  intersection at the
           |  time they became an
flag_schoo |   LPIS intervention
         l |         0          1 |     Total
-----------+----------------------+----------
         0 |   289,690     18,085 |   307,775
         1 |    15,351      1,549 |    16,900
-----------+----------------------+----------
     Total |   305,041     19,634 |   324,675

6.j. Decay effect

[362]:
sort intersection_id quarterly
[363]:
bysort intersection_id: gen flag_LPIS_1yr = 1 if quarterly == (LPIS_install_qt+1)
bysort intersection_id: replace flag_LPIS_1yr = 1 if quarterly == (LPIS_install_qt+2)
bysort intersection_id: replace flag_LPIS_1yr = 1 if quarterly == (LPIS_install_qt+3)
bysort intersection_id: replace flag_LPIS_1yr = 1 if quarterly == (LPIS_install_qt+4)
replace flag_LPIS_1yr = 0 if missing(flag_LPIS_1yr)
replace flag_LPIS_1yr = 0 if flag_LPIS_ever == 0

(313,505 missing values generated)

(10,265 real changes made)

(9,357 real changes made)

(8,784 real changes made)

(285,099 real changes made)

(30,934 real changes made)
[364]:
bysort intersection_id: gen flag_LPIS_2yr = 1 if quarterly == (LPIS_install_qt+5)
bysort intersection_id: replace flag_LPIS_2yr = 1 if quarterly == (LPIS_install_qt+6)
bysort intersection_id: replace flag_LPIS_2yr = 1 if quarterly == (LPIS_install_qt+7)
bysort intersection_id: replace flag_LPIS_2yr = 1 if quarterly == (LPIS_install_qt+8)
replace flag_LPIS_2yr = 0 if missing(flag_LPIS_2yr)
replace flag_LPIS_2yr = 0 if flag_LPIS_ever == 0

(316,621 missing values generated)

(7,091 real changes made)

(6,147 real changes made)

(5,611 real changes made)

(297,772 real changes made)

(21,441 real changes made)
[365]:
bysort intersection_id: gen flag_LPIS_2yrup = 1 if quarterly == (LPIS_install_qt+5)
bysort intersection_id: replace flag_LPIS_2yrup = 1 if flag_LPIS_2yrup[_n-1] == 1
replace flag_LPIS_2yrup = 0 if missing(flag_LPIS_2yrup)
replace flag_LPIS_2yrup = 0 if flag_LPIS_ever == 0

(316,621 missing values generated)

(35,482 real changes made)

(281,139 real changes made)

(35,233 real changes made)
[366]:
bysort intersection_id: gen flag_LPIS_3yrup = 1 if quarterly == (LPIS_install_qt+9)
bysort intersection_id: replace flag_LPIS_3yrup = 1 if flag_LPIS_3yrup[_n-1] == 1
replace flag_LPIS_3yrup = 0 if missing(flag_LPIS_3yrup)
replace flag_LPIS_3yrup = 0 if flag_LPIS_ever == 0

(320,275 missing values generated)

(14,683 real changes made)

(305,592 real changes made)

(16,242 real changes made)

6.k. Panel is strongly balanced

[367]:
duplicates report intersection_id

Duplicates in terms of intersection_id

--------------------------------------
   copies | observations       surplus
----------+---------------------------
       25 |       324675        311688
--------------------------------------

6.l. Merge outcomes data

[384]:
// Outcomes data
mmerge quarter year intersection_id using "..\working_data\collision_quarterly.dta", ///
type(1:1) ///
unmatched(master)

-------------------------------------------------------------------------------
merge specs          |
       matching type | 1:1
  mv's on match vars | none
  unmatched obs from | master
---------------------+---------------------------------------------------------
  master        file | <data in memory not named>
                 obs | 324675
                vars |     79
          match vars | quarter year intersection_id  (key)
  -------------------+---------------------------------------------------------
  using         file | ..\working_data\collision_quarterly.dta
                 obs | 195341
                vars |     30
          match vars | quarter year intersection_id  (key)
---------------------+---------------------------------------------------------
result          file | <data in memory not named>
                 obs | 324675
                vars |    108  (including _merge)
         ------------+---------------------------------------------------------
              _merge | 133939  obs only in master data                (code==1)
                     | 190736  obs both in master and using data      (code==3)
-------------------------------------------------------------------------------
Reason that we did not match every obs from using is:
1. That we droped intersections in the intersection level data merge process 2. Certain LPIS junctions have been installed before the collisions datasets begin

6.m. Fill in missing information

[385]:
// Fill in '0' for intersection quarter year that did not have collisions
replace personsinjured = 0 if _merge == 1
replace personskilled = 0 if _merge == 1
replace pedestriansinjured = 0 if _merge == 1
replace pedestrianskilled = 0 if _merge == 1
replace cyclistinjured = 0 if _merge == 1
replace cyclistkilled = 0 if _merge == 1
replace motoristinjured = 0 if _merge == 1
replace motoristkilled = 0 if _merge == 1
replace collision_count = 0 if mi(collision_count)

replace latenight_personsinjured = 0 if _merge == 1
replace latenight_personskilled = 0 if _merge == 1
replace latenight_pedestriansinjured = 0 if _merge == 1
replace latenight_pedestrianskilled = 0 if _merge == 1
replace latenight_cyclistinjured = 0 if _merge == 1
replace latenight_cyclistkilled = 0 if _merge == 1
replace latenight_motoristinjured = 0 if _merge == 1
replace latenight_motoristkilled = 0 if _merge == 1
replace latenight_collision_count = 0 if mi(latenight_collision_count)

replace day_personsinjured = 0 if _merge == 1
replace day_personskilled = 0 if _merge == 1
replace day_pedestriansinjured = 0 if _merge == 1
replace day_pedestrianskilled = 0 if _merge == 1
replace day_cyclistinjured = 0 if _merge == 1
replace day_cyclistkilled = 0 if _merge == 1
replace day_motoristinjured = 0 if _merge == 1
replace day_motoristkilled = 0 if _merge == 1
replace day_collision_count = 0 if mi(day_collision_count)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)

(133,939 real changes made)
[387]:
gen flag_collision = collision_count > 0
gen latenight_flag_collision = latenight_collision_count > 0
gen day_flag_collision = day_collision_count > 0

Label outcomes variables

[388]:
label variable personsinjured "No. of persons injured during that month"
label variable personskilled "No. of persons killed during that month"
label variable pedestriansinjured "No. of pedestrians injured during that month"
label variable pedestrianskilled "No. of pedestrians killed during that month"
label variable cyclistinjured "No. of cyclist injured during that month"
label variable cyclistkilled "No. of cyclist killed during that month"
label variable motoristinjured "No. of motorist injured during that month"
label variable motoristkilled "No. of motorist killed during that month"
label variable collision_count "No. of collisions occured during that month"
label variable flag_collision "At least one collision occurred that month"

label variable latenight_personsinjured "No. of persons injured during that month between 11pm - 5am"
label variable latenight_personskilled "No. of persons killed during that month between 11pm - 5am"
label variable latenight_pedestriansinjured "No. of pedestrians injured during that month between 11pm - 5am"
label variable latenight_pedestrianskilled "No. of pedestrians killed during that month between 11pm - 5am"
label variable latenight_cyclistinjured "No. of cyclist injured during that month between 11pm - 5am"
label variable latenight_cyclistkilled "No. of cyclist killed during that month between 11pm - 5am"
label variable latenight_motoristinjured "No. of motorist injured during that month between 11pm - 5am"
label variable latenight_motoristkilled "No. of motorist killed during that month between 11pm - 5am"
label variable latenight_collision_count "No. of collisions occured during that month between 11pm - 5am"
label variable latenight_flag_collision " At least one collision occurred that month between 11pm - 5am"

label variable day_personsinjured "No. of persons injured during that month between 5am - 11pm"
label variable day_personskilled "No. of persons killed during that month between 5am - 11pm"
label variable day_pedestriansinjured "No. of pedestrians injured during that month between 5am - 11pm"
label variable day_pedestrianskilled "No. of pedestrians killed during that month between 5am - 11pm"
label variable day_cyclistinjured "No. of cyclist injured during that month between 5am - 11pm"
label variable day_cyclistkilled "No. of cyclist killed during that month between 5am - 11pm"
label variable day_motoristinjured "No. of motorist injured during that month between 5am - 11pm"
label variable day_motoristkilled "No. of motorist killed during that month between 5am - 11pm"
label variable day_collision_count "No. of collisions occured during that month between 5am - 11pm"
label variable day_flag_collision " At least one collision occurred that month between 5am - 11pm"
[ ]:
//save "..\working_data\analytical_file_panel_qt.dta",replace

7. Convert analytical panel data into shapefile

  • python

[2]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\working_data\analytical_file_panel_qt.dta"
[3]:
analytical_df = pd.read_stata(fp)
[4]:
analytical_df.head(3)
[4]:
intersection_id quarter year quarterly index y x st1_name st2_name st3_name ... day_personskilled day_pedestriansinjured day_pedestrianskilled day_cyclistinjured day_cyclistkilled day_motoristinjured day_motoristkilled flag_collision latenight_flag_collision day_flag_collision
0 1 1 2013 2013-01-01 0 199793.6093 986336.1490 ALLEN STREET CANAL STREET ... 0 0 0 0 0 0 0 1.0 0.0 1.0
1 2 1 2013 2013-01-01 1 202206.1619 982769.3310 AVENUE OF THE AMERICAS LAIGHT STREET CANAL STREET ... 0 0 0 0 0 0 0 0.0 0.0 0.0
2 3 1 2013 2013-01-01 2 201790.0942 982805.6184 AVENUE OF THE AMERICAS LISPENARD STREET WEST BROADWAY ... 0 0 0 0 0 0 0 0.0 0.0 0.0

3 rows × 163 columns

[5]:
analytical_df.columns
[5]:
Index(['intersection_id', 'quarter', 'year', 'quarterly', 'index', 'y', 'x',
       'st1_name', 'st2_name', 'st3_name',
       ...
       'day_personskilled', 'day_pedestriansinjured', 'day_pedestrianskilled',
       'day_cyclistinjured', 'day_cyclistkilled', 'day_motoristinjured',
       'day_motoristkilled', 'flag_collision', 'latenight_flag_collision',
       'day_flag_collision'],
      dtype='object', length=163)
[6]:
# Put the latitude and longtitude
analytical_df['geometry'] = list(zip(analytical_df.x, analytical_df.y))
analytical_df['geometry'] = analytical_df['geometry'].apply(Point)
analytical_gdf = gpd.GeoDataFrame(analytical_df, geometry='geometry')
[7]:
analytical_gdf.crs = {'init' :'epsg:2263'}
[8]:
analytical_gdf.plot(figsize=(15, 15));
_images/Impact-Evaluation-of-Leading-Pedestrian-Interval-Signals_336_0.png

The shapefile is required to calculate contiguity matrix in Stata

[397]:
 save_out = analytical_gdf[['collision_count','personsinjured','pedestriansinjured','cyclistinjured','motoristinjured',
                 'bronx','brooklyn','manhattan','queens','statenisland','distance_to_LPIS','distance_to_Street',
                 'distance_to_Sch','distance_to_LTC','distance_to_bikeroute','distance_to_truckroute',
                 'flag_left_turn_ever','flag_left_turn','flag_LPIS_ever','flag_LPIS','bike_route_tv',
                 'bike_route_ever', 'flag_seniors','flag_priorityinters','flag_school', 'flag_street_improv',
                 'quarter','year','intersection_id','y','x','geometry']]
[398]:
fp =  r"C:\Users\jerem\Box Sync\Policy Evaluation\working_data\analytical_panel_shapefile\analytical_panel_qt_shapefile.shp"
[ ]:
#save_out.to_file(origin+fp)

8. Thiessen Polygons

  • python

[9]:
analytical_gdf.columns
[9]:
Index(['intersection_id', 'quarter', 'year', 'quarterly', 'index', 'y', 'x',
       'st1_name', 'st2_name', 'st3_name',
       ...
       'day_pedestriansinjured', 'day_pedestrianskilled', 'day_cyclistinjured',
       'day_cyclistkilled', 'day_motoristinjured', 'day_motoristkilled',
       'flag_collision', 'latenight_flag_collision', 'day_flag_collision',
       'geometry'],
      dtype='object', length=164)

We want to extract only the cross section of the quarterly panel shapefile

[10]:
analytical_gdf.head(3)
[10]:
intersection_id quarter year quarterly index y x st1_name st2_name st3_name ... day_pedestriansinjured day_pedestrianskilled day_cyclistinjured day_cyclistkilled day_motoristinjured day_motoristkilled flag_collision latenight_flag_collision day_flag_collision geometry
0 1 1 2013 2013-01-01 0 199793.6093 986336.1490 ALLEN STREET CANAL STREET ... 0 0 0 0 0 0 1.0 0.0 1.0 POINT (986336.149 199793.609300002)
1 2 1 2013 2013-01-01 1 202206.1619 982769.3310 AVENUE OF THE AMERICAS LAIGHT STREET CANAL STREET ... 0 0 0 0 0 0 0.0 0.0 0.0 POINT (982769.331 202206.161899999)
2 3 1 2013 2013-01-01 2 201790.0942 982805.6184 AVENUE OF THE AMERICAS LISPENARD STREET WEST BROADWAY ... 0 0 0 0 0 0 0.0 0.0 0.0 POINT (982805.6183999931 201790.0942)

3 rows × 164 columns

[11]:
# Extract cross section of quarter==3 & year == 2012
sub_analytical_gdf = gpd.GeoDataFrame(analytical_gdf.loc[(analytical_gdf['quarter'] == 3) & (analytical_gdf['year'] == 2012)])
[12]:
sub_analytical_gdf['intersection_id'].count()
[12]:
12987
[13]:
sub_analytical_gdf.crs = {'init': 'epsg:2263'}
[14]:
points = list(zip(sub_analytical_gdf.x,sub_analytical_gdf.y))

Instead of extracting a cross section, we can also collapse the data to the unique intersections level

[15]:
collapsed_analytical_gdf = gpd.GeoDataFrame(analytical_gdf.groupby('intersection_id').agg({
    'collision_count': sum, # find the sum of collision_count in each intersection
    'personsinjured': sum, # find the sum of personsinjured in each intersection
    'pedestriansinjured': sum, # find the sum of pedestriansinjured in each intersection
    'cyclistinjured': sum , # find the sum of cyclistinjured in each intersection
    'motoristinjured': sum , # find the sum of motoristinjured in each intersection
    'flag_LPIS_ever': "first" ,
    'x': "first",
    'y': "first",
    'geometry': "first"})).reset_index()    # find the first geometry
[16]:
collapsed_analytical_gdf.columns
[16]:
Index(['intersection_id', 'collision_count', 'personsinjured',
       'pedestriansinjured', 'cyclistinjured', 'motoristinjured',
       'flag_LPIS_ever', 'x', 'y', 'geometry'],
      dtype='object')
[17]:
collapsed_analytical_gdf['intersection_id'].count()
[17]:
12987
[18]:
collapsed_analytical_gdf.crs = {'init': 'epsg:2263'}
[19]:
points = list(zip(collapsed_analytical_gdf.x,collapsed_analytical_gdf.y))

8.a. Voronoi

Creates dataframe regions and vertices

source: https://github.com/pysal/libpysal/blob/master/notebooks/voronoi.ipynb

[20]:
regions, vertices = voronoi_frames(points)
[21]:
regions.columns
[21]:
Index(['geometry'], dtype='object')
[22]:
regions.crs = {'init': 'epsg:2263'}
vertices.crs = {'init': 'epsg:2263'}
[23]:
fig, ax = plt.subplots(figsize=(15, 15))
regions.plot(ax=ax, color='blue',edgecolor='black', alpha=0.3)
vertices.plot(ax=ax, color='red');
_images/Impact-Evaluation-of-Leading-Pedestrian-Interval-Signals_360_0.png

Problem here is that the voronoi polygons extends far out from NYC’s boundaries. Therefore, we will use geopandas’s overlay to “cut” the borough shape out.

[24]:
fp = r"C:\Users\jerem\Box Sync\Policy Evaluation\input_data\nyc_boroughs_2263\nyc_boroughs_2263.shp"
[25]:
borough_gdf = gpd.read_file(fp)
[26]:
borough_gdf.crs
[26]:
{'proj': 'lcc',
 'lat_1': 41.03333333333333,
 'lat_2': 40.66666666666666,
 'lat_0': 40.16666666666666,
 'lon_0': -74,
 'x_0': 300000.0000000001,
 'y_0': 0,
 'datum': 'NAD83',
 'units': 'us-ft',
 'no_defs': True}
[27]:
borough_gdf = borough_gdf.to_crs({'init': 'epsg:2263'})
[28]:
borough_gdf.count()
[28]:
bcode       5
bname       5
name        5
namelsad    5
geometry    5
dtype: int64

There are multipolygons in each boroughs, but we want each polygon to be on its own. Therefore, we will use geopandas’s explode function.

[29]:
borough_explode_gdf = borough_gdf.explode()
[30]:
borough_explode_gdf.count()
[30]:
bcode       85
bname       85
name        85
namelsad    85
geometry    85
dtype: int64
[31]:
# Overlay with borough to extract the outlines
voronoi_sub_analytical_gdf = gpd.overlay(regions, borough_explode_gdf, how='intersection')
[32]:
voronoi_sub_analytical_gdf.plot(figsize=(15, 15));
_images/Impact-Evaluation-of-Leading-Pedestrian-Interval-Signals_371_0.png
[33]:
voronoi_sub_analytical_gdf.count()
[33]:
bcode       13315
bname       13315
name        13315
namelsad    13315
geometry    13315
dtype: int64
[34]:
voronoi_sub_analytical_gdf.columns
[34]:
Index(['bcode', 'bname', 'name', 'namelsad', 'geometry'], dtype='object')
[35]:
collapsed_analytical_gdf.columns
[35]:
Index(['intersection_id', 'collision_count', 'personsinjured',
       'pedestriansinjured', 'cyclistinjured', 'motoristinjured',
       'flag_LPIS_ever', 'x', 'y', 'geometry'],
      dtype='object')

Problem: Recall that there were only 12987 counts previously, but now there is 13315. We will use geopandas’s sjoin to find polygons that intersect with the signal intersection points.

[38]:
voronoi_intersections_collapsed_analytical_gdf = gpd.sjoin(voronoi_sub_analytical_gdf,collapsed_analytical_gdf, how='inner', op='intersects')
[40]:
voronoi_intersections_collapsed_analytical_gdf.head()
[40]:
bcode bname name namelsad geometry index_right intersection_id collision_count personsinjured pedestriansinjured cyclistinjured motoristinjured flag_LPIS_ever x y
1 36061 Manhattan New York New York County POLYGON ((986291.1955668803 200052.298871208, ... 0 1 97.0 21.0 7 3 11.0 0 986336.1490 199793.6093
3 36061 Manhattan New York New York County POLYGON ((982721.8764644475 202448.3499543682,... 1 2 4.0 1.0 0 0 1.0 0 982769.3310 202206.1619
5 36061 Manhattan New York New York County POLYGON ((983012.7139141727 201750.4648247989,... 2 3 0.0 0.0 0 0 0.0 0 982805.6184 201790.0942
7 36061 Manhattan New York New York County POLYGON ((982967.8732795889 201664.0831515846,... 3 4 50.0 5.0 3 1 1.0 0 982812.2555 201529.9618
9 36061 Manhattan New York New York County POLYGON ((982619.6644923392 201201.7787439331,... 4 5 45.0 10.0 1 2 7.0 0 982816.3956 201226.1097
[41]:
voronoi_sub_analytical_gdf['bcode'].count()
[41]:
13315
[42]:
print(voronoi_intersections_collapsed_analytical_gdf.columns.values)
['bcode' 'bname' 'name' 'namelsad' 'geometry' 'index_right'
 'intersection_id' 'collision_count' 'personsinjured' 'pedestriansinjured'
 'cyclistinjured' 'motoristinjured' 'flag_LPIS_ever' 'x' 'y']

It is suppose to be 12,987, but we only have 12983. The 4 intersections were lost because they are outside the boundaries of the boroughs.

[43]:
voronoi_intersections_collapsed_analytical_gdf.plot(figsize=(15, 15));
_images/Impact-Evaluation-of-Leading-Pedestrian-Interval-Signals_381_0.png
[84]:
quantiles = ps.Quantiles(voronoi_intersections_collapsed_analytical_gdf['collision_count'], k=4)
f, ax = plt.subplots(1, figsize=(25, 18))
voronoi_intersections_collapsed_analytical_gdf.assign(cl=quantiles.yb).plot(column='cl', categorical=True, \
        k=4, cmap='OrRd', linewidth=0.1, ax=ax, \
        edgecolor='black', legend=True)
ax.set_axis_off()
plt.title("Total Number of Collisions by Quartiles")
plt.savefig('../manuscripts/NYC_collisions_quartiles.pdf', bbox_inches='tight')
_images/Impact-Evaluation-of-Leading-Pedestrian-Interval-Signals_382_0.png
[85]:
quantiles10 = ps.Quantiles(voronoi_intersections_collapsed_analytical_gdf['collision_count'], k=10)
f, ax = plt.subplots(1, figsize=(25, 18))
voronoi_intersections_collapsed_analytical_gdf.assign(cl=quantiles10.yb).plot(column='cl', categorical=True, \
        k=10, cmap='OrRd', linewidth=0.1, ax=ax, \
        edgecolor='black', legend=True)
ax.set_axis_off()
plt.title("Total Number of Collisions by Deciles")
plt.savefig('../manuscripts/NYC_collisions_deciles.pdf', bbox_inches='tight')
_images/Impact-Evaluation-of-Leading-Pedestrian-Interval-Signals_383_0.png
[79]:
quantiles = ps.Quantiles(voronoi_intersections_collapsed_analytical_gdf['flag_LPIS_ever'], k=2)
f, ax = plt.subplots(1, figsize=(25, 18))
voronoi_intersections_collapsed_analytical_gdf.assign(cl=quantiles.yb).plot(column='cl', categorical=True, \
        k=4, cmap='BuPu', linewidth=0.1, ax=ax, \
        edgecolor='black', legend=True)
ax.set_axis_off()
plt.title("Leading Pedestrian Intervanl Signals")
plt.show();
_images/Impact-Evaluation-of-Leading-Pedestrian-Interval-Signals_384_0.png

8.b. Spatial Autocorrelation

“The concept of spatial autocorrelation relates to the combination of two types of similarity: spatial similarity and attribute similarity. Although there are many different measures of spatial autocorrelation, they all combine these two types of simmilarity into a summary measure.” source: http://darribas.org/gds_scipy16/ipynb_md/04_esda.html

[47]:
voronoi_intersections_collapsed_analytical_gdf.columns
[47]:
Index(['bcode', 'bname', 'name', 'namelsad', 'geometry', 'index_right',
       'intersection_id', 'collision_count', 'personsinjured',
       'pedestriansinjured', 'cyclistinjured', 'motoristinjured',
       'flag_LPIS_ever', 'x', 'y'],
      dtype='object')
[48]:
 save_out = voronoi_intersections_collapsed_analytical_gdf[['intersection_id', 'collision_count', 'personsinjured',
       'pedestriansinjured', 'cyclistinjured', 'motoristinjured', 'y', 'x',
       'geometry']]
[51]:
fp =  r"C:\Users\jerem\Box Sync\Policy Evaluation\working_data\analytical_panel_shapefile\collapsed_analytical_panel_qt_shapefile.shp"

save_out.to_file(fp)
[52]:
data = ps.pdio.read_files(fp)
W = ps.queen_from_shapefile(fp)
W.transform = 'r'
[53]:
data.columns
[53]:
Index(['intersecti', 'collision_', 'personsinj', 'pedestrian', 'cyclistinj',
       'motoristin', 'y', 'x', 'geometry'],
      dtype='object')
[56]:
data.columns = (['intersection_id', 'collision_count', 'personsinjured',
       'pedestriansinjured', 'cyclistinjured', 'motoristinjured', 'y', 'x',
       'geometry'])
[57]:
data.columns
[57]:
Index(['intersection_id', 'collision_count', 'personsinjured',
       'pedestriansinjured', 'cyclistinjured', 'motoristinjured', 'y', 'x',
       'geometry'],
      dtype='object')
[58]:
collision_count_Lag = ps.lag_spatial(W, data['collision_count'])
[67]:
collision_count_LagQ4 = ps.Quantiles(collision_count_Lag, k=4)
collision_count_LagQ10 = ps.Quantiles(collision_count_Lag, k=10)
[88]:
f, ax = plt.subplots(1, figsize=(25, 18))
voronoi_intersections_collapsed_analytical_gdf.assign(cl=collision_count_LagQ4.yb).plot(column='cl', categorical=True, \
        k=4, cmap='OrRd', linewidth=0.1, ax=ax, \
        edgecolor='black', legend=True)
ax.set_axis_off()
plt.title("Collisions Spatial Lag Quartiles")
plt.savefig('../manuscripts/NYC_collisions_quartiles_lagged.pdf', bbox_inches='tight')
_images/Impact-Evaluation-of-Leading-Pedestrian-Interval-Signals_396_0.png
[91]:
f, ax = plt.subplots(1, figsize=(25, 18))
voronoi_intersections_collapsed_analytical_gdf.assign(cl=collision_count_LagQ10.yb).plot(column='cl', categorical=True, \
        k=10, cmap='OrRd', linewidth=0.1, ax=ax, \
        edgecolor='black', legend=True)
ax.set_axis_off()
plt.title("Collisions Spatial Lag Deciles")
plt.savefig('../manuscripts/NYC_collisions_deciles_lagged.pdf')
_images/Impact-Evaluation-of-Leading-Pedestrian-Interval-Signals_397_0.png