Using Open Data to Derive Local Amenity Demand Patterns for Walkability Simulations and Amenity Utilization Analysis

Understanding human behavior and preferences are important for urban planning and the design of walkable neighborhoods. However, it remains challenging to study human activity patterns because significant efforts are required to collect the relevant data, convert unstructured data into useful knowledge, and take into consideration different urban contexts. In the context of the heated discussion about urban walkability and amenities, as well as the need of identifying a feasible approach to analyze human activities, this paper proposes a simple and effective metric of the amenity demand patterns, which demonstrates the spatiotemporal distribution of human activities according to the activeness in urban amenities. Such metric has the potential to support the urban study about people, mobility, and built environment, as well as other relevant design thinking. Further, a case study illustrates the data and the new metric can be used in walkability simulations and amenity utilization analysis, thus informing the design decision-making process.


INTRODUCTION
Understanding human behavior is crucial for studying mobility-related issues in urban planning and design (Huang and Wong 2016).It not only supports decision-making in road network design, real estate development, and architecture programming but also relates to interdisciplinary topics such as healthy and sustainable cities (Nieuwenhuijsen and Khreis 2019).However, it remains challenging to study human activities because significant efforts are required to collect the relevant data, convert unstructured data into useful knowledge, and take into con-sideration different urban contexts.
Large efforts have been made to empower urban study about human activities with data-based methods.Some of them are used to describe the aggregated commuting pattern and characterize the built environment.For example, Grauwin et al (2015) clustered mobile network signatures and used them to characterize human dynamic behavior on the city and local scale.Dashdorj and Sobolevsky (2016) classified geographical areas based on their amenity distribution, such as working or shopping oriented areas and proved the similarity between such activity categorical profiles and human activity timeline categories estimated through cell phone data records.Frias-Martinez et al. (2012) identified urban land uses and landmarks from geolocated Twitter data.Besides, there are also more specific metrics proposed to support planning decision-making from different aspects.Yoshimura et al. (2016) uncovered customers' spatial distributions by analyzing their consecutive transactions, which can help improve spatial arrangements of retail shops.Wang et al. (2015) defined linked activity spaces among social groups by analyzing call dataset, which may relate to demands for amenities for social life.Despite new perspectives in understanding the urban environment, none of these researches have directly shown the way of implementing the proposed methods in the planning or design process, or how these data can practically influence the decision making.
Meanwhile, current planning paradigms promote high density, walkable neighborhoods as one solution for many challenges.Studies have shown that walkable neighborhoods can significantly reduce traffic-related pollution and lower the risk for chronic diseases (Frank et al. 2006;Lee and Buchner 2008), promote economic growth and prosperity (Claris and Scopelliti 2016), and foster an increase in social capital and political participation (Leyden 2003).Walkable amenities, as one of the most important factors towards a walkable city, have also become a heated topic in urban planning and design.It has been associated with socioeconomic growth (Clark et al. 2002;Zandiatashbar and Hamidi 2018), urban environment (Carlino and Saiz 2019) and quality of life (Mulligan and Carruthers 2011;Carr et al. 2011).
In the context of the discussion about urban walkability and amenities, as well as the need of identifying a feasible approach to analyze human activities, this paper proposes a simple and effective metric of the amenity demand patterns, which demonstrates the spatiotemporal distribution of human activities according to the activeness in urban amenities.More specifically, we regard the utility of ameni-ties as the indicator of human activity (e.g.going to a restaurant, going to a cinema) because (1) urban amenities basically refer to all the services, functions and infrastructures that residents use in their daily lives (Allen 2015), which makes it a viable measure of what people usually do; (2) the amenity-based metric explicitly specifies people's destinations so that it can be interpreted and employed in mobility-related researches more effortlessly.
Also, this paper introduces a case study about the way of utilizing this metric in the design process.Based on Urbano (Dogan et al., 2018), an independently researched and developed plugin toolbox for Rhinoceros3D & Grasshopper, this metric proves to be valuable input data that can help promote mobility-aware urban design.

METHODOLOGY
We consider Lower Manhattan as our test ground because it is one of the most walkable regions worldwide with a high density of urban amenities.For other cities, although details of data processing may vary due to the difference in available open data, the foundational information needed to derive the result stays the same.Firstly, the PLUTO dataset from NYC Department of City Planning is an extensive land use dataset at the tax lot level.It contains fields about building floor area of different land uses including commercial, residential, office, retail, garage, storage, factory, and other use.Secondly, the Overpass API of Open Street Maps (OSM) provides geographical data of urban fabrics and building footprints.It also contains information about urban amenities as nodes with metadata telling its coordinates, name, amenity type, opening hours, rating, etc.Although OSM data has reasonably high quality and is comparatively easy to obtain, its amenity data is not as comprehensive as Google Places data (Figure 1).Furthermore, Google's data provides a critical dataset called popular times, which presents the utility of a specific place using a 24/7 matrix with each popularity number of any given hour relative to the typical peak popularity (100) for the business for the week (Table 3).To infer occupant density in the amenities we refer to ASHRAE Standard 62.1-2013: Ventilation for Acceptable Indoor Air Quality that provides standardized occupancy assumptions per architectural use case (ASHRAE 2015).

Data Processing
In general, the necessary information to derive the result consists of amenity type, location, capacity, and temporal utility of each amenity in the studied region.The process is introduced here by steps (Figure 2).
Step 1 is to map out the primary data, including PLUTO data and all the amenity information from OSM and Google Maps.The latter one is straightforward, while the first one needs additional processing.In PLUTO data, The location of a lot is depicted by the coordinate of a point near its center, expressed in the New York-Long Island State Plane coordinate system.Also, the commercial area refers to all the allocated area for commercial use in a lot, including but not limited to office and retail use.Hence, we use commercial area subtracting office area as the equivalence of the total indoor amenity area in each tax lot and then map them onto the lat-long coordinate system.Step 2 is to derive amenity areas in preparation for calculating amenity capacities since there is no official open data directly containing such information.We divide the total amenity area on tax lot level equally between all amenities in that lot.This step introduces inaccuracy as amenities may differ in size.Nonetheless, it is an approximate measurement and can obtain desirable results when we apply it to a larger region (e.g. the whole Lower Manhattan Area) and introduce some descriptive statistics such as the interquartile range (IQR) to filter the outliers (Figure 3).The median value, perceived as the normal amenity area of each type, can be assigned to all the outliers as well as amenities with an unknown area in the same region (Table 2).
Step 3 is to calculate amenity capacities based on its type and area.The amenity capacity is the outcome of multiplying the amenity area the standardized occupant densities described by ASHRAE (2015).With regard to the choice of study types (Table 1), we first choose the most common amenity types in the city so that there is sufficient data to retrieve from maps platforms.Among them, we then select important types of walkable amenities as reported by the sample data sheet from Walk Score so that they are valuable for the discussion.Step 4 is to explore temporal utility data from Google popular times.For each amenity with available popular time data, we calculate the average activeness of each hour in a week, which results in an array of 24-hour data expressing the specific utility patterns of this place in a general day.Since the urban mobility pattern differs from weekday to weekend, we also treat them separately (Table 3).
The final step is to combine all the previous results so as to convert the temporal utility pattern to the user pattern.Under an assumption that the peak activeness in amenity ( 100) is equivalent to the utility of full capacity, the temporal amenity utility can be directly transformed into the temporal user population by multiplying its capacity.Ideally, the user population should be calculated for each amenity in the studied region and then be summed up for each type in order to derive the result.However, to limit the computational burden, we instead use a sampling method (300 samples for each type), derive average statistics for each type, and finally multiply them with the total number of amenities in each kind.The even-Table 3 A sample sheet of the comprehensive information collected for one amenity.An array of 24-hour activeness in percent (%) is calculated independently for weekdays and weekends, expressing its temporal utility pattern.
tual datasheet for amenity demand patterns is presented in Table 4.

Result
Using Table 4, the amenity demand patterns can be represented by a 24-hour timeline graph (Figure 4).The total height of the graph refers to the overall amount of activities in the studied region, which peaks during the day and dips in the early morning.Each layer in the graph represents the demand pattern of a particular amenity, some of which are not quite consistent with the overall pattern due to unique functionality.For example, banks and post offices tend to stop service early in the afternoon, while bar and pub become dominant activities during midnight.
Intuitively, human activities demonstrate diverse patterns in distinct spatial and temporal contexts.By separating the weekday and weekend data and making comparative graphs, it is shown that, during the weekend, both daytime and nightlife are more active, and the peak hour of overall activities shifts several hours earlier.Regarding the spatial difference, we conduct our deriving method for Lower Manhattan and Downtown Paris and compare the results using the same graph.It turns out that the urban pulse de-picted by amenity demand patterns in the two cities also have significant differences.

CASE STUDY
To examine the value of amenity demand data in design practice, this case study aims to use the derived data to inform an urban mobility model and discuss how it can influence the design decisions in urban design and program allocation.

Case Setup and Tool Description
Urbano (Urbano.io) is a tool that allows designers to interactively modify the model according to urban mobility simulation results (Figure 5).It streamlines the workflow of importing data, model setup, simulating iteratively and modifying designs in order to promote the mobility-aware urban design process.Moreover, it provides the ability to customize people's different preferences and demands for urban amenities.Taking advantage of these features of Urbano, we embed into the initial model all the relative data of driving the mobility simulation, including amenity locations, building-level population, amenity capacities, and amenity demand patterns in weekdays of Lower Manhattan, along with the automatically downloaded geographical metadata of ur-  ban networks.These inputs construct a contextual model with abundant metadata that enables designers to get a more profound perception of urban mobility environment.
We select a rectangle area in Lower Manhattan as the study region to carry out the general simulation based on the 24-hour matrix of the generated amenity demand patterns input.Additionally, we choose a block in the center of this region to test how a new project would affect the current condition in the sense of pedestrians and amenities (Figure 6).

Result and Analysis
There are 24 simulation results each of which represents the mobility state of an hour in a weekday (Figure 7).Though Urbano is able to calculate various mobility metrics such as walk score and street utility, we will only focus on the simulation of amenity usage, which is called "amenity hits", referring to pedestrian hits that an amenity received in a round of simulation.This metric is mainly driven by the shortestpath routing system of Urbano, with the buildinglevel population data deciding the population coming from all origins and the data of amenity demand patterns deciding the population going to all destinations.It can measure local demands for different amenities and help decide the appropriate program under the context.
Furthermore, in the selected block as the test site for the new project, we run the same simulation with different program scenarios to explore their variance in the performances, including self-interests in receiving pedestrian hits and regional impact on nearby businesses (Figure 8).Accordingly, we can make decisions in the program selection with custom criteria.
In pursuit of a more explicit illustration of how the derived data works, we plot each amenity's activity pattern from the simulation result and compare them with the real data from the original Google popular times (Figure 9).Regardless of the different height, which represents the overall activeness of the business and is totally driven by the routing simulation and population data, the shapes of the lines that are variant in different amenity types are indeed determined by the amenity demand data derived in the previous chapter.The comparison confirms that the data is overall representative and can be concerned as a workable proxy for measuring human activities regarding amenity utilities, especially for the types with less variance in the business pattern, such as restaurants and banks.Nevertheless, this derived data depicts a general pattern over the whole Lower Manhattan region.If one would like to get a more

Scale of the study area
Caution is needed about choosing the scale of the study area, both for deriving the amenity demand patterns and for running the simulation as the case study.Under the first circumstance, there is a risk in misinterpreting the existing situation as implicit needs, which is especially problematic if we only consider a relatively small area.For example (Figure 10), if we only compute the data from the case region, the amenity demand patterns remain similar.But after downsizing the computing area into the quarter of the region, the result starts to change drastically, and some amenity types even start to have zero demands, which is questionable.Under the latter circumstance, since there is a cut-off border for the selected region and the simulation result near the border is less reliable, it is also inappropriate to use a toosmall study area.

Limitation in data sources
The accessibility and contents of open data largely differ among countries and cities, and there certainly exist other possibilities in processing data.For example, the dataset like PLUTO may not be available outside of NYC, while other cities may provide data that is not available for Manhattan as well (e.g. the city of Melbourne provides an open dataset of seating capacity of cafes and restaurants).Besides, some data has not been updated for years and may deviate from the most recent fact.Moreover, the user-generated data mostly relies on GPS on mobile phones (e.g.Google popular times data), which has the problem of coverage and bias.
While more efforts are being made to empower the urban database in the era of big data and smart city, the metric of amenity demand pattern and its deriving method proposed in this paper should also evolve with more advanced data in the future.However, intellectual merits will remain as it develops a new way of quantifying and measuring people's behavior in the city and links it closely to urban function and built environment so that it is more useful to urban planning and design.

Conclusion
"Cities have the capability of providing something for everybody, only because, and only when, they are created by everybody" (Jacobs, 1992).Urban planners and designers have long pursued a humancentric urban environment.The accelerated technical development and increased access to urban data are providing us a profound opportunity to better understand human preferences and behavior in cities.This paper aims to propose a data-driven metric of amenity demand patterns to help measure human activities in the local context.Constructed from the local data and user-generated contents, such metric has the potential to support the urban study about people, mobility, and built environment, as well as other relevant design thinkings.
Also, under the bigger technological context of extraordinary advancements in areas like big data and smart city, and emerging paradigms about empowering planners and architects with computational techniques, this paper shows how amenity demand data can be utilized and leveraged in active mobility simulation workflows and how these simulations can be used in the design decision-making process to achieve "better" design outcomes.
Figure 1 OSM has fewer amenity data points compared to Google Maps.The presented region is the same as the following case study region.(a) Cafe in OSM; (b) Cafe in Google Maps; (c) Pharmacy in OSM; (d) Pharmacy in Google Maps.
Figure 2 Diagram of data processing steps Figure 3 Use the interquartile range (IQR) to analyze the distribution of the deriving amenity area (sqm).(a) Hardware stores; (b) Pharmacies; (c) Restaurants; (d) Cafes.
Figure 4 Timeline graph for amenity demand patterns reveals spatiotemporal differences in urban pulse.(a) Weekday pattern in Lower Manhattan; (b) Weekday pattern in Paris; (c) Weekend pattern in Lower Manhattan; (d) Weekend pattern in Paris.
Figure 5 Framework of Urbano.io Figure 7 Simulation results in a 24-hour matrix for the selected rectangle region in Lower Manhattan Figure 9 Comparison of amenity demand patterns between simulation and real data.For each type, we randomly select 5 samples.(a)Real data; (b)Simulation result.
Figure 10 Amenity demand patterns of 6 sample types derived from different scales of the study area express significant variance.