Effectively Using Data

March 7, 2014
How transit agencies can effectively use data to improve efficiencies.

Toronto, ON

Dennis Fletcher, Associate

Steer Davies Gleave

There are a number of issues related to making the most effective use of the new wealth of data from automatic vehicle and passenger count systems. In my experience, transit planners — especially those that have gained their experience in small and medium-sized systems — have often managed for years with very little data. They have learned to plan and refine their systems with, at best, an annual on/off count and point checks of schedule adherence collected manually. Faced with the onslaught of data they have never used before, one of the big challenges is understanding what can be done with the data and learning to ask new questions for which answers have never been available.

The creative planner with a good head for data can come up with countless good questions that can be answered by AVL or APC data, or a combination of the two. These I think are the best two examples — one for schedulers with AVL data and one for planners with APC data.

For AVL data the key is run-time calibration. Previously, schedulers had insufficient data for robust calculations. Now, the data permit detailed assessment of running times based on many thousands of records that allow schedulers to clearly identify clusters of common trip times as well as outliers, and whether there are really patterns in those outliers. A specific question that this data can answer is — for what stops do we show specific dwell time in the public schedules? And for those stops where dwell time is notable, but to be included in the trip time, do we include it in the link before the node? Or after the node? Including dwell time before the node means informing the passenger of arrival time, rather than departure time, thereby improving the chances a passenger catches your bus, and improving the perception of schedule adherence without moving the bus!

A key area for APC data is developing an understanding of linked boardings and alightings. Using a variety of algorithms, a vector of on-off data can be manipulated into an on-off correspondence matrix showing a probability of links between boarding points and alighting points for passengers. The volume of data in these calculations permits reasonable probability calculations that can be useful in identifying the impacts of, for example, planned construction delay or detours, branch structures, express or limited stops — the list goes on.

In both these examples, we have always been able to ask these questions, but we often did not either because we did not think of them, or the limited answers we got from limited data were not particularly useful so we stopped asking. Now, the new wealth of data expands the depth of our calculations, smoothing over the variations of limited data and ensuring robust, meaningful calculations.

Run time calibration and correspondence matrices — two of the most effective uses we can make of today’s data.

Ann Arbor, Mich.

Don Kline, Integrated Marketing Coordinator

The Ann Arbor Area Transportation Authority (TheRide)

In the last 10 years, ridership on Ann Arbor Area Transportation Authority fixed-route service has grown by about 2.2 million rides; an increase of more than 50 percent. Along with accommodating record ridership, adjustments need to be made to fit evolving traffic patterns and changing schedule needs throughout the community. Throughout the years, TheRide has used a combination of vehicle data, surveys, and peer analysis to help inform decisions on how to best serve current and prospective riders. This is seen more as adjusting than restructuring; developing a reliable service that more people can use in more places, more conveniently.

Residents in the greater Ann Arbor and Ypsilanti areas, as in many places, make housing choices based on transit access. While schedules and bus stop locations do require periodic adjustments, complete restructuring is rare, as major changes can often make living choices more difficult for those who rely on the service to reach jobs, schools and vital destinations. When major changes are needed, TheRide conducts public comment periods where residents can share suggestions or concerns. If that input does not reveal any large, universal issues with the proposed changes, TheRide notifies the community well in advance of implementation. Service is routinely monitored, and as needed, incremental adjustments are made to align service capabilities with changing needs.

TheRide also regularly measures how it performs on a broad range of metrics while gauging current performance and identifying opportunities for improvement through objective peer comparisons with similar mid-sized public transportation agencies. In the latest National Transit Database report, TheRide, whose cost per passenger is approximately 17 percent lower than the peer median, led 20 peers in a variety of categories: 

  • TheRide’s cost per passenger is 16.8 percent lower than the peer median
  • TheRide carries 66.7 percent more passengers per mile than the peer median
  • TheRide operates in a service area 5.2 percent larger than the peer median
  • TheRide reaches a population 17.3 percent smaller than the peer median
  • TheRide’s service area density is 6.3 percent higher than the peer median
  • While TheRide’s cost per service hour is 17.8 percent more than the peer median, it carries 49.6 percent more passengers per hour than the peer median
  • TheRide carries nearly two million more passengers per year than the peer median
  • TheRide provides 20.5 percent more service hours and 14.4 percent more service miles than the peer median

Boston, Mass.

Santosh Mishra, Senior Transportation Planner

TranSystems Corp.

Transit intelligent transportation systems' components generate a large volume of data which typically are collected and archived in individual databases of the systems that generate the data. The extent of data collected generally depends on agency operational characteristics, such as route configurations and the length of service hours, and the ITS systems’ configurations such as data refresh rates for vehicle location and operational events such as route and schedule deviations.

One of the key applications of archived ITS data, specifically from computer-aided dispatch/automatic vehicle location, automatic passenger counters and electronic fareboxes, is in the area of service analysis and restructuring. Traditionally, agencies have relied upon costly manual data collection for such studies but the scope of the analysis often is limited due to the low sample-size. The availability of archived ITS datasets at desired spatial and temporal granularity allows comprehensive analysis for specific time periods and operational scenarios. Agencies can implement in-house or third-party tools to utilize these datasets through analysis of:

  • Archived AVL and APC data to determine the variability in running times by route and route segments, reliability of scheduled layovers and cycle times, reliability of scheduled headways, and productivity of route segments
  • Operational anomalies using archived CAD/AVL data to determine the root causes such as driver behavior, intersection delays and long dwell times at stops, and actions such as developing driver training tools, relocating stops and implementing off-board fare payment or stop consolidation
  • Ridership and revenue data available from APCs and fareboxes and their correlation with externalities such as weather
  • Passenger demand and vehicle capacity, including crowding and adjustment of services such as using high-capacity/low-floor buses on specific routes
  • Transfers using CAD and farebox data to determine potential improvements in schedules to enhance connection protection at transfer points
  • Impact of operational changes, such as modified headways on passengers such as passenger wait times at stops
  • Improvements in regional and multi-modal coordination through, for example, the identification of unproductive route segments that may be better served by other regional partners

Further, AVL/APC data can be combined with other datasets such as video surveillance, weather, accident/incident, and traffic based on time and location to allow additional “what-if” analysis. However, ITS data quality is of the utmost importance. Quality issues can be caused by user negligence or systemic issues requiring post-processing of the data before it becomes suitable for analysis.  

The use of ITS data for planning has gained popularity in recent years as agencies are taking advantage of ever-evolving data management and data analysis techniques. Further, this positive trend is encouraging more innovative “mash-up” of ITS data with social media and other datasets. 

Information-rich ITS data, with potential significant annual cost savings in the order of magnitude of thousands of dollars, offers a compelling alternative to conventional data collection. Sometimes unique agency system environments require specific methodologies to harness the power of the ITS data. However, an effective utilization of this data not only improves restructuring analysis but also maximizes the return-on-investment in ITS technologies.  

Atlanta, Ga. 

Cy Smith, Founder and CEO


When it comes to vehicle travel data, we live in a time of riches: federal, state and local transportation planners have access to more data than ever before.

Planners can use that data to develop better travel strategies for several issues:

  • Inefficiencies in current transit systems
  • Service issues or complaints
  • Planned development or facilities
  • Changing demographics and travel patterns
  • Shifts in employment or tourism
  • Changes caused by natural disasters
  • Planning for special events

Whether working within a large, multi-modal system or a single-mode municipality, there are three critical considerations:

  • Types of data available
  • Existing and/or available analysis tools
  • Identifying the data you need and understanding how to work with it

Newer technology, such as cellular data aggregators, can provide a trove of information, including:

  • Origin-destination
  • Arrivals and departures
  • Trip matrices
  • Activity density within a zone or other area
  • Length of stays at a location or in a zone

That data creates new possibilities for planners, environmentalists, engineers and others.

For example, most planners use software to build models of an area. These models take existing data from surveys or the census and generate projections based on what-if scenarios. Historically, modelers have worked with synthetic data, forecasting travel based on household or employment data. With more comprehensive, near-real-time data available, software models can now generate answers based on current, validated information.

In Dubuque, Iowa, IBM is using cellular data to create the first Smarter City. By aggregating and analyzing data from 15,000 mobile phones, IBM designed new, efficient bus routes linking previously under-served areas of the city. The new transit routes optimize performance indicators — such as average journey time, headways and wait time — which will help transit agencies meet demand and save operating expenses. According to IBM, the project redefines public transit planning and can serve as a template for any U.S. city because it allows for rapid and easy replicability. The effort was recognized with a National Association of Development Organizations’ 2012 NADO Innovation Award.

If IBM’s Smarter City shows what can be done within an existing system, Sacramento experience shows how cellular data can help plan for change.

The Sacramento Kings basketball team wanted to replace its suburban location with a new, downtown arena. Planners captured trip origin and destination locations of people attending several games at the current arena, as well as travel patterns and system capacity for the proposed location. Planners then built a regional travel demand model to calculate average travel distance for both locations. The data also helped determine likelihood of an attendee taking light rail, bus or walking/biking versus driving.

Previously, the Kings would have used zip code data from season ticket holders, which would have shown where fans lived, but not necessarily where they traveled from for games. Dubuque would have done a cost-benefit analysis and concluded that extending mass transit to under-served areas wasn't economically feasible.

New data opens up new possibilities, enabling planners to ask what-if questions and receive more accurate answers than ever before.

Pittsburgh, Penn.

Edward McDonald, Principal

Signaling Stantec

The amount of data available from every aspect of our lives grows exponentially every year. This is also true for the rail transit industry. Rail transit data is collected for rail stations, vehicles and track facilities; ridership is measured by automated fare collection systems; and passengers receive next-train time notifications for their ride home.

And yet, all of that only touches the surface of the data that is available and how it could alter future passenger service. As more trains are equipped with Wi-Fi, the amount of personal data freely available to operators grows Smartphones with GPS and other Wi-Fi-equipped devices provide a way for Wi-Fi-ready agencies to gather and evaluate data on an individual basis, rather than for just ridership in general. 

This type of marketing is new and not fully developed; it is not yet clear how all of this passenger data could be used while also maintaining individual privacy. For example, it would be interesting to know the regular transit-use patterns of riders and their preferences, which could present some data-driven ways to plan for the future. Smartphones are equipped not only with Wi-Fi and GPS capabilities, but also with the capability to identify you (or to track your individual patterns). Knowing where people are going beyond a particular transit station — the market, a restaurant, a play, or even transferring to another mode of transportation — could be tracked and stored for future analysis. 

So just what can be done with all of that data? A transit authority can determine travel patterns and identify the impacts of new and existing businesses or the flow of passengers and the frequency with which passengers use various transit modes and routes. This analysis could then be used to help intelligently plan an extension to a line, or to revise the schedule of connecting modes of transportation to move passengers more quickly. The data could also be used to encourage retailers and businesses to advertise on certain trains or bus routes. 

This same data collection infrastructure can be used in reverse to distribute data to an agency’s patrons. Patrons could possess the tools to perform instant trip planning, they could also be advised that their favorite store on the route is closed, that the bus they were going to connect with is 20 minutes late, or the bus or train they’re riding is going to be late.  

Today’s intelligent systems are a product of the information age and our desire to know more and do more. Can agencies gather and use this data effectively? Will Wi-Fi users knowingly risk losing their anonymity? Maybe, depending on what they get in return. If the transit authority can make a passenger’s experience better by knowing more about that passengers , perhaps so.  The return on identity investment will determine the success of this technology’s impact on mass transit.  

Wade Rosado. Director of Analytics

Cubic Transportation Systems

Large volumes of vehicle and passenger data are only the starting point. As “big data” tools are applied to transit, planners can now combine existing data with even larger sets of information from other sources. A new generation of analytics and visualization tools will enable them to get insights into what the massive amounts of information is telling them. The upshot is they will actually be able to predict what is likely to happen under different scenarios, making them far more effective at planning and operating their transportation networks than they are today.

Traditional techniques for analyzing vehicle data have focused on what happened in the past, and in some cases what is happening now. Analyzing inputs and outputs — X people boarded at point A and Y people disembarked at point B — from static reports leaves transit planners guessing about how people actually travel on their entire journey.

With better tools and more dynamic views of data, planners will be able to link trips and consider whole journeys. Their planning will be able to incorporate the relationship between origin and destination and across all modes of travel, not just the transit system.

Armed with broader and deeper insights, planners can more effectively balance demand against capacity when restructuring services. For example, knowing the origins and destinations of passengers’ journeys lets planners determine not just how many people will be impacted by a schedule change, but also what alternative routing options exist to determine the real impact on those affected. That insight lets planners deeply assess the real customer travel implications of the changes they make.

Using predictive analytics, planners will also be able to model policy decisions, such as changing the pricing scheme of the transport network, including transit and other city resources such as bridge tolls or parking, to help optimize the allocation of resources and improve throughput.

Data visualization is equally important to achieving these goals as new analytics tools and massive cloud data stores and number-crunching computers produce increasingly rich data sets. Through the application of innovative state-of-the-art visualization techniques, analysts and planners will be better able to derive actionable insights from these massive data sets, model different scenarios and then more effectively communicate the conclusions to stakeholders.

Big data has already begun transforming other sectors like retail and logistics, but it is early days for its use in the transit industry. Nonetheless, the potential gains from it are truly significant.

The transit business is characterized by high fixed costs and challenging logistics. Operators and planning authorities need tools to identify cost drivers and the interrelationship between policies, pricing, demand and their services. At the same time, they have to continually reassess their network resource optimization.

The new tools coming with the movement to big data will put the large volume of information available to better use, to ensure that the best possible travel experience can be delivered at a sustainable cost that is aligned with financial targets. These insights will help shape operational and strategic decision-making as well as capital planning, and help more clearly depict consumer behavior and choices, and generally aid transport operators in achieving an optimal allocation of resources.