All Aboard (Eventually)

by Willem Klumpenhouwer

Sunday, December 01, 2019

There's a lot that can be learned from looking at the data.

Transit systems collect, produce, and publish a huge amount of information on their operations. Real-time information, schedule updates, vehicle locations, and social media posts are part of the way in which transit operators try to provide a sense of certainty about their operations. Passenger railways are no exception.

VIA train waiting for a CP Train

A VIA Train waits at an interlocking for a CP train to pass. Operational challenges like this one cause unreliability across VIA's busiest corridor. [Larry Broadbent]

Many agencies fail to use the data they collect to really understand their system, figure out where things are going wrong, and find ways to fix those issues. Data is used primarily for customer service and short-term operational adjustments, typically in near-real-time.

There's a huge opportunity for public transportation agencies to leverage their data for what I like to call "strategic operations planning". Looking in detail at the fine-grain information that is collected, patterns can start to emerge. We can identify reliability bottlenecks, operational challenges, and performance issues.

To do this properly, we need to look at the data in multiple ways. It's not enough to publish a single performance statistic like "87% of trains depart on time"1, which only leads to more questions: What is 'on time'? Is this at each station, or only the first departing station? Statistics like this lead to confusion and potential distrust in the reporting and analysis of the agency.

Instead, the data needs to tell a story. With growing attention on urban and regional transit, there is an opportunity for these agencies to push for resources to improve their service, in particularly their reliability. In a complex environment of competing interests, politicial priorities, and space, agencies like VIA rail are going to need the data to back them up.

The story that follows is about VIA Rail, about reliability, and about data. But it's also about inspiring all transit agencies to leverage the data they have to tell a story.

Let's get started.

Note: lots of the visuals you're about to see are interactive. Feel fee to click/tap/hover over the graphics for additional information. While this article is best viewed on a non-mobile screen, a lot of the features still work for mobile devices. Happy exploring!

Being a passenger railway in North America is tough.

For decades, cities, regions, and countries in North America have prioritized the smooth and efficient movement of the private automobile. We have designed our transportation system to maximize the freedom available to a driver, often at enormous expense. Other forms of transportation have faltered, often unable to compete with this attention. Passenger rail in North America is no exception.

It's not all bad news: As cities and regions begin to wake up to the phenomenon of induced demand, people are turning to trains once again as a space-efficient, sustainable, and safe way to travel. Amtrak is set to make a profit, and VIA has reported steadily increasing ridership.

In Canada, VIA Rail faces a unique challenge. Canada's national railway (and indeed most passenger rail in Canada) runs on tracks primarily owned by someone else. Canada's two big freight railways, Canadian Pacific and Canadian National, often prioritize their trains and their business over their guests'. As a result, VIA trains struggle to stay on time.

A map of VIA rail's Corridor corridor

Via's "Corridor" corridor. [Wikipedia]

This is a story about that struggle.

Though VIA is a national railway, the only place where its service provides an arguably feasible alternative to the automobile is in in Ontario and Québec, and so we will turn our focus to the oddly (or brilliantly?) named Corridor corridor, which serves Southwestern Ontario, the Greater Toronto Area, Ottawa, Montréal, and Québec City.

For a while I've wanted to better understand how bad VIA's struggle with reliability really is. And so I started gathering data. VIA provides scheduled and actual train arrival times to keep customers informed, and so I have collected these published times over the course of a year, from November 2018 to October 2019.

Here's the present situation.

As an introduction to the data we are working with, here' a real-time snapshot of how VIA is doing right now. Each square represents an active train somewhere in Canada, colour-coded by their delays: Less than 15 minutes, 15 to 30 minutes, 30 to 60 minutes, and more than one hour. Grey trains are missing data.

We need to discover patterns in the past.

Real-time information is great when you're keeping an eye on your train, but it doesn't help you when you're planning your journey, deciding whether to take the train at all, or even chosing where to live and how to commute. What we need is to peer into the past, and see if we can discover some trends about reliability. How dependable are the trains? Which route is the most predicable? How likely am I to be late at my destination if I take the train? As a potential passenger, the answer to these questions can affect whether we take the train at all. As a transit agency, the answers to these questions can help build a case for improvement and change.

Here's all of the train arrivals in the Corridor on the week of March 4, 2019, using the same colour-coding system as above.

It's easy for the brain to pick up on patterns in a gird like this, but it can be misleading: The cluster of black near the top is the same couple of very late trains arriving at subsequent stations. To get a better picture, squint your eyes and decide for yourself: is there a reasonable amount of green?

Let's summarize what each of these week-grids tell us over the course of the year.

Service is inconsistent throughout the year.

Categorizing each of those squares by their colour over the course of the whole year, we can see what percentage of trains arrived late on the Corridor. Early February 2018 and June through September of 2019 were particularly bad months for reliability, with over 40% of trains arriving late during some weeks. The spikes in very and quite late trains tell us that the delays were caused either by slowdowns along the entire trip (extreme heat and winter storms, for example) or by major operational delays (breakdowns, waiting for oncoming freight trains to clear).

Not all trains are equal.

We have been looking at the system as a whole, but what if these issues are not systemic across the whole Corridor? If we look at the montly average delays for each train over the year, we can see that some trains stay relatively flat throughout the year, while others fluctuate wildly.

Let's untangle the mess above. If we look at delays for each station, we can see some intuitive patterns. Terminus stations (Sarnia, Windsor, Toronto, Ottawa, Montréal) have much higher average arrival delays than departure delays. This suggests that delays to trains are typically happening en route, rather than being caused by late departures at the start of the route (which is why the "87% of trains depart on time" statistic could be misleading). Stations along the Toronto-Montréal corridor have the highest occurrences of late trains, with over one quarter of all trains arriving more than 15 minutes late!

This grouping of stations along the Toronto-Montréal and the Kitchener corridors point to a corridor-specific problem. If delays are happening between stations rather than at stations, we need to look at how much a train's delay changes moving from one station to the next. While these delays add up to the problems we're seeing above, if we can find where the biggest delays are happening, these can be the areas to focus on.

The weakest links are location and direction specific.

This corridor map shows directional links colour coded by their average delay, with darker segments indicating larger average delay increases. Click/tap a station to learn more.

This map can tell detailed stories about the root causes of delays. Here are some examples:

Trains between Guelph and Kitchener tended to run early, while trains between Kitchener and Stratford lost 8-9 minutes, on average. This may be due to track improvements on the first segment due to GO expansion, while track between Kitchener and Stratford has continued to decay, leading to slowdowns.

Trains from London to St. Mary's were often delayed significantly (8 minutes on average) while trains in the opposite direction tended to run ahead. This is due to VIA trains often having to wait for oncoming traffic (including other VIA trains). Limited passing options exacerbates the problem.

Eastbound trains on the Toronto-Montréal corridor experience more delay en route than westbound trains. The most probelmatic stations in the list above are the ones on this corridor, where trains delayed in both directions are ariving chronically late. It's a no wonder that VIA is pushing for a dedicated corridor between these two cities.

Take the data further

We've seen what we can learn, quantify, and prove from analysing just a small amount of public data. Transit agencies are sitting on huge amounts of data, with more stories waiting to be told.

Thanks for reading.

References and Inspiration