The U.S. Census Bureau
provides an astonishing array of fine-grained statistics on population and jobs along the peninsula rail corridor. When thinking about the future of peninsula rail service, and especially in deciding quantitatively how good
a proposed timetable might be, or where stations should be placed, or how HSR should mesh with Caltrain in a 'blended' scenario, the basic consideration should be where people live and work.
Annual ridership counts
provide one way of planning your timetable: simply add more service to the stops that get a lot of ridership. This becomes a self-fulfilling prophecy with ridership patterns becoming distorted by the timetable, as observed with the Baby Bullet Effect
. Teasing apart the timetable-induced distortion from the underlying (and often untapped) ridership demand is impossible, so it is necessary to go back to the raw population and jobs data to build the full picture. That is where the census really delivers.
Where People Live
The 2010 census
provides the most recent snapshot of the population distribution on the peninsula, on a block-by-block basis that includes over 45,000 locations in the three Caltrain counties. By tallying up how many people live within 1/4, 1/2, 1 and 2 miles of each Caltrain station location, you can build Figure 1. This chart reveals where the population is densely concentrated around stations (e.g. San Mateo), or sprawled out (e.g. Sunnyvale).
Observations on the population numbers:
- The new Oakdale station long proposed by San Francisco (with little support from Caltrain) could tap into more residential population than just about any other stop along the peninsula, or even 22nd Street.
- The population density doesn't suddenly drop off at the southern end of the Caltrain-owned right of way in San Jose, where service suddenly drops off. There are large concentrations of under-served population within a mile of the Tamien and Capitol stops, accounting for more than 3 times as many people as live within a mile of the San Jose Diridon station.
- A stop like Broadway (Burlingame) with zero weekday rail service has more people living near it than Millbrae, site of the all-important BART intermodal station. Other stations with poor Caltrain service (San Antonio, Cal Ave, San Bruno, Burlingame, Belmont, Santa Clara) have more people living nearby than stops with the best service, such as Palo Alto.
To assign to each station location a single weighting factor that quantifies that station's accessibility for nearby residents, regardless of distance, one can sum up each person divided by the square of how far away they live. This inverse-square relationship is empirical, but captures the fact that people who live far away from a station are less likely to use it; its use in ridership modeling is not unprecedented. A 1/r law would fall off too slowly, with the same number of people using the station from 1/2 mile away as 2 miles away (assuming constant population density). A 1/r cubed law would fall off too quickly, with only 1/16th as many riders from 2 miles away as from 1/2 mile away. As it turns out, the precise value of the exponent--if not exactly two--doesn't really drive the relative weights that strongly. Only one small tweak has been applied to prevent people who live very close to a station from skewing the results: anyone living closer than 1/4 mile is considered 1/4 mile away. The resulting inverse-square population weights for each station location are shown in Figure 2.
Where People Work
The Census Bureau publishes extensive statistics on local employment dynamics
, providing block-by-block data on the number and distribution of jobs, pay levels, and industries. The latest data set
as of this writing is from 2009 (based on geographical data
from the 2000 census covering over 32,000 locations in the three Caltrain counties). The analysis presented here is based on raw data files, but the data can also be analyzed interactively using the Census Bureau's On The Map
application. Figure 3 shows how many jobs are located within 1/4, 1/2, 1 and 2 miles of each Caltrain station location. Only the jobs worth more than $40k a year are shown, since lower-income jobs are less likely to require commuting (only about 15% of Caltrain riders earn less than $40k, and the average household income of a weekday peak Caltrain rider is over $100k
Observations on the jobs numbers:
- Not so surprisingly, there is a concentration of jobs in the vicinity of the future Transbay Transit Center, adjacent to the financial district. What is more surprising is just how massive that concentration is: Transbay has more jobs within a half-mile radius (over 100,000) than all the other Caltrain stations combined, from 4th & King all the way down to Gilroy!
- Job sprawl shows up in Santa Clara and southern Palo Alto (and most of Silicon Valley, really) in the form of few jobs near stations but many jobs within a mile or two. Mountain View, despite its status as a major Baby Bullet stop, and home of Google, is not a particularly large job center.
Again, assigning to each station a weighting factor that quantifies that station's accessibility to nearby jobs, we apply the same inverse square relationship to obtain the job weights for each station location shown in Figure 4. Note that Transbay goes way off the chart.
The Ridership Potential Matrix
Since 86% of riders
during the weekday peak are commuters, the distribution of population and jobs can be used to construct a relative weight for the ridership that could potentially
be generated between any given origin and destination (O&D) pair. This is the ridership potential matrix. The eventual purpose of this matrix is to help derive a single figure of
merit for timetables, on an apples-to-apples basis, for how much of the potential ridership is tapped based on the service metrics for each O&D pair
. When considering any given timetable, this weighting scheme ensures that O&D pairs that have a lot of
population and jobs at each end (such as 4th & King and Palo
Alto) are given more importance compared to O&D pairs with lower
population and fewer jobs (such as Atherton and Bayshore).
It is important to note that this ridership potential matrix is completely independent of how each O&D pair is connected by rail service; it holds true for any timetable. It is solely a product of census data and the geographic location of each station. A timetable must then be designed to unlock the maximum potential ridership.
The ridership potential matrix works like this: take for example station 1 and station 2, with respective population and job weights P1, P2, J1 and J2. The weight for morning peak trips from origin 1 to destination 2 is P1*J2 (for people living near station 1 and working near station 2). Conversely, the weight for morning peak trips from origin 2 to destination 1 is P2*J1 (for people living near station 2 and working near station 1). When you multiply all the population weights from Figure 2 by all the job weights from Figure 4, you get a basic ridership potential matrix. But there's a bit more to it than just people and jobs.
Regardless of where people live and work, there are upper and lower limits to how far they will typically commute by rail. Extremely short trips are less likely because of the overhead of access and egress to and from the station, at each end of the journey. Conversely, extremely long trips are less likely because of their sheer duration; regional commute patterns are not just a factor of train service considered in isolation, but also driving times. That's why we will make the assumption that the distance distribution of commutes, generally speaking, is independent of the quality of train service--and that no foreseeable rail service pattern could significantly alter it. Good service might lead to greater market share for rail, but the underlying distance distribution will be assumed not to budge. This allows us to apply a (timetable-independent) distance distribution to the ridership potential matrix.
Caltrain ridership surveys
show that the average trip length on the peninsula rail corridor during the weekday peak is about 25 miles. The distance weighting function will be modeled as a Rayleigh distribution with a value of 0 at 0 miles and a peak of 1 at 25 miles-- for no particular statistical reason other than it ends up looking about right, as shown in Figure 5.
Each element of the ridership potential matrix is now the product of three factors: the distance weight based on the distance between origin and destination; the population weight at the origin station; and the job weight at the destination station. This simple formulation yields the morning peak values shown in Figure 6 as a bubble graph (numerical values are available as a tab-delimited text file
). The evening peak is described by the transpose of the matrix, i.e. origin and destination switch places. The distance-weighted ridership potential matrix is now ready for use in the quantitative analysis of past, present and future timetables, a topic that will be covered in upcoming posts revisiting the topic of service metrics
In the meantime, we can explore other interesting aspects of the ridership potential matrix. For example, summing the nth row together with the nth column of the matrix allows us to build a single weighting factor for the potential ridership at each stop including both the morning and evening peaks, i.e. a measure of the ridership distribution that could
exist if it were tapped with excellent service, shown in Figure 7. These weights can then be compared to the actual Caltrain ridership realized in 2011, yielding the scatter plot in Figure 8. This comparison provides another more fundamental way (much better than historical ridership patterns
) to visualize which groupings of Caltrain stops are under-served, and is amazingly accurate considering that it was constructed without ever looking at a timetable
- Access to Transbay would provide a step-change improvement in Caltrain service, with probable ridership gains of more than 25%. Terminating any weekday peak train at 4th & King, as is inexplicably planned by Caltrain, is a huge mistake. Agency turf battles with BART and the CHSRA regarding whether or how to pay for the downtown extension tunnel, and how to share platforms at Transbay, must be fought and won.
- Underlying ridership demand is not accurately reflected by realized ridership, which suffers from severe timetable distortion. Future service planning, and in particular the timetables assumed for the ongoing 'blended' operations analysis, must be based less on realized ridership and more on fresh census data--even if not using the simplified approach described here.
- For the same reason that every Caltrain should serve Transbay (the huge concentration of jobs in San Francisco), HSR service that does not provide a one-seat ride into Transbay is a non-starter.