When I tell friends, family, other mums at the school gate that I’m doing a PhD, they all ask me what my research is about. I’ve been working on my answer:
I’m looking at ‘big data’ to see what it can tell us about risks to how and whether people travel for work during severe weather events. That’s the one sentence version.
Sometimes it’s better to frame it as a question: What do commuters do when a big storm disrupts their usual journey to work, is telecommuting a preferred option, and what does that mean for how we plan for the more frequently extreme weather likely in the future?
That last bit is the aim, the purpose, the endgame of my research and why I believe a research council has agreed to fund it. I’m looking for evidence that might suggest that governments, businesses, and communities change the way they plan for and invest in resilience to severe weather and its impacts on infrastructure and property. And I’m looking in the new world of ‘big data’ because that evidence needs to be as statistically significant and scalable as possible, not just anecdotal.
Well, I’m still chasing the really ‘big’ data, but I have recently acquired some data on the transport impacts of Storm Doris on Thursday, 23 February 2017 in the Reading urban area. Trees fallen down, billboards blocking roads. Trains and buses delayed, diverted and cancelled. My data is sourced from local news reports, Twitter, and passenger numbers from Reading Buses on the day and on a more ‘average’ Thursday for comparison.
A few quick calculations and the results were suggestive. Passenger numbers were down during Storm Doris. Routes affected by diversions and delays due to fallen trees or other debris saw lower ridership than the ‘average’ day, but then so did other routes without noted storm-related problems. Did people stay home? Travel virtually or cancel their activity entirely? Were there map-able patterns?
I noticed that there were some routes which gained passengers. Why were more Vodafone employees on their dedicated services? The numbers couldn’t tell me. Why were more people on the long-distance route to Wokingham and Bracknell and on the Park & Ride service in that direction? A likely answer is that as the trains were even more affected than the buses, some people may have decided to switch. In which case, thinking of my endgame, perhaps Reading Buses should build that likelihood into their emergency planning for that route, run more buses. But was there enough evidence of actual cause and effect, of probability of recurrent behaviour to justify such an operational response?
I’ve been thinking about how much more evidence I might tease out from public data sources or a little more data from the bus operator. Are there patterns in the individual bus trips where the loss or gain of passengers was particularly noteworthy or could be matched to service disruptions? Is it worth looking at the type of tickets, the stops along the routes to get an idea of the demographics of who did or didn’t take the bus? Did anyone tweet their intentions to switch modes, to stay at home?
Yet with every dive deeper into the data, the falling probability of demonstrating statistical significance echoes ever louder. The passenger dataset was less than 90,000 on the average Thursday, falling by over 4%. The numbers on individual routes, different ticket types, different times, quickly descended into the hundreds or tens, even on the popular routes. My recently refreshed, but untested and uncertain statistical skills are already struggling with how to make a more than anecdotal comparison between one average Thursday and one disrupted Thursday during one storm in one urban area. How do I show that the most basic null hypothesis – that the storm had no impact on passenger numbers – is extremely unlikely, never mind look at any route in more depth to propose emergency service tweaks to the operator?
I have to face it. It will always be an anecdote. But it could still be an anecdote with an endgame.