Anything but Average

Posted on November 9, 2017 by HD Budnitz

Back in July, I wrote about transport planning for places, rather than individual modes and ‘networks’. Last month, I wrote about transport planning to accommodate the needs of people, rather than the temptations of technology. Last week, I spoke about both at the South West England regional conference for transport planning. Planning, including transport planning, is by definition about looking towards the future and how we create better places than we have now that improve the quality of life for the people in those places.

Yet in preparing my presentation for last week, and in listening to some of the other presentations, I realised that transport modelling, forecasting, and thus planning have yet another loadstone to cast off before they can ‘help shape a better world’, another challenge besides remembering that the best transport planning invisibly serves people and places. And that weight is the weight of averages.

As a methodology for representing individual behaviour, the average, the ‘usual’, falls woefully short. It ignores the steps people may take to be sustainable or exercise more unless they do so more than half the time being measured. It glosses over the people who do not have the same destinations to access on a daily basis. It downplays the regular, but infrequent patterns of linked trips to visit family or participate in other activities that induce diversionary routes once a week or once a month. It gives no thought to how some people may react to increased risk, delay, or disruption due to severe weather, planned events, unplanned incidents, scheduled repair works, or even terrorist threats.

To plan for local contexts, the average assumptions about how people travel to, from, and within areas of particular land uses can easily miss the diversity of options, variety of economic drivers, and cultural preferences in different places. If most traffic and transport models, whether to assess the impacts of new developments or to inform investment decisions with a cost-benefit ratio, are based upon data collected on average dates for an average population and average land uses, it is no wonder that transport planners are still living in a ‘predict and provide’ paradigm. Nor is it surprising that those predictions often turn out to be wrong.

Way back in March, I wrote about Visions of the future of transport and society developed through scenario-planning techniques. I’ve read academic articles advocating scenario planning in order to address the uncertainties we face. But the key to scenario-planning is not only to think about how people behave and how places might take shape, but also to consider a spectrum of possibilities. A spectrum that encompasses extremes, which in turn do allow for hybrid possibilities, but not averages.

This is where big data and new technologies and ‘smart’ infrastructure can help. Algorithms might still regress data back to averages, but that data, those sensors, the digital trail we all leave in our wake like high-tech breadcrumbs , can also give us a much better understanding of extremes than we’ve ever had before. No longer dependent upon snapshots or cross-sections, planners can take a long view and find the patterns of flexibility that better represent the lives we all lead. Instead of predict and provide, let’s propose and future-proof. Because the future is unlikely to be any more ‘average’ than the present.

Big Data Busting

Posted on December 9, 2016 by HD Budnitz

You’ve heard the term before. Maybe from me. Big Data. It’s a catchphrase of our time. But have you ever asked what it means? Google’s search engine defines it as a noun referring to “extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions.” And Google should know, right? It’s their day job.

But I had another definition proposed to me at a workshop on the topic last week. Roger Downing of the Hartree Centre in Warrington, part of the Science and Technology Facilities Council, described big data as datasets that were “uncomfortably large to deal with on a single machine”. That’s one of the reasons why the Hartree Centre exists and why I and a group of other PhD students were being treated to a workshop on big data there – they have plenty of machines to deal with the datasets comfortably. But over the course of the week, I began to wonder whether big data was about not just the size of the datasets, but also the data analysis decisions that may be uncomfortable for individual humans to deal with.

Certainly the volume of data and the speed with which it’s generated is staggering for humans or machines. Even though it has to be translated at some point into a plethora of ones and zeros, the datasets themselves are made up of numbers, measurements, text, images, audio and visual recordings, shape files and mixed formats collected and stored in a variety of computer programming languages. The datasets come from sources around the world and are produced by scientists, machines, transactions, interactions and ordinary people. Therefore, it is no surprise that some of the data is meticulous, some is missing and some is mendacious.

And all of it only has value if it can be analysed in such a way that can help people in society make better decisions more efficiently and achieve their goals, whether they be health and well-being or the bottom line. So if the analysis is uncomfortable for a single machine, then big data analytics requires tools that enable ‘cluster computing’ with processing in parallel and allowances for ‘fault-tolerance’ or duplication of original and subsequent datasets so that information is not corrupted or lost during processing. The performance of such tools are designed and judged for speed, efficiency, ease of use, compatibility and unity, i.e. the more different data types the tool can handle, programming languages it can interact with, and variety of output it can produce within a unified framework, the better.

Of course tools must be used by well-trained data scientists, because the analysis of data and its value depends upon asking the right questions. Those right questions are most likely to be asked if data scientists not only have statistical and computer science skills, but also expertise in their area of study and a combination of creativity and curiosity that seeks new paths for research. Which again, is why we were there, as it is felt in some circles that it may be easier to offer training in statistics and computer programming to those working and researching within specialist areas than to train statisticians and computer scientists in all the disciplines they may encounter in their work with big data. Furthermore, patterns and predictions coming out of big data analysis are not helpful if the data has not been cleaned first and checked for its accuracy, consistency and completeness, a much easier task with specialist knowledge at your disposal. Machines cannot learn if they are not trained on structured and then validated data. And people cannot trust the output without control over the input and an understanding of how data was transformed into information.

And so there is the issue of comfort again. The technology now exists to economically store big datasets and try to merge them even if there is no certainty that added value will result. Machines analyse big data and offer potential audiences instead of actual ones, probabilities and levels of confidence instead of facts. Machine learning and cognitive computing utilise big data to create machine assistants, enhancing and accelerating human expertise, rather than machine workers, undertaking mundane tasks for humans. Thus we enter a brave new world. But I still can’t say I’m entirely comfortable.

Data x3

Posted on July 6, 2016 by HD Budnitz

Data, Data, Data. Does it have the same cachet as Location, Location, Location? Big data. Open data. Standardised data. Personal data. If it doesn’t yet, it soon will.

I attended the Transport Practitioners’ Meeting 2016 last week and the programme was full of presentations and workshops available to any delegate with an interest in data, including me. With multiple, parallel sessions, I could have filled my personal programme twice over.

Transport planning has always been rich in the production and use of data. The difference now is that data is producing itself, the ability for the transport sector to mine data collected for other purposes is growing, and the datasets themselves are multiplying. Transport planners are challenged to keep up, and to keep to their professional aims of using the data for the good of society.

The scale of this challenge is recognised by Research Councils and is probably why I won a studentship to undertake a PhD project that must use big data to assess environmental risk and resilience. Thus my particular interest in finding all the inspiration I could at the conference.

Talk after talk, including my own presentation on bike share, mentioned the trends in data that will guide transport planning delivery in the future, but more specific sources of data were also discussed.

Some were not so much new as newly accessible. In the UK, every vehicle must be registered to an owner and after 3 years must pass an annual service, called an MOT. A group of academics has been analysing this data for the government in part to determine what benefits its use might bring. Our workshop discussion at the conference on this agreed the possibilities were extensive.

Crowd-sourced data, on the other hand, could be called new; collected on social media platforms or by apps like Waze. Local people using local transport networks share views on the quality of operation, report potholes, raise issues, and follow operators’ social media accounts to get their personalised transport news. This data is the technological successor to anecdote; still qualitatively rich, but now quantitatively significant. It helps operators and highways authorities respond to customers more quickly. Can it also help transport professionals plan strategically for the future?

Another new source of data is records of ‘mobile phone events’ – data collected by mobile phone network operators that can be used to determine movement, speed, duration of stay, etc. There are still substantial flaws in translating this data for transport purposes, particularly the significant under-counting of short trips and the extent of verification required. However, accuracy will increase in time, and apps that are designed to track travel such as Strava and Moves can already be analysed with much greater confidence.

Even more reliable are the records now produced automatically by ticketing systems on public transport, sensors in roads and traffic signals, cameras, lasers, GPS trackers and more. Transport is not only at the forefront of machine learning, but the ‘Internet of Things’ is becoming embedded in its infrastructure. Will such data eventually replace traditional traffic counts and surveys, informing reliable models, accurate forecasts and appropriate interventions?

It is certainly possible that we will be able to plan for populations with population-size data sources on a longitudinal spectrum, rather than using sample surveys of a few hundred people or snapshots of a short period of ‘neutral’ time.

However…

Despite attempts to stop it (note impossibility of ignoring Brexit in any field; its shadow hung over the conference proceedings), globalisation is here to stay and data operates in an international ecosystem. Thus, it cannot be used to its full potential without international regulations on sharing and privacy and standards on format and availability.

Transport planners also need the passion and the skills to make data work for us. Substantial analysis of new datasets is required to identify utility and possibility, requiring not only statistical and modelling training, but also instruction in analytical methods. People with such skills are in limited supply, as is the time and money for both training and analysis of new datasets.

Therefore, perhaps the most important lesson is that sharing best practice and successful projects that employ data at conferences like TPM2016 is more important than ever.

Go-How

Transport Planning and Research by HD Budnitz

Tag Archives: big data

Anything but Average

Big Data Busting

Data x3