The emergence of large stores of transactional data generated by increasing use of digital devices presents a huge opportunity for policymakers to improve their knowledge of the local environment and thus make more informed and better decisions. A research frontier is hence emerging which involves exploring the type of measures that can be drawn from data stores such as mobile phone logs, Internet searches and contributions to social media platforms and the extent to which these measures are accurate reflections of the wider population. This paper contributes to this research frontier, by exploring the extent to which local commuting patterns can be estimated from data drawn from Twitter. It makes three contributions in particular. First, it shows that heuristics applied to geolocated Twitter data offer a good proxy for local commuting patterns; one which outperforms the current best method for estimating these patterns (the radiation model). This finding is of particular significance because we make use of relatively coarse geolocation data (at the city level) and use simple heuristics based on frequency counts. Second, it investigates sources of error in the proxy measure, showing that the model performs better on short trips with higher volumes of commuters; it also looks at demographic biases but finds that, surprisingly, measurements are not significantly affected by the fact that the demographic makeup of Twitter users differs significantly from the population as a whole. Finally, it looks at potential ways of going beyond simple frequency heuristics by incorporating temporal information into models.
Estimating local commuting patterns from geolocated Twitter data
Graham McNeill, Jonathan Bright, and Scott A Hale