The field of data science appears to be fast developing within government.
I say this after our Government Digital Service (GDS) Show and Tell event where data scientists from different departments and agencies showcased a number of innovative projects, all using freely available tools.
There were interesting uses of social media. The Food Standards Agency (FSA) spoke about its use of Twitter to gain early warnings of potential Norovirus outbreaks. Signs of virus outbreaks can now be identified up to 4 weeks earlier than when the FSA relied on lab data, giving it more time to act and prevent the virus from spreading.
The FSA said it used social network Yammer as a crowdsourcing tool amongst its staff to identify keywords relating to the Norovirus. FSA staff identified over 70 keywords, and then whittled this list down according to how each word's popularity correlated with previous outbreaks of the Norovirus (the use of keywords change with each new outbreak of the virus). Factor analysis was then used to group the keywords into sets.
A logistic regression model was used to statistically assess whether each set of keywords had the capability to predict Norovirus outbreaks, and if so, how far ahead in the future they could predict.
Then the team used a receiver operating characteristic (ROC) curve, which is a common method for assessing the performance of a model, to help decide how sensitive to make the model. You don’t want your early warning system to keep raising false alarms, but you don’t want it to miss an impending outbreak either. The ROC curve helped the team optimise this trade-off.
Now the FSA is looking at other potential use cases for this type of analysis.
Meanwhile the Foreign and Commonwealth Office (FCO) is experimenting with some innovative data science methods to understand the networks of British ambassadors. By visualising the Twitter followers of ambassadors, you can see how their networks split into different groups. As detailed in a previous blog, this practice could possibly be extended to see how tweets pass through ambassador networks, assessing the effectiveness of FCO messaging.
GDS also demoed a number of its own projects, which are focused on simplifying the process of scraping data off the web and getting it ready for analysis.
A GDS colleague of mine, John Byrne discussed how he’s been using Python and XPath to scrape data from the web. I spoke about my use of Kimono Labs, a point-and-click tool that makes it very easy to turn a website into a structured data feed that you can access via any application programming interface (API).
The Show and Tell event was particularly helpful in sowing the seeds for a community of data science best practice, and the field will no doubt continue to advance in government as we learn from each other.
If you’re interested in coming along to the next Show and Tell, please let us know by writing in the comments below.
If work like this sounds good to you, take a look at working for GDS – we’re usually in search of talented people to come and join the team.