Usually at our Data Science Show and Tell events there are presentations on successful projects. This time we decided to turn that idea around and look at ‘celebrating our failures’.
Why? Because in our day to day data science work failure is always a possibility; it’s how we learn and avoid more serious failures in the future. The term ‘failure’ can refer to a range of circumstances from a simple code issue, like the misplacement of a bracket, to a fundamental misunderstanding of the problem you are trying to tackle. Here are some examples of what we learnt at the show and tell session.
Predicting hospital demand using meteorological forecasts
John Minty from the Department for Work and Pensions (DWP) talked about a project from a previous job that tried to use weather data to predict hospital demand. John learnt the value of asking for the right data; at the Met Office, a longer-term ‘climate’ forecast and a short-term ‘weather’ forecast provided similar data, but meant very different things! John also found that the work did not give the high impact insight that was hoped for as the hospital planners wanted very detailed demand forecast models, but only a broad one was statistically robust.
Economic Growth dashboard: the limitations of DC.js
Commas can hurt you - heed your R data import warnings!
Dave Bonfield from BIS (and recently on our Data Science Accelerator programme) explored the issue of ignoring warnings during a data import. Often, the first few rows of your imported data may look correct, but a stray comma later in the CSV file can ruin the rest of your dataset. His advice to avoid this problem was simple, but important: listen to warnings and check the end of your data not the start. We usually check the ‘head’ of a dataset on import, but rarely the ‘tail’ - which in the case of a CSV file can create all sorts of problems further on!
Fighting the 2D user experience for data discovery
Adam Beirne from the Ministry of Defence (MOD) gave an insight into his work on developing data discovery tools. Adam’s challenge was to introduce more multi-dimensional visualisations for his users to explore data patterns. He showed some examples of cutting-edge work around the world to create a multidimensional data discovery experience from 3D printing to virtual reality goggles. After some experimentation and user feedback Adam decided that with the current user infrastructure limitations his data visualisations will have to stay in 2D for now.
Enhance your calm with Twitter API rate limits
Weichao Wang from DWP shared his experience of gathering data from Twitter using the Tweepy Python package. He faced a commonly encountered problem with the Twitter API of rate limiting data downloads. He realised that although his code was properly limiting the call rate, he had not fully stopped previous API calls. These created a compound effect and resulted in multiple 420 errors.
We’d like to thank our presenters for sharing their experiences. It takes confidence and nerve to stand up in front of your peers and talk about something that didn’t go as planned.
If you are interested in coming along to the next Show & Tell on August 27th 2015, please let us know by writing in the comments below or contacting us.