Deriving value from Data Science at scale requires execution on a number of fronts. In previous blog posts, we’ve covered techniques, data access, and skills. Our 6th Data Science Show & Tell focused this time around on Data Science Infrastructure. We were very pleased to welcome outside speakers Dr. Jagdev Panesar of British Telecommunications plc (BT) and data scientists setting up new capability at Bank of England (BoE), at the National Audit Office (NAO) and at the Ministry of Defence (MoD).
Big Data architecture principles - GDS
GDS’s very own Data Architect, Andy Delaney, laid the foundation for the abundance of form and variety in the word of Big Data with his enlightening overview of the CAP theorem, which, in essence, posits that databases necessarily need to compromise between consistency, availability and accuracy. This implies different infrastructure will invariably perform differently on differing tasks, and that conversely, specific uses will be optimally suited to particular architectures. Andy mentioned the unsuitability of accessing the popular HBase for certain analytic tasks without using lower latency technologies such as Apache Phoenix and Cloudera Impala.
Data Lab - BoE
David Bradnum and Paul Robinson at BoE then walked us through the evolution of their DataLab, BoE’s data science platform, which emerged from their efforts to improve the efficiency of their data life-cycle. One insight I picked up was how the demand on their infrastructure has grown thanks to the increased analytical capability afforded by their investment into their architecture and ability to harness this.
Digital media analytics - BT
BT’s Dr. Jagdev Panesar followed with a clear demonstration of how large scale analytics architecture can empower a business to draw actionable insights from the mountains of customer interactions and machine generated events produced via devices and all across the telco’s network.
Value for Money - NAO
The next presentation, by Phil Bradburn and Ben Coleman from the National Audit Office, showed how smart thinking coded on “conventional architectures” of relational databases and .NET technology, appropriately sized and spec’ed, can crack much smaller scaled data problems such as scraping data from reports already in the public domain to reduce research/ procurement time. Value for money indeed from our friends at the NAO.
Transcending IT- MoD
Tom Keen from the Ministry of Defence (MOD) showed us how his team employ the flexibility afforded by virtualisation to build analytic capability that would not have been possible on the their (necessarily) locked down IT.
The world of Data Science being as young and as fast-lived as it is, designing and affording the right architecture will foreseeably continue to be a hot topic for organisations looking to grow their analytic ability. The 6th Show and Tell brought us examples where analytics is delivering powerful impact thanks to or indeed, in some cases, despite the architecture. The more distant future possibly belongs to cloud infrastructures that can be spun up and re-shaped, tuned to required analyses. We may soon find out.
In the meantime, we extend our thanks to our speakers and audience. If you would like to contribute or otherwise join us at the next Show and Tell (14:00-16:00 on 12th November 2015), do please get in touch.