https://gdsdata.blog.gov.uk/2014/10/15/improving-customer-insights-the-merits-of-using-lasso/

Improving customer insights: the merits of using LASSO

In a world where online surveys are often distributed to understand user needs, the main challenge is no longer the gathering of data but dealing with the vast quantities of responses.  When faced with such a daunting analysis task, the first port of call is statistical modelling, normally using the General Linear Model (GLM).

Statistical modelling tools scan survey responses and then recommend a list of actions to improve customer experience. The problem with the current GLM analysis model is that it needs an analyst to judge what actions should be included in the final list, which opens up the process to human error.

There are alternatives. One of these is the LASSO (Least Absolute Shrinkage and Selection Operator) regression method that can automatically produce a final list of actions without involving human judgement. This method is admired for its simplicity and has been widely used in areas like Genomics, where big data is routine. It's advantageous to any organisation where resources are limited.

To test the use of this new LASSO regression method in government, we created an artificial survey with 40,000 fictitious respondents, each of whom answered 200 questions. The results of both the GLM and the new LASSO regression method are presented below.

GLMLasso

As you can see, the LASSO regression method recommends fewer actions than GLM, but achieves nearly the same degree of user satisfaction. A total of 28 actions will address 93.56% of the user needs, and assuming it takes 10 hours to implement 1 action at a cost of £400 per day - according to the current market rate of an analyst - the solution will cost £112,000.

In contrast, the GLM method identifies 63 recommended actions that will cost 44% more (£252,000) but provide an increment of only 6% to the customer satisfaction level.

From an analytical standpoint the new LASSO regression approach provides an efficient way of arriving at the simplest solution for survey data. The use of the model can greatly help government departments with limited digital resources prioritise the improvement of those areas that result in the greatest return on investment for citizens.

If work like this sounds good for you, take a look at Working for GDS – we’re usually in search of talented people to come and join the team.

You can follow Shahzia on twitter, sign up now for email updates from this blog or subscribe to the feed.

 

 

 

 

 

3 comments

  1. Ed Jones

    Can you recommend reading materials for the non-expert and tell us what tools you used? Is this an R script you wrote? A specific software package?

    (And is your code on github somewhere?)

    Thanks!

    Link to this comment
  2. Marko Stojovic

    I only wish one could do LASSO and ridge easily in logistic regression in SAS 9.2! There's a "ridging" option, but it appears to affect the convergence algorithm and there's no reference to the tuning parameter (how to select, or how it's optimised). I hope someone will tell me I've missed it!

    Link to this comment