https://gdsdata.blog.gov.uk/2016/08/12/using-google-sheets-to-audit-your-foreign-language-content/

Using Google Sheets to audit your foreign language content

1

I wanted to analyse when some GOV.UK foreign language content had last had a content update. I regularly use Google sheets for my analysis, and so use the DETECTLANGUAGE function within sheets to audit our pages without having to worry about whether the content has already been tagged by language.

I began by using the GOV.UK Search API  to get a list of unsorted URLs related to the content area I was interested in. I knew the list would include some foreign language pages but the volume and distribution of pages per language was still unknown.

Identifying the language

I pasted the URL list from the Search API into column A of a Google sheet. Then simply added  =DETECTLANGUAGE(URL cell) formula in an adjacent column cell for each URL.

Input the formula in an adjacent cell to the URL
Input the formula in an adjacent cell to the URL

Applying this formula for all of the URLs in column A allowed me to identify the language of each page in the list represented by a two letter language acronym. Some of these acronyms are intuitive, but for the ones I was unsure about I cross-referenced them with this list

Language filter
In column C (third from left) the formula has identified the language of the pages in column A

I wasn't dealing with many languages within this list so it was quick to cross-reference. If you have a more varied language content list, I would advise creating a VLOOKUP to clearly state the language in another column.

Tracking content updates by language

Next, I used the column C filter to just show me the Welsh pages. Then I copied and pasted these pages into another sheet tab where I used the ImportXML function to scrape the last content update information into column D (ImportXML values).

I used the MID function to just return the date information into column E (MID function values) and finally  from each of the foreign pages into column D which allowed to me to sort the pages by language and last updated.

Finally I used the GOOGLETRANSLATE function to translate the dates into English in column F (Translation) which allowed me to see at a glance the recency of content updates.

Last updated
Using ImportXML to scape the last updated data from some identified Welsh content

Over to you

There are lots of potential uses for DETECTLANGUAGE including comparing the foreign language pages with its English equivalents (if any). This can help analysis of how expected user behaviour can vary from English and foreign versions. We’d love to hear about how you’re using  it. Please share your ideas in the comments section.

4 comments

  1. Comment by Ben John Grady posted on

    Cheers! I hadn't thought of utilising the DETECTLANGUAGE alongside GOOGLETRANSLATE for translating the outputs into sheets.
    Ben John Grady

    Reply
    • Replies to Ben John Grady>

      Comment by vpilla posted on

      Thanks Ben, it's a nifty function. Let us know how you get on with it and if you have any tips or new uses.

      Reply
  2. Comment by Neil Shadrach posted on

    Interesting use of Google sheets.
    I would point out though that, from a UK perspective, Welsh is not a foreign language.

    Reply
    • Replies to Neil Shadrach>

      Comment by vpilla posted on

      Very good point Neil, I didn't intend to imply any different. The content area I was interested unfortunately didn't have foreign content as I expected to show as an example. Therefore with only English and Welsh pages to work with, I used our Welsh pages to show the detect function in practice. The function would work just as well with all languages depending on dataset.

      Reply

Leave a comment

We only ask for your email address so we know you're a real person