I wanted to analyse when some GOV.UK foreign language content had last had a content update. I regularly use Google sheets for my analysis, and so use the DETECTLANGUAGE function within sheets to audit our pages without having to worry about whether the content has already been tagged by language.
I began by using the GOV.UK Search API to get a list of unsorted URLs related to the content area I was interested in. I knew the list would include some foreign language pages but the volume and distribution of pages per language was still unknown.
Identifying the language
I pasted the URL list from the Search API into column A of a Google sheet. Then simply added =DETECTLANGUAGE(URL cell) formula in an adjacent column cell for each URL.
Applying this formula for all of the URLs in column A allowed me to identify the language of each page in the list represented by a two letter language acronym. Some of these acronyms are intuitive, but for the ones I was unsure about I cross-referenced them with this list.
I wasn't dealing with many languages within this list so it was quick to cross-reference. If you have a more varied language content list, I would advise creating a VLOOKUP to clearly state the language in another column.
Tracking content updates by language
Next, I used the column C filter to just show me the Welsh pages. Then I copied and pasted these pages into another sheet tab where I used the ImportXML function to scrape the last content update information into column D (ImportXML values).
I used the MID function to just return the date information into column E (MID function values) and finally from each of the foreign pages into column D which allowed to me to sort the pages by language and last updated.
Finally I used the GOOGLETRANSLATE function to translate the dates into English in column F (Translation) which allowed me to see at a glance the recency of content updates.
Over to you
There are lots of potential uses for DETECTLANGUAGE including comparing the foreign language pages with its English equivalents (if any). This can help analysis of how expected user behaviour can vary from English and foreign versions. We’d love to hear about how you’re using it. Please share your ideas in the comments section.