Technical Blog from our Data Science Team

Read the latest technical blogs from our data science team.

Analysis of Gaelic Station Names: An exploration of inter-language similarity measures for place-names and the design of rural scores.

Motivation Most of modern Scotland was once Gaelic-speaking and a policy change in 2010 means Gaelic names appear alongside English names on almost all station signs across Scotland’s railway. I live in Glasgow and often travel out into the highlands and over time I hypothesised: H1: The Gaelic and English names of a station become more similar

Read More »

Dealing with many dimensions in historical data: Tracking cooperation & conflict patterns over space and time in R

For this post, I’ve managed to find some extremely interesting historical event data offered by the Cline Center on this page. As you will see, this dataset can be quite challenging because of the sheer number of dimensions you could look at. With so many options, it becomes tricky to create visualisations with the ‘right’ level of granularity:

Read More »

Data guidelines: A set of recommendations for clean and usable data

The extent to which a dataset follows a set of commonly expected guidelines will often determine how much time you have left to spend thinking about your analysis. Ideally, you might intend to spend 20% of your time cleaning the data for a project, and 80% planning and carrying out your actual analysis. But often,

Read More »

LA maps of crime: Using R to map criminal activity in LA since 2010

I’ve recently come across data.gov — a huge resource for open data. At the time of writing, there are close to 17,000 freely available datasets stored there, including this one offered by the LAPD. Interestingly, this dataset includes almost 1.6M records of criminal activity occurring in LA since 2010 — all of them described according to a variety of measures (you can

Read More »

Winning Pub Quizzes with pandas

I was in my local pub quiz recently when as usual we were faced with the anagram speed round. The quiz-master Dr Paul reads out a collection of words with a clue to a mystery person’s identity, and the first team to correctly shout out the answer wins a spot prize. It’s often a box

Read More »