Group visit to Sheffield Madina Masjid

Sheffield Madina Masjid

A couple of weeks ago I had the privilege of visiting the Madina Masjid in Sheffield, taking a group of undergraduate students as part of the social science Achieve More week. I’d never been inside a mosque before despite visiting a couple, so was genuinely excited to be shown around. Our tour guide, Zahid, showed the group the facilities, starting with the wash rooms where he showed us how Muslims performs their ablutions before prayer.

Masjid entrance

Masjid entrance

The mosque had a small library with a collection of books for children learning Arabic (among others, I’m sure), as well as several copies of the Quran. Zahid showed us an edition with beautiful calligraphy which I didn’t dare handle or take a photograph of, but I did take a photograph of one of the editions in the classroom:

Quran

Quran

After the library Zahid showed us the main prayer rooms. These were laid out so prayers face the Kaaba in Mecca and were ornately decorated, with the upstairs prayer room housing an immense gold chandelier:

Chandelier

Chandelier

The Islamic calendar is based on the moon and the times of daily activities are based on sunrise and sunet, so the times and dates of key events (such as prayer) do not align with a consistent time and date in the Gregorian calendar. As the Gregorian calendar is dominant in most Western countries (and the sun rises and sets at different times depending on the season!), the prayer room listed all the prayer times for the coming week:

Prayer times

Prayer times

The visit was truly fascinating as I didn’t (and still don’t) know much about Islam so to glean an insight into the daily practices of local Muslims was eye-opening. I don’t personally practise any religion, but I have at least some understanding of Christian religious practice through participating in English culture and tradition, but the practice of Islam was a relative unknown to me. Seeing rituals like ablution (wudu) performed brought these to life in a way that simply reading about them can’t.

Muslims and Islam are, sadly, often the subject of misunderstanding, suspicion, and even outright hostility in many Western societies, so I thought this was a great experience offered to visitors of the masjid. Of course, in an ideal world, mosques shouldn’t need to raise awareness and help alleviate social tensions, but one step at a time I guess.

Formal informal testing of research code

When writing research code I do test my code and results, but until recently I’ve only been doing this informally and not in any systematic way. I decided it was time to change my testing habits when I noticed I had recoded a variable incorrectly despite my informal tests suggesting nothing was wrong. When I went back and corrected this error this made a small, but noticeable, difference to my model.

I didn’t want to write formal unit tests that required setting up new folders and files, etc., as I would use this as an excuse not to switch tabs and write the tests. My other concern is that I am often testing large dataframes that I remove from memory once checked and merged into something else, so they don’t persist for external testing (like a function). For my own needs I just wanted to write a test inline with the script, which would halt the execution of the script if the test did not pass. I would suggest this approach would work for many researchers writing code as scripts in environments like R or Python.

Writing informal tests formally

For R users a great tool I found is the testthat package by Hadley Wickham. It’s available on CRAN so is as easy as any other package to install. Stick near the top of your script:

install.packages("testthat")
library("testthat")

Really the package is written for package authors who want to automate their testing, but by dropping a few bits it can easily be integrated into a script and the tests are performed when sourceing the file.

The testthat documentation outlines how to write a test, and the syntax to use. As an example, the following test checks to see that the right number of rows and columns are present, assuming we know our data should have 1,000 respondents and 20 variables, respectively:

df <- readr::read_csv("data.csv")

context("Check df loaded correctly")
test_that("df correct number of rows (respondents) loaded?", {
  expect_that(nrow(df), equals(1000))
})
test_that("df correct number of columns (variables) loaded?", {
  expect_that(ncol(df), equals(20))
}) 

Clearly this is a trivial example and you probably wouldn’t test to this detail, but this does show you the outline of the test format. I use a new context for each group of tests focused around an individual object or function. For example, I would write a new context when no longer testing df. Within each context you can then have as many test_that() calls, and within each test_that() calls you can have as many expect_that() calls as you like. This effectively allows you to give your test groups ‘headings’ to remind you of their purpose and structure. These headings are useful when the tests fail because they form part of the error message returned to you so you can easily spot what’s wrong.

The advantage of using testthat in this way is that it stops execution of the script if any tests fail. Assuming you write good tests, you can be confident that if the script finishes that there are no errors.

Writing informal tests systematically

Not sure if I wrote good code or bad unit tests

Think about your unit tests

Of course, tests don’t write themselves so you need to cover this yourself. This takes practice, am I don’t claim to have mastered writing unit testing yet. The bits of advice I have picked up through trial and error and reading others’ work, though, are:

  1. Test systematically. Test everything. Test every time you create a new object (function, data frame, variable, vector, etc).
  2. Of course, some objects are pretty straightforward, so use your judgement when applying suggestion 1! If in doubt, err on the side of testing.
  3. If writing a function, write some tests that compare the output with what you expect the output to be.
  4. If testing a data frame check you have the right number of respondents (rows) and variables (columns) if necessary. Test to see that there are no responses that do not have an incorrect value. For example, if your variable should be binary (0/1) only check there are no 2s.
  5. If the class of an object is important, test to see if it is that class.
  6. When you recode a variable, test these have been recoded correctly.
  7. Prefer tests that are comprehensive over tests that sample, if possible. For example, write tests that check a whole vector rather than a selection of values in that vector.
  8. Test what you can, and you’ll get better with practice. Testing some things is better than testing no things.

Why bother testing?

Well, clearly it’s up to you if you test. There is an investment upfront in learning to test, but it’s not that great. You can be up and running writing testthat tests within half a day, tops. In my case, that investment has saved me two things:

  1. Time. When I come to re-use a function (or copy+paste a section of code) I have the tests in place so I can run the new code immediately and know if it’s working or not without having to scrutinise every line of code or test a number of outputs from the function. They’re written, and they’ll fail if there’s a problem.
  2. Worry. I am now much more confident in my results than ever before, because I know that I’ve made sensible tests along the way. Crucially I also know that if I ever need to change anything the tests will inform me if there’s a problem, rather than having to check everything carefully or hope for the best.

Downloading ONS geography lookups

I wasn’t able to easily download ONS output area lookups from either their main site or the geoportal. I kept getting errors that the download failed. I suspect on Linux the URLs are considered malformed; they might work on Windows.

Anyway, on Linux use links from the geoportal. They directly link to the .zip files (and the main site links to the geoportal, anyway).

I used cURL on the command line and escaped the brackets in the URL with \:

curl -o ~/Downloads/oa-lookup.zip https://geoportal.statistics.gov.uk/Docs/Lookups/Output_areas_\(2011\)_to_lower_layer_super_output_areas_\(2011\)_to_middle_layer_super_output_areas_\(2011\)_to_local_authority_districts_\(2011\)_E+W_lookup.zip

Which worked a treat.

Dissolve polygons in R

Dissolving polygons is another fairly elementary GIS task that I need to perform regularly. With R this is can be a bit involved, but once done is fully reproducible and the code can be re-used. This post is essentially a companion piece to Clipping polygons in R; I wrote both because I often forget how to complete these tasks in R.

Getting started

Let’s gather together everything we need to complete this example. We need a shapefile of small geographies to ‘dissolve’, a lookup table to tell us which polygons dissolve into which, and we need a couple of R spatial packages to run everything. Let’s get started.

# The default behaviour of this script is to create a folder called 'dissolve-example'
# and download and run everything from here.
# You can change this by modifying the next two lines
dir.create("dissolve-example")
setwd("dissolve-example")

# Load a few packages. dplyr makes merges easier
require("rgdal")
require("rgeos")
require("dplyr")

# Set up shapefile to dissolve. I'm using English regions
download.file("http://census.edina.ac.uk/ukborders/easy_download/prebuilt/shape/England_gor_2011.zip",
              destfile = "lad-region-lookup.zip")
unzip("lad-region-lookup.zip", exdir = ".")
region <- readOGR(".", "England_gor_2011")

# Check the shapefile has loaded correctly
plot(region)
Regions

Regions

# We're going to dissolve all regions in to one country (England!)
# For this we'll create a lookup table and merge with the spatial data
# Hopefully for your 'real' data you have a lookup table of all polygons and
# their larger geography already!
lu <- data.frame()
lu <- rbind(lu, region@data)
lu$CODE <- as.character(lu$CODE)
lu$NAME <- as.character(lu$NAME)
lu$country <- NA
lu$country <- "England"

Merge

We now need to merge the lookup table into our spatial object data frame. We should end up with one row per zone to dissolve, each with a reference for the relevant larger geography. I think the trick is to make sure the row names match exactly, and if you can match the polygon IDs as well with spChFIDs().

# Merge lu (LookUp) into polygons,
region@data$CODE <- as.character(region@data$CODE)
region@data <- full_join(region@data, lu, by = "CODE")

# Tidy merged data
region@data <- select(region@data, -NAME.x)
colnames(region@data) <- c("code", "name", "country")

# Ensure shapefile row.names and polygon IDs are sensible
row.names(region) <- row.names(region@data)
region <- spChFIDs(region, row.names(region))

Dissolve

Now we can do the dissolve. I use gUnaryUnion (and indeed I think unionSpatialPolygons in the maptools package uses this by default).

# Now the dissolve
region <- gUnaryUnion(region, id = region@data$country)

If you want to just plot this using base plot you can stop there. If you want to do anything with the data or plot using ggplot you need to recreate the data frame.

# If you want to recreate an object with a data frame
# make sure row names match
row.names(region) <- as.character(1:length(region))

# Extract the data you want (the larger geography)
lu <- unique(lu$country)
lu <- as.data.frame(lu)
colnames(lu) <- "country"  # your data will probably have more than 1 row!

# And add the data back in
region <- SpatialPolygonsDataFrame(region, lu)

# Check it's all worked
plot(region)

Your plot should look like this:

Dissolved polygons

Example dissolved polygons

And your data frame should look like this:

region@data
##   country
## 1 England
LADs in Yorks and the Humber

Clipping polygons in R

Clipping polygons is a basic GIS task. It involves removing unneeded polygons outside of an area of interest. For example, you might want to study all local authorities (LADs) in the Yorkshire and the Humber region but can only obtain shapefiles that contain all the LADs in the UK. Removing the LADs outside of Yorkshire and the Humber could be achieved by ‘clipping’ the shapefile of LADs, using the extent of the larger region as a template. In R this takes a bit of work, but is quite possible with a few lines of code and has the advantage of being fully reproducible.

Getting started

Let’s start by obtaining administrative boundary shapefiles in England, loading necessary packages, and quickly checking the shapefiles are plotting correctly. We will need both regions and LADs. I’m using the freely-available boundaries from the UK Data Service:

# The default behaviour of this script is to create a folder called 'clip-example'
# and download and run everything from here.
# You can change this by modifying the next two lines
dir.create("clip-example")
setwd("clip-example")

# download and unzip UK administrative boundary shapefiles
download.file("http://census.edina.ac.uk/ukborders/easy_download/prebuilt/shape/England_gor_2011.zip",
              destfile = "gor.zip")  # ~6.4MB
download.file("http://census.edina.ac.uk/ukborders/easy_download/prebuilt/shape/England_lad_2011.zip",
              destfile = "lad.zip")  # ~25MB
unzip("gor.zip", exdir = ".") # '.' just means current directory
unzip("lad.zip", exdir = ".")

# I recommend loading the shapefiles with readOGR() in the rgdal package
# install.packages("rgdal") # uncomment and run if not already installed
# install.packages("rgeos") # as above

# If you run in to trouble installing rgdal and rgeos on Linux (Ubuntu) see:
# https://philmikejones.wordpress.com/2014/07/14/installing-rgdal-in-r-on-linux/

require("rgdal")
require("rgeos")

regions <- readOGR(".", "England_gor_2011")
plot(regions) # worth a quick check everything looks like it should!

lads <- readOGR(".", "England_lad_2011")
plot(lads)

So now we have everything we need to begin: a shapefile of English regions, and a shapefile of English LADs.

Extract Yorkshire and the Humber

You’ve hopefully seen from your test plot that the shapefiles are plotting all boundaries in England, so first let’s remove all but Yorkshire and the Humber from the regions shapefile. We can do this by looking for the names of the regions in the shapefiles attributes file and using standard subsetting syntax. If you’ve used other GIS software before hopefully this idea will be familiar to you, but if not, don’t worry!

regions <- regions[which(regions@data$NAME == "Yorkshire and The Humber"), ]
# Another plot confirms just the Yorkshire and The Humber region is left
plot(regions)

Clip the LADs

The regions object now just contains Yorkshire and The Humber, so we can use this as our bounds to extract LADs:

yh_lads <- gIntersection(regions, lads, byid = TRUE, drop_lower_td = TRUE)
plot(yh_lads)

Easy, but we’re not finished yet. The clipped polygons no longer contain a data frame because the gIntersection doesn’t (and can’t) know which data frame items to save in to the new object. This means we must add them back in manually, but even this is relatively straight-forward.

Recreate the data frame

The row names for the clipped polygons are a concatenation of the regions row name and the LAD row names. These are in the form x yyyy where x is the region row name (in our case always 5, because Yorkshire and The Humber was row 5) and yyyy is the LAD row name. This can be confirmed with:

row.names(yh_lads)
## [1] "5 2"   "5 8"   "5 9"   "5 37"  "5 39"  "5 43"  "5 105" "5 124" "5 162" "5 165"
## [11] "5 175" "5 204" "5 207" "5 214" "5 225" "5 245" "5 248" "5 251" "5 254" "5 297"
## [21] "5 320"

So that we can create a data frame based on the LAD data frame, we must first ensure our row names match the LAD row names so they can be merged across. I do this by removing the “5 ” with gsub():

row.names(yh_lads) <- gsub("5 ", "", row.names(yh_lads))
# Confirm it worked
row.names(yh_lads)
##  [1] "2"   "8"   "9"   "37"  "39"  "43"  "105" "124" "162" "165" "175" "204" "207"
## [14] "214" "225" "245" "248" "251" "254" "297" "320"

So far, so good. Now we create a variable - "keep" - that contains the row names that we want to keep from the LADs data frame.

keep <- row.names(yh_lads)

For the sake of simplicity I also change the polygon IDs so they are consistent with their respective row names. This step is probably optional, but I prefer to ensure they match so I have a consistent set of IDs to work with later on. If you choose not to do this I don't think you'll run into any problems but YMMV.

yh_lads <- spChFIDs(yh_lads, keep)

Now we create a copy of the LAD dataframe and filter it so that it only contains the rows we want to keep, using row names to perform the match:

yh_data <- as.data.frame(lads@data[keep, ])

Finally we create a SpatialPolygonsDataFrame object with our clipped polygons and our subsetted data frame:

yh_lads <- SpatialPolygonsDataFrame(yh_lads, yh_data)

All being well, your final plot should look like this:

plot(yh_lads)
LADs in Yorks and the Humber

LADs in the Yorkshire and The Humber region

And your yh_lads object should contain attribute data, which can be viewed with:

yh_lads@data
##           CODE                        NAME ALTNAME
##  2   E07000164                   Hambleton    
##  8   E07000169                       Selby    
##  9   E08000017                   Doncaster    
##  37  E07000168                 Scarborough    
##  39  E08000032                    Bradford    
##  43  E07000165                   Harrogate    
##  105 E07000167                     Ryedale    
##  124 E07000166               Richmondshire    
##  162 E06000012     North East Lincolnshire    
##  165 E08000035                       Leeds    
##  175 E07000163                      Craven    
##  204 E08000033                  Calderdale    
##  207 E06000013          North Lincolnshire    
##  214 E08000019                   Sheffield    
##  225 E06000010 Kingston upon Hull, City of    
##  245 E06000011    East Riding of Yorkshire    
##  248 E06000014                        York    
##  251 E08000036                   Wakefield    
##  254 E08000034                    Kirklees    
##  297 E08000016                    Barnsley    
##  320 E08000018                   Rotherham    

# If further confirmation is required:
class(yh_lads)
##  [1] "SpatialPolygonsDataFrame"
##  attr(,"package")
##  [1] "sp"

Congratulations, a fully reproducible polygon clip!

Sheffield VPN in Ubuntu

By default in Ubuntu you need to install vpnc to connect to the University of Sheffield VPN. You can do this with:

sudo apt-get install network-manager-vpnc

You will need a Remote Access Password in order to connect. If you don’t have one, you can request a remote access password from CICS. Note this is not your regular account password.

With vpnc installed, set up the VPN by:

  1. Open System Settings and click Network.
  2. Click the + to create a new service.
  3. From the Interface drop-down menu select VPN.
  4. A new menu for VPN Connection Type will appear. Select Cisco Compatible VPN (vpnc).
  5. You can rename you VPN connection if you wish by editing the Connection Name.
  6. In the gateway field enter: vpn.shef.ac.uk
  7. In account name enter your University username.
  8. Change Always Ask for password to Saved in the drop-down list.
  9. Enter your Remote Access password in the User password field.
  10. Change Always Ask for Group password to Saved in the drop-down list.
  11. For both Group name and Group password enter Unishef
  12. Click Save.
  13. Your VPN is now set up. To use it left-click on the network icon in the status bar (two arrows facing up and down), select VPN connections and choose your new VPN.
Aside

Pidgin with Google Hangouts

I’m old-fashioned and like a desktop client for online messaging so I use Pidgin on my Ubuntu machines. To install Pidgin:

sudo apt-get install pidgin

Then open the Pidgin client. Add an account and select XMPP protocol. Your username is the bit before the ‘@’ in your email address. Domain is the bit after the ‘@’. If you’re using a personal Gmail account this will be gmail.com; if you’re using a Google Apps email address (for example if your company use Gmail to handle it’s mail) it will be whatever’s after the @ in your email address. Resource can be left blank.

If you use two-factor authentication to sign in to your Google account (and you really should!) you will need an app-specific password to log in to Pidgin; don’t use your normal Gmail password. If this is you, go to App passwords and create a new password to Pidgin. Use this to log in. If you don’t use two-factor authentication use your normal password.

Before you save go to Advanced tab. You need to select ‘Connection security: Require encryption’ and Connect port: 5222. If this doesn’t work you can try ‘Connection security: use old-style SSL’ and Connect port: 5223. Both seem to work but I assume the old-style SSL is less secure. Finally, in Connect server enter talk.google.com. Press Save and connect.

Thanks to @mpekas for the tip about port configurations.