The Nomisweb API can be used both to discover (list) datasets, and to obtain one or more datasets once these have been identified.
The API supports a variety of formats including html, xml and JSON. I find the JSON preferable because it’s less verbose than xml so typically is a smaller file.
To list all the available datasets I used a programme called jq (‘Json Query’ I assume). At the time of writing (June 2015) I strongly recommend you install the development version, rather than the existing binary versions, because the manual and tutorial oddly refer to filters and commands that are not yet available in the official released version. Instructions to install the development version from git are available or download a pre-compiled binary from the drop down box on the homepage.
You will also need curl which on a Ubuntu machine is as easy as:
sudo apt-get install curl
I then used curl and jq to obtain the list of datasets and filter out everything, keeping only the ID, name, and description to easily find relevant datasets. The following script does this and saves the output to nomisweb-datasets.txt:
- Line 1 specifies the relevant json file listing all the datasets, specified in the API documentation.
- Line 2 is a series of filters piped together. It essentially obtains everything; then filters out everything but keyfamilies at that level; then filters everything but keyfamily (it just so happens there’s nothing to filter at that level); then obtains all datasets (‘.‘); before finally extracting the id, name, and description from each index.
- Line 3 saves this to the file rather than print it to the terminal window as is the default.