Sites of Special Scientific Interest, or SSSIs, are nature conservation areas given legal protection from damage, development, and neglect in order to safeguard the natural features that they contain. As their name implies, they are used to conserve features of particular biological or geological interest. Caves and mines often play host to such features; because of the unique biological habitats that they provide, and because of the geological nature of their formation/excavation.
Earlier this year I received an email from a caver who wanted to explore how well recorded (and therefore protected) these features are. Their idea was to search through all SSSI citations for keywords such as "cave", "ogof", and "uamh", to determine which SSSIs made reference to these features. A dataset of known cave and mine features would then be prepared and those that fall within an SSSI boundary extracted. These two datasets could then be compared to assess how many SSSI citations acknowledge these features.
I was tasked with compiling the database of known cave and mine features and performing the point-in-polygon intersection with the SSSI boundarys.
Whilst there is no national database of cave and mine features in Great Britain, the British Caving Association (BCA) does work with regional caving councils to maintain regional registries. As the BCAs Cave Registry & Archive working group has developed these registries they all take a similar format, which means that they can be easily scraped. I scraped entries from:
Other regional councils have developed their own database of cave and mine features, which I was also able to scrape. This was a slightly tricker task due to the differing design of each registry, and stronger built-in scraping protection. I sourced data from:
Finally, one of the famous caving guide books has also developed an online dataset of cave and mine features which I was able to scrape for entries that might not appear in other datasets:
Each of the devolved governments maintain their own publically available SSSI datasets, so it was fairly straightforward to download shapefiles for the SSSIs:
I used Python to scrape each of the data sources. I won't share this code publically as I had to work around some anti-scraping measures, but I ensured that a suitable delay was used so as not to overload any of the servers hosting the data.
Once I had scraped features from each data-source, I compiled the resultant features into a geospatial dataset of point features and performed some fuzzy deduping based on feature name and location, also in Python.
I then used QGIS to load up the SSSI shapefiles and performed a "point-in-polygon" intersection with the features dataset in order to label each feature with the details of the SSSI that it falls within.
The result of this work was a GeoJSON (1.25Mb or 120kb gzipped) of cave/mine features. I then cobbled together a quick visualisation using a Leaflet map to allow the public to explore this dataset:
If you want to learn a little more about this visualisation you can check out the source-code yourself:
In science it's important to communicate assumptions that may influence the interpretation of your work. In data-science this is even more important. As such, when interpreting this data it's important to note that:
My work here was only a small part of a much bigger research project, and you can download the report for some interesting reading:
To quickly summarise: