SSSI Cave Entrances

Sites of Special Scientific Interest, or SSSIs, are nature conservation areas given legal protection from damage, development, and neglect in order to safeguard the natural features that they contain. As their name implies, they are used to conserve features of particular biological or geological interest. Caves and mines often play host to such features; because of the unique biological habitats that they provide, and because of the geological nature of their formation/excavation.

Earlier this year I received an email from a caver who wanted to explore how well recorded (and therefore protected) these features are. Their idea was to search through all SSSI citations for keywords such as "cave", "ogof", and "uamh", to determine which SSSIs made reference to these features. A dataset of known cave and mine features would then be prepared and those that fall within an SSSI boundary extracted. These two datasets could then be compared to assess how many SSSI citations acknowledge these features.

I was tasked with compiling the database of known cave and mine features and performing the point-in-polygon intersection with the SSSI boundarys.

Sourcing the data

Whilst there is no national database of cave and mine features in Great Britain, the British Caving Association (BCA) does work with regional caving councils to maintain regional registries. As the BCAs Cave Registry & Archive working group has developed these registries they all take a similar format, which means that they can be easily scraped. I scraped entries from:

Other regional councils have developed their own database of cave and mine features, which I was also able to scrape. This was a slightly tricker task due to the differing design of each registry, and stronger built-in scraping protection. I sourced data from:

Finally, one of the famous caving guide books has also developed an online dataset of cave and mine features which I was able to scrape for entries that might not appear in other datasets:

Each of the devolved governments maintain their own publically available SSSI datasets, so it was fairly straightforward to download shapefiles for the SSSIs:


I used Python to scrape each of the data sources. I won't share this code publically as I had to work around some anti-scraping measures, but I ensured that a suitable delay was used so as not to overload any of the servers hosting the data.

Once I had scraped features from each data-source, I compiled the resultant features into a geospatial dataset of point features and performed some fuzzy deduping based on feature name and location, also in Python.

I then used QGIS to load up the SSSI shapefiles and performed a "point-in-polygon" intersection with the features dataset in order to label each feature with the details of the SSSI that it falls within.

The map

The result of this work was a GeoJSON (1.25Mb or 120kb gzipped) of cave/mine features. I then cobbled together a quick visualisation using a Leaflet map to allow the public to explore this dataset:

If you want to learn a little more about this visualisation you can check out the source-code yourself:

Get it on Github


In science it's important to communicate assumptions that may influence the interpretation of your work. In data-science this is even more important. As such, when interpreting this data it's important to note that:

  • This is a snapshot in time: Both the cave/mine feature databases and the SSSI databases change over time, but my web scraping was performed in December 2021 and I do not intend to regularly repeat it.
  • This data is only as reliable as the source data that feeds it. SSSI polygons are fairly coarse, and many of the cave/mine features were located by cavers in the dark and the pouring rain reading a soggy paper OS map for a six figure grid reference. As such we can't always assume that the locations of each feature are perfect, or that all features within SSSIs will have been captured. I tried to work around this a little by adding a margin of error to my point-in-polygon intersection, but this is of limited efficacy.

The results

My work here was only a small part of a much bigger research project, and you can download the report for some interesting reading:

Read the Report

To quickly summarise:

  • 276 SSSIs were identified that contain at least one of the 6657 cave/mine features identified
  • Of these 276 SSSIs, 17 citations make no reference to any cave/mine features, whilst a further 70 only mention a feature in relation to it being a bat habitat

Previous Post Next Post