The geography of Pubs in the UK: Visualizing Geographical Information with Plotly
In my previous blog, I performed some Text Mining techniques to learn more about the names of the Pubs in UK. Since the source of the information contained also the location of each Pub, I am writing this second blog to learn something different about the drinking culture of this country.
Plotting a map is an excellent way of making sense of datasets that contain information about location in the physical world. To understand where the Pubs are geographically concentrated, I would like to start with a basic Scatterplot using the coordinates (latitude and longitude) provided in the dataset.
To do so, I will use Plotly, which is a very advanced and flexible data visualization library. The chart below uses an integration with Mapbox in order to get the base layer of the map and then plot the dots on top of that. We can see that the largest concentration of Pubs happen in the Greater London region, the Liverpool-Manchester-Leeds corridor and also in the region surrounding Newcastle.
If you wish to do so, you can zoom in to get a more detailed look and also hover the mouse to see the pubs name and coordinate. Click here to see a zoomable version of the map.
Note: In order for this to work, you will need to create a Pyplot account, a Mapbox account and specify the log-in credentials and mapbox access token at the top of your script. The documentation provided will walk you through the required the steps.
Another thing that can be done in order to further understand the geographical locations of the Pubs in UK is to create a ratio against the quantity of people that live in each region. Since the Pubs dataset contains the full address, we can use the ‘outward’ section of the postcode to know in which region it is located.
To achieve this we need to wrangle our Data frame in order to group the number of Pubs by regions/cities and also append the population estimate for that same postcode, which I got from here. In order to avoid distorting the results, I also decided to drop the postcodes that were outliers (for example the code EC3M is located in central London, besides the Tower of London; being part of the business district it has 10 registered Pubs, but only 6 registered residents).
Now, we can use the same scattermapbox visualization with the new data frame. We specify that the size and color of the marker would be proportional to the ratio we calculated in order to get an accurate representations of the regions that have the most pubs per capita. Go here to see a zoomable version of the map and see finer detail of the graph.
A useful tool to make sense of the information is to know approximate distribution of the ratios, we can see that the mean value is nearly 1 pub for every 1.000 residents. Anything above 1.3 would be on the top 25% of the dataset.
mean 0.950071
std 0.566517
min 0.030396
25% 0.505923
50% 0.845728
75% 1.294305
max 2.986858
We can use this information to learn more about the visualization. This is a snapshot of the zoom in towards central London. Because of their size, we know that all the dots shown here are above the 75% quartile, meaning that all central london has a very high concentration of Pubs in relation to its population.
There is a lot of cool things to do with this dataset. I think that the map view can help us with the interpretation of the numbers and also can lead to interesting discoveries .
If you want to connect and have a conversation about this or any other topic, please feel free to find me on LinkedIn: