Shake it up! Network Analytics of Cocktail Ingredients

5 min readNov 21, 2019

Network theory studies the relations between entities; in other words it tries to reveal the different ways things are connected to each other. The most known use case for network analysis are social networks and the many ways people are connected in online social media platforms.

In this blog, I will show a different application of the same theory, using a Python library called NeworkX for transforming data sets into networks and another great library called Netwulf for creating stunning visualizations of the network.

My idea for this blog was inspired by a University of Cambridge paper called “Network analysis and data mining in food science: the emergence of computational gastronomy”, where network analytics was applied to food recipes. I thought I could adapt a similar approach for cocktail recipes and thus this blog was born.

I went online and rummaged several forums until I found a dataset that had been put together from this open-source cocktail directory. The database contained 500+ cocktail recipes, detailing the ingredients required to prepare each drink. I was interested in learning how the individual ingredients are connected to each other in this “universe” of cocktail recipes in order to answer questions like:

⚫ What is the most versatile spirit used for cocktail preparation?
⚫ Will Guiness stout pair well with coffee?
⚫ Which kind of ingredients should never be used alongside with sugar? etc.

So, let me walk you through the entire process while I highlight the most interesting findings.

Data Preprocessing: The most challenging part of the process was to transform the list of recipes into an ingredient adjacency matrix. This meant creating a square pairwise matrix of n ingredients long vs n ingredients across. The matrix was populated with the amount of recipes each pair of ingredients co-occurred.

For example, in the intersection of “vodka” and “grenadine” you will find the value of 4, which stands for the 4 recipes that contained both ingredients (Addison Special, Arizona Twister, Jitterbug and San Francisco).

After pre-processing the information, the network is created as a Graph-object using the NetworkX library. The network that was created contained 307 nodes, each representing one ingredient. These nodes are connected via 2.513 edges; each edge representing 1 recipe where both ingredients were present. The summary statistics of the resulting network are the following:

The minimum number of nodes is 3 (i.e carrots, kiwi, papaya).
The maximum number of nodes is 137 (Vodka)
The average number of nodes is 16.37

In other words, this means that, on average, each ingredient is connected to other 16 ingredients. The extremely well connected ingredients of the network would be those on the right hand of the distribution: Vodka, with 137 connections, sugar with 100 connections, orange juice and ice with 97 connections, gin with 89 connections and lemon juice with 85 connections.

Visualizing the network: Per the project’s documentation, Netwulf is an interactive visualization tool for networkx Graph-objects, that allows you to produce beautifully looking network visualizations. Adding a line of code to your script will launch an interactive web-based environment were the network visualisation can be modified and then saved as an image. Here is a Gif of my network, where you can see Netwulf’s customization capabilities in action.

In order to convey more information, I have added two attributes to each node in the network: size and group. The size was determined by the relative percentage of times that an ingredient appeared in the universe of recipes (the most prevalent one is vodka, appearing in 16% of all recipes). The group is determined by a classification of ingredients into three categories: alcohols, mixers and garnishes. In the graph, the size attribute was used to determine the diameter of each node and the group attribute was used to color code the nodes according to its category.

This is the static final version of my visualization:

Ingredients that are highly connected will tend to appear in the centre of the network and those with few connections will appear in the periphery. Take the example of 'lemon' which seems to be a very central garnish and which goes well either with him, light rum or with coca cola or club soda.

The algorithm that plots the network automatically creates some interesting clusters of ingredients, like for example the bottom right corner of the network, where we can see a very specific group of ingredients that are only used in certain types of cocktails: caramel sauce, Snickers bar, whipped cream, vanilla, coffee and milk. We can see that the 2 main alcohols associated with these ingredients are Bailey’s and Amaretto.

We can use the network object for several other things. For example, we can easily retrieve all the neighboring nodes for any given node. This will give us an understanding of the direct connections of these ingredient to other ingredients in the network. Let’s take the case of a somewhat rare ingredient for cocktail making like ‘corona’ beer.

In:

list(nx.neighbors(H,'corona'))

Out:

['corona', 'bacardi limon', 'light rum']

If we were interested in learning more about the connectedness of ingredients, we could explore the concept of degrees of separation. The nx.neighbors function will give you the direct (or first-degree) connections, but there is a way of obtaining a list of all other ingredients that are within 2 degrees of separation.

To do so, we can loop over the neighbors, calling the same function and thus getting a list of ingredients within 2 degrees of separation of the original node:

In:

ingredient_analysis=[]
for i in list(nx.neighbors(H,'corona')):
    ingredient_analysis.extend(nx.neighbors(H,i))list(set(ingredient_analysis))

Out:

['cointreau',
 'mint',
 'egg white',
 'vanilla ice-cream',
 'pineapple',
 'pineapple juice',
 'powdered sugar',
 'ice',
   ....
 'cherry',
 'sherry',
 'banana',
 'bourbon',
 'brandy']

The resulting list is 71 ingredients long and if we think the ingredient network as an analogy of a social network, they can be interpreted as the “friends you might know” of corona beer. I am not a mixology expert and for certain I am not qualified to say that a corona beer might pair well with egg-white. However, this analysis might be useful to get insights of possible innovations that could be done if you have a limited availability of ingredients.

If you want to connect and have a conversation about this or any other topic, please feel free to find me on LinkedIn:

https://www.linkedin.com/in/juan-felipe-alvarez-analytics/

Shake it up! Network Analytics of Cocktail Ingredients

Written by Juan Felipe Alvarez Jaramillo

No responses yet