Chemical Structures with Bokeh
Bokeh is a well documented and versatile visualisation package that has a python API. Check out their documentation here.
I have been looking for a way to be able to include chemical structures as a hover effect on plots, diagramms, figures interactively and had experimented with a few packages before settling on Bokeh.
This post demonstrates a proof of concept for working with Bokeh to achieve that. I hope you enjoy it and please let me know if you know of different ways of achieving the same with different interactive visualisation packages in python.
First let’s define what we will be visualising. I think something simple and amenamble to change will serve us just fine.
As an example, let’s look at the chemical synthesis of paracetamol and how the molecular weight changes at each step of the route.
The code data can also be found here.
#import modules
import pandas as pd
from rdkit import Chem
from rdkit.Chem import Draw
import rdkit.Chem.Descriptors as Desc
import io
import base64
from bokeh.plotting import ColumnDataSource, figure, output_notebook, show
from bokeh.models import LinearAxis, Range1d
#read in the route data
route_data = pd.read_csv('Paracetamol.csv')
route_data
Step | SMILES | |
---|---|---|
0 | 0 | C1=CC=CC(=C1)OC |
1 | 1 | C1=C(C=CC(=C1)O[H])[N+](=O)[O-] |
2 | 2 | C1=C(C=CC(=C1)O[H])N([H])[H] |
3 | 3 | C1=C(C=CC(=C1)O[H])N([H])C(C)=O |
This is a short 3 step route involving a nitration, reduction to the amine, and acylation to prepare paracetamol starting from phenol.
Let’s calculate the molecular weight of each compound in a new column
mols = [Chem.MolFromSmiles(smi) for smi in route_data['SMILES']]
mws = [Desc.ExactMolWt(mol) for mol in mols]
route_data['MW'] = mws
route_data
Step | SMILES | MW | |
---|---|---|---|
0 | 0 | C1=CC=CC(=C1)OC | 108.057515 |
1 | 1 | C1=C(C=CC(=C1)O[H])[N+](=O)[O-] | 139.026943 |
2 | 2 | C1=C(C=CC(=C1)O[H])N([H])[H] | 109.052764 |
3 | 3 | C1=C(C=CC(=C1)O[H])N([H])C(C)=O | 151.063329 |
Great! Now we can visualise the above in a Bokeh plot:
- let’s get the image for each molecule from its mol object that we generated above
imgs = [Draw.MolToImage(mol) for mol in mols]
- get a mock urls in a list that we can feed to the Bokeh API:
urls = []
for img in imgs:
buffer = io.BytesIO() #initialise the buffer
img.save(buffer, format='PNG') #use the buffer to save the image to memory
byte_im = buffer.getvalue() #retrieve the image from memory
url = 'data:image/png:base64' #base string
url += base64.b64encode(byte_im).decode('utf-8') #add unique string after encoding and decoding
urls.append(url) #append to the list
source = ColumnDataSource(data=dict(x=route_data['Step'], y=route_data['MW'], imgs=urls))
TOOLTIPS = """
<div>
<div>
<img
src="@imgs" height="150"
style="float: left; margin: 2px 2px 2px 2px;"
border="2"
></img>
</div>
<span style="font-size: 15px; font-weight: bold; ">Step: </span>
<span style="font-size: 15px; ">@x </span>
</div>
<div>
<span style="font-size: 15px; font-weight: bold; ">MW: </span>
<span style="font-size: 15px; ">@y </span>
</div>
</div>
"""
#create the figure
p = figure(plot_width=800, plot_height=800,
x_range=(-0.2, route_data['Step'].max()+0.2),
y_range=(route_data['MW'].min()-10, route_data['MW'].max()+10),
tools='hover, pan, wheel_zoom, box_zoom, reset, save',
tooltips = TOOLTIPS,
title = f'Route Intermediate Visualisation') # create the figure
p.circle('x', 'y', fill_alpha=0.8, size=10, source = source) #scatter plot with circles
p.xaxis.axis_label = 'Step number'
p.yaxis.axis_label = 'MW for each compound at each step'
output_notebook()
show(p)
I find it very useful to be able to hover over a point and the chemical structure to be coming up. This allows for direct interaction and can facilitate faster understanding of your data.
I hope that you find this useful.