How to combine interesting quantum computing properties with a classic Machine Learning technique
Introduction
Although quantum computers are not yet accessible for everyone, Quantum Machine Learning (QML) is a promising field of study as it uses the intrinsic probabilistic nature of quantum systems to develop models. Right now, data scientists around the world are trying to understand how to leverage the quantum paradigm to produce better, scalable models. It is not possible to quantify when this will happen because it also depends on the evolution of quantum hardware, but there is accelerated growth in this matter.
In my last studies I was trying to design Variational Quantum Classifiers (VQCs), as you could see in a previous post I wrote. This is an interesting case of study if you are starting to study QML like me.
However, lately I also started to study a quantum approach to the Support Vector Machine (SVM) and I was intrigued by how SVM could be translated into the quantum world.
As I was studying VQCs, I was very biased and I was trying to guess how the SVM could be translated into a parameterizable quantum circuit, but I found that the quantum enhancement here works differently, which was a nice surprise and helped me to open my mind about this subject.
In this post I start with a brief introduction of SVM followed by how to do a Quantum Machine Learning (QML) approach to this technique and I finish with an example of quantum enhanced SVM (QSVM) using the Titanic dataset.
SVM and kernels
I present here the SVM focused on classification problems, that is, the Support Vector Classifier (SVC). The objective of SVC is to find a hyperplane that separates data from different classes with the best possible margin. This doesn’t seem very helpful at first, right?
But what is this hyperplane that separates the classes? Suppose we are working with data in a two-dimensional vector space and we have two classes, as in Figure 1.
Figure 1— 2D data with a very clear linear separator — Image by the author
In this example we have data points from 2 different classes and we can easily draw a line separating both. Our solid line is the hyperplane that separates our data with the best possible margin, as seen by the dashed lines. Thus, SVM tries to find the best separator.
You may think that my example was too naive and a line is a very particular case of a hyperplane, which is a valid point. What if our two-dimensional data looks like Figure 2?
Figure 2— 2D data with nonlinear separator — Image by the author
In this case we can’t draw a line that separates our data correctly. If we look at this figure, we could draw a circle as a good separator. However, this shape is neither a line nor a plane, so SVM isn’t able to directly solve this problem. However, this is the coolest SVM trick and the part where a high dimensional hyperplane occurs!
What if we have the transformation of this data into a higher dimension vector space? As:
(equation image by the author)
So we would could draw a plane:
(equation image by the author)
Which separates the two classes optimally, as seen in Figure 3:
Figure 3— Plane that separates our data in a higher dimensional space — Image by the author
In our case, the function f is what we call the kernel, which projects data into a higher dimensional space, which makes it easier to find a hyperplane that can correctly identify data from different classes.
This was a very brief introduction about kernels and SVM, if you are interested in more explanation about SVM I recommend you to read these two posts (1 and 2), which are really good introductions to SVM and I used both of them as references in this post.
You might be thinking now that my example was very convenient to explain the kernel concept, but how in real life do we find a suitable kernel that solves our problems? There are some kernels that are very flexible and are very helpful to solve a good number of problems, like the Radial Basis Function (RBF), which is the default option of scikit-learn’s SVC. If you are interested in learning more about this kernel, I recommend this post. An important detail about kernels such as the RBF is that they aren’t described by an analytic function, but as a similarity matrix between data points based on the kernel.
However, what if we want to be more creative? If you have read my previous posts you might remember that one of the most interesting properties of quantum computing is the exponential relation between qubits and quantum states. Thus, a quantum system is a very interesting candidate for a good kernel, as it tends to drive our system towards a high-dimensional vector space, depending on the quantity of qubits we are using.
Quantum Kernels
Quantum kernels are usually defined by an similarity matrix based on a quantum circuit, that might be parameterizable or not. Both Pennylane and Qiskit have built-in functions that create kernels that can be used in scikit-learn’s SVC.
The project of a quantum kernel has some steps:
Embedding data into quantum states
(equation image by the author)
Designing a quantum circuit that might be parameterizable or not
(equation image by the author)
At this stage, it is highly recommended to work with some degree of superposition and entanglement between states to obtain the best that quantum computing can provide.
Building the similarity matrix
Here we work with the unitary U(|x>) that we built in the last step and its adjoint to design a similarity matrix.
Example
Here we are designing a simple quantum kernel with Pennylane to use it with an SVC from scikit-learn for the Titanic Classification dataset, where we want to predict whether a person survived the Titanic tragedy based on variables such as age, gender and boarding class.
In our example we are using the following variables:
is_child: if the age of the person is less than 12 (boolean)Pclass_1: if the person boarded in the first class (boolean)Pclass_2: if the person boarded in the second class (boolean)Sex_female: if the gender of the person is female (boolean)
As you can see, this is a very simple model with four boolean variables. We are embedding our data into quantum states using quantum embedding (Basis Embedding), applying Hadamard gates to apply superposition into our qubits and CNOT gates to generate entanglement.
Figure 4— Ansatz for our kernel example — Image by the author
This is a simple and non-parameterizable ansatz, but it generates superposition and entanglement between our variables.
The code to create the kernel and SVM is here:
import pennylane as qml
from pennylane import numpy as np
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score, precision_score, recall_score
from sklearn.svm import SVC
num_qubits = 4
def layer(x):
qml.BasisEmbedding(x, wires=range(num_qubits))
for j, wire in enumerate(wires):
qml.Hadamard(wires=[wire])
if j != num_qubits-1:
qml.CNOT(wires=[j, j+1])
else:
qml.CNOT(wires=[j, 0])
def ansatz(x, wires):
layer(x)
adjoint_ansatz = qml.adjoint(ansatz)
dev = qml.device(“default.qubit”, wires=num_qubits, shots=None)
wires = dev.wires.tolist()
@qml.qnode(dev, interface=”autograd”)
def kernel_circuit(x1, x2):
ansatz(x1, wires=wires)
adjoint_ansatz(x2, wires=wires)
return qml.probs(wires=wires)
def kernel(x1, x2):
return kernel_circuit(x1, x2)[0]
df_train = pd.read_csv(‘train.csv’)
df_train[‘Pclass’] = df_train[‘Pclass’].astype(str)
df_train = pd.concat([df_train, pd.get_dummies(df_train[[‘Pclass’, ‘Sex’, ‘Embarked’]])], axis=1)
X_train, X_test, y_train, y_test = train_test_split(df_train.drop(columns=[‘Survived’]), df_train[‘Survived’], test_size=0.10, random_state=42, stratify=df_train[‘Survived’])
X_train[‘Age’] = X_train[‘Age’].fillna(X_train[‘Age’].median())
X_test[‘Age’] = X_test[‘Age’].fillna(X_test[‘Age’].median())
X_train[‘is_child’] = X_train[‘Age’].map(lambda x: 1 if x < 12 else 0)
X_test[‘is_child’] = X_test[‘Age’].map(lambda x: 1 if x < 12 else 0)
cols_model = [‘is_child’, ‘Pclass_1’, ‘Pclass_2’, ‘Sex_female’]
X_train = X_train[cols_model]
X_test = X_test[cols_model]
X_train = np.array(X_train.values, requires_grad=False)
init_kernel = lambda x1, x2: kernel(x1, x2)
K = qml.kernels.square_kernel_matrix(X_train, init_kernel, assume_normalized_kernel=True)
svm = SVC(kernel=lambda X1, X2: qml.kernels.kernel_matrix(X1, X2, init_kernel)).fit(X_train, y_train)
X_test = np.array(X_test.values, requires_grad=False)
predictions = svm.predict(X_test)
accuracy_score(y_test, predictions)
precision_score(y_test, predictions)
recall_score(y_test, predictions)
f1_score(y_test, predictions, average=’macro’)
svm1 = SVC(gamma=’auto’, kernel=’rbf’)
svm1.fit(X_train, y_train)
y_pred = svm1.predict(X_test)
accuracy_score(y_test, y_pred)
precision_score(y_test, y_pred)
recall_score(y_test, y_pred)
f1_score(y_test, y_pred, average=’macro’)
The results are:
Figure 5— Print of tests results — Image by the author
As you can see, the SVC with the RBF kernel outperformed our SVC with quantum kernel. Our quantum approach had good precision, which means that we managed to avoid false positives at a good rate, but our recall wasn’t so good, implying that we got a significant number of false negatives.
If you want to read more about SVMs with quantum kernel, these posts are good references: 1, 2 and these texts from Pennylane about the subject: 3 and 4.
Conclusions
Quantum kernels can be a powerful tool to increase SVM performance. However, as we could see in our example, a SVM with a simple quantum kernel isn’t able to outperform SVM with an RBF kernel. Quantum kernels require careful design in order to be competitive with classical techniques.
I have been deepening my studies to design parameterizable quantum kernels and I hope to have good news on this subject soon.
A simple introduction to Quantum enhanced SVM was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.