QML-Essentials: 21 weeks of quantum machine learning, honestly graded

data science12 min read
← Back to Projects

A self-paced curriculum that ended with a working quantum autoencoder, four sub-projects of head-to-head comparison against classical baselines, and a clear-eyed view of what quantum ML can and cannot do at simulator scale today.

PythonPennyLaneQiskitPyTorchQuantum MLVQEQAOA

TL;DR. I taught myself quantum machine learning over ~21 weeks, building a working repo of 27 small experiments. The technical headline: a 16-parameter quantum model that compresses hydrogen-molecule states with 95% fidelity. The honest headline: across five head-to-head matchups against classical methods, quantum won two and classical won three. I kept both numbers in.

Where it begins. Two quantum bits start as zeros. After two simple operations, every measurement returns either "both zero" or "both one," never a mismatch. That linkage is entanglement. Running it 2,000 times and counting the outcomes is what turns it from a slogan into something a for loop can verify.

Quantum computers have spent ten years generating headlines and not very much shipping software. So when a friend asked me last summer whether quantum machine learning was real or hype, I realized I did not actually know. I have a physics background. I work as a machine learning engineer. The intersection of those two should have been my home turf. It was not.

So I set up an experiment of my own. Five months, 5 to 7 hours a week, two open laptops side by side. PennyLane on one (a Python framework that lets quantum circuits plug into PyTorch the way a regular neural network would), and Qiskit on the other (IBM's framework, the lingua franca of QML papers). Every week of the curriculum had to end with one runnable file that produced one verifiable number. If the number did not match the prediction, the file failed. If the file failed, the week did not count.

The discipline mattered. In week 11 I wrote what I thought was a working quantum chemistry simulation. The optimizer had silently converged to the wrong answer at every data point. The plotted curve looked fine at a glance: smooth, monotonic, the right general shape. The file would have shipped and I would have moved on. The numerical assertion at the bottom of the script (compare to a known reference value to within 1e-3 Hartree) was the only thing that flagged it. The bug was a subtle ordering issue in how I was constructing the Hamiltonian before passing it to the expectation routine. The plot-versus-assertion gap is the lesson. That kind of self-checking pattern is what I would recommend to anyone learning a field that makes pretty pictures easily and correct results rarely.

01. The shape of the project

I broke the work into three tiers, each building on the last. The structure was deliberate: learn to read circuits before trying to train them; train and benchmark them honestly before trying to do anything ambitious; finish with one end-to-end artifact that exercises every piece together.

TierLengthWhat it covered
I8 weeksLearn to read a quantum circuit. The fundamentals: single qubits, entangled pairs, measurement, the famous teleportation protocol, and three classic algorithms (Deutsch-Jozsa, Grover's search, the quantum Fourier transform). The week-8 final exam was rebuilding two of these from scratch, no notes.
II13 weeksTrain quantum models, then benchmark them honestly. Four sub-projects that span the field: simulating a hydrogen molecule, solving a small graph problem, classifying flowers, and using a quantum kernel for support vector machines. Capped with a measurement of the field's biggest open problem (the "barren plateau") and a head-to-head comparison against matched classical baselines on every task.
III6 weeksBuild one end-to-end thing. A quantum autoencoder: a tiny network that compresses quantum states the way JPEG compresses images, but with the data living natively on a quantum computer. 5 random seeds for reproducibility, a noise sweep that simulates real hardware imperfections, and matched classical autoencoder baselines.

02. Tier I: What "literacy" actually means

Eight weeks of a single goal: be able to look at a quantum circuit diagram and predict, by hand, what comes out the other end. The most useful exercise was the teleportation protocol in week 3. The procedure looks like science fiction on paper. You take an unknown quantum state, perform a few operations on a shared entangled pair, send your friend two ordinary classical bits, and your friend's qubit becomes a perfect copy of the state you started with. The original is destroyed in the process.

It feels mystical until you write it out. When the script confirms that the recipient's quantum state matches the sender's input across six different test cases, accurate to the last decimal place, the protocol stops feeling magical and starts feeling like an algorithm. That transition from "I don't believe this" to "of course this works" is the entire point of Tier I.

The Tier I final exam was rebuilding Bell-pair preparation and Grover's search algorithm from scratch in PennyLane, with no reference to the earlier weeks. It ran first try. Repetition is the thing.

03. Tier II: Where the comfortable narrative falls apart

The unifying question through all four sub-projects of Tier II: does the quantum model actually beat the obvious classical alternative, or just produce a number? I kept that as a standing rule. Every experiment had to include a classical baseline trained the same way on the same data. If the quantum side lost, the loss got reported.

2A. Simulating a hydrogen molecule

The first project was finding the lowest possible energy of a hydrogen (H₂) molecule. This sounds like an exotic problem until you realize that simulating molecules is one of the few things quantum computers might genuinely do better than classical ones. Drug discovery, materials design, battery chemistry, all bottlenecked by the same kind of calculation.

In plain English. The energy of a molecule depends on where its electrons are. The "ground state energy" is the lowest possible total energy, which determines everything from bond strength to chemical reactivity. Calculating it exactly is easy for very small systems and effectively impossible for big ones. VQE (Variational Quantum Eigensolver) uses a small quantum circuit as a guess, then a classical optimizer tunes the circuit's settings until the energy is as low as possible. The math guarantees you cannot accidentally tune it below the true answer.

The result: with just three trainable knobs, a small quantum circuit reaches the exact ground-state energy to within one part in a million. Accurate enough that chemists would not bother distinguishing it from the right answer. The circuit also tracks the exact answer across the entire bond-stretching curve, where the two hydrogen atoms get progressively pulled apart until they break.

The interesting half of the result is what happens to the standard classical method (called Hartree-Fock). At the natural bond length it is reasonable. As the bond breaks, it drifts upward, ending hundreds of times further off than the quantum answer at full separation. That gap is the difference between a useful chemistry calculation and a useless one.

This is the cleanest result in the entire curriculum, and the one place where I would say without hedging that the quantum approach earns its keep. The catch: hydrogen has two electrons. Real molecules have hundreds. Whether this approach scales is the open question that the rest of Tier II started to answer, mostly negatively.

2B. Cutting graphs

Project 2B was a different kind of problem. Given a graph (dots connected by lines), colour each dot one of two colours so that as many lines as possible end up with two different-coloured endpoints. This sounds abstract, but it shows up everywhere there is something to partition: scheduling, image segmentation, circuit design. It is also one of the canonical problems for which a quantum algorithm called QAOA was specifically designed.

On a small 6-dot graph, my QAOA implementation reached 98% of the optimal score. The best classical guarantee for any algorithm of its kind is 87.8%. So on this specific instance, quantum beats the classical worst-case bound. That feels like a clean win, until you ask the next question: what happens at scale?

The answer turned out to be the most important number in Tier II, and it has a name. It is called the barren plateau.

In plain English. Imagine training a machine learning model is like rolling a ball downhill to find the lowest point in a landscape. The "gradient" is the slope under the ball. It tells the optimizer which way is down. Barren plateaus are the discovery, in 2018, that as quantum circuits grow more qubits, the slopes everywhere become exponentially flatter. Past a certain size, the ball cannot tell which way is down anywhere. The optimizer just sits there. This is not a bug in any one method. It is a fundamental obstacle.

I measured this directly. Random initial parameters, hundreds of samples, look at how strong the gradient signal is on average:

By 30 or 40 qubits (still small by classical standards), the signal would be undetectable from numerical noise. This is the obstacle that gates almost every claim about quantum machine learning at scale.

2C. Classifying flowers (yes, the famous one)

Project 2C used the most over-used dataset in machine learning: 150 measurements of iris flowers, three species, four features each. The reason it is over-used is that it is small enough to be a sanity check for any new method.

I tried three different ways of "encoding" each flower's measurements into a quantum circuit, then trained the same downstream model on top of each. The encoding is the choice of how to translate ordinary numbers into the quantum machine's input format, and there are several reasonable ways to do it. I expected this choice to matter a little. It mattered a lot.

On the harder pair of species, with everything downstream of the encoding held constant: angle encoding hit 100% accuracy. Amplitude encoding hit 30% (worse than chance, confidently wrong). IQP encoding hit 75%. Same data, same trainable model, three different ways of getting the data in. The choice of encoder made a 70-percentage-point difference.

This is the kind of result that does not show up in conference talks because it does not look like a method, it looks like a footnote. But it is the most important practical lesson of Tier II. The papers spend their pages comparing trainable architectures. The encoding choice that nobody emphasizes ends up dominating the result.

There is a mechanical reason for the gap. Amplitude encoding squeezes a 4-dimensional feature vector into the amplitudes of a 2-qubit state, which forces normalization across features and collapses any feature whose magnitude differs from the others into the noise floor. Angle encoding spreads features across independent rotation gates, so each one survives. The lesson generalizes: in any pipeline that compresses inputs before learning, the compression itself is doing more of the work than the learner sitting downstream. The classical analogue is when a poorly-tuned PCA step in front of a fancy model decides the result before training begins.

Then I ran the fair-fight experiment. I built a tiny classical neural network with the exact same number of parameters as the quantum model: 13. Same training, same data, same loss function. The classical model won three of five sample-size comparisons. Mean accuracy: classical 0.943, quantum 0.923. A two-percentage-point loss for the quantum approach on a problem the field treats as easy.

2D. The most decisive negative result

The fourth Tier II project used a quantum computer in a different mode: not to train a model, but to compute a "similarity score" between data points. Then a classical algorithm (a support vector machine) uses those similarity scores to classify. This is called a quantum kernel, and it is one of the most-hyped applications of QML.

I ran a 9-cell sweep. Three different circuit depths, three different training-set sizes. The quantum kernel against a standard classical kernel called RBF, on the same data:

The quantum kernel decreases monotonically with depth: more expressive feature maps push pairwise similarity scores toward a constant, and the classifier is left fitting noise. This is the recently-discovered "kernel concentration" phenomenon (Thanasilp et al. 2024), seen directly.

The runtime accounting closes the case. Computing all the quantum similarity scores: 125 seconds. Computing all the classical scores: 0.07 seconds. A factor of 1,800 slower, with the quantum side losing every single match. On real quantum hardware, the gap would be many times larger again.

The honest scoreboard

Sub-projectQuantum resultClassical resultVerdict
2A · H₂ moleculeone part in a milliondrifts off badlyquantum wins
2B · graph cutting98% of optimal87.8% guaranteedquantum wins
2C · flowers vs simple linear100%95%quantum +5
2C · flowers vs matched neural net92%94%classical -2
2D · flower kernel comparison74% (mean)100%classical, swept

Quantum wins on problems with quantum structure (molecules, combinatorial graphs). Classical wins on every plain-tabular-data task. And both quantum wins are against classical methods that are themselves doing something quantum-flavored under the hood.

04. Tier III: The artifact

Tier III was meant to be one quantum-native end-to-end thing, executed honestly. I picked an autoencoder.

In plain English. An autoencoder is a neural network with a deliberate bottleneck. It learns to compress input through the bottleneck and then reconstruct it on the other side. JPEG does something similar for photographs. Keep the important parts, throw the rest away. A quantum autoencoder does the same trick, but on quantum states. Take a state spread across 4 qubits, squeeze it through a 2-qubit bottleneck, expand back to 4 qubits. If you can do that without losing much, you have discovered the structure in your data without being told what to look for.

The data was the hydrogen-molecule states from project 2A: 22 different bond lengths, each producing one quantum state. The model had 16 trainable parameters total. After training:

  • 95% reconstruction fidelity on data it had never seen during training (mean over 5 random training runs, plus or minus 2%).
  • Almost no overfitting: the gap between training and test fidelity was only 2.6 percentage points.
  • The model survived simulated hardware noise gracefully. At noise levels matching today's IBM machines, fidelity dropped to 87% but stayed 61 percentage points above an untrained random baseline.

The most interesting result was not any one of those numbers. It was what the model put inside its compressed bottleneck.

I checked whether the 2-qubit "compressed code" had any human-readable structure. It did. Across all 22 bond lengths, the compression axis lined up with bond length almost perfectly. (Statistical correlation: +0.998, where 1.0 is a perfect monotonic relationship.) The model had figured out, with no hint from me, that the only physically meaningful difference between any two of its inputs was how stretched the bond was. So it used its 2-qubit code as a one-dimensional dial for that single property. It compressed the data and discovered the underlying physics.

I want to flag what made this auditable in the first place. The dataset has exactly one independent physical parameter (bond length), and I knew that going in. So the test for "did the model learn something physical" reduced to plotting the bottleneck activations against bond length and checking for monotonicity. On almost any other dataset, I would not have had a known ground-truth axis to compare against, and "the latent space looks structured" would have been an unfalsifiable claim. The lesson I would carry into a less-controlled domain: if you cannot articulate, in advance, what physical axis the latent space is supposed to recover, you cannot test whether it did.

Then I ran the honest classical comparison. Three autoencoders, same data, same training:

The honest takeaway: on this specific dataset, the big classical model wins because the data happens to live exactly where the big classical model is strongest. The quantum autoencoder is competitive in a controlled fair-fight against a similarly-sized classical model. And the quantum model has structural advantages that this simulator-based comparison cannot show. On real quantum hardware, the input and output of a quantum autoencoder are themselves quantum states. The classical model would need exponentially expensive measurements to even read the input, and exponentially expensive preparation to use the output for anything else. None of that overhead shows up here.

05. What I would do differently

Five things, roughly in order of how much they would matter.

1. Move to real hardware. Everything in this curriculum ran on a simulator, even the noise study. Half the practical knowledge of quantum computing is invisible in simulation: shot noise, calibration drift, gate errors that are correlated and time-varying. IBM offers free access to a 7-qubit machine. That is the obvious next step.

2. Pick datasets with quantum structure from the start. The flowers and digits datasets I used were chosen for repo cleanliness, not because quantum methods had any reason to win on them. The honest place to test quantum machine learning is on data generated by a quantum process, or on combinatorial problems with discrete structure. I would pick those next.

3. Watch the gradient signal earlier. The barren-plateau measurement should have been part of the workflow from week 1, not week 14. "Is this model trainable at this size?" is the first question, not the last.

4. Stop fighting kernel concentration. A quantum kernel without trainable parameters versus a classical kernel is, at small scale, a one-sided fight that the literature has been losing for several years. A serious follow-up should commit to learnable quantum kernels or skip kernel methods entirely.

5. Report standard deviations from the start. Tier II reported numbers thinly across random seeds. Tier III ran every headline through 5 seeds and reported mean plus or minus standard deviation. The discipline of "no headline number without an error bar" is what kept Tier III's results trustworthy. I would put that rule in place from week 1.

06. One sentence per tier

Tier I: read a quantum circuit fluently, no hand-waving. Tier II: train and honestly evaluate four variational QML projects, including the ones where quantum lost. Tier III: produce one end-to-end quantum-native artifact, with reproducibility, simulated hardware noise, and matched classical baselines.

The unsexy summary of the whole project: on the kinds of problems QML is currently being marketed for, classical methods are harder to beat than the marketing suggests. The interesting corollary, from Tier III specifically: on problems where the input is already a quantum state and the output needs to feed into another quantum operation, classical models are not competing at all. They need exponentially expensive translation to enter the ring. Whether that ever becomes commercially relevant depends on whether real quantum hardware ever crosses a threshold that simulators cannot illuminate. That is the experiment I do not yet know how to run.

What I would not claim: that any of this is publishable. The curriculum was designed as a learning artifact, not a research result. It does exactly what it claims to do, and no more. That is what the assertion gates were for.

07. The repo, and what to read next

Everything is at github.com/TirtheshJani/QML-Essentials. Each weekly script is self-contained and self-checking; the whole curriculum runs end to end in the for-loop in the README. The two long-form review documents (TIER2_REVIEW.md, TIER3_REVIEW.md) are where I worked out the honest verdicts that turned into this post.

If you want to learn the field yourself, the PennyLane Codebook is the best free starting point I found. Schuld and Petruccione's Machine Learning with Quantum Computers is the textbook that informed Tier II. The Romero, Olson, and Aspuru-Guzik paper from 2017 is the quantum-autoencoder original.