We implement the off-policy Qlearning algorithm with the goal of maximizing the hybrid entanglement between the walker and the coin degree of freedom in an one-dimensional quantum walk.
We use the Schmidt norm as a measure of entanglement which is the reward at the final time step of the quantum walk evolution. The Schmidt norm S reached after an evolution of 5-step (blue), 7-step (orange), and 15-step (green) quantum walk with the optimal sequence (solid line) obtained by the Qlearning algorithm and the universal entangling sequence (dotted dashed line) is shown below
Learning curves for the optimization problems discussed in the main text. The episodic reward (Schmidt norm) is averaged over 300 [(a) and (b)] or 400 [(c) and (d)] independent runs. The light blue area corresponds to the confidence interval and dashed lines denote the maximally achievable reward of sqrt(2). (a)–(c) Learning curves for the 5, 7, and 15 step (n) quantum walk where the initial state parameter φ is set to zero and the parameter θ is sampled from a uniform distribution at the beginning of each new episode. (d) Learning curve for the 5 step quantum walk where both initial state parameters φ and θ are sampled at the beginning of each training episode.
Find the published version of our work here