At a broad level, stochastic optimization is concerned with optimizing a function in the presence of noise. In this context, we consider the optimization of a quadratic objective function whose gradients are only accessible through a stochastic oracle that returns the gradient at any given point plus a zero-mean finite variance random error. We present the first algorithm that achieves the optimal prediction error rates for least-squares regression, both in terms of forgetting of initial conditions in O(1/n^2), and in terms of dependence on the noise and dimension d of the problem, as O(d/n). Our new algorithm is based on averaged accelerated regularized gradient descent, and may also be analyzed through finer assumptions on initial conditions and the Hessian matrix, leading to dimension-free quantities that may still be small while the "optimal" terms above are large.