Reverse-mode automatic differentiation, especially in the presence of complex language features, is notoriously hard to implement correctly, and most implementations focus on differentiating straight-line imperative first-order code. Generalisations exist, however, that can tackle more advanced features; for example, the algorithm described by Pearlmutter and Siskind in their 2008 paper can differentiate higher-order functions written in a pure functional language. We show that AD algorithms can benefit enormously from being translated into the language of string diagrams in two steps: first, we describe a Pearlmutter-Siskind style AD algorithm as a set of rules for transforming hierarchical graphs; rules which can -and indeed have been- be implemented correctly and efficiently for a non-trivial language. Then, we present a proof of soundness for it by reducing the above rewrite rules to a suitable graphical version of the axioms of Cartesian reverse differential categories, expressed as string diagrams.