I will discuss how Message Passing Neural Networks (MPNNs) model mixing among features in a graph. I will show that MPNNs need as many layers as the commute time between nodes to model strong mixing. This allows to derive a measure for over-squashing and to clarify how the latter limits the expressivity of MPNNs to learn functions with long-range interactions. I will then discuss a novel paradigm for message-passing that determines “when” messages are being propagated based on the geometry of the data and introduces a delay mechanism to greatly enhance the power of MPNNs to capture long-range interactions achieving superior performance than Graph-Transformers even remaining sub-quadratic.