Understanding the controls on large magnitude seismicity occurrence remains an open challenge, yet a pressing one, for the exceptional hazard associated with earthquakes. Different parameters are proposed to exert control on the generation and propagation of megathrust earthquakes and untangling their complex interactions across scales remains challenging. Here, we use explainable artificial intelligence to unravel the interactions between different parameters and elucidate the underlying mechanisms. We use three types of datasets from a number of convergent margins: a) a catalogue of earthquake hypocentre and rupture, b) geophysical observations of subduction zones properties (e.g., gravity, bathymetric roughness, sediment thickness), and c) the distribution of stress within the slab due to slab pull calculated from flexure models. These constitute the three types of nodes in the input layer of a Fully Connected Network (FCN) trained to classify earthquake magnitude embedding the state of the system (b), the driving mechanism (c) and the resulting seismicity (a). We then analyse the trained network using Layer-wise Relevance Propagation (LRP) to determine the relative weights of the input nodes, providing relevant constraints on the mechanisms that dominate the seismicity in a region, their scale and likelihood.