Purpose: Stimulated echo acquisition mode (STEAM) diffusion MRI can be advantageous over pulsed-gradient spin-echo (PGSE) for diffusion times that are long compared to $\ttwo$. It is important therefore for biomedical diffusion imaging applications at 7T and above where $\ttwo$ is short. However, imaging gradients in the STEAM sequence contribute much greater diffusion weighting than in PGSE, but are often ignored during post-processing. We demonstrate here that this can severely bias parameter estimates. Method: We present models for the STEAM signal for free and restricted diffusion that account for crusher and slice-select (butterfly) gradients to avoid such bias. The butterfly gradients also disrupt experiment design, typically by skewing gradient-vectors towards the slice direction. We propose a simple compensation to the diffusion gradient vector specified to the scanner that counterbalances the butterfly gradients to preserve the intended experiment design. Results: High-field data fixed from a monkey brain experiments demonstrate the need for both the compensation during acquisition and correct modelling during post-processing for both diffusion tensor imaging and ActiveAx axon-diameter index mapping. Simulations support the results and indicate a similar need in in-vivo human applications. Conclusion: Correct modelling and compensation are important for practical applications of STEAM diffusion MRI.