In this paper, we propose a non-orthogonal multiple access (NOMA)-based communication framework that allows machine type devices (MTDs) to access the network while avoiding congestion. The proposed technique is a 2-step mechanism that first employs fast uplink grant to schedule the devices without sending a request to the base station (BS). Secondly, NOMA pairing is employed in a distributed manner to reduce signaling overhead. Due to the limited capability of information gathering at the BS in massive scenarios, learning techniques are best fit for such problems. Therefore, multi-arm bandit learning is adopted to schedule the fast grant MTDs. Then, constrained random NOMA pairing is proposed that assists in decoupling the two main challenges of fast uplink grant schemes namely, active set prediction and optimal scheduling. Using NOMA, we were able to significantly reduce the resource wastage due to prediction errors. Additionally, the results show that the proposed scheme can easily attain the impractical optimal OMA performance, in terms of the achievable rewards, at an affordable complexity.