We present a first-person method for cooperative basketball intention prediction: we predict with whom the camera wearer will cooperate in the near future from unlabeled first-person images. This is a challenging task that requires inferring the camera wearer's visual attention, and decoding the social cues of other players. Our key observation is that a first-person view provides strong cues to infer the camera wearer's momentary visual attention, and his/her intentions. We exploit this observation by proposing a new cross-model EgoSupervision learning scheme that allows us to predict with whom the camera wearer will cooperate in the near future, without using manually labeled intention labels. Our cross-model EgoSupervision operates by transforming the outputs of a pretrained pose-estimation network, into pseudo ground truth labels, which are then used as a supervisory signal to train a new network for a cooperative intention task. We evaluate our method, and show that it achieves similar or even better accuracy than the fully supervised methods do.