Synchronous presentation of stimuli to the auditory and visual systems can modify the formation of a percept in either modality. For example, perception of auditory speech is improved when the speaker's facial articulatory movements are visible. Neural convergence onto multisensory sites exhibiting supra-additivity has been proposed as the principal mechanism for integration. Recent findings, however, have suggested that putative sensory-specific cortices are responsive to inputs presented through a different modality. Consequently, when and where audiovisual representations emerge remain unsettled. In combined psychophysical and electroencephalography experiments we show that visual speech speeds up the cortical processing of auditory signals early (within 100 ms of signal onset). The auditory-visual interaction is reflected as an articulator-specific temporal facilitation (as well as a nonspecific amplitude reduction). The latency facilitation systematically depends on the degree to which the visual signal predicts possible auditory targets. The observed auditory-visual data support the view that there exist abstract internal representations that constrain the analysis of subsequent speech inputs. This is evidence for the existence of an “analysis-by-synthesis” mechanism in auditory-visual speech perception.