In this paper, we propose a pixel-wise method named TextCohesion for scene text detection, which splits a text instance into five key components: a Text Skeleton and four Directional Pixel Regions. These components are easier to handle than the entire text instance. A confidence scoring mechanism is designed to filter characters that are similar to text. Our method can integrate text contexts intensively when backgrounds are complex. Experiments on two curved challenging benchmarks demonstrate that TextCohesion outperforms state-of-the-art methods, achieving the F-measure of 84.6% on Total-Text and bfseries86.3% on SCUT-CTW1500.