Lots of machine learning tasks require dealing with graph data, and among them, scene graph generation is a challenging one that calls for graph neural networks' potential ability. In this paper, we present a definition of graph neural network (GNN) consists of node, edge and global attribute, as well as their corresponding update and aggregate functions. Based on this, we then propose a realization of GNN model called Graph-LSTM and use it in scene graph generation. The model first extracts the item features in the image as the initial states of the node-LSTM representing subject/object and edge-LSTM representing predicate. Two LSTMs update the states via LSTM's timestep and aggregate information via message passing. Repeat the update-aggregate until convergence. Meanwhile, the tag feature, i.e., the generated probability distribution of image's semantic concepts is sent to the LSTM through a semantic compositional network (SCN). The SCN-LSTM is trained in an ensemble style, and hence allows the tag feature to serve as the global attribute providing context information to all individuals. The LSTMs' final states are input to inference modules and generate the triplet (subject, predicate, object) of the scene graph. Experimental results show that Graph-LSTM outperforms the Message Passing and the attention Graph Covolutional Network methods, proving the effectiveness of the proposed scheme.