We propose a reinforcement learning method for estimating an optimal dynamic treatment regime for survival outcomes with dependent censoring. The estimator allows the failure time to be conditionally independent of censoring and dependent on the treatment decision times, supports a flexible number of treatment arms and treatment stages, and can maximize either the mean survival time or the survival probability at a certain time point. The estimator is constructed using generalized random survival forests and can have polynomial rates of convergence. Simulations and data analysis results suggest that the new estimator brings higher expected outcomes than existing methods in various settings. An R package dtrSurv is available on CRAN.