This paper describes a new concept for the implementation of the direct simulation Monte Carlo (DSMC) method. It uses a localized data structure based on a computational cell to achieve high performance, especially on workstation processors, which can also be used in parallel. Since the data structure makes it possible to freely assign any cell to any processor, a domain decomposition can be found with equal calculation load on each processor while maintaining minimal communication among the nodes. Further, the new implementation strictly separates physical modeling, geometrical issues, and organizational tasks to achieve high maintainability and to simplify future enhancements. Three example flow configurations are calculated with the new implementation to demonstrate its generality and performance. They include a flow through a diverging channel using an adapted unstructured triangulated grid, a flow around a planetary probe, and an internal flow in a contactor used in plasma physics. The results are validated either by comparison with results obtained from other simulations or by comparison with experimental data. High performance on an IBM SP2 system is achieved if problem size and number of parallel processors are adapted accordingly. On 400 nodes, DSMC calculations with more than 100 million particles are possible.