With the rapid proliferation of smartphones and tablets, various embedded sensors are incorporated into these platforms to enable multimodal human-computer interfaces. Gesture recognition, as an intuitive interaction approach, has been extensively explored in the mobile computing community. However, most gesture recognition implementations by now are all user-dependent and only rely on accelerometer. In order to achieve competitive accuracy, users are required to hold the devices in predefined manner during the operation. In this paper, a high-accuracy human gesture recognition system is proposed based on multiple motion sensor fusion. Furthermore, to reduce the energy overhead resulted from frequent sensor sampling and data processing, a high energy-efficient VLSI architecture implemented on a Xilinx Virtex-5 FPGA board is also proposed. Compared with the pure software implementation, approximately 45 times speed-up is achieved while operating at 20 MHz. The experiments show that the average accuracy for 10 gestures achieves 93.98% for user-independent case and 96.14% for user-dependent case when subjects hold the device randomly during completing the specified gestures. Although a few percent lower than the conventional best result, it still provides competitive accuracy acceptable for practical usage. Most importantly, the proposed system allows users to hold the device randomly during operating the predefined gestures, which substantially enhances the user experience.