Abstract
The tt¯H(bb¯) process is an essential channel in revealing the Higgs boson properties; however, its final state has an irreducible background from the tt¯bb¯ process, which produces a top quark pair in association with a b quark pair. Therefore, understanding the tt¯bb¯ process is crucial for improving the sensitivity of a search for the tt¯H(bb¯) process. To this end, when measuring the differential cross section of the tt¯bb¯ process, we need to distinguish the b-jets originating from top quark decays and additional b-jets originating from gluon splitting. In this paper, we train deep neural networks that identify the additional b-jets in the tt¯bb¯ events under the supervision of a simulated tt¯bb¯ event data set in which true additional b-jets are indicated. By exploiting the special structure of the tt¯bb¯ event data, several loss functions are proposed and minimized to directly increase matching efficiency, i.e., the accuracy of identifying additional b-jets. We show that, via a proof-of-concept experiment using synthetic data, our method can be more advantageous for improving matching efficiency than the deep learning-based binary classification approach presented in [1]. Based on simulated tt¯bb¯ event data in the lepton+jets channel from pp collision at s = 13 TeV, we then verify that our method can identify additional b-jets more accurately: compared with the approach in [1], the matching efficiency improves from 62.1% to 64.5% and from 59.9% to 61.7% for the leading order and the next-to-leading order simulations, respectively.