Block-SCL: Blocking Matters for Supervised Contrastive Learning in Product Matching
Abstract
Product matching is a fundamental step for the global understanding of consumer behavior in e-commerce. In practice, product matching refers to the task of deciding if two product offers from different data sources (e.g. retailers) represent the same product. Standard pipelines use a previous stage called blocking, where for a given product offer a set of potential matching candidates are retrieved based on similar characteristics (e.g. same brand, category, flavor, etc.). From these similar product candidates, those that are not a match can be considered hard negatives. We present Block-SCL, a strategy that uses the blocking output to make the most of Supervised Contrastive Learning (SCL). Concretely, Block-SCL builds enriched batches using the hard-negatives samples obtained in the blocking stage. These batches provide a strong training signal leading the model to learn more meaningful sentence embeddings for product matching. Experimental results in several public datasets demonstrate that Block-SCL achieves state-of-the-art results despite only using short product titles as input, no data augmentation, and a lighter transformer backbone than competing methods.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2022
- DOI:
- 10.48550/arXiv.2207.02008
- arXiv:
- arXiv:2207.02008
- Bibcode:
- 2022arXiv220702008A
- Keywords:
-
- Computer Science - Computation and Language
- E-Print:
- 7 pages, 2 figures, e-commerce, conference