Improving Address Matching using Siamese Transformer Networks
Abstract
Matching addresses is a critical task for companies and post offices involved in the processing and delivery of packages. The ramifications of incorrectly delivering a package to the wrong recipient are numerous, ranging from harm to the company's reputation to economic and environmental costs. This research introduces a deep learning-based model designed to increase the efficiency of address matching for Portuguese addresses. The model comprises two parts: (i) a bi-encoder, which is fine-tuned to create meaningful embeddings of Portuguese postal addresses, utilized to retrieve the top 10 likely matches of the un-normalized target address from a normalized database, and (ii) a cross-encoder, which is fine-tuned to accurately rerank the 10 addresses obtained by the bi-encoder. The model has been tested on a real-case scenario of Portuguese addresses and exhibits a high degree of accuracy, exceeding 95% at the door level. When utilized with GPU computations, the inference speed is about 4.5 times quicker than other traditional approaches such as BM25. An implementation of this system in a real-world scenario would substantially increase the effectiveness of the distribution process. Such an implementation is currently under investigation.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2023
- DOI:
- 10.48550/arXiv.2307.02300
- arXiv:
- arXiv:2307.02300
- Bibcode:
- 2023arXiv230702300D
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Information Retrieval;
- I.2
- E-Print:
- To be published in the 22nd EPIA Conference on Artificial Intelligence, EPIA 2023, Faial Island - Azores, Portugal, 5-8 September 2023, Proceedings