Ethosight: A Reasoning-Guided Iterative Learning System for Nuanced Perception based on Joint-Embedding & Contextual Label Affinity
Abstract
Traditional computer vision models often necessitate extensive data acquisition, annotation, and validation. These models frequently struggle in real-world applications, resulting in high false positive and negative rates, and exhibit poor adaptability to new scenarios, often requiring costly retraining. To address these issues, we present Ethosight, a flexible and adaptable zero-shot video analytics system. Ethosight begins from a clean slate based on user-defined video analytics, specified through natural language or keywords, and leverages joint embedding models and reasoning mechanisms informed by ontologies such as WordNet and ConceptNet. Ethosight operates effectively on low-cost edge devices and supports enhanced runtime adaptation, thereby offering a new approach to continuous learning without catastrophic forgetting. We provide empirical validation of Ethosight's promising effectiveness across diverse and complex use cases, while highlighting areas for further improvement. A significant contribution of this work is the release of all source code and datasets to enable full reproducibility and to foster further innovation in both the research and commercial domains.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2023
- DOI:
- 10.48550/arXiv.2307.10577
- arXiv:
- arXiv:2307.10577
- Bibcode:
- 2023arXiv230710577L
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Artificial Intelligence