$O_2$ is a multiple context-free grammar: an implementation-, formalisation-friendly proof
Abstract
Classifying formal languages according to the expressiveness of grammars able to generate them is a fundamental problem in computational linguistics and, therefore, in the theory of computation. Furthermore, such kind of analysis can give insight into the classification of abstract algebraic structure such as groups, for example through the correspondence given by the word problem. While many such classification problems remain open, others have been settled. Recently, it was proved that $n$-balanced languages (i.e., whose strings contain the same occurrences of letters $a_i$ and $A_i$ with $1\leq i \leq n$) can be generated by multiple context-free grammars (MCFGs), which are one of the several slight extensions of context free grammars added to the classical Chomsky hierarchy to make the mentioned classification more precise. This paper analyses the existing proofs from the computational and the proof-theoretical point of views, systematically studying whether each proof can lead to a verified (i.e., checked by a proof assistant) algorithm parsing balanced languages via MCFGs. We conclude that none of the existing proofs is realistically suitable against this practical goal, and proceed to provide a radically new, elementary, extremely short proof for the crucial case $n \leq 2$. A comparative analysis with respect to the existing proofs is finally performed to justify why the proposed proof is a substantial step towards concretely obtaining a verified parsing algorithm for $O_2$.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2024
- DOI:
- 10.48550/arXiv.2405.09396
- arXiv:
- arXiv:2405.09396
- Bibcode:
- 2024arXiv240509396C
- Keywords:
-
- Computer Science - Formal Languages and Automata Theory;
- Computer Science - Artificial Intelligence;
- Computer Science - Logic in Computer Science;
- Mathematics - Logic
- E-Print:
- dlt 2024