Optimally Convergent Autonomous and Decentralized Tasking with Empirical Validation

Optimally Convergent Autonomous and Decentralized Tasking with Empirical Validation

The SDA mission unitesa vast array of sensing tools to detect and characterize the local space environment. It is critical that assets operate optimally and in real time with respect to the goals of decision-makers. In practice, this is a challenging problem combining a combinatoric decision space and extensive decentralized architectures. Additionally, sensing assets traditionally have not been applied in a coordinated manner and are not often optimized for emerging missions, such as the evolving interest in cislunar space. As discussed in a recent memorandum of understanding between NASA and the USSF [1], "current capabilities and architecture are limited by technologies and an architecture designed for a legacy mission." There is thus a clear need for decision-making methodologies that are autonomous, scalable, and near-optimal. This paper presents, to the knowledge of the authors, the first example of a methodology developed to solve the fully decentralized SDA sensor tasking problem. A clear path toward decentralized goals is characterizing the many-agent sensor tasking problem as a sequential decision making problem. Markov Decision Processes (MDP) and Markov Games are a useful means for describing sensor goals of maximizing some reward [2]. With this problem definition, the vast array of MDP and reinforcement learning literature may be considered for multi-agent reinforcement learning (MARL) applications. In the MARL literature, methodologies are often characterized as cooperative, where agents work toward common or aligned goals, competitive, where agents lie in direct or indirect opposition, or a mixed scenario, in which both collaborative and adversarial goals exist [3]. Many recent methods [4-6]also apply concepts from Monte Carlo Tree Search, a heuristic search methodology demonstrated to be state of the art for general games [7.] These methods are scalable and decentralized, but an additional goal is for such methods to be communication-efficient. This goal is of utmost importance when assets are isolated or at risk, and further, methodologies that are robust to denial of communication are also desired. This paper presents research that covers this gap in the literature, illustrating an optimally convergent decentralized decision making algorithm that applies robust communication between agents over a random graph [8]. Using concepts of regular random digraphs, we present requirements that guarantee agents may communicate to arbitrary probability, even over denial of lines of communication. This concept is described mathematically using notions of digraph strong connectivity and strong k-connectivity. Probabilistically inferring the diameter of the resultant graph, weestablish upper bounds on communication timesbetween any two agents. With further assumptions on the impact of discrete communication to locally optimal actions, we perform asymptotic analysis of the resultant tree search algorithm. Both theoretic and numerical convergence toward optima are demonstrated. In addition to a theoretic contribution, we further present the application of the developed methodologies to two test cases. First, the tasking techniques are demonstrated using the Vision, Autonomy and Decision Research (VADeR) observatory at The University of Colorado at Boulder. Autonomous tasking is performed for a 0.7 meter f/6.5 observing telescope and an array of four coaligned 0.2 meter f/3 search telescopes over the course of spring and summer 2023. Additionally, a simulation is considered combining a ground based observer with GEO and XGEO space-based sensors for cislunar space object tracking. This simulation is robust to the loss of the ground-based observer for a subset of the test case, demonstrating that space-based sensors may operate autonomously in a near-optimal manner using the developed methodology. The presented results will demonstrate the methodology, but the true impact of this research lies in the scalability to many-agent and distributed agent problems. We present a methodology that does not increase in computational complexity during search as more agents are added to the problem. Significant relative gains are made in communication as further agents are introduced, and the methodology is robust to the isolation of sensing assets.As such, there is great potential for application of this research to the existing portfolio of commercial and governmental sensing assets.

References: [1]Memorandum of Understanding between the National Aeronautics and Space Administration and the United States Space Force, NASA.gov, September 21, 2020, accessed February 27, 2023 [2]Hu, J., and Wellman, M. P., "Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm," ICML, Vol. 98, 1198, pp. 242-250. doi:10.1007/bf01769133. [3] Zhang, K., Yang, Z., and Basar, T., "Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms," Studies in Systems, Decision and Control, Vol. 325, 2021, pp. 321-384. doi:10.1007/978-3-030-60990-0_12. [4] Amato, C., and Oliehoek, F. A., "Scalable planning and learning for multiagent POMDPs," Proceedings of the National Conference on Artificial Intelligence, Vol. 3, 2015, pp. 1995-2002. doi:10.1609/aaai.v29i1.9439. [5] Best, G., Cliff, O. M., Patten, T., Mettu, R. R., and Fitch, R., "Dec-MCTS: Decentralized planning for multi-robot active perception," International Journal of Robotics Research, Vol. 38, No. 2-3, 2019, pp. 316-337. doi:10.1177/0278364918755924. [6] Fedeler, S., Holzinger, M., and Whitacre, W., "Sensor tasking in the cislunar regime using Monte Carlo Tree Search," Advances in Space Research, , No. xxxx, 2022, pp. 1-19. doi:10.1016/j.asr.2022.05.003, URL https://doi.org/10.1016/j.asr. 2022.05.003. [7] Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., and Hassabis, D., "A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play," Science, Vol. 362, No. 6419, 2018, pp. 1140-1144. doi:10.1126/science.aar6404. [8] Frieze, A., and Karonski, M., "Introduction to Random Graphs," Introduction to Random Graphs, 2015. doi:10.1017/ cbo9781316339831.

Publication:

Proceedings of the Advanced Maui Optical and Space Surveillance (AMOS) Technologies Conference

Pub Date:

September 2023

Bibcode:

2023amos.conf....4F

Keywords:

optimal sensor tasking;
decentralized decision making;
cislunar SSA

NASA/ADS

Optimally Convergent Autonomous and Decentralized Tasking with Empirical Validation

Abstract