Assessing the best edit in perturbation-based iterative refinement algorithms to compute the median string
Strings are a natural representation of biological data such as DNA, RNA and protein sequences. The problem of finding a string that summarizes a set of sequences has direct application in relative compression algorithms for genome and proteome analysis, where reference sequences need to be chosen. Median strings have been used as representatives of a set of strings in different domains. However, several formulations of those problems are NP-Complete. Alternatively, heuristic approaches that iteratively refine an initial coarse solution by applying edit operations have been proposed. Recently, we investigated the selection of the optimal edit operations to speed up convergence without spoiling the quality of the approximated median string. We propose a novel algorithm that outperforms state of the art heuristic approximations to the median string in terms of convergence speed by estimating the effect of a perturbation in the minimization of the expressions that define the median strings. We present corpus of comparative experiments to validate these results.