Ns or subdomains [23,24]. Large-scale movements of groups of atoms complicate the
Ns or subdomains [23,24]. Large-scale movements of groups of atoms complicate the correct identification of structural equivalences between related proteins when rigid structural aligners are used. The molecular chaperon GroEL is an interesting case of protein molecules exhibiting pronounced molecular flexibility between structurally conserved domains. By comparison of crystal structures of different functional states, the GroEL molecule can be divided into three domains (equatorial, hinge and apical) separated by hinge regions [25]. Due to the large relative motion of the domains between different functional states, rigid body aligners will typically fail to align crystal structures of GroEL with different sequences in different conformational states. In recent years, tools for the flexible alignment of protein structures have been introduced. These tools find an equivalence map between the residues of two molecular structures even when substantial intramolecular movements occur around molecular hinges. The regions between hinge points are commonly considered as rigid bodies and the alignment is usually optimized to minimize the number of hinges. The group of ‘flexible aligners’ includes, FlexProt [26] and FATCAT [18] and their corresponding extensions to multiple alignment MultiProt [27] and POSA [28]. However, in alignments of molecules such as GroEL where the polypeptide chain folds back onto itself (Figure 1) and thereby creates structural domains in which parts of the polypeptide chain that are distant in sequence space engage into stable contact in three-dimensional space(e.g. for the equatorial domain of GroEL, see below), many of these aligners meet difficulties in recognizing the spatial continuity as will be illustrated below. Here we introduce a new algorithm for the flexible structural alignment of proteins Avermectin B1a web called RAPIDO (for Rapid Alignment of Proteins PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/28212752 in terms of Domains). RAPIDO is capable of aligning related protein molecules in the presence of large conformational differences while at the same time groups of equivalent parts of the polypeptide that are distant in sequence but nevertheless form spatially continuous domains are identified correctly as such. As a first step RAPIDO creates an equivalence map between the two structures by taking into account flexibility, with a procedure that is similar to the one used by FATCAT [18]. This step is followed by the application of a genetic algorithm [29] for the identification of structurally conserved regions that can be continuous in space but not in sequence (e.g. the equatorial domain of GroEL). The result of the procedure is a description of a protein in terms of structurally conserved regions connected by localized hinges or by flexible linker regions. We have chosen the standard parameter settings for RAPIDO such that more emphasis is placed on the geometric similarity of the structurally conserved regions (as reflected in low RMSDs) than on their size (as reflected in the length of the alignments). With this choice, the resulting structurally conserved regions will have a high level of similarity allowing their usage for robust coordinate-based structure superpositions. In the following, we describe the algorithm used and the application of RAPIDO to a number of test cases. For all test cases, RAPIDO produces results that are in agreement with previous analyses. Regions identified as structurally conserved furnish subsets of atoms whose relative positions between diff.