One of the main targets of bioinformatics is to assign functions to proteins whose function is unknown relying on homologies identifications with proteins with known functions. Several approaches are currently available: the best choice depends on the evolutionary distance that separates the protein of interest from its homologous. Recently attention has been focused on molecular surfaces since they do not depend on the three-dimensional structure and allow similarities to be identified which other methods can’t identify. Furthermore, molecular surfaces are the interface of interaction between molecules, and their geometrical and physical descriptions will lead to the comprehension of the molecular recognition process, since the geometrical component has a fundamental role in the early stage of complex formation. This particular aspect would have a major impact in the field of drug design and in the understanding of the side effects due to interactions between proteins. During this thesis a protocol for similarities identification on molecular surfaces has been developed and optimized. In this process, molecular surfaces are calculated according to Lee Richard’s model, and then are represented through triangular meshes. Successively surfaces are transformed into a set of object oriented images using a computer vision approach. This type of representation has the advantage of being independent from the position of the objects represented, and thus similar surfaces can be described by similar images. The search for similarities is then performed by indentifying correspondences between pairs of similar images, by filtering matches relying on geometrical criteria and then by clustering correspondences in high similarity groups. These groups are then used to align surfaces in order to evaluate results both by visual inspection and through appropriate indexes. This process can be applied in the field of functional annotation, through the identification of similarities between surfaces of homologous proteins, and in study of interaction between proteins, through the identification of complementary areas between interacting proteins. The whole process of similarities detection depends on the configuration of 15 parameters that balance the time needed to perform calculation with the quality of results found. The problem of parameters estimation has been addressed using an implementation of genetic algorithm, which allowed representing different configuration parameters as a population in which individuals that are able to align surfaces satisfactory are rewarded with an high fitness score. The effectiveness of the algorithm was then improved by the introduction of neighbor heuristic which reduced the computational time required for correspondence clustering on surfaces. Particular interest was placed in results displaying and in the construction of indices that can quantify the quality of results. Regarding the visualization problem, a display system was implemented based on the Visualization ToolKit libraries in order to represent surfaces aligned as objects in three-dimensional space, enabling the user to interact with the scene represented by changing the point of view or enlarging details of the scene represented. Regarding the definition of useful indexes for results evaluation, two indexes had a fundamental role. The first one, called overlap index, measures the percentage of vertices of two surfaces that are closer than 1 A° after the alignment. This index in particular is useful for evaluating the surface similarity since similar aligned surfaces will have a large number of vertices closer than this distance. The second index, called RMSD, is important because it evaluates the Root Mean Square Deviation of alpha carbons of two aligned proteins in the case of a complementary search. This index allows evaluating how the aligned protein is distant from the correct position in the crystal complex. Concerning results evaluation, we have noticed that the consideration of electrostatic potential allows assigning good scores in case of strong geometrical similarity in context of functional annotations, thus facilitating the identification of homologous surfaces. This method has been validated both in the search of similarities and in the search of complementarities. Regarding the search of similarities, we tried to analyze a sample of 13 known proteins with a prosite domain in order to identify the presence of such domains on molecular surfaces. For doing this, we first reduced the number of structures present in the Protein Data Bank to a group of representative structures. Then we calculated the molecular surfaces for each representative protein and we created a dataset of patches corresponding to the prosite functional domain. The test was then performed trying to align the surface of the 13 known proteins to the patches dataset of functional domains. The results showed that in most cases we are able to properly align a functional domain to a protein surface with the same functional domain, and that these evidence was easily identifiable both by the parameters used for results evaluations, both by visually inspecting the results of the alignments. The method was then tested for complementary research, trying to reconstruct the protein-protein complex present in a well known dataset used to validate docking methods. In the case of searching for similarities it is important to describe surfaces in details in order to increase the accuracy, but high precision when searching for complementarity is counterproductive, since the interaction between proteins is not only determined by geometrical features but also involves the formation of favorable electrostatic interactions and rearrangements of side chains. Thus molecular surfaces were calculated using smoothed surfaces, where most details are lost but allowing to detect more easily interacting surfaces. Results showed that the algorithm is able to align complexes with comparable scores than the programs currently available; Considering this experimental design and that the method does not take into account the electrostatic potential, we can assume that the results obtained are particularly interesting since the proposed method provides a wider set of conformations than other algorithms, upon which we can extend the analysis in order to identify a better prediction. In conclusions the proposed system is able to identify similarities on molecular surfaces through the analysis of images of local description. The results show that the system implemented is effective in identifying similar surface areas in the context of functional annotation. In regards to the search for complementarities, the algorithm seems to have an interesting perspective, even though the best complex proposed is not always biologically correct. From this point of view, we have to do more analysis in order to improve the methods in protein interaction studies.

PROTEIN SURFACE SIMILARITIES EVALUATION FOR FUNCTIONAL ANNOTATION STUDIES / P.a. Cozzi ; direttore della scuola: Maria Luisa Villa ; tutore: Paola Comi; correlatore: Luciano Milanesi. Universita' degli Studi di Milano, 2010 Dec 09. 23. ciclo, Anno Accademico 2010. [10.13130/cozzi-paolo-alessandro_phd2010-12-09].

PROTEIN SURFACE SIMILARITIES EVALUATION FOR FUNCTIONAL ANNOTATION STUDIES

P.A. Cozzi
2010

Abstract

One of the main targets of bioinformatics is to assign functions to proteins whose function is unknown relying on homologies identifications with proteins with known functions. Several approaches are currently available: the best choice depends on the evolutionary distance that separates the protein of interest from its homologous. Recently attention has been focused on molecular surfaces since they do not depend on the three-dimensional structure and allow similarities to be identified which other methods can’t identify. Furthermore, molecular surfaces are the interface of interaction between molecules, and their geometrical and physical descriptions will lead to the comprehension of the molecular recognition process, since the geometrical component has a fundamental role in the early stage of complex formation. This particular aspect would have a major impact in the field of drug design and in the understanding of the side effects due to interactions between proteins. During this thesis a protocol for similarities identification on molecular surfaces has been developed and optimized. In this process, molecular surfaces are calculated according to Lee Richard’s model, and then are represented through triangular meshes. Successively surfaces are transformed into a set of object oriented images using a computer vision approach. This type of representation has the advantage of being independent from the position of the objects represented, and thus similar surfaces can be described by similar images. The search for similarities is then performed by indentifying correspondences between pairs of similar images, by filtering matches relying on geometrical criteria and then by clustering correspondences in high similarity groups. These groups are then used to align surfaces in order to evaluate results both by visual inspection and through appropriate indexes. This process can be applied in the field of functional annotation, through the identification of similarities between surfaces of homologous proteins, and in study of interaction between proteins, through the identification of complementary areas between interacting proteins. The whole process of similarities detection depends on the configuration of 15 parameters that balance the time needed to perform calculation with the quality of results found. The problem of parameters estimation has been addressed using an implementation of genetic algorithm, which allowed representing different configuration parameters as a population in which individuals that are able to align surfaces satisfactory are rewarded with an high fitness score. The effectiveness of the algorithm was then improved by the introduction of neighbor heuristic which reduced the computational time required for correspondence clustering on surfaces. Particular interest was placed in results displaying and in the construction of indices that can quantify the quality of results. Regarding the visualization problem, a display system was implemented based on the Visualization ToolKit libraries in order to represent surfaces aligned as objects in three-dimensional space, enabling the user to interact with the scene represented by changing the point of view or enlarging details of the scene represented. Regarding the definition of useful indexes for results evaluation, two indexes had a fundamental role. The first one, called overlap index, measures the percentage of vertices of two surfaces that are closer than 1 A° after the alignment. This index in particular is useful for evaluating the surface similarity since similar aligned surfaces will have a large number of vertices closer than this distance. The second index, called RMSD, is important because it evaluates the Root Mean Square Deviation of alpha carbons of two aligned proteins in the case of a complementary search. This index allows evaluating how the aligned protein is distant from the correct position in the crystal complex. Concerning results evaluation, we have noticed that the consideration of electrostatic potential allows assigning good scores in case of strong geometrical similarity in context of functional annotations, thus facilitating the identification of homologous surfaces. This method has been validated both in the search of similarities and in the search of complementarities. Regarding the search of similarities, we tried to analyze a sample of 13 known proteins with a prosite domain in order to identify the presence of such domains on molecular surfaces. For doing this, we first reduced the number of structures present in the Protein Data Bank to a group of representative structures. Then we calculated the molecular surfaces for each representative protein and we created a dataset of patches corresponding to the prosite functional domain. The test was then performed trying to align the surface of the 13 known proteins to the patches dataset of functional domains. The results showed that in most cases we are able to properly align a functional domain to a protein surface with the same functional domain, and that these evidence was easily identifiable both by the parameters used for results evaluations, both by visually inspecting the results of the alignments. The method was then tested for complementary research, trying to reconstruct the protein-protein complex present in a well known dataset used to validate docking methods. In the case of searching for similarities it is important to describe surfaces in details in order to increase the accuracy, but high precision when searching for complementarity is counterproductive, since the interaction between proteins is not only determined by geometrical features but also involves the formation of favorable electrostatic interactions and rearrangements of side chains. Thus molecular surfaces were calculated using smoothed surfaces, where most details are lost but allowing to detect more easily interacting surfaces. Results showed that the algorithm is able to align complexes with comparable scores than the programs currently available; Considering this experimental design and that the method does not take into account the electrostatic potential, we can assume that the results obtained are particularly interesting since the proposed method provides a wider set of conformations than other algorithms, upon which we can extend the analysis in order to identify a better prediction. In conclusions the proposed system is able to identify similarities on molecular surfaces through the analysis of images of local description. The results show that the system implemented is effective in identifying similar surface areas in the context of functional annotation. In regards to the search for complementarities, the algorithm seems to have an interesting perspective, even though the best complex proposed is not always biologically correct. From this point of view, we have to do more analysis in order to improve the methods in protein interaction studies.
9-dic-2010
Settore MED/46 - Scienze Tecniche di Medicina di Laboratorio
molecular surface ; molecular visualization ; surface similarities
COMI, PAOLA PIERA MARIA
VILLA, MARIA LUISA
Doctoral Thesis
PROTEIN SURFACE SIMILARITIES EVALUATION FOR FUNCTIONAL ANNOTATION STUDIES / P.a. Cozzi ; direttore della scuola: Maria Luisa Villa ; tutore: Paola Comi; correlatore: Luciano Milanesi. Universita' degli Studi di Milano, 2010 Dec 09. 23. ciclo, Anno Accademico 2010. [10.13130/cozzi-paolo-alessandro_phd2010-12-09].
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R07687.pdf

accesso aperto

Tipologia: Tesi di dottorato completa
Dimensione 2.48 MB
Formato Adobe PDF
2.48 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/150064
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact