As a genome describes the genetic content of an organism, a proteome defines the protein complement of the genome. Proteomics includes the identification of proteins in biological tissues, the characterisation of their physicochemical properties (complete sequence, post-translational modifications), and the description of their behaviour (function, expression level). After processing and modifications, a single gene may express between one and a few dozen different protein products; by extrapolation, the ~50,000 human genes could produce over 500,000 different proteins.
A combination of technologies is required to characterise a proteome fully. A standard procedure is two-dimensional gel electrophoresis (2-DE) as the separation method, followed by mass spectrometry (MS) analysis of the separated and enzymatically digested proteins. The peptide mass fingerprints typically obtained by MALDI-TOF MS are matched against sequence databases using dedicated bioinformatics tools.
The whole procedure can be automated and robotised for high throughput purposes. One aim of the programme will be to promote the development of new methods themselves, as well as projects that are application driven and using the new technologies as tools. For example, it is important to develop further high throughput techniques to separate efficiently and identify quickly a majority of proteins. More specific technologies will have to be used to identify proteins that have failed positive hits with the main approach and to characterise individual post-translational modifications which, while typically not deducible from gene sequence alone, carry important functional implications. The huge amount of data generated from both experimental results have to be stored in specific databases, which can then be searched for pattern-matching recognition, characterisation or functional relevance studies. It is important that the design of common and appropriate database formats and analysis tools allow easy access to data and comfortable data interpretation.
In the cell, proteins do not act in isolation, but usually form transitory or stable complexes in order to participate in pathways and act in networks. Protein-protein interactions thus constitute an essential aspect of the normal workings of the living cell and unravelling the various interactions in which individual proteins are involved constitutes an invaluable way of understanding protein function. Recently, Fromont-Racine et al. (Institut Pasteur, Paris) developed a high-throughput, genome-wide version of the yeast two-hybrid system to create Protein Interaction Maps (PIMs) for whole cells.
The automated generic version of Fromont-Racine’s procedure is rapidly becoming the method of choice for mapping whole proteomes. With a yeast cell mating procedure that increases screening efficiency, Fromont-Racine et al. used their complex yeast genomic library of 5 x 1000 000 clones to test 700 x 1000 000 interactions against 15 proteins. They identified and classified 170 potential interactors, including approximately 70 proteins of previously unknown function. More than 25% of the interactors are probably biologically relevant. The achievements of this group have opened the way to the systematic analysis of the protein interaction networks of the 6,000 open reading frames of the yeast proteome.
Another European team (Hybrigenics, Paris) has adapted the Fromont-Racine procedure to analyse a bacterial genome and has linked half of the 1600 proteins of the ulcer-provoking bacterium Helicobacter pylori into partial PIMs. Hybrigenics has developed the ‘PIM Rider’ tool to score interactions and visualize PIMs; it also identifies Selected Interacting Domains (SIDs) involved in the various protein-protein interactions listed in the database. Other technologies which address proteomics will also be included in the programme, such as the application of phage and ribosome display libraries and purification of complexes.