|
The GPMDB contains thousands of data sets contributed by researchers around the world.
Every week, we select a data set because of its technical excellence, biological interest
or simply because we think it is of general interest to the proteomics community.
|
Data set of the week: (2010/12/19) An Expanded Oct4 Interaction Network: Implications for Stem Cell Biology, Development, and Disease.
This study contains 7
LC/MS/MS runs, from pull-down experiments.
The manuscript describing this work was published by
Pardo M, Lang B, Yu L, Prosser H, Bradley A, Babu MM, and Choudhary J,
Cell Stem Cell. 2010 6:382-95
(PubMed).
This study contains very high-quality pull-down results that represent
rarely observed Mus musculus proteins and peptides. Unfortunately, the original
data was not made publicly available: only spectra that resulted in identifications were
stored in PRIDE. Hopefully the authors will make the original data available at some
point so that a more thorough analysis can be performed.
Nota bene: In looking through these results, some may notice that there was no
observation of a protein named "Oct4". This seemly odd fact was due to
the confusing nature of protein naming: "Oct4" is not a currently accepted name
for any mouse protein. The current name for that gene product is "Pou5f1" (POU domain, class 5, transcription factor 1),
corresponding to ENSMUSP00000025271.
Inspection of the current observations show clearly that this protein has been
over-represented in samples coming from mouse embryonic stem cells.
|
Data set of the week: (2010/12/12) Nucleosome-interacting proteins regulated by DNA and histone methylation.
This study contains 160
LC/MS/MS runs, grouped into sets of SDS-PAGE bands.
The manuscript describing this work was published by
Bartke T, Vermeulen M, Xhemalce B, Robson SC, Mann M, and Kouzarides T,
Cell 2010 143:470-84
(PubMed).
This work demonstrates the extent to which SILAC quantitation has become a main stream
technique in molecular biology. The study addresses a biologically important question,
uses an exellent lab to perform the proteomics instrumental analysis and applies
straightforward, established informatics methods to interpret the proteomics data in the context of
the biological question.
|
Data set of the week: (2010/12/05) Comparative shotgun proteomics using spectral count data and quasi-likelihood modeling.
This study contains 153
LC/MS/MS runs, grouped into sets of MudPit experiments. The analysis for each individual LC/MS/MS and
summaries of the MudPit runs were recorded.
The manuscript describing this work was published by
Li M, Gray W, Zhang H, Chung CH, Billheimer D, Yarbrough WG, Liebler DC, Shyr Y, and Slebos RJ,
J Proteome Res. 2010 9:4295-305
(PubMed).
While this set of data was generated for a specific statistical study, it also represents a very good
resource for anyone interested in the study of the bioinformatics and statistics of proteomics
experimental analysis. The tissues selected were of clinical interest (head and neck carcinomas), the
equipment was state-of-the-art and the experimental groups involved were first rate. Many data sets generated
for bioinformatics analysis are not really representative of current best laboratory practices, but this one genuinely
exceeds expectations.
|
Data set of the week: (2010/11/28) Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics.
This study contains 28
tissue sample data sets.
The manuscript describing this work was published by
Baerenfaller K, Grossmann J, Grobei MA, Hull R, Hirsch-Hoffmann M, Yalovsky S, Zimmermann P, Grossniklaus U, Gruissem W, and Baginsky S,
Science 2008 320:938-41
(PubMed).
This work is still probably the most comprehensive proteomics study of Arabidopsis thaliana tissues
available. Each of the individual samples corresponds to > 9,000 peptide identifications and > 1,000
non-redundant protein identifications. It can be used as a reliable catalogue of observable
peptides and proteins for the corresponding A. thaliana tissues and cell-culture samples.
|
Data set of the week: (2010/11/21) Prioritization of candidate protein biomarkers from an in vitro model system of breast tumor progression toward clinical verification.
This study contains 5
individual LC/MS/MS runs.
The manuscript describing this work was published by
Lau TY, Power KA, Dijon S, de Gardelle I, McDonnell S, Duffy MJ, Pennington SR, and Gallagher WM.,
J Proteome Res. 9(3):1450-9
(PubMed).
The data is a good example of what can be achieved using a QTOF-style instrument for analyzing
gel bands. The relatively good resolution obtained on the fragment ions makes peptide identifications more
positive (FDR ≈ 0.1%) and generally improves the confidence of the resulting protein identifications.
The approach used in the paper has some merit for determining the suitability of proteins
as biomarkers, although much of the comparitive work could have been done using
existing databases of observable plasma and serum proteins.
|
Data set of the week: (2010/11/14) Proteomic Analysis of Human Nail Plate.
This study contains 40
individual LC/MS/MS runs.
The manuscript describing this work was published by
Rice RH, Xia Y, Alvarado RJ, and Phinney BS,
J Proteome Res. 2010 Nov 1
(Epub ahead of print, PubMed).
The data investigates the proteins present in two common but sparsely investigated
human tissues: hair and nail plate. These non-cellular tissues are composed mainly of
high-sulphur (hard) keratins and keratin-associated proteins in different proportions. These
proteins are unusually abundant on Chromosome 17, with more than 60 genes clustered between
chromosome coordinates 38,810,917-39,780,829 (see the Human
Proteome Guide for the gene names, positions and frequency of observation).
|
Data set of the week: (2010/11/07) Proteomic screen defines the Polo-box domain interactome and identifies Rock2 as a Plk1 substrate.
This study contains 24
individual result sets derived from SDS-PAGE gel bands.
The manuscript describing this work was published by
Lowery DM, Clauser KR, Hjerrild M, Lim D, Alexander J, Kishi K, Ong SE, Gammeltoft S, Carr SA, and Yaffe MB
in EMBO J. 2007 26:2262-73
(PubMed).
This study demonstrates the power of protein affinity methods for enriching relatively rare, but biologically
important proteins. The result sets contain many of the best identifications observed for
proteins such as GRIPAP1, ROCK2, ANLN, EPB41L3, CLIP2 and the minichromosome maintenance complex. The methodology
used here was relatively simple, but it revealed an interesting, high quality interactome that will
take years of biological research to thoroughly investigate and understand.
|
Data set of the week: (2010/10/31) Genome analysis and genome-wide proteomics of Thermococcus gammatolerans, the most radioresistant organism known amongst the Archaea.
This study contains 7
individual result sets; each set is the union of all spectra collected from a single SDS-PAGE gel.
The manuscript describing this work was published by
Zivanovic Y, Armengaud J, Lagorce A, Leplat C, Guérin P, Dutertre M, Anthouard V, Forterre P, Wincker P, and Confalonieri F.
in Genome Biol. 2009;10(6):R70
(PubMed).
This study was a straightforward analysis of the proteome of a previously unexamined archaeon,
T. gammatolerans. What set this study apart was the level of competence displayed by the
research team in obtaining this data. The methodology used was straightforward, but they were
able to consistently generate spectra good enough so that ~50% of the spectra resulted in high quality identifications.
Generally, this type of strategy results in
high levels of human keratins 1, 2, 9 and 10 identified, but not in this case. The data corresponded to
>1000 T. gammatolerans proteins, with the largest of the individual gel sets having >60,000 identified peptides.
|
Data set of the week: (2010/10/24) Feasibility of large scale phosphoproteomics with HCD fragmentation.
This study contains 25
individual samples, contrasting two methods for phophopeptide detection.
The manuscript describing this work was published by
Nagaraj N, D'Souza RC, Cox J, Olsen JV, and Mann M
in J. Proteome Res. 2010 (Epub ahead of print,
PubMed).
This data set is a major game-changer for any group interested in high-throughput phosphopeptide detection.
The combination of HCD fragmentation with high accuracy parent and fragment ion
mass measurement described in the associated publication result a level of sequence
and PTM assignment accuracy that simply cannot be matched by the conventional
CID approach using a low accuracy LTQ for fragment ion analysis. It is
also clearly superior to ETD for high throughput phosphoproteomics: the physical
chemistry of ETD make it much better suited to the detailed characterization of difficult
cases rather than broad surveys of large mixtures.
|
Data set of the week: (2010/10/17) Coupled global and targeted proteomics of human embryonic stem cells during induced differentiation.
This data set contains 18
sample analyses.
The manuscript describing this work was published by
Yocum AK, Gratsch TE, Leff N, Strahler JR, Hunter CL, Walker AK, Michailidis G, Omenn GS, O'Shea KS, and Andrews PC
in Mol Cell Proteomics 2008 7:750-67
(PubMed).
This study utilizes MALDI TOF-TOF technology to provide an excellent survey of proteins
in embryonic stem cells. While MALDI has become a secondary ionization method compared
with electrospray, it still is a robust method for protein identification and it provides
the most reliable source for library spectra of singly charge peptide ions.
|
Data set of the week: (2010/10/10) Glycosylation signatures in Drosophila: fishing with lectins.
This data set contains 1
LC/MS/MS result.
The manuscript describing this work was published by
Vandenborre G, Van Damme EJ, Ghesquière B, Menschaert G, Hamshou M, Rao RN, Gevaert K, and Smagghe G.
in J Proteome Res. 2010 9:3235-42
(PubMed).
A carefully selected set of lectins was used to purify glycoproteins by affinity capture from
Drosophila melanogaster samples. The results show that this method was able to obtain
an unusually high quality set of identifications for proteins of this species, as demonstrated by the
very large fraction of "best ever" identifications for the proteins reported. The peptides
identified also show significantly more chymotryptic peptide cleavage than would be typical for such a study.
|
Data set of the week: (2010/10/03) Global analysis of lysine ubiquitination by ubiquitin remnant immunoaffinity profiling.
This data set contains 1
LC/MS/MS result.
The manuscript describing this work was published by
Xu G, Paige JS, and Jaffrey SR
in Nat Biotechnol. 2010 28:868-73
(PubMed).
This data was obtained from a very interesting study that describes the utility of an
immunoaffinity method for purifying the peptides generated by the trypsin digest of
proteins that have N-lysyl-ubiquitination. Trypsin cleaves away most of the ubiquitin bound
to the lysine sidechain, leaving a Gly-Gly sequence attached. By generating an antibody that
was specific for this type of modified lysine sidechain, they were able to isolate peptides
from ubiquitinated proteins. This purification allowed them to overcome the large concentration ratio between the modified
and unmodified proteins that has made identifying this type of modification difficult in the past.
The availability of this antibody should make many interesting
studies of the ubiquitin-mediated protein degradation pathway possible.
|
Data set of the week: (2010/09/26) The Asia Oceania Human Proteome Organisation Membrane Proteomics Initiative. Preparation and characterisation of the carbonate-washed membrane standard.
This data set contains 2
LC/MS/MS results.
The manuscript describing this work was published by
Peng L, Kapp EA, Fenyö D, Kwon MS, Jiang P, Wu S, Jiang Y, Aguilar MI, Ahmed N, Baker MS, Cai Z, Chen YJ, Van Chi P, Chung MC, He F, Len AC, Liao PC, Nakamura K, Ngai SM, Paik YK, Pan TL, Poon TC, Hosseini Salekdeh G,
Simpson RJ, Sirdeshmukh R, Srisomsap C, Svasti J, Tyan YC, Dreyer FS, McLauchlan D, Rawson P,
and Jordan TW.
in Proteomics. 2010 May 18
(PubMed).
This study, the results of a HUPO-affiliated AOHUPO project, demonstrate the effectiveness of
a standardized, relatively simple protocol for the enrichment of membrane proteins. A quick
inspection of the GO displays for
unwashed and
carbonate washed samples
proves this point very nicely. Many groups still seem to believe that membrane proteins are
difficult to observe using proteomics methods, so a straightforward study such as this one
demonstrating the contrary is a welcome addition to the field and an excellent subject for
a HUPO study.
|
Data set of the week: (2010/09/19) Insulin receptor substrate influences female caste development in honeybees.
This data set contains 23
LC/MS/MS results. The original data was obtained from Peptidome (Study PSE129).
The manuscript describing this work was published by
Wolschin F, Mutti NS, and Amdam GV.
in Biol Lett. 2011 Feb 23;7(1):112-5
(PubMed).
This study explores the insulin/insulin-like signalling (IIS) network in
honeybees. Apis mellifera is an economically important species with a complete genome
but which has recieved only limited attention from the proteomics community. Fortunately
bee proteomics scientists have been very active in contributing their data to public
repositories. Inspection of the
list of all A. mellifera proteins
in GPMDB shows that more than 2450 proteins have been observed and a surprising number of them have been
observed more than 500 times.
|
Data set of the week: (2010/09/12) Identification of pathways associated with invasive behavior by ovarian cancer cells using multidimensional protein identification technology (MudPIT).
This data set contains 252
LC/MS/MS results. The original data was obtained from TRANCHE.
The manuscript describing this work was published by
Sodek KL, Evangelou AI, Ignatchenko A, Agochiya M, Brown TJ, Ringuette MJ, Jurisica I, and Kislinger T.
in Mol Biosyst. 2008 4:762-73
(PubMed).
This study contains probably the best information set for the detailed exploration of proteomics as
a reproducible technology. Six different ovarian cancer cell lines were examined, each of which
is analyzed in six replicates, each replicate containing six SCX fractions. While this study was designed
to explore the differences between these cell lines, it also affords a truly useful collection of
data for anyone interested in proteomics
sample preparation reproducibility, measurement undersampling, search engine effectiveness, peak finding
efficacy or any other aspect of proteomics data generation and handling.
The GPM results are grouped according to cell line replicates, with each replicate having six entries
corresponding to the individual SCX fractions, followed by a summary result generated from those
six analyses. A description containing a statement like "Data directory: SKOV_5" indicates that
the result was obtained from replicate "5" of cell line "SKOV".
|
Data set of the week: (2010/09/05) A quantitative proteomics design for systematic identification of protease cleavage events.
This data set contains three (3)
COFRADIC analyses (COmbined FRActional DIagonal Chromatography).
The original manuscript describing this work was published by
Impens F, Colaert N, Helsens K, Ghesquiere B, Timmerman E, De Bock PJ, Chain BM, Vandekerckhove J, and Gevaert K
in Mol Cell Proteomics. 2010 Jul 13
(PubMed).
The study demonstrates a relatively straightforward method for determining the
cleavage specificity of proteolytic enzymes. The data analysis technique used in the
original paper is somewhat complex, but the more flexible modes of analysis available
in the GPM simplied the process considerably. Simple inspection of the AAA display allows
the assignment of the appropriate cleavage specificities for the enzymes:
- cathepsin D;
- cathepsin E; and
- caspase-3.
|
Data set of the week: (2010/08/29) Human Ccr4-Not complexes contain variable deadenylase subunits.
This data set contains nine (9)
LC/MS/MS analyses.
The original manuscript describing this work was published by
Lau NC, Kolkman A, van Schaik FM, Mulder KW, Pijnappel WW, Heck AJ, and Timmers HT.
in Biochem J. 2009 422:443-53
(PubMed).
The study contained eight (8) pulldown experiments and one (1) control. Each
pull-down is annotated with the bait protein. The experiment uses the combination of Lys-C and
bovine trypsin characteristic of the Heck group, which generates a rather
complete set of tryptic peptides, although there were a signficant number of
non-tryptic peptides generated. The sample preparation method used urea, so there
was also a significant number of carbamylated peptides detected. Neither of these artifacts
affect the conclusions of the study.
The study contains also contains a surprising number of
protein identifications that are the best so far obtained in GPMDB, e.g., TNKS1BP1, RAVER1,
FHL2, RQCD1, RNF219, UBAP2L, BAG3 as well as the bait CNOT proteins. Pull-down experiments,
with their ability to purify an unusual fraction of proteins, seem to be very effective
at obtaining the best observations of rare proteins, compared to large MudPit-style survey
experiments.
|
Data set of the week: (2010/08/22) Low abundance proteome of human red blood cells captured by combinatorial peptide libraries. Behavior of mono- to hexapeptides.
This data set contains 19
LC/MS/MS analyses.
The original manuscript describing this work was published by
Sim C, Bachi A, Cattaneo A, Guerrier L, Fortis F, Boschetti E, Podtelejnikov A, and Righetti PG.
in Anal Chem 2008 80:3547-56
(PubMed).
This study is an excellent example of a very important class of study: attempting to
use novel separation strategies to increase the dynamic range of tissue proteomics. The
particular strategy used in this case appears to work quite well at obtaining distributions
of proteins with limited specificity, while at the same time producing fractions depleted in high abundance
proteins. Technically, the data is also very quality and it contains an unusual number of
high confidence identifications of relatively small peptides (< 1000 Da).
|
Data set of the week: (2010/08/15) Quantitative analysis of kinase-proximal signaling in lipopolysaccharide-induced innate immune response.
This data set contains 73
LC/MS/MS data sets of obtained from mouse RAW 264.7 cells (macrophage cell line) that have been treated with lipopolysaccharide
to simulate infection with Gram-negative bacteria.
This data was published by
Sharma K, Kumar C, Kéri G, Breitkopf SB, Oppermann FS, and Daub H in J Proteome Res. 2010, 9:2539-49
(PubMed).
The goal of the paper was to follow TOLL-like receptor phospho-signaling during this sort of
simulated infection using SILAC: a combination of unlabelled and labelled samples with
two different isotopic tag pairs (K(4),R(6) and K(8)R(10)) were used to detect differential
protein and phosphopeptide concentrations.
In addition to the biological conclusions, this data contains some excellent examples of a common
analytical artifact associated with the use of titanium dioxide phosphopeptide enrichment. Metal oxide
columns work by binding peptides with low pIs (i.e., acidic peptides). While phosphopeptides certainly
fill the bill as being acidic relative to most peptides, normal peptide sequences with multiple acidic sidechains are
also strongly enriched by these columns. This effect can be clearly seen by using the pI vs. RT
and the amino acid analysis
displays. In example used here, most of the peptides detected have a pI < 5. Aspartic acid (D) and glutamic acid (E) residues in the detected peptides
are enriched to 250% and 215% of their expected composition, based on the composition of the associated proteins.
|
Data set of the week: (2010/08/08) Comparative proteome profiling of Mycobacterium tuberculosis: the response of drug-resistant and drug-sensitive stains.
This data set contains 6 (six)
MudPit data sets of two different strains of M. tuberculosis, A12998 (daughter strain, drug-resistant) and A7494 (parent strain, drug-sensitive).
This data was published via upload to Peptidome
as Study PSE133 by
Moo-Jin Suh, Rembert Pieper, and Shih-Ting Huang from the J. Craig Venter Institute.
From Peptidome: The study describes the analysis of proteins from Drug-resistant and
-sensitive strains of Mycboacterium tuberculosis. LC-MS-based proteomics approach was combined with APEX to quantitatively measure relative proteins abundance and to compare the cellular protein composition of
Mycobacterium tuberculosis strains A12998 (daughter strain, drug-resistant) and A7494 (parent strain, drug-sensitive).
The results are probably the most thorough analysis of proteins from this important pathogen and
they make up a large fraction of the Annotated Spectrum Libraries available from M. tuberculosis
strains.
An unexpected piece of information made available through this data set is a good initial
measurement of the phosphoproteome of this prokaryote. M. tuberculosis is known to have a serine/threonine
kinase and this data set has a number of very good phophopeptides generated by this kinase. The kinase
appears to prefer threonine phosphorylation, with a S:T ratio of about 1:3. This ratio is the reverse
of typical eukaryote kinases, which seem to prefer serine by about 3:1. The phosphoproteome
generated from this study is available in either
Excel,
html or
tab-separated text formats, as projected on to the proteome of
strain CDC1551. Note: the original analysis in Peptidome did not include phosphorylation, so these
results are only present in the GPMDB re-analysis. It would be very useful to
have an IMAC-type study done on these and other M. tuberculosis strains.
|
Data set of the week: (2010/08/01) In-depth proteomic analyses of direct expressed prostatic secretions.
This data set contains 9 (nine)
MudPit data sets, each measured from a different prostatic fluid sample from individuals with prostate cancer.
The original raw data was obtained from TRANCHE.
It was published by Drake RR, Elschenbroich S, Lopez-Perez O, Kim Y, Ignatchenko V, Ignatchenko A, Nyalwidhe JO, Basu G, Wilkins CE, Gjurich B, Lance RS, Semmes OJ, Medin JA, and Kislinger T. in
J Proteome Res. 2010, 9:2109-16 (PubMed).
The results show the amount of variability that can be expected when analyzing biological replicates
of clinically sampled material. The identifications were very high quality and are the best quality measurements of
many rather rare proteins, such as KLK3 (prostate-specific antigen) and ACPP (Prostatic acid phosphatase). The data
shows moderate levels of carbamylation from the urea solublization method used. There were also significant concentrations of
peptides generated by non-tryptic cleavage, probably from the presence of proteases in the sample itself as the cleavage sites were
not chymotryptic. An examination of the AAA page (e.g., sample #2)
showed that the "Pre" and "C-terminal" columns were broadly populated for most residues,
not just the K and R residues normally expected in a trypsin cleavage experiment.
Interestingly for a sample obtained from prostate secretions, no proteins originating from genes on the Y chromosome
were detected. This fact points out a general feature of proteomics: there does not seem to be any "common sense"
association between tissue-specific protein concentrations and chromosomes.
|
Data set of the week: (2010/07/25) Proteomic analysis of the secretome of human umbilical vein endothelial cells using a combination of free-flow electrophoresis and nanoflow LC-MS/MS.
This data set contains a single
LC/MS/MS data set, using a combination of free-flow electrophoresis and nanoflow HPLC separations.
The original raw data was made available as a Scaffold file from a web site maintained by the authors (www.vascular-proteomics.com).
It was published by Tunica DG, Yin X, Sidibe A, Stegemann C, Nissum M, Zeng L, Brunet M, and Mayr M in
Proteomics. 2009, 9:4991-6 (PubMed).
This study attempts to discover a difficult thing: the secretome of human umbilical vein endothelial cells in the face
of the background proteins in a complex growth medium. The
results provide a good basis for the examination of this important cell type, with a very good set of
identifications that provides a broad survey of the proteins that can be readily obtained
from these cells.
|
Data set of the week: (2010/07/18) Proteomics Analysis of the Causative Agent of Typhoid Fever.
This data set contains 313
LC/MS/MS runs using Thermo LTQ mass spectrometers.
The original raw files originally from the Resource Center for Biodefense Proteomics Research, which
has been superceded by the Pathogen Portal (raw data).
It was published by Ansong C, Yoon H, Norbeck AD, Gustin JK, McDermott JE, Mottaz HM, Rue J, Adkins JN, Heffron F, and Smith RD in
J Proteome Res. 2008, 7:546-57 (PubMed).
This very thorough data set is the primary large collection of information that has allowed for the
creation of the rather comprehensive annotated spectrum libraries that are now available for
S. enterica related species, including S. typhi and S. typhimurium. The Pacific Northwestern
National Laboratory group was an early proponent of making publicly-funded proteomics raw data widely available and
their efforts legitimized the idea for many other groups.
|
Data set of the week: (2010/07/11) Discovery of Anthrax Biomarkers Using Label-Free Quantitative Phosphoproteomics via Mass Spectrometry.
This data set contains 66 individual phosphopeptide enriched
LC/MS/MS runs made using a Thermo Orbitrap hybrid mass spectrometer.
The original raw files were transferred from TRANCHE.
The data was credited to Nathan P. Manes, Li Dong, Weidong Zhou, Xiuxia Du, Nikitha Reghu, Arjan C. Kool,
Dahan Choi, Charles L. Bailey, Emanuel F. Petricoin III, Lance A. Liotta, and Serguei G. Popov.
It was made available prior to publications, although some part of the data was presented at the 2010 ASMS conference.
The analyzed results are simply the best, most consistent set of phosphopeptide results that we have ever seen.
The combination of sample preparation, HPLC and mass spectrometry used by the authors has generated
what can only be considered a milestone in the application of phospho-proteomics technique to
real tissue samples.
|
Data set of the week: (2010/07/04) Quantitative proteomics combined with BAC TransgeneOmics reveals in vivo protein interactions.
This data set contains 61 individual experiments using
both SILAC and label-free quantitation. The experimental protocols used either trypsin or endo-LysC to digest
the proteins, depending on the type of protocol being used.
The original raw files were transferred from TRANCHE.
The data was published by Hubner NC, Bird AW, Cox J, Splettstoesser B, Bandilla P, Poser I, Hyman A, Mann M in
J Cell Biol. 2010 189:739-54 (PubMed).
The data was generated to demonstrate the utility of a new technique for protein quantitation
developed by the authors: "quantitative BAC-green fluorescent protein interactomics" (QUBIC). The
technique is meant to be applied to the quantitative study of protein-protein interactions, several of
which are demonstrated here. The technical quality of the MS/MS data is excellent, with many ids for individual proteins
in the top 10% of all GPMDB observations.
|
Data set of the week: (2010/06/27) mTAL Phosphoproteome Data.
This data set contains metal oxide enriched LC/MS/MS observations
of phosphopeptides from R. rattus medullary Thick Ascending Limb (mTAL) cells.
The raw files were transferred from TRANCHE.
The original analysis was reported by Ruwan Gunaratne, Guozhong Ma, Trairak Pisitkun, and Mark A. Knepper as part of the mTAL-PD
project. It appears to be closely related to the Collecting Duct
Phosphoproteome Database.
The phosphorylated domains obtained are interesting because there is surprisingly little publicly available data from
rat cell lines or tissue samples. The phosphopeptide enrichment here was somewhat less effective
than in some other studies, however overall it is quite typical of IMAC phosphopeptide enrichment studies. This
study has significantly added to the known phosphorylated domains for available R. rattus through GPMDB's pSYT interface.
Added 2010/09/08: This data has been published in "Quantitative phosphoproteomic analysis reveals cAMP/vasopressin-dependent signaling pathways in native renal thick ascending limb cells."
Proc Natl Acad Sci U S A. 2010 107:15653-8 (PubMed).
|
Data set of the week: (2010/06/20) Proteomic analysis of mouse brain microsomes: identification and bioinformatic characterization of endoplasmic reticulum proteins in the mammalian central nervous system.
This data set contains 1 2DLC MS/MS and 3 1DLC MS/MS runs obtained
from mouse brain microsomal preparations. The original data was transferred from TRANCHE.
The original data analysis was reported by Stevens SM Jr, Duncan RS, Koulen P, Prokai L. in J Proteome Res. 2008 7:1046-54.
(PubMed).
This data set is interesting in a number of ways. It shows the difference in the depth of analysis available using
of multi-dimensional chromatographic analysis versus simple, single separation HPLC. The three repetitions of the
1D LCMS approach give a good indication of the statistical variability that is to be expected caused by the
under-sampling inherent in this type of measurement. A Gene Ontology analysis of the data (e.g., GPM33080005862)
shows the complexity of real microsomal samples, compared to simply believing that they contain only membrane and membrane-associated proteins.
A similar study can be compared, showing
some significant differences in microsome proteome composition, which are most likely due to variations in the
sample preparation methods.
|
Data set of the week: (2010/06/13) The minor salivary gland proteome in Sjögren's syndrome.
This data set contains 2 LC-MS-MS runs obtained
from human salivary gland tissue. The original data was transferred from PRIDE entries 7962-3.
The data was reported by Hjelmervik TO, Jonsson R, Bolstad AI. in Oral Dis. 2009 15:342-53.
(PubMed).
The two sets of identifications are meant to show the differences in the protein compliment of
salivary glands caused by the autoimmune disease, Sjögren's syndrome.
Technically, the data is a good example of the use of a high resolution MS/MS device (ESI-QTOF, Ultima Global) applied to
tissue samples. The high accuracy fragment ion masses significantly improve the quality of the
identifications.
|
Data set of the week: (2010/06/06) Identification of Ricin and Concanavalin A-binding Trypanosoma brucei Glycoproteins.
This data set contains 1 data set obtained
from T. brucei. The original data was transferred from PRIDE 9223.
A portion of the data was report by Izquierdo L, Schulz BL, Rodrigues JA, Güther ML, Procter JB,
Barton GJ, Aebi M, Ferguson MA in EMBO J. 2009 28:2650-61
(PubMed).
The data was obtained by using the the lectins concanavalin A and ricin to pull down glycoproteins from
T. brucei (blood stream form) and then glycosidases were used to remove the N-linked glycosylation, leaving
a deamidated asparagine residue behind. Any deamidated N residue that was associated with the N-{P}-[ST] glycosylation
motif should be considered a potential N-linked glycosylation site. You can see just these peptides by
clicking here.
|
Data set of the week: (2010/05/30) Use of fluorescence-activated vesicle sorting for isolation of naked2-associated, basolaterally-targeted exocytic vesicles for proteomic analysis.
This data set contains 6 experiments obtained
from C. familiaris and it is probably the best single data set we have in GPMDB from the domestic dog proteome. This work was transferred from TRANCHE
and it was published by Cao Z, Li C, Higginbotham JN, Franklin JL, Tabb DL, Graves-Deal R, Hill S, Cheek K, Jerome WG, Lapierre LA, Goldenring JR, Ham AJ, Coffey RJ.
in Mol. Cell. Proteomics 2008, 7:1651-67 (PubMed).
The individual experiments show how well fairly straightforward proteomics techniques can perform
on vesicular membrane proteins. They also demonstrate of the type of comprehensive results that can be obtained using a proteome
sequence that is almost completely the result of genome annotation.
|
Data set of the week: (2010/05/23) A Global Protein Kinase and Phosphatase Interaction Network in Yeast.
This data set contains 450 pull-down experiments obtained
from S. cerevisiae. This work was transferred from TRANCHE
and it was published by Ashton Breitkreutz, Hyungwon Choi, Jeffrey R. Sharom, Lorrie Boucher, Victor Neduva, Brett Larsen, Zhen-Yuan Lin, Bobby-Joe Breitkreutz, Chris Stark, Guomin Liu, Jessica Ahn, Danielle Dewar-Darch, Teresa Reguly, Xiaojing Tang, Ricardo Almeida, Zhaohui Steve Qin, Tony Pawson,
Anne-Claude Gingras, Alexey I. Nesvizhskii, Mike Tyers Science 2010 328:1043-6.
Each of the individual results is annotated with the identity of the bait used in the pull-down
experiment. L-A and L-BC virus proteins are present in some of the pull-downs. The group did a remarkably
job at detecting phosphopeptides for a study that did not do any specific enrichment for these
peptides.
|
Data set of the week: (2010/05/16) Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast.
This data set contains 505 LC/MS/MS runs obtained
from S. cerevisiae diploid and haploid populations. This work was transferred from TRANCHE
and it was published in de Godoy LM, Olsen JV, Cox J, Nielsen ML, Hubner NC, Fröhlich F, Walther TC, Mann M.
Nature. 2008 455:1251-4. (PubMed).
The results give a good indication of the relative abundance and observability of yeast proteins in
both haploid and diploid cells using either trypsin or endopeptidase LysC to generate peptides and SILAC labels to provide relative quantitation.
The data also shows very good examples of the major proteins observable
from the double stranded DNA viruses L-A and L-BC that are almost ubiquitously present in yeast cell cultures. In some
cases, these proteins are very strongly observed (e.g. protein #3 in GPM77711001229)
and the SILAC labelling can used to estimate the relative amounts of virus present in the two cell types. To located the virus and virus-related proteins in any
of the individual runs, type "virus" into the Find box at the top of any model page (click here for an
example).
|
Data set of the week: (2010/05/09) Phosphoproteome analysis of Drosophila melanogaster embryo.
This data set contains 24 LC/MS/MS runs obtained
from D. melanogaster embryos. This work was transferred from TRANCHE
and it was published in Zhai B, Villén J, Beausoleil SA, Mintseris J, Gygi SP,
J Proteome Res. 2008 7:1675-82 (PubMed).
The assignments in this data set give a good overview of phosphorylation in D. melanogaster and they
are good examples of phosphopeptides identified using an Orbitrap-LTQ hybrid instrument with CID. The mapped
phosphorylation sites from this data set were a major contribution to the pSYT annotation now available for
the fruit fly. The predominance of yolk proteins and other larvae-specific proteins in the identified peptides
gives a good view of the phosphorylation patterns on proteins that may be under-represented or absent from
studies that use mature flies or cells from tissue culture.
|
Data set of the week: (2010/05/02) Activated Macrophage Proteomics
This data set contains 9 merged results obtained
from human macrophages under various conditions. This work was transferred from a TRANCHE project
of the same name, created and maintained by Maureen M. Goodenow, Dept. of Pathology, Immunology and Laboratory Medicine
University of Florida.
The experiments reported by Dr. Goodenow are proteomics survey studies of macrophages, in which the
proteomes of treated cells are separated by SDS-PAGE and the resulting gel is sliced into 15 pieces. The
proteins are then digested, the peptides extracted and run using LC/MS/MS. Each one of the entries in GPMDB correspond
to the merged results of the 15 bands. They are good examples of what can be done using gel-slicing experiments to
obtain proteomics information about a cell type. It is also an admirable example of valuable data being made available to the
general community by an individual investigator.
|
Data set of the week: (2010/04/25) Large-scale quantitative LC-MS/MS analysis of detergent-resistant membrane proteins from rat renal collecting duct.
This data set contains 78 LC/MS/MS runs obtained
from membrance enriched fractions of tissue samples from rat renal ducts. It was originally published by
Yu MJ, Pisitkun T, Wang G, Aranda JF, Gonzales PA, Tchapyjnikov D, Shen RF, Alonso MA, Knepper MA.
in Am J Physiol Cell Physiol. 2008 295:C661-78
(PubMed).
The data was transferred to GPMDB from TRANCHE.
This study demonstrates that it is possible to generate very good results from membrane proteins isolated from tissue, even
those that do not readily dissolve in detergent solutions, such as lipid raft proteins. GO analysis of the
resulting protein identifications shows very significant enrichments in proteins known to be either integral
membrane, membrane associated or part of the extracellular matrix.
|
Data set of the week: (2010/04/18) Targeted tandem affinity purification of PSD-95 recovers core postsynaptic complexes and schizophrenia susceptibility proteins.
This data set contains 70 LC/MS/MS runs obtained
using TAP-tag protein isolation, SDS-PAGE separation followed by tandem mass spectrometry. It was originally published by Fernández E, Collins MO, Uren RT, Kopanitsa MV, Komiyama NH, Croning MD, Zografos L, Armstrong JD, Choudhary JS, Grant SG. Mol Syst Biol. 2009;5:269
(PubMed).
The data corresponds to the PeptideAtlas accession PAe001454 and was transferred to GPMDB.
The results are a good demonstration of the depth and detail of a particular molecular system that can be
obtained by coupling TAP-tagging with protein and subsequent peptide separations. The use of multiple gel
slices allows a depth of proteome coverage that would be difficult to obtain using other techniques.
|
Data set of the week: (2010/04/11) Proteomics of mouse liver microsomes
This data set contains 9 LC/MS/MS runs obtained
using SDS-PAGE separation followed by tandem mass spectrometry. It was originally published by Zgoda VG, Moshkovskii SA, Ponomarenko EA,
Andreewski TV, Kopylov AT, Tikhonova OV, Melnik SA, Lisitsa AV, and Archakov AI in
Proteomics, 2009,9:4102-5 (PubMed).
The data corresponds to the PRIDE accessions 8848-8856 and was transferred to GPMDB.
This data set is an example of the isolation of a specific experimental fraction (mouse liver microsome
from the endoplasmic reticulum) that provides a good representation of proteins not commonly observed, in
this case the cytochrome P450 family of metabolic oxidases. The quality of the isolation can be easily seen
when viewed as either KEGG pathways
or GO cellular components.
|
Data set of the week: (2010/04/04) Use of shotgun proteomics for the identification, confirmation, and correction of C. elegans gene annotations
This data set contains 369 LC/MS/MS runs obtained
using a Thermo Finnigan LTQ instrument. It was originally published by Merrihew GE, Davis C, Ewing B, Williams G, Käll L, Frewen BE, Noble WS, Green P, Thomas JH, MacCoss MJ. in
Genome Res. 2008, 18:1660-9 (PubMed). The data
was obtained directly from the authors' web site
and it is not currently held in any of the other data sites.
The original analysis of this data set in the publication used the C. elegans WS150 proteome sequence and it was found to
indicate the presence of additional coding sequences. The analysis in GPMDB was performed using
the WS200 proteome (ENSEMBL v. 55), which has taken into account the original work. It serves
as a good example of the proteins that can be seen using conventional proteomics techniques in C. elegans.
|
Data set of the week: (2010/03/28) Global proteomic profiling of Shigella dysenteriae Sd1617
This data corresponds to Peptidome Study PSE140, comprised of samples PSM1302,
PSM1303 and
PSM1304
The data was obtained by Rembert Pieper, Srilatha Kuntumalla, Shih-Ting Huang at the J. Craig Venter Institute
and it was transferred from Peptidome.
Each of the samples is composed of 3 replicate multidimensional chromatography runs of soluble
proteins obtained from S. dysenteriae. The tandem mass spectra are good quality, obtained using a Thermo LTQ
instrument. The results give a good indication of the type of depth and reproducibility that
can be expected in this type of straight-forward analysis of soluble proteins from an
enterobacterial culture.
|
Data set of the week: (2010/03/21) Global Impact of Oncogenic Src on a Phosphotyrosine Proteome
The data is composed of 31 separate runs.
The data was obtained from a study published in J. Proteome Res., 2008, 7 (8), pp 3447–3460, by Weifeng Luo, Robbert J. Slebos, Salisha Hill, Ming Li, Jan Brbek, Ramars Amanchy, Raghothama Chaerkady, Akhilesh Pandey, Amy-Joan L. Ham and Steven K. Hanks
(DOI: 10.1021/pr800187n). This information was
transferred from TRANCHE.
The data investigates the impact of Src transformation of mouse cells by determining the tyrosine phosphorylation
differences between control and transformed cells. The data also demonstrates the utility of using multiple peptidases
to increase the coverage of peptides, compared to trypsin alone. The data is very high quality LTQ data and it is an excellent
reference work for what is to be expected when looking for mouse tyrosine phosphophorylation.
|
Data set of the week: (2010/03/14) Quantitative phosphoproteomic analysis reveals vasopressin V2-receptor-dependent signaling pathways in renal collecting duct cells.
The data is composed of 2 separate sets, corresponding to the Peptidome accession numbers PSM1275 and
PSM1276. The data was
obtained from a study published in Proc Natl Acad Sci U S A. 2010 Feb 23;107(8):3882-7, by Rinschen MM, Yu MJ, Wang G, Boja ES, Hoffert JD, Pisitkun T, and Knepper MA
(PubMed). This information was
transferred from TRANCHE. The data is of high quality, containing good identifications of serine and threonine phosphorylation sites
in M. musculus proteins and it is an excellent example of the use of SILAC to monitor the relative quantitation of
protein phosphorylation.
|
Data set of the week: (2010/03/07) Phosphorylation dynamics during early differentiation of human emrbyonic stem cells.
The data is composed of 12 individual LC/MS/MS runs
obtained from a study published in Cell Stem Cell, Volume 5, Issue 2, 214-226, 7 August 2009 by Van Hoof D, Muñoz J, Braam SR, Pinkse MW, Linding R, Heck AJ, Mummery CL, and Krijgsveld J.
(PubMed). This information was
transferred from TRANCHE. Each of these data sets is large and contain significant
numbers of phosphorylated peptides.
The experiments performed were to investigate how "pluripotent stem cells self-renew indefinitely and possess characteristic
protein-protein networks that remodel during differentiation. How this occurs is poorly understood.
Using quantitative mass spectrometry, the (phospho)proteome of human embryonic stem cells (hESCs) was analyzed
during differentiation induced by bone morphogenetic protein (BMP) and removal of hESC growth factors."
|
Data set of the week: (2010/02/28) A Lectin HPLC Method to Enrich Selectively-glycosylated Peptides from Complex Biological Samples.
The data is composed of 83 individual LC/MS/MS runs
obtained from a study published in J Vis Exp. 2009 Oct 1;(32). pii: 1398 by Johansen E, Schilling B, Lerch M, Niles RK, Liu H, Li B, Allen S, Hall SC, Witkowska HE, Regnier FE, Gibson BW, Fisher SJ, and Drake PM
(PubMed). This information was
transferred from TRANCHE.
Briefly, plasma was depleted of the fourteen most abundant proteins using a multiple affinity removal system.
Depleted plasma was trypsin-digested and separated into flow-through and bound
fractions by SNA or AAL HPLC. The fractions were treated with PNGaseF to remove N-linked glycans,
and analyzed by LC-MS/MS on a QStar Elite. There is an accompanying video
explaining the methods used.
|
Data set of the week: (2010/02/21) Quantitative chemical proteomics reveals mechanisms of action of clinical ABL kinase inhibitors.
The data is composed of 729 individual LC/MS/MS runs
obtained from a study published in Nature Biotechnology by Bantscheff M, Eberhard D, Abraham Y, Bastuck S, Boesche M, Hobson S, Mathieson T, Perrin J, Raida M, Rau C, Reader V, Sweetman G, Bauer A, Bouwmeester T, Hopf C, Kruse U, Neubauer G, Ramsden N, Rick J, Kuster B, and Drewes G.
(DOI: 10.1038/nbt1328). This information was
transferred from PRIDE (PRIDE accession numbers 2445-3178).
Labelling with iTRAQ is used for quantitative profiling of the consequences of the introductions of tge drugs imatinib (Gleevec),
dasatinib (Sprycel) and bosutinib in K562 cells confirms known targets including ABL and SRC family kinases.
|
Data set of the week: (2010/02/14) Cell-Specific Information Processing in Segregating Populations of Eph Receptor Ephrin-Expressing Cells.
This dataset was transfered to GPMDB from PRIDE.
The data is composed of 2 large LC/MS/MS runs
is from a study published in Science by Jørgensen C, Sherman A, Chen GI, Pasculescu A, Poliakov A, Hsiung M, Larsen B, Wilkinson DG, Linding R, and Pawson T
(DOI: 10.1126/science.1176615).
The data is from a set of quantitative mass spectrometric analyses of mixed populations of EphB2- and ephrin-B1–expressing cells that were labeled with different
isotopes revealed cell-specific tyrosine phosphorylation events. The data is of very high quality and it
has a very rich set of tyrosine phosphorylated peptides.
|
Data set of the week: (2010/02/07) The value of using multiple proteases for large-scale mass spectrometry-based proteomics.
This dataset was transfered to GPMDB from TRANCHE.
The data is composed of 15 LC/MS/MS runs
is from a study published in J. Proteome Research by Danielle L. Swaney, Craig D. Wenger and Joshua J. Coon
(DOI: 10.1021/pr900863u).
The data is from experiments in which an S. cerevisiae whole cell lysate was digested with one of five
enzymes (trypsin, LysC, ArgC, AspN, and GluC), in triplicate. The results clearly show that any of these
proteases can be used very effectively with standard proteomics equipment, giving very similar protein
identifications.
|
Data set of the week: (2010/01/31) Identifying blood biomarkers and physiological processes that distinguish humans with superior performance under psychological stress.
This dataset was transfered to GPMDB from PRIDE (Pride accessions 10075-10092).
The data (GPM77710000113-GPM77710000130)
is from a study published in PLoS One by Cooksey AM, Momen N, Stocker R, and Burgess SC
(PLoS One. 2009 Dec 18;4(12):e8371 PubMed).
The results show the plasma proteins that change in response to the Modular Egress Training psychological stress test, given
to a group of naval aviation students. The data was obtained using an LCQ DECA XP Plus and analyzed
using X! Hunter (annotated spectrum library searches).
|
Data set of the week: (2010/01/24) High quality catalog of proteotypic peptides from human heart
This dataset was transfered to GPMDB from the authors' web site, corresponding to the manuscript of the same name, Kline, KG, et al.,J Proteome Res. 2008 Nov;7(11):5055-61.
PubMed. This data is not currently available on other respositories.
The data consists of 96 LCMS runs
analyzed with a ThermoFinnigan LTQ mass spectrometer. It is a good example of the type of data that can be obtained from cardiac muscle using
multidimensional chromatography directly on tissue lysate.
|
Data set of the week: (2010/01/17) A Mitochondrial Protein Compendium Elucidates Complex I Disease Biology
This dataset was transfered to GPMDB from TRANCHE, corresponding to the manuscript of the same name, Pagliarini, DJ, et al., Cell 134:112-123
doi:10.1016/j.cell.2008.06.016.
The data consists of 26 individual
data sets, composed of replicates of mitochondrial proteins obtained from a variety of
mouse tissues (cerebellum, cerebrum, brainstem, spinal cord, kidney, liver, heart, skeletal muscle, testis and placenta). It is a good example of high quality proteomics data, obtained using a
Thermo-Finnigan Orbitrap hybrid mass spectrometer.
|
Data set of the week: (2010/01/10) Comparative analysis of the human and mouse placental transcriptome and proteome
This dataset was transfered to GPMDB from Peptidome, from the Peptidome entries PSM1063 (mouse) and
and PSM1064 (human).
The cells in the tissue were separated from extracellular proteins and various subcellular
fractions were analyzed separately. The data was originally published in Cox B, et al., Mol Syst Biol 2009;5:279. PMID: 19536202.
Note: the Peptidome entry misidentifies the mass spectrometry platform as being an "TRAP-FTMS" while it is actually a Thermo-Finnigan LTQ (with no additional hybrid component).
|
Data set of the week: (2010/01/03) Large-scale phosphorylation analysis of mouse liver
This dataset was transfered to GPMDB from TRANCHE and it is not currently held in any other repository (see data).
It is credited to Villén J, Beausoleil SA, Gerber SA, and Gygi SP, and it is described in Proc Natl Acad Sci U S A. 2007 Jan 30;104(5):1488-93.
This data set is a good example of the quality of phosphorylation data that can be
obtained using SCX separation of a tissue extract, followed by IMAC phosphopeptide
enrichment of each fraction, when using an LTQ-Orbitrap mass spectrometer. The data view that
is obtained from the link above shows all of the detected phosphopeptides, with a peptide
false positive rate of ~ 0.14%, i.e., about 10 times more stringent than the analysis
in the original paper.
|
Data set of the week: (2009/12/28) Community proteogenomics reveals insights into the physiology of phyllosphere bacteria
This dataset was transfered to GPMDB from PRIDE (see data).
It is credited to Delmotte N, et al. and it is described in Proc Natl Acad Sci U S A. 2009 Sep 22;106(38):16428-33.
Data-set-of-the-week is a new feature for GPMDB, started with the intent of highlighting
high quality data sets that have been made available via GPMDB and other proteomics repositories. Data
sets will be selected by a panel, but any suggestions (email to dsotw@thegpm.org) of suitable data will be
considered.
|
Copyright © 2010, The Global Proteome Machine Organization
|
|