NAguideR

Calculating......

Step 1: Upload Original Data


1. Expression data:


2. Samples information data:


1. Expression data:


2. Samples information data:

Step 2: NA Overview








Step 3: Missing value imputation. All methods have been classified based on their algorithm, please select the imputation methods you want (by default, fast methods are chosen in each category), then click the 'Calculate' button.


A. Single value approaches

Method 1: Zero

Method 2: Minimum

Method 3: Column median (colmedian)

Method 4: Row median (rowmedian)

Method 5: Deterministic minimal value (mindet)

Method 6: Stochastic minimal value (minprob)

Method 7: Perseus imputation (PI)

B. Global structure approaches

Method 8: Singular value decomposition (svd)

Method 9: Maximum likelihood estimation (mle)

Method 10: Sequential imputation (impseq)

Method 11: Robust sequential imputation (impseqrob)

Method 12: Bayesian principal component analysis (bpca)

C. Local similarity approaches

Method 13: K-nearest neighbor (knn)

Method 14: Sequential knn (seq-knn)

Method 15: Quantile regression (qr)

Method 16: Local least squares (lls)

Method 17: Glmnet Ridge Regression (GRR)

Method 18: Multiple imputation bayesian linear regression (mice-norm)

Method 19: Truncation knn (trknn)

Method 20: Iterative robust model (irm)

Method 21: Generalized Mass Spectrum (GMS)

Method 22: Multiple imputation classification and regression trees (mice-cart)

Method 23: Random forest model (rf)

Step 4: Results and Assessments

1. Parameters for 'Results'

1.1. Select one method:

2. Parameters for 'Criteria'




1. Comprehensive ranks under classic criteria:

Download

2. Normalized root mean squared Error (NRMSE):

Download

3. NRMSE-based sum of ranks (SOR):

Download

4. Procrustes sum of squared errors (PSS):

Download

5. Average correlation coefficient between original value and imputed value (ACC_OI):

Download

6. Figures:

Download


1. Average correlation coefficient between peptides with different charges (ACC_Charge):

Download

2. Histogram of ACC_Charge:

Download

1. Comprehensive ranks under proteomic criteria:

Download

2. Average correlation coefficient between peptides in a same protein (ACC_PepProt):

Download

3. Average correlation coefficient between protein complexes (ACC_CORUM):

Download

4. Average correlation coefficient between protein complexes (ACC_PPI):

Download

5. Histogram of ACC_PepProt:

Download

6. Histogram of ACC_CORUM:

Download

7. Histogram of ACC_PPI:

Download

1. Comprehensive ranks under proteomic criteria:

Download

2. Average correlation coefficient between peptides with different charges (ACC_Charge):

Download

3. Average correlation coefficient between peptides in a same protein (ACC_PepProt):

Download

4. Average correlation coefficient between protein complexes (ACC_CORUM):

Download

5. Average correlation coefficient between protein complexes (ACC_PPI):

Download

6. Figures:

Download

1. Comprehensive ranks under proteomic criteria:

Download

2. Average correlation coefficient between protein complexes (ACC_CORUM):

Download

3. Average correlation coefficient between protein complexes (ACC_PPI):

Download

4. Histogram of ACC_CORUM:

Download

5. Histogram of ACC_PPI:

Download

There are no assessment here!


Please note: This function is optinal and designed for many biologists with specific experimental aims, for example, some users may want to check a particular peptide/protein (i.e. spiked-in standard peptides, proteins, or known housekeeping proteins like beta-actin, etc.) before and after imputation.


1.1 Abstract
Mass-spectrometry (MS) based quantitative proteomics experiments frequently generate data with missing values, which may profoundly affect downstream analyses. A wide variety of missing value imputation methods have been established to deal with the missing-value issue. To date, however, there is a scarcity of efficient, systematic, and easy-to-handle tools that are tailored for proteomics community. Herein, we developed a user-friendly and powerful web tool, NAguideR, to enable implementation and evaluation of different missing value methods offered by twenty popular missing-value imputation algorithms. Evaluation of data imputation results can be performed through classic computational criteria and, unprecedentedly, proteomic empirical criteria such as quantitative consistency between different charge-states of the same peptide, different peptides belonging to the same proteins, and individual proteins participating functional protein complexes. We applied NAguideR into three label-free proteomic datasets featuring peptide-level, protein-level, and phosphoproteomic variables respectively, all generated by data independent mass spectrometry (DIA-MS) with substantial biological replicates. The results indicate that NAguideR is able to discriminate the optimal imputation methods that are facilitating DIA-MS experiments over those sub-optimal and low-performance algorithms. NAguideR web-tool further provides downloadable tables and figures supporting flexible data analysis and interpretation. The flowchart below summarizes the process of data analysis in NAguideR.
1.2 What NAguideR exactly does in each step ?
As described above, there are four main steps in the data analysis process of NAguideR: (1) Upload of proteomics data; (2) Data quality control; (3) Missing value imputation; (4) Performance evaluation. However, many users care about the detailed operation in each step. The figure below shows the major steps of the data analysis process in NAguideR. We take two groups of samples (five biological replicates in each group, labeled A1, A2, A3, A4, A5, B1, B2, B3, B4, B5 in the original intensity data) for example. Feature means the identified proteins/peptides.
2.1 Input data preparation
NAguideR supports four basic file formats (.csv, .txt, .xlsx, .xls). Before analysis, users should prepare two required data: (1) Proteomics expression data and (2) Sample information data. The data required here could be readily generated based on results of several popular tools such as MaxQuant, PEAKS, Spectronaut, and so on. Then can upload the two data into NAguideR with right formats respectively and start subsequent analysis.
2.1.1 Proteomics expression data
There are four types of proteomics expression data supported in NAguideR ('Peptides+Charges+Proteins', 'Peptides+Charges', 'Peptides+Proteins', 'Proteins'), among which the main differences are the first few columns. In addition, users may upload other kinds of omics data (i.e. Genomics, Metabolomics), they can choose the fifth type ('Others'), please note, the fifth type can not generate the results based on those protomic criteria.
2.1.1.1 Expression data with peptide sequences, peptide charge states, and protein ids
In this situation, peptide sequences, peptide charge states, and protein ids are sequentially provided in the first three columns of input file. Peptide sequences in the first column can be peptides with any post-translational modification (PTM, written in any routine format) or stripped peptides (without PTM). The second column is peptide charge states. The protein ids in the third column should be UniProt ids. From the fourth column, peptides/proteins expression intensity or signal abundance in every sample should be listed. The data structure is shown as below:
2.1.1.2 Expression data with peptide sequences and peptide charge states
Similar to the above situation, peptide sequences and peptide charge states are sequentially provided in the first two columns of input file. Peptide sequences in the first column can be peptides with post-translational modification (PTM) or stripped peptides (without PTM). The second column is peptide charge states. From the third column, peptides/proteins expression intensity or signal abundance in every sample should be listed. The data structure is shown as below:
2.1.1.3 Expression data with peptide sequences, and protein ids
Under this circumstance, peptide sequences, and protein ids are sequentially provided in the first two columns of input file. Peptide sequences in the first column can be peptides with post-translational modification (PTM) or stripped peptides (without PTM). The protein ids in the second column should be UniProt ids. From the third column, peptides/proteins expression intensity or signal abundance in every sample should be listed. The data structure is shown as below:
2.1.1.4 Expression data with protein ids
In this situation, protein ids are provided in the first columns of input file. The protein ids here should be UniProt ids. From the second column,peptides/proteins expression intensity or signal abundance in every sample should be listed. The data structure is shown as below:
2.1.1.5 Other kinds of omics data
If users want to use NAguideR for other omics data (i.e. genomics, metabolomics), gene/metabolite ids/names should be provided in the first columns of input file. From the second column, genes/metabolites expression intensity or signal abundance in every sample should be listed. The data structure may be shown as below:
2.1.2 Sample information data
Sample information here means that users should provide sample group identity information. This information could e.g., enable filtration strategy for different group respectively in the quality control step. The sample names are in the first column and their orders are same as those in the expression data. Group information is in the second column. The data structure is shown as below:
2.2 Operating Procedure of NAguideR (Four steps)
Step 1. Uploading proteomics expression data
When preparing required data, users can click 'Import data' and upload their own data in the left panel:
If users want to check the example data first, they can choose 'Load example data' and download these example data by clicking relative button:
Step 2. Data quality control
After uploading the right data, users can click 'NA Overview'. In this part, users can check the NA distribution in their data ('NA distribution' part) and those proteins/peptides with excessively high proportion of NA and large coefficient of variation (CV) will be removed ('Filter' part). After setting suitable parameters, just click the 'Calculate' button.
Step 3. Missing value imputation
After data quality control, users can click 'Methods'. In this step, users should select the imputation methods first. With regard to the running time, we set these fast methods (left part, 15 methods) chosen by default. If users choose those slow methods (right part, 5 methods), that means the running time will be longer.
After selecting suitable methods, users need to click 'Calculate' button, and a popup window will be jumped out to show the selected methods, then click 'OK' button and continue:
Step 4. Performance evaluation
Click 'Results and Assessments'. In this step, based on the methods chosen above, the data with NA will be imputed and shown in the 'Results' panel, then the results will be evaluated under the four classic criteria and the four proteomic criteria, shown as below:
The tables and figures are provided here under the four classic criteria. 1. This table shows the comprehensive ranks of every imputation method; 2-5, the tables show the scores of every imputation method based on 'Normalized root mean squared Error (NRMSE)', 'NRMSE-based sum of ranks (SOR)', 'Procrustes sum of squared errors (PSS)', and 'Average correlation coefficient between original value and imputed value (ACC_OI)', respectively; 6. Figures here show the normalized scores of every imputation method under the four classic criteria. 'Normalized Values' here means every score divides by corresponding max value.
The tables and figures are provided here under the four proteomic criteria. 1. This table shows the comprehensive ranks of every imputation method; 2-5, the tables show the scores of every imputation method based on 'Average correlation coefficient between peptides with different charges (ACC_Charge)', 'Average correlation coefficient between peptides in a same protein (ACC_PepProt)', 'Average correlation coefficient between protein complexes (ACC_CORUM)', 'Average correlation coefficient between protein complexes (ACC_PPI)', respectively; 6. Figures here show the correlation coefficient distribution of the original values and the imputed values from every imputation method under the four proteomic criteria.
If you have any questions, comments or suggestions about NAguideR, please feel free to contact: wsslearning@omicsolution.com. We really appreciate that you use NAguideR, and your suggestions should be valuable to its improvement in the future.