Downloads - ADEPTUS

File Name	Description	Size
Adeptus2.zip	All data, including raw expression data	4,468MB
Supervised_data_atleast_10000_genes_max_percet_na_10.RData	The final preprocessed supervised database, can be easily loaded into an R session. Contains: x – the expression profiles: a matrix of 37782 samples vs. 10081 genes y – the matrix of all labels: a binary matrix of 37337 samples vs. 216 labels (all samples appear as rows in x) sample2study, sample2terms (y as list of labels), sample2tissue_slim	2,291MB
sample2study.RData	A mapping of samples to their studies.	158KB
classification_performance_scores_summary.RData	Results of the leave-study-out SVM cross-validation. Contains a list called classifier_scores_matrices. It has several score matrices. In each one the rows are labels and the first column is the score of the SVM-based classifier. The sublist classifier2selected_diseases contains (as the first entry) the list of 68 well-classified labels (13 tissue controls and 55 disease-related).	16,652B
classification_results_40k_samples.RData	An R object with the leave-study-out SVM cross validation results – the actual predictions	242MB
gene_dataset_p_matrices.RData	A list with an entry for each label. Each entry has a p-value matrix of genes vs labels. Each p-value is a result of comparing the label’s samples to the other samples in that study.	228MB
gene_pb_roc_scores.RData	PB-ROC scores: a matrix of genes vs labels.	17,980KB
gene_pn_roc_scores.RData	PN-ROC scores: a matrix of genes vs labels.	14,501KB
gene_edge_based_son2rocs.RData	Results of the edge-based analysis: a list with an entry for each disease label. Each entry is a matrix of genes vs the parents of the label (typically a single parent)	14,971KB
selected_genes_adeptus2.RData	A list with the selected genes for each label	20,004KB
gpl_mappings_to_entrez.RData	A list that maps the probes in each microarray platform into Entrez gene ids.	9,383KB

File Name

Description

Size

Adeptus2.zip

All data, including raw expression data

4,468MB

Supervised_data_atleast_10000_genes_max_percet_na_10.RData

The final preprocessed supervised database, can be easily loaded into an R session. Contains:

x – the expression profiles: a matrix of 37782 samples vs. 10081 genes
y – the matrix of all labels: a binary matrix of 37337 samples vs. 216 labels (all samples appear as rows in x)
sample2study, sample2terms (y as list of labels), sample2tissue_slim

2,291MB

sample2study.RData

A mapping of samples to their studies.

158KB

classification_performance_scores_summary.RData

Results of the leave-study-out SVM cross-validation. Contains a list called classifier_scores_matrices. It has several score matrices. In each one the rows are labels and the first column is the score of the SVM-based classifier.
The sublist classifier2selected_diseases contains (as the first entry) the list of 68 well-classified labels (13 tissue controls and 55 disease-related).

16,652B

classification_results_40k_samples.RData

An R object with the leave-study-out SVM cross validation results – the actual predictions

242MB

gene_dataset_p_matrices.RData

A list with an entry for each label. Each entry has a p-value matrix of genes vs labels. Each p-value is a result of comparing the label’s samples to the other samples in that study.

228MB

gene_pb_roc_scores.RData

PB-ROC scores: a matrix of genes vs labels.

17,980KB

gene_pn_roc_scores.RData

PN-ROC scores: a matrix of genes vs labels.

14,501KB

gene_edge_based_son2rocs.RData

Results of the edge-based analysis: a list with an entry for each disease label. Each entry is a matrix of genes vs the parents of the label (typically a single parent)

14,971KB

selected_genes_adeptus2.RData

A list with the selected genes for each label

20,004KB

gpl_mappings_to_entrez.RData

A list that maps the probes in each microarray platform into Entrez gene ids.

9,383KB

See our github repository for the code

The R code requires the following packages: R Dependencies: e1071, PRROC, ROCR, pROC, hash, LiblineaR, gplots, VennDiagram R Dependencies (Bioconductor): CMA, preprocessCore, DO.db, limma, org.Hs.eg.db, IRanges, RBGL, BiocGenerics, gRbase, gRain, S4Vectors, GEOquery Optional R dependencies (for additional analyses that are implemented): RandomForest, ranger, bnlearn