Statistical Analysis and Data Mining

Papers
(The TQCC of Statistical Analysis and Data Mining is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-05-01 to 2024-05-01.)
ArticleCitations
Optimal ratio for data splitting187
GRATIS: GeneRAting TIme Series with diverse and controllable characteristics62
Generalized mixed‐effects random forest: A flexible approach to predict university student dropout24
Unsupervised random forests18
Supervised compression of big data17
Multiclass machine learning classification of functional brain images for Parkinson's disease stage prediction16
Modal linear regression models with multiplicative distortion measurement errors13
Imbalanced classification: A paradigm‐based review13
Data Twinning13
A linear time method for the detection of collective and point anomalies11
Two‐stage hybrid learning techniques for bankruptcy prediction*11
Fourier neural networks as function approximators and differential equation solvers11
Exponential calibration for correlation coefficient with additive distortion measurement errors9
An efficient k‐modes algorithm for clustering categorical datasets8
Multivariate Hidden Markov Models for disease progression8
Weighted k‐nearest neighbor based data complexity metrics for imbalanced datasets8
A comparison of Gaussian processes and neural networks for computer model emulation and calibration8
Handwriting identification using random forests and score‐based likelihood ratios7
The fairness‐accuracy Pareto front7
Power grid frequency prediction using spatiotemporal modeling7
A clustering method for graphical handwriting components and statistical writership analysis7
Weighted pivot coordinates for partial least squares‐based marker discovery in high‐throughput compositional data7
Measure inducing classification and regression trees for functional data7
Trees, forests, chickens, and eggs: when and why to prune trees in a random forest6
Markov chain to analyze web usability of a university website using eye tracking data6
MR plot: A big data tool for distinguishing distributions6
An analytical toast to wine: Using stacked generalization to predict wine preference5
An adaptive nonparametric exponentially weighted moving average control chart with dynamic sampling intervals5
Visual diagnostics of an explainer model: Tools for the assessment of LIME explanations5
A framework for stability‐based module detection in correlation graphs5
Use of data mining in a two‐step process of profiling student preferences in relation to the enhancement of English as a foreign language teaching5
A tutorial on generative adversarial networks with application to classification of imbalanced data5
Extreme ensemble of extreme learning machines4
A fast and efficient Modal EM algorithm for Gaussian mixtures4
Specifying composites in structural equation modeling: A refinement of the Henseler–Ogasawara specification4
Parallel coordinate order for high‐dimensional data4
Tracking clusters and anomalies in evolving data streams4
Precision aggregated local models4
Feature selection for imbalanced data with deep sparse autoencoders ensemble4
The next wave: We will all be data scientists4
Learning compact physics‐aware delayed photocurrent models using dynamic mode decomposition3
A tree‐based gene–environment interaction analysis with rare features3
Factor analysis of mixed data for anomaly detection3
Sample selection bias in evaluation of prediction performance of causal models3
Coefficient tree regression for generalized linear models3
Traditional kriging versus modern Gaussian processes for large‐scale mining data3
Model‐based clustering of time‐dependent categorical sequences with application to the analysis of major life event patterns3
Ensembled sparse‐input hierarchical networks for high‐dimensional datasets3
Intuitively adaptable outlier detector3
An approach to characterizing spatial aspects of image system blur3
Survival trees based on heterogeneity in time‐to‐event and censoring distributions using parameter instability test3
Penalized composite likelihood for colored graphical Gaussian models2
Online embedding and clustering of evolving data streams2
Frequentist model averaging for zero‐inflated Poisson regression models2
SURE estimates for high dimensional classification2
Next waves in veridical network embedding*2
Cluster analysis via random partition distributions2
A general iterative clustering algorithm2
Adaptive batching for Gaussian process surrogates with application in noisy level set estimation2
The future of precision health is data‐driven decision support2
Comparison of machine learning approaches used to identify the drivers of Bakken oil well productivity2
Negative binomial graphical model with excess zeros2
Estimation of disease progression for ischemic heart disease using latent Markov with covariates2
Evaluating causal‐based feature selection for fuel property prediction models2
Clover plot: Versatile visualization in nonparametric classification*2
Classification of high‐dimensional electroencephalography data with location selection using structured spike‐and‐slab prior2
Emulated order identification for models of big time series data2
A study of the impact of COVID‐19 on the Chinese stock market based on a new textual multiple ARMA model2
Coupled support tensor machine classification for multimodal neuroimaging data2
Sketched Stochastic Dictionary Learning for large‐scale data and application to high‐throughput mass spectrometry2
0.030226945877075