AI RESEARCH
A Multivariate Bernoulli-Based Sampling Method for Multi-Label Data with Application to Meta-Research
arXiv CS.LG
•
ArXi:2512.08371v4 Announce Type: replace Datasets may contain observations with multiple labels. If the labels are not mutually exclusive, and if the labels vary greatly in frequency, obtaining a sample that includes sufficient observations with scarcer labels to make inferences about those labels, and which deviates from the population frequencies in a known manner, creates challenges. In this paper, we consider a multivariate Bernoulli distribution as our underlying distribution of a multi-label problem. We present a novel sampling algorithm that takes label dependencies into account.