Package pulearn
pulearn: Positive-unlabeled learning with Python.
The pulearn Python package provide a collection of scikit-learn wrappers
to several positive-unlabeled learning (PU-learning) methods.
pulearn
pulearn is a Python package providing scikit-learn-compatible wrappers for
several Positive-Unlabeled (PU) learning algorithms.
In PU learning the training set contains a set of labeled positive examples and a (typically much larger) set of unlabeled examples that may contain both positive and negative instances.
Installation
pip install pulearn
API Foundations
Core PU classifiers now share a common base contract via
pulearn.BasePUClassifier:
- Shared PU label normalization utilities with canonical internal form
(
1= labeled positive,0= unlabeled). Inputs in{1, -1},{1, 0}, and{True, False}are normalized immediately. Usepulearn.normalize_pu_labels(…)(ornormalize_pu_y()(…)) to convert labels at API boundaries. - Shared
predict_probaoutput checks for shape and numeric validity. - Optional hooks for score calibration and PU scorer construction.
- Shared validation policy for fit/metric inputs: non-empty arrays, matching sample counts between arrays, and explicit errors for missing labeled positives or missing unlabeled examples.
- Registered learner metadata is discoverable through
pulearn.get_algorithm_registry()andpulearn.get_algorithm_spec("<key>").
Extending pulearn
Contributor-facing scaffolding for new learners now lives in the repository:
- Checklist:
doc/new_algorithm_checklist.md - Docs stub:
doc/templates/new_algorithm_doc_stub.md - Regression test scaffold:
tests/templates/test_new_algorithm_template.py.tmpl - Shared API contract scaffold:
tests/templates/test_api_contract_template.py.tmpl - Benchmark entry scaffold:
benchmarks/templates/benchmark_entry_template.py.tmpl
Use pulearn.get_new_algorithm_checklist() to inspect the required workflow
from Python. pulearn.get_scaffold_templates() resolves absolute template
paths when called from a repository checkout and raises an actionable error
outside that context. Add a registry entry before wiring docs, benchmarks,
or tests for a new learner.
Implemented Classifiers
Elkanoto
Scikit-learn wrappers for the methods described in the paper by
Elkan and Noto (2008).
Unlabeled examples can be indicated by -1, 0, or False; positives by
1 or True.
Classic Elkanoto
from pulearn import ElkanotoPuClassifier
from sklearn.svm import SVC
svc = SVC(C=10, kernel="rbf", gamma=0.4, probability=True)
pu_estimator = ElkanotoPuClassifier(estimator=svc, hold_out_ratio=0.2)
pu_estimator.fit(X, y)
Weighted Elkanoto
from pulearn import WeightedElkanotoPuClassifier
from sklearn.svm import SVC
svc = SVC(C=10, kernel="rbf", gamma=0.4, probability=True)
pu_estimator = WeightedElkanotoPuClassifier(
estimator=svc, labeled=10, unlabeled=20, hold_out_ratio=0.2
)
pu_estimator.fit(X, y)
Bagging PU Classifier
Based on
Mordelet & Vert (2013).
Accepted PU labels follow the same package-wide conventions:
1/True for labeled positives and 0/-1/False for unlabeled.
from pulearn import BaggingPuClassifier
from sklearn.svm import SVC
svc = SVC(C=10, kernel="rbf", gamma=0.4, probability=True)
pu_estimator = BaggingPuClassifier(estimator=svc, n_estimators=15)
pu_estimator.fit(X, y)
Non-Negative PU Classifier (nnPU)
Implements the nnPU algorithm from Kiryo et al. (NeurIPS 2017). Trains a linear model using a non-negative risk estimator that prevents over-fitting to positive examples. Supports both nnPU and uPU modes. Prior probability of the positive class must be provided.
from pulearn import NNPUClassifier
clf = NNPUClassifier(prior=0.3, max_iter=1000, learning_rate=0.01)
clf.fit(X_train, y_pu) # y_pu: 1 = labeled positive, 0/-1 = unlabeled
labels = clf.predict(X_test)
Class-Prior Estimation (pulearn.priors)
pulearn.priors introduces a unified API for estimating the PU class prior
pi under the SCAR assumption:
LabelFrequencyPriorEstimatoris a naive lower-bound baseline equal to the observed labeled-positive fraction.HistogramMatchPriorEstimatorfits a probabilistic scorer and estimates the hidden positive mass in the unlabeled pool by matching score histograms.ScarEMPriorEstimatorrefinespiwith a soft-label EM loop over latent positives in the unlabeled pool.
Each estimator implements fit(X, y) and estimate(X, y) and returns a
PriorEstimateResult with the estimate, observed label rate, sample counts,
and method-specific metadata.
from pulearn import (
HistogramMatchPriorEstimator,
LabelFrequencyPriorEstimator,
ScarEMPriorEstimator,
)
baseline = LabelFrequencyPriorEstimator().estimate(X_train, y_pu)
histogram = HistogramMatchPriorEstimator().estimate(X_train, y_pu)
scar_em = ScarEMPriorEstimator().estimate(X_train, y_pu)
print(baseline.pi, histogram.pi, scar_em.pi)
print(scar_em.metadata["c_estimate"])
Use the baseline as a floor, compare it against the score-matching estimate, and favor the EM estimate when the underlying classifier is stable and the SCAR assumption is plausible.
Bootstrap confidence intervals are available for reproducible uncertainty estimates:
estimator = ScarEMPriorEstimator().fit(X_train, y_pu)
result = estimator.bootstrap(
X_train,
y_pu,
n_resamples=200,
confidence_level=0.95,
random_state=7,
)
print(result.pi)
print(result.confidence_interval.lower, result.confidence_interval.upper)
Diagnostics helpers can summarize estimator stability across a parameter sweep and optionally drive sensitivity plots:
from pulearn import HistogramMatchPriorEstimator, diagnose_prior_estimator
diagnostics = diagnose_prior_estimator(
HistogramMatchPriorEstimator(),
X_train,
y_pu,
parameter_grid={"n_bins": [8, 12, 20], "smoothing": [0.5, 1.0]},
)
print(diagnostics.unstable, diagnostics.warnings)
print(diagnostics.range_pi, diagnostics.std_pi)
# Optional: requires matplotlib
# from pulearn import plot_prior_sensitivity
# plot_prior_sensitivity(diagnostics)
If pi is uncertain, sweep corrected metrics across a plausible prior
range and compare the resulting best/worst-case summaries:
from pulearn import analyze_prior_sensitivity
sensitivity = analyze_prior_sensitivity(
y_pu,
y_pred=y_pred,
y_score=y_score,
metrics=["pu_precision", "pu_roc_auc"],
pi_min=0.2,
pi_max=0.5,
num=7,
)
print(sensitivity.as_rows())
print(sensitivity.summaries["pu_precision"].best_pi)
Propensity Estimation (pulearn.propensity)
pulearn.propensity packages robust estimators for the SCAR labeling
propensity c = P(s=1|y=1):
MeanPositivePropensityEstimatormatches the classic Elkan-Noto mean-on-positives estimate.TrimmedMeanPropensityEstimatortrims extreme labeled-positive scores before averaging.MedianPositivePropensityEstimatorandQuantilePositivePropensityEstimatorprovide conservative alternatives for noisy or skewed positive scores.CrossValidatedPropensityEstimatoruses out-of-fold probabilities from a probabilistic sklearn estimator to reduce optimistic bias.
All score-based estimators implement fit(y_pu, s_proba=...) and
estimate(y_pu, s_proba=...). The cross-validated estimator uses the same
API but accepts X=... and a base estimator.
from sklearn.linear_model import LogisticRegression
from pulearn import (
CrossValidatedPropensityEstimator,
MeanPositivePropensityEstimator,
MedianPositivePropensityEstimator,
QuantilePositivePropensityEstimator,
TrimmedMeanPropensityEstimator,
)
mean_c = MeanPositivePropensityEstimator().estimate(y_pu, s_proba=y_score)
trimmed_c = TrimmedMeanPropensityEstimator(trim_fraction=0.1).estimate(
y_pu,
s_proba=y_score,
)
median_c = MedianPositivePropensityEstimator().estimate(y_pu, s_proba=y_score)
quantile_c = QuantilePositivePropensityEstimator(quantile=0.25).estimate(
y_pu,
s_proba=y_score,
)
cv_c = CrossValidatedPropensityEstimator(
estimator=LogisticRegression(max_iter=1000),
cv=5,
random_state=7,
).estimate(y_pu, X=X_train)
print(mean_c.c, trimmed_c.c, median_c.c, quantile_c.c, cv_c.c)
print(cv_c.metadata["fold_estimates"])
Use the mean estimator for classic Elkan-Noto workflows, the trimmed/median/quantile estimators when a few labeled positives look unreliable, and the cross-validated estimator when you need a less optimistic score estimate from a fitted model.
estimate_label_frequency_c()(…) now delegates to the same
mean estimator and therefore expects probability-like scores in [0, 1].
Bootstrap confidence intervals are available when you need uncertainty
estimates or an explicit instability warning for c:
estimator = TrimmedMeanPropensityEstimator(trim_fraction=0.1).fit(
y_pu,
s_proba=y_score,
)
result = estimator.bootstrap(
y_pu,
s_proba=y_score,
n_resamples=200,
confidence_level=0.95,
random_state=7,
)
print(result.c)
print(result.confidence_interval.lower)
print(result.confidence_interval.upper)
print(result.confidence_interval.warning_flags)
Warning flags highlight repeated bootstrap fit failures, high resample
variance, large coefficient of variation, or inconsistent fold-level
cross-validation estimates. Those are SCAR warning signs worth investigating
before you treat c as stable enough for calibration or corrected metrics.
To check whether SCAR itself looks plausible, compare labeled positives against the highest-scoring unlabeled pool:
from pulearn import scar_sanity_check
scar_check = scar_sanity_check(
y_pu,
s_proba=y_score,
X=X_train,
candidate_quantile=0.9,
random_state=7,
)
print(scar_check.group_membership_auc)
print(scar_check.max_abs_smd)
print(scar_check.warnings)
Warnings such as group_separable, high_mean_shift, or
max_feature_shift indicate that the unlabeled samples most likely to be
positive still look systematically different from the labeled positives.
That is a practical signal to revisit SCAR before relying on c-corrected
calibration or metrics.
Experimental SAR Hooks
pulearn also exposes a minimal experimental SAR interface for users who
already have a selection-propensity model. The current scope is narrow:
plug in a propensity model, score new samples, and compute
inverse-propensity weights. Full SAR learners and SAR-corrected metrics are
still out of scope for this milestone.
from sklearn.linear_model import LogisticRegression
from pulearn import (
ExperimentalSarHook,
compute_inverse_propensity_weights,
predict_sar_propensity,
)
propensity_model = LogisticRegression(max_iter=1000).fit(X_train, s_train)
sar_scores = predict_sar_propensity(propensity_model, X_test)
sar_weights = compute_inverse_propensity_weights(
sar_scores,
clip_min=0.05,
clip_max=1.0,
normalize=True,
)
hook = ExperimentalSarHook(propensity_model)
hook_result = hook.inverse_propensity_weights(X_test, normalize=True)
print(sar_weights.weights[:5])
print(hook_result.metadata["propensity_model"])
These helpers warn on every use because the semantics are still unstable.
Inspect clipped_count, effective_sample_size, and extreme weights before
you rely on them in downstream research code.
Bayesian PU Classifiers
Four Bayesian classifiers for PU learning, ported from the
MIT-licensed reference implementation
by Chengning Zhang.
All four accept labels in either {1, 0} or {1, -1} convention.
Boolean labels follow the same package-wide behavior: True is treated as
labeled positive and False as unlabeled.
Continuous features are automatically discretized into equal-width bins.
Positive Naive Bayes (PNB)
from pulearn import PositiveNaiveBayesClassifier
clf = PositiveNaiveBayesClassifier(alpha=1.0, n_bins=10)
clf.fit(X_train, y_pu)
proba = clf.predict_proba(X_test)
Weighted Naive Bayes (WNB)
from pulearn import WeightedNaiveBayesClassifier
clf = WeightedNaiveBayesClassifier(alpha=1.0, n_bins=10)
clf.fit(X_train, y_pu)
print(clf.feature_weights_) # per-feature MI weight
proba = clf.predict_proba(X_test)
Positive Tree-Augmented Naive Bayes (PTAN)
from pulearn import PositiveTANClassifier
clf = PositiveTANClassifier(alpha=1.0, n_bins=10)
clf.fit(X_train, y_pu)
print(clf.tan_parents_) # learned tree structure
proba = clf.predict_proba(X_test)
Weighted Tree-Augmented Naive Bayes (WTAN)
from pulearn import WeightedTANClassifier
clf = WeightedTANClassifier(alpha=1.0, n_bins=10)
clf.fit(X_train, y_pu)
print(clf.feature_weights_)
print(clf.tan_parents_)
proba = clf.predict_proba(X_test)
Two-Step Reliable-Negative (RN) Classifiers
The two-step RN approach is a standard PU learning baseline under the SCAR assumption. It proceeds in two stages:
- Identification — score unlabeled samples with a step-1 classifier and select a subset as reliable negatives (RN) using one of three strategies.
- Classification — train a final binary classifier on the labeled positives (P) and the identified RN.
Three identification strategies are supported via the rn_strategy parameter:
| Strategy | Description |
|---|---|
"spy" (default) |
Inject a fraction of positives as "spies" into U; use the lowest spy score as the RN threshold (Liu et al., 2002) |
"threshold" |
Select unlabeled samples with positive-class score below a fixed threshold |
"quantile" |
Select the bottom quantile fraction of unlabeled samples by score |
Basic usage
from pulearn import TwoStepRNClassifier
clf = TwoStepRNClassifier(rn_strategy="spy", random_state=0)
clf.fit(X_train, y_pu)
proba = clf.predict_proba(X_test)
predictions = clf.predict(X_test)
Quantile strategy with custom estimators
from sklearn.linear_model import LogisticRegression
from pulearn import TwoStepRNClassifier
clf = TwoStepRNClassifier(
step1_estimator=LogisticRegression(max_iter=500),
step2_estimator=LogisticRegression(max_iter=500),
rn_strategy="quantile",
quantile=0.3,
random_state=0,
)
clf.fit(X_train, y_pu)
Failure modes and warnings
The classifier emits UserWarning in the following situations:
- Too few reliable negatives: fewer than
min_rn_fraction× n_unlabeled samples are selected as RN. Step-2 training may be dominated by the labeled positives. - Nearly all unlabeled selected: ≥ 95% of unlabeled samples are selected as RN. The final classifier may be biased toward the negative class.
- Large spy ratio:
spy_ratiowould consume all (or nearly all) positives as spies, leaving too few for step-2 training.
Prefer "spy" or "quantile" over "threshold" when the positive-class
prior or the step-1 calibration is uncertain, as direct thresholding is
highly sensitive to both.
Probability Calibration (pulearn.calibration)
PU learners often produce poorly calibrated probabilities because they are trained on a mix of labeled positives and unlabeled (mixed positive/negative) samples rather than clean two-class supervision. Poor calibration degrades decision thresholds, corrected PU metrics, and any downstream task that relies on probability magnitudes.
pulearn.calibration provides post-hoc calibration that adjusts raw
classifier scores on a separate held-out calibration set.
When to calibrate
| Situation | Recommendation |
|---|---|
| Default choice | Platt scaling (method='platt') |
| Non-parametric, large calibration set (100+ samples) | Isotonic regression (method='isotonic') |
| Only ranking quality needed (AUC) | No calibration required |
| < 30 held-out samples | Collect more data first |
Platt scaling fits a sigmoid on the positive-class scores via logistic regression. Reliable with as few as 30–50 held-out samples.
Isotonic regression is a non-parametric, monotone calibration method.
More flexible than Platt but prone to overfitting with small sets. At least
50 samples are required (100+ recommended). A ValueError is raised for
smaller sets.
Typical workflow
from sklearn.linear_model import LogisticRegression
from pulearn import PURiskClassifier, pu_train_test_split
from pulearn.calibration import calibrate_pu_classifier
# 1. Hold out a calibration split (separate from training data)
X_tr, X_cal, y_tr, y_cal = pu_train_test_split(X, y_pu, test_size=0.2)
# 2. Train the PU classifier on the training split
clf = PURiskClassifier(LogisticRegression(), prior=0.3).fit(X_tr, y_tr)
# 3. Calibrate using the held-out split.
# y_cal here are PU labels (1=labeled positive, 0=unlabeled).
# If you have true ground-truth labels for the calibration split
# (y_cal_true, where 0 = truly negative), pass those instead for
# sharper calibration.
calibrate_pu_classifier(clf, X_cal, y_cal, method="platt")
# 4. Use calibrated probabilities
proba = clf.predict_calibrated_proba(X_test)
Using PUCalibrator directly
PUCalibrator follows the sklearn estimator interface and is compatible with
BasePUClassifier.fit_calibrator:
from pulearn.calibration import PUCalibrator
cal = PUCalibrator(method="isotonic", min_samples_isotonic=100)
clf.fit_calibrator(cal, X_cal, y_cal)
proba = clf.predict_calibrated_proba(X_test)
Small-sample guard
Use warn_if_small_calibration_set to emit a UserWarning before attempting
calibration when the set may be too small:
from pulearn.calibration import warn_if_small_calibration_set
warn_if_small_calibration_set(n_samples=len(X_cal), method="isotonic")
Evaluation Metrics (pulearn.metrics)
pulearn.metrics provides evaluation utilities designed for the PU setting
under the SCAR (Selected Completely At Random) assumption. Metric
functions use strict PU label validation and normalize accepted conventions
to the canonical internal representation (1 positive, 0 unlabeled).
Calibration
from pulearn.metrics import estimate_label_frequency_c, calibrate_posterior_p_y1
c_hat = estimate_label_frequency_c(y_pu, s_proba)
p_y1 = calibrate_posterior_p_y1(s_proba, c_hat)
Expected-Confusion Metrics
from pulearn.metrics import (
pu_recall_score,
pu_precision_score,
pu_f1_score,
pu_specificity_score,
)
rec = pu_recall_score(y_pu, y_pred)
prec = pu_precision_score(y_pu, y_pred, pi=0.3)
f1 = pu_f1_score(y_pu, y_pred, pi=0.3)
spec = pu_specificity_score(y_pu, y_score)
Ranking Metrics
from pulearn.metrics import pu_roc_auc_score, pu_average_precision_score
auc = pu_roc_auc_score(y_pu, y_score, pi=0.3)
aul = pu_average_precision_score(y_pu, y_score, pi=0.3)
Risk Estimators
from pulearn.metrics import pu_unbiased_risk, pu_non_negative_risk
risk_upu = pu_unbiased_risk(y_pu, y_score, pi=0.3)
risk_nnpu = pu_non_negative_risk(y_pu, y_score, pi=0.3)
Scikit-learn Integration
from sklearn.model_selection import GridSearchCV
from pulearn.metrics import make_pu_scorer
scorer = make_pu_scorer("pu_f1", pi=0.3)
gs = GridSearchCV(estimator, param_grid, scoring=scorer)
gs.fit(X_train, y_pu_train)
Supported metric names: "lee_liu", "pu_recall", "pu_precision",
"pu_f1", "pu_specificity", "pu_roc_auc", "pu_average_precision",
"pu_unbiased_risk", "pu_non_negative_risk".
Model Selection (pulearn.model_selection)
pulearn.model_selection provides PU-aware splitting utilities that ensure
labeled positive samples are preserved across all folds and splits. Under the
SCAR assumption, stratifying by the binary PU label is a valid and practical
proxy for preserving the labeled-positive rate.
PUStratifiedKFold
Wraps scikit-learn's StratifiedKFold and stratifies by the PU label so that
each fold contains roughly the same fraction of labeled positive samples as the
full dataset.
from sklearn.svm import SVC
from pulearn import PUStratifiedKFold
estimator = SVC()
scores = []
cv = PUStratifiedKFold(n_splits=5, shuffle=True, random_state=0)
for train_idx, test_idx in cv.split(X, y_pu):
estimator.fit(X[train_idx], y_pu[train_idx])
scores.append(estimator.score(X[test_idx], y_pu[test_idx]))
PUCrossValidator
A higher-level cross-validator compatible with sklearn.model_selection.cross_validate
and GridSearchCV. It emits an actionable UserWarning when the labeled-positive count
is smaller than n_splits and falls back to plain KFold in that case.
from sklearn.model_selection import cross_validate
from pulearn import PUCrossValidator
cv = PUCrossValidator(n_splits=5, shuffle=True, random_state=0)
results = cross_validate(estimator, X, y_pu, cv=cv, scoring="f1")
pu_train_test_split
Stratified train/test split that preserves the PU label distribution and validates that the resulting training set always contains at least one labeled positive.
from pulearn import pu_train_test_split
X_train, X_test, y_train, y_test = pu_train_test_split(
X, y_pu, test_size=0.2, random_state=42
)
Examples
End-to-end runnable examples can be found in the examples/ directory of
the repository:
BreastCancerElkanotoExample.py— classic Elkan-Noto on the Wisconsin breast cancer dataset.BayesianPULearnersExample.py— comparison of all four Bayesian PU classifiers.PUMetricsEvaluationExample.py— demonstration of PU evaluation metrics on synthetic SCAR data.
Sub-modules
pulearn.bagging-
Bagging meta-estimator for PU learning …
pulearn.base-
Shared PU classifier contracts and utilities.
pulearn.bayesian_pu-
Bayesian PU learning classifiers …
pulearn.benchmarks-
Benchmark utilities for pulearn …
pulearn.calibration-
Post-hoc probability calibration utilities for PU classifiers …
pulearn.elkanoto-
Both PU classification methods from the Elkan & Noto paper.
pulearn.metrics-
Implement metrics that are useful for PU learning …
pulearn.model_selection-
PU-aware cross-validation and dataset-splitting utilities …
pulearn.nnpu-
Non-negative PU learning classifier …
pulearn.priors-
Class-prior estimation utilities for positive-unlabeled learning.
pulearn.propensity-
Propensity-estimation utilities for positive-unlabeled learning.
pulearn.registry-
Registry and contributor scaffolding for PU algorithms.
pulearn.risk-
Risk-objective PU learning wrapper for sklearn estimators …
pulearn.rn-
Reliable-Negative (RN) PU learning classifiers …
pulearn.torch_pu-
Experimental optional PyTorch integration for PU learning …