Module pulearn

The pulearn Python package provide a collection of scikit-learn wrappers to several positive-unlabled learning (PU-learning) methods.

Features

Installation

Install pulearn with:

  pip install pulearn

Implemented Classifiers

Elkanoto

Scikit-Learn wrappers for both the methods mentioned in the paper by Elkan and Noto, "Learning classifiers from only positive and unlabeled data" (published in Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2008).

These wrap the Python code from a fork by AdityaAS (with implementation to both methods) to the original repository by Alexandre Drouin implementing one of the methods.

Classic Elkanoto

To use the classic (unweighted) method, use the ElkanotoPuClassifier class:

    from pulearn import ElkanotoPuClassifier
    from sklearn.svm import SVC
    svc = SVC(C=10, kernel='rbf', gamma=0.4, probability=True)
    pu_estimator = ElkanotoPuClassifier(estimator=svc, hold_out_ratio=0.2)
    pu_estimator.fit(X, y)

See the documentation of the class for more details.

Weighted Elkanoto

To use the weighted method, use the WeightedElkanotoPuClassifier class:

    from pulearn import WeightedElkanotoPuClassifier
    from sklearn.svm import SVC
    svc = SVC(C=10, kernel='rbf', gamma=0.4, probability=True)
    pu_estimator = WeightedElkanotoPuClassifier(
        estimator=svc, labeled=10, unlabeled=20, hold_out_ratio=0.2)
    pu_estimator.fit(X, y)

See the original paper for details on how the labeled and unlabeled quantities are used to weigh training examples and affect the learning process.

See the documentation of the class for more details.

Bagging-based PU-learning

Based on the paper A bagging SVM to learn from positive and unlabeled examples (2013) by Mordelet and Vert. The implementation is by Roy Wright (roywright on GitHub), and can be found in his repository.

    from pulearn import BaggingPuClassifier
    from sklearn.svm import SVC
    svc = SVC(C=10, kernel='rbf', gamma=0.4, probability=True)
    pu_estimator = BaggingPuClassifier(
        base_estimator=svc, n_estimators=15)
    pu_estimator.fit(X, y)

Note: Any scikit-learn classifier can be used as the base estimator.

Examples

A nice code example of the classic Elkan-Noto classifier used for classification on the Wisconsin breast cancer dataset , comparing it to a regular random forest classifer, can be found in the examples directory.

To run it, clone the repository, and run the following command from the root of the repository, with a python environment where pulearn is installed:

    python examples/BreastCancerElkanotoExample.py

You should see a nice plot, like the one below, comparing the F1 score of the PU learner versus a naive learner, demonstrating how PU learning becomes more effective - or worthwhile - the more positive examples are "hidden" from the training set.

alt text

License

This package is released as open-source software under the BSD 3-clause license. See LICENSE_NOTICE.md for the different copyright holders of different parts of the code.

Credits

Implementations code by:

Packaging, testing and documentation by Shay Palachy.

Expand source code Browse git
"""
The `pulearn` Python package provide a collection of scikit-learn wrappers to
several positive-unlabled learning (PU-learning) methods.

.. include:: ./documentation.md
"""

from .elkanoto import (  # noqa: F401
    ElkanotoPuClassifier,
    WeightedElkanotoPuClassifier,
)
from .bagging import (  # noqa: F401
    BaggingPuClassifier,
)

from ._version import get_versions
__version__ = get_versions()['version']
del get_versions

Sub-modules

pulearn.bagging

Bagging meta-estimator for PU learning …

pulearn.elkanoto

Both PU classification methods from the Elkan & Noto paper.