Module pulearn
The pulearn
Python package provide a collection of scikit-learn wrappers to
several positive-unlabled learning (PU-learning) methods.
Features
- Scikit-learn compliant wrappers to prominent PU-learning methods.
- Fully tested on Linux, macOS and Windows systems.
- Compatible with Python 3.5+.
Installation
Install pulearn
with:
pip install pulearn
Implemented Classifiers
Elkanoto
Scikit-Learn wrappers for both the methods mentioned in the paper by Elkan and Noto, "Learning classifiers from only positive and unlabeled data" (published in Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2008).
These wrap the Python code from a fork by AdityaAS (with implementation to both methods) to the original repository by Alexandre Drouin implementing one of the methods.
Classic Elkanoto
To use the classic (unweighted) method, use the ElkanotoPuClassifier
class:
from pulearn import ElkanotoPuClassifier
from sklearn.svm import SVC
svc = SVC(C=10, kernel='rbf', gamma=0.4, probability=True)
pu_estimator = ElkanotoPuClassifier(estimator=svc, hold_out_ratio=0.2)
pu_estimator.fit(X, y)
See the documentation of the class for more details.
Weighted Elkanoto
To use the weighted method, use the WeightedElkanotoPuClassifier
class:
from pulearn import WeightedElkanotoPuClassifier
from sklearn.svm import SVC
svc = SVC(C=10, kernel='rbf', gamma=0.4, probability=True)
pu_estimator = WeightedElkanotoPuClassifier(
estimator=svc, labeled=10, unlabeled=20, hold_out_ratio=0.2)
pu_estimator.fit(X, y)
See the original paper for details on how the labeled
and unlabeled
quantities are used to weigh training examples and affect the learning process.
See the documentation of the class for more details.
Bagging-based PU-learning
Based on the paper A bagging SVM to learn from positive and unlabeled examples (2013) by Mordelet and Vert. The implementation is by Roy Wright (roywright on GitHub), and can be found in his repository.
from pulearn import BaggingPuClassifier
from sklearn.svm import SVC
svc = SVC(C=10, kernel='rbf', gamma=0.4, probability=True)
pu_estimator = BaggingPuClassifier(
base_estimator=svc, n_estimators=15)
pu_estimator.fit(X, y)
Note: Any scikit-learn
classifier can be used as the base estimator.
Examples
A nice code example of the classic Elkan-Noto classifier used for classification on the Wisconsin breast cancer dataset , comparing it to a regular random forest classifer, can be found in the examples
directory.
To run it, clone the repository, and run the following command from the root of the repository, with a python environment where pulearn
is installed:
python examples/BreastCancerElkanotoExample.py
You should see a nice plot, like the one below, comparing the F1 score of the PU learner versus a naive learner, demonstrating how PU learning becomes more effective - or worthwhile - the more positive examples are "hidden" from the training set.
License
This package is released as open-source software under the BSD 3-clause license. See LICENSE_NOTICE.md
for the different copyright holders of different parts of the code.
Credits
Implementations code by:
- Elkan & Noto - Alexandre Drouin and AditraAS.
- Bagging PU Classifier - Roy Wright.
Packaging, testing and documentation by Shay Palachy.
Expand source code Browse git
"""
The `pulearn` Python package provide a collection of scikit-learn wrappers to
several positive-unlabled learning (PU-learning) methods.
.. include:: ./documentation.md
"""
from .elkanoto import ( # noqa: F401
ElkanotoPuClassifier,
WeightedElkanotoPuClassifier,
)
from .bagging import ( # noqa: F401
BaggingPuClassifier,
)
from ._version import get_versions
__version__ = get_versions()['version']
del get_versions
Sub-modules
pulearn.bagging
-
Bagging meta-estimator for PU learning …
pulearn.elkanoto
-
Both PU classification methods from the Elkan & Noto paper.