Module pulearn.bagging

Bagging meta-estimator for PU learning.

Any scikit-learn estimator should work as the base estimator.

This implementation is fully compatible with scikit-learn, and is in fact based on the code of the sklearn.ensemble.BaggingClassifier class with very minor changes.

Classes

class BaggingPuClassifier (estimator=None,
n_estimators=10,
max_samples=1.0,
max_features=1.0,
bootstrap=True,
bootstrap_features=False,
oob_score=True,
warm_start=False,
n_jobs=1,
random_state=None,
verbose=0,
balanced_subsample=False)
Expand source code Browse git
class BaggingPuClassifier(BaseBaggingPU, ClassifierMixin):
    """A Bagging PU classifier.

    Adapted from sklearn.ensemble.BaggingClassifier, based on
    A bagging SVM to learn from positive and unlabeled examples (2013)
    by Mordelet and Vert
    http://dx.doi.org/10.1016/j.patrec.2013.06.010
    http://members.cbio.mines-paristech.fr/~jvert/svn/bibli/local/Mordelet2013bagging.pdf

    Parameters
    ----------
    estimator : object or None, optional (default=None)
        The base estimator to fit on random subsets of the dataset.
        If None, then the base estimator is a decision tree.

    n_estimators : int, optional (default=10)
        The number of base estimators in the ensemble.

    max_samples : int or float, optional (default=1.0)
        The number of unlabeled samples to draw to train each base estimator.
        Ignored when ``balanced_subsample=True``.

    max_features : int or float, optional (default=1.0)
        The number of features to draw from X to train each base estimator.

        - If int, then draw `max_features` features.
        - If float, then draw `max_features * X.shape[1]` features.

    bootstrap : boolean, optional (default=True)
        Whether samples are drawn with replacement.

    bootstrap_features : boolean, optional (default=False)
        Whether features are drawn with replacement.

    oob_score : bool, optional (default=True)
        Whether to use out-of-bag samples to estimate
        the generalization error.

    warm_start : bool, optional (default=False)
        When set to True, reuse the solution of the previous call to fit
        and add more estimators to the ensemble, otherwise, just fit
        a whole new ensemble.

    n_jobs : int, optional (default=1)
        The number of jobs to run in parallel for both `fit` and `predict`.
        If -1, then the number of jobs is set to the number of cores.

    random_state : int, RandomState instance or None, optional (default=None)
        If int, random_state is the seed used by the random number generator;
        If RandomState instance, random_state is the random number generator;
        If None, the random number generator is the RandomState instance used
        by `np.random`.

    verbose : int, optional (default=0)
        Controls the verbosity of the building process.

    balanced_subsample : bool, optional (default=False)
        When True, each bag always includes all positive samples and draws
        up to ``n_positives`` unlabeled samples (without replacement). This
        yields a roughly 1:1 positive-to-unlabeled ratio when
        ``n_unlabeled >= n_positives``; otherwise, all unlabeled samples are
        used and the bag contains more positives than unlabeled. When True,
        the ``max_samples`` parameter is ignored.

    Attributes
    ----------
    estimator_ : estimator
        The base estimator from which the ensemble is grown.

    estimators_ : list of estimators
        The collection of fitted base estimators.

    estimators_samples_ : list of arrays
        The subset of drawn samples (i.e., the in-bag samples) for each base
        estimator. Each subset is defined by a boolean mask.

    estimators_features_ : list of arrays
        The subset of drawn features for each base estimator.

    classes_ : array of shape = [n_classes]
        The classes labels.

    n_classes_ : int or list
        The number of classes.

    oob_score_ : float
        Score of the training dataset obtained using an out-of-bag estimate.

    oob_decision_function_ : array of shape = [n_samples, n_classes]
        Decision function computed with out-of-bag estimate on the training
        set. Positive data points, and perhaps some of the unlabeled,
        are left out during the bootstrap. In these cases,
        `oob_decision_function_` contains NaN.

    ensemble_diagnostics_ : dict
        Summary statistics computed after ``fit``. Always present.
        Keys:

        - ``n_positives`` (int): number of positive training samples.
        - ``n_unlabeled`` (int): number of unlabeled training samples.
        - ``effective_max_samples`` (int): unlabeled samples drawn per bag.
        - ``bag_size`` (int): total samples per bag
          (``effective_max_samples`` + ``n_positives``).
        - ``positive_ratio_in_bags`` (float): fraction of positives in
          each bag.

        When ``oob_score=True`` the following keys are also present:

        - ``oob_score`` (float): out-of-bag accuracy.
        - ``oob_prediction_variance`` (float): variance of the
          OOB positive-class probability estimates across all OOB
          samples; useful as a proxy for ensemble prediction stability.

    """

    def __init__(
        self,
        estimator=None,
        n_estimators=10,
        max_samples=1.0,
        max_features=1.0,
        bootstrap=True,
        bootstrap_features=False,
        oob_score=True,
        warm_start=False,
        n_jobs=1,
        random_state=None,
        verbose=0,
        balanced_subsample=False,
    ):
        """Initialize the Bagging meta-estimator."""
        super(BaggingPuClassifier, self).__init__(
            estimator,
            n_estimators=n_estimators,
            max_samples=max_samples,
            max_features=max_features,
            bootstrap=bootstrap,
            bootstrap_features=bootstrap_features,
            oob_score=oob_score,
            warm_start=warm_start,
            n_jobs=n_jobs,
            random_state=random_state,
            verbose=verbose,
            balanced_subsample=balanced_subsample,
        )

    def _validate_estimator(self):
        """Check the estimator and set the estimator_ attribute."""
        super(BaggingPuClassifier, self)._validate_estimator(
            default=DecisionTreeClassifier()
        )

    def _set_oob_score(self, X, y):
        n_samples = y.shape[0]
        n_classes_ = self.n_classes_
        # classes_ = self.classes_

        predictions = np.zeros((n_samples, n_classes_))

        for estimator, samples, features in zip(
            self.estimators_,
            self.estimators_samples_,
            self.estimators_features_,
        ):
            # Create mask for OOB samples
            mask = ~samples

            if hasattr(estimator, "predict_proba"):
                predictions[mask, :] += estimator.predict_proba(
                    (X[mask, :])[:, features]
                )

            else:
                p = estimator.predict((X[mask, :])[:, features])
                j = 0

                for i in range(n_samples):
                    if mask[i]:
                        predictions[i, p[j]] += 1
                        j += 1

        # Modified: no warnings about non-OOB points (i.e. positives)
        with np.errstate(invalid="ignore"):
            denominator = predictions.sum(axis=1)[:, np.newaxis]
            oob_decision_function = predictions / denominator
            oob_score = accuracy_score(y, np.argmax(predictions, axis=1))

        self.oob_decision_function_ = oob_decision_function
        self.oob_score_ = oob_score

    def _validate_y(self, y):
        y = column_or_1d(y, warn=True)
        y = normalize_pu_y(
            y,
            require_positive=True,
            require_unlabeled=True,
            strict=True,
        )
        self.classes_ = np.array([0, 1], dtype=int)
        self.n_classes_ = 2
        return y

    def predict(self, X):
        """Predict class for X.

        The predicted class of an input sample is computed as the class with
        the highest mean predicted probability. If base estimators do not
        implement a ``predict_proba`` method, then it resorts to voting.

        Parameters
        ----------
        X : {array-like, sparse matrix} of shape = [n_samples, n_features]
            The training input samples. Sparse matrices are accepted only if
            they are supported by the base estimator.

        Returns
        -------
        y : array of shape = [n_samples]
            The predicted classes.

        """
        predicted_probabilitiy = self.predict_proba(X)
        return self.classes_.take(
            (np.argmax(predicted_probabilitiy, axis=1)), axis=0
        )

    def predict_proba(self, X):
        """Predict class probabilities for X.

        The predicted class probabilities of an input sample is computed as
        the mean predicted class probabilities of the base estimators in the
        ensemble. If base estimators do not implement a ``predict_proba``
        method, then it resorts to voting and the predicted class probabilities
        of an input sample represents the proportion of estimators predicting
        each class.

        Parameters
        ----------
        X : {array-like, sparse matrix} of shape = [n_samples, n_features]
            The training input samples. Sparse matrices are accepted only if
            they are supported by the base estimator.

        Returns
        -------
        p : array of shape = [n_samples, n_classes]
            The class probabilities of the input samples. The order of the
            classes corresponds to that in the attribute `classes_`.

        """
        check_is_fitted(self, "classes_")
        # Check data
        X = check_array(X, accept_sparse=["csr", "csc"])

        if self.n_features_ != X.shape[1]:
            raise ValueError(
                "Number of features of the model must "
                "match the input. Model n_features is {0} and "
                "input n_features is {1}."
                "".format(self.n_features_, X.shape[1])
            )

        # Parallel loop
        n_jobs, n_estimators, starts = _partition_estimators(
            self.n_estimators, self.n_jobs
        )

        all_proba = Parallel(n_jobs=n_jobs, verbose=self.verbose)(
            delayed(_parallel_predict_proba)(
                self.estimators_[starts[i] : starts[i + 1]],
                self.estimators_features_[starts[i] : starts[i + 1]],
                X,
                self.n_classes_,
            )
            for i in range(n_jobs)
        )

        # Reduce
        proba = sum(all_proba) / self.n_estimators

        return proba

    def predict_log_proba(self, X):
        """Predict class log-probabilities for X.

        The predicted class log-probabilities of an input sample is computed as
        the log of the mean predicted class probabilities of the base
        estimators in the ensemble.

        Parameters
        ----------
        X : {array-like, sparse matrix} of shape = [n_samples, n_features]
            The training input samples. Sparse matrices are accepted only if
            they are supported by the base estimator.

        Returns
        -------
        p : array of shape = [n_samples, n_classes]
            The class log-probabilities of the input samples. The order of the
            classes corresponds to that in the attribute `classes_`.

        """
        check_is_fitted(self, "classes_")
        if hasattr(self.estimator_, "predict_log_proba"):
            # Check data
            X = check_array(X, accept_sparse=["csr", "csc"])

            if self.n_features_ != X.shape[1]:
                raise ValueError(
                    "Number of features of the model must "
                    "match the input. Model n_features is {0} "
                    "and input n_features is {1} "
                    "".format(self.n_features_, X.shape[1])
                )

            # Parallel loop
            n_jobs, n_estimators, starts = _partition_estimators(
                self.n_estimators, self.n_jobs
            )

            all_log_proba = Parallel(n_jobs=n_jobs, verbose=self.verbose)(
                delayed(_parallel_predict_log_proba)(
                    self.estimators_[starts[i] : starts[i + 1]],
                    self.estimators_features_[starts[i] : starts[i + 1]],
                    X,
                    self.n_classes_,
                )
                for i in range(n_jobs)
            )

            # Reduce
            log_proba = all_log_proba[0]

            for j in range(1, len(all_log_proba)):  # pragma: no cover
                log_proba = np.logaddexp(log_proba, all_log_proba[j])

            log_proba -= np.log(self.n_estimators)

            return log_proba
        # else, the base estimator has no predict_log_proba, so...
        return np.log(self.predict_proba(X))

    @available_if(lambda self: hasattr(self.estimator, "decision_function"))
    def decision_function(self, X):
        """Average of the decision functions of the base classifiers.

        Parameters
        ----------
        X : {array-like, sparse matrix} of shape = [n_samples, n_features]
            The training input samples. Sparse matrices are accepted only if
            they are supported by the base estimator.

        Returns
        -------
        score : array, shape = [n_samples, k]
            The decision function of the input samples. The columns correspond
            to the classes in sorted order, as they appear in the attribute
            ``classes_``. Regression and binary classification are special
            cases with ``k == 1``, otherwise ``k==n_classes``.

        """
        check_is_fitted(self, "classes_")

        # Check data
        X = check_array(X, accept_sparse=["csr", "csc"])

        if self.n_features_ != X.shape[1]:
            raise ValueError(
                "Number of features of the model must "
                "match the input. Model n_features is {0} and "
                "input n_features is {1} "
                "".format(self.n_features_, X.shape[1])
            )

        # Parallel loop
        n_jobs, n_estimators, starts = _partition_estimators(
            self.n_estimators, self.n_jobs
        )

        all_decisions = Parallel(n_jobs=n_jobs, verbose=self.verbose)(
            delayed(_parallel_decision_function)(
                self.estimators_[starts[i] : starts[i + 1]],
                self.estimators_features_[starts[i] : starts[i + 1]],
                X,
            )
            for i in range(n_jobs)
        )

        # Reduce
        decisions = sum(all_decisions) / self.n_estimators

        return decisions

A Bagging PU classifier.

Adapted from sklearn.ensemble.BaggingClassifier, based on A bagging SVM to learn from positive and unlabeled examples (2013) by Mordelet and Vert http://dx.doi.org/10.1016/j.patrec.2013.06.010 http://members.cbio.mines-paristech.fr/~jvert/svn/bibli/local/Mordelet2013bagging.pdf

Parameters

estimator : object or None, optional (default=None)
The base estimator to fit on random subsets of the dataset. If None, then the base estimator is a decision tree.
n_estimators : int, optional (default=10)
The number of base estimators in the ensemble.
max_samples : int or float, optional (default=1.0)
The number of unlabeled samples to draw to train each base estimator. Ignored when balanced_subsample=True.
max_features : int or float, optional (default=1.0)

The number of features to draw from X to train each base estimator.

  • If int, then draw max_features features.
  • If float, then draw max_features * X.shape[1] features.
bootstrap : boolean, optional (default=True)
Whether samples are drawn with replacement.
bootstrap_features : boolean, optional (default=False)
Whether features are drawn with replacement.
oob_score : bool, optional (default=True)
Whether to use out-of-bag samples to estimate the generalization error.
warm_start : bool, optional (default=False)
When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
n_jobs : int, optional (default=1)
The number of jobs to run in parallel for both fit and predict. If -1, then the number of jobs is set to the number of cores.
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
verbose : int, optional (default=0)
Controls the verbosity of the building process.
balanced_subsample : bool, optional (default=False)
When True, each bag always includes all positive samples and draws up to n_positives unlabeled samples (without replacement). This yields a roughly 1:1 positive-to-unlabeled ratio when n_unlabeled >= n_positives; otherwise, all unlabeled samples are used and the bag contains more positives than unlabeled. When True, the max_samples parameter is ignored.

Attributes

estimator_ : estimator
The base estimator from which the ensemble is grown.
estimators_ : list of estimators
The collection of fitted base estimators.
estimators_samples_ : list of arrays
The subset of drawn samples (i.e., the in-bag samples) for each base estimator. Each subset is defined by a boolean mask.
estimators_features_ : list of arrays
The subset of drawn features for each base estimator.
classes_ : array of shape = [n_classes]
The classes labels.
n_classes_ : int or list
The number of classes.
oob_score_ : float
Score of the training dataset obtained using an out-of-bag estimate.
oob_decision_function_ : array of shape = [n_samples, n_classes]
Decision function computed with out-of-bag estimate on the training set. Positive data points, and perhaps some of the unlabeled, are left out during the bootstrap. In these cases, oob_decision_function_ contains NaN.
ensemble_diagnostics_ : dict

Summary statistics computed after fit. Always present. Keys:

  • n_positives (int): number of positive training samples.
  • n_unlabeled (int): number of unlabeled training samples.
  • effective_max_samples (int): unlabeled samples drawn per bag.
  • bag_size (int): total samples per bag (effective_max_samples + n_positives).
  • positive_ratio_in_bags (float): fraction of positives in each bag.

When oob_score=True the following keys are also present:

  • oob_score (float): out-of-bag accuracy.
  • oob_prediction_variance (float): variance of the OOB positive-class probability estimates across all OOB samples; useful as a proxy for ensemble prediction stability.

Initialize the Bagging meta-estimator.

Ancestors

  • pulearn.bagging.BaseBaggingPU
  • sklearn.ensemble._base.BaseEnsemble
  • sklearn.base.MetaEstimatorMixin
  • sklearn.base.BaseEstimator
  • sklearn.utils._repr_html.base.ReprHTMLMixin
  • sklearn.utils._repr_html.base._HTMLDocumentationLinkMixin
  • sklearn.utils._metadata_requests._MetadataRequester
  • sklearn.base.ClassifierMixin

Methods

def decision_function(self, X)
Expand source code Browse git
@available_if(lambda self: hasattr(self.estimator, "decision_function"))
def decision_function(self, X):
    """Average of the decision functions of the base classifiers.

    Parameters
    ----------
    X : {array-like, sparse matrix} of shape = [n_samples, n_features]
        The training input samples. Sparse matrices are accepted only if
        they are supported by the base estimator.

    Returns
    -------
    score : array, shape = [n_samples, k]
        The decision function of the input samples. The columns correspond
        to the classes in sorted order, as they appear in the attribute
        ``classes_``. Regression and binary classification are special
        cases with ``k == 1``, otherwise ``k==n_classes``.

    """
    check_is_fitted(self, "classes_")

    # Check data
    X = check_array(X, accept_sparse=["csr", "csc"])

    if self.n_features_ != X.shape[1]:
        raise ValueError(
            "Number of features of the model must "
            "match the input. Model n_features is {0} and "
            "input n_features is {1} "
            "".format(self.n_features_, X.shape[1])
        )

    # Parallel loop
    n_jobs, n_estimators, starts = _partition_estimators(
        self.n_estimators, self.n_jobs
    )

    all_decisions = Parallel(n_jobs=n_jobs, verbose=self.verbose)(
        delayed(_parallel_decision_function)(
            self.estimators_[starts[i] : starts[i + 1]],
            self.estimators_features_[starts[i] : starts[i + 1]],
            X,
        )
        for i in range(n_jobs)
    )

    # Reduce
    decisions = sum(all_decisions) / self.n_estimators

    return decisions

Average of the decision functions of the base classifiers.

Parameters

X : {array-like, sparse matrix} of shape = [n_samples, n_features]
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns

score : array, shape = [n_samples, k]
The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.
def predict(self, X)
Expand source code Browse git
def predict(self, X):
    """Predict class for X.

    The predicted class of an input sample is computed as the class with
    the highest mean predicted probability. If base estimators do not
    implement a ``predict_proba`` method, then it resorts to voting.

    Parameters
    ----------
    X : {array-like, sparse matrix} of shape = [n_samples, n_features]
        The training input samples. Sparse matrices are accepted only if
        they are supported by the base estimator.

    Returns
    -------
    y : array of shape = [n_samples]
        The predicted classes.

    """
    predicted_probabilitiy = self.predict_proba(X)
    return self.classes_.take(
        (np.argmax(predicted_probabilitiy, axis=1)), axis=0
    )

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a predict_proba method, then it resorts to voting.

Parameters

X : {array-like, sparse matrix} of shape = [n_samples, n_features]
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns

y : array of shape = [n_samples]
The predicted classes.
def predict_log_proba(self, X)
Expand source code Browse git
def predict_log_proba(self, X):
    """Predict class log-probabilities for X.

    The predicted class log-probabilities of an input sample is computed as
    the log of the mean predicted class probabilities of the base
    estimators in the ensemble.

    Parameters
    ----------
    X : {array-like, sparse matrix} of shape = [n_samples, n_features]
        The training input samples. Sparse matrices are accepted only if
        they are supported by the base estimator.

    Returns
    -------
    p : array of shape = [n_samples, n_classes]
        The class log-probabilities of the input samples. The order of the
        classes corresponds to that in the attribute `classes_`.

    """
    check_is_fitted(self, "classes_")
    if hasattr(self.estimator_, "predict_log_proba"):
        # Check data
        X = check_array(X, accept_sparse=["csr", "csc"])

        if self.n_features_ != X.shape[1]:
            raise ValueError(
                "Number of features of the model must "
                "match the input. Model n_features is {0} "
                "and input n_features is {1} "
                "".format(self.n_features_, X.shape[1])
            )

        # Parallel loop
        n_jobs, n_estimators, starts = _partition_estimators(
            self.n_estimators, self.n_jobs
        )

        all_log_proba = Parallel(n_jobs=n_jobs, verbose=self.verbose)(
            delayed(_parallel_predict_log_proba)(
                self.estimators_[starts[i] : starts[i + 1]],
                self.estimators_features_[starts[i] : starts[i + 1]],
                X,
                self.n_classes_,
            )
            for i in range(n_jobs)
        )

        # Reduce
        log_proba = all_log_proba[0]

        for j in range(1, len(all_log_proba)):  # pragma: no cover
            log_proba = np.logaddexp(log_proba, all_log_proba[j])

        log_proba -= np.log(self.n_estimators)

        return log_proba
    # else, the base estimator has no predict_log_proba, so...
    return np.log(self.predict_proba(X))

Predict class log-probabilities for X.

The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.

Parameters

X : {array-like, sparse matrix} of shape = [n_samples, n_features]
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns

p : array of shape = [n_samples, n_classes]
The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
def predict_proba(self, X)
Expand source code Browse git
def predict_proba(self, X):
    """Predict class probabilities for X.

    The predicted class probabilities of an input sample is computed as
    the mean predicted class probabilities of the base estimators in the
    ensemble. If base estimators do not implement a ``predict_proba``
    method, then it resorts to voting and the predicted class probabilities
    of an input sample represents the proportion of estimators predicting
    each class.

    Parameters
    ----------
    X : {array-like, sparse matrix} of shape = [n_samples, n_features]
        The training input samples. Sparse matrices are accepted only if
        they are supported by the base estimator.

    Returns
    -------
    p : array of shape = [n_samples, n_classes]
        The class probabilities of the input samples. The order of the
        classes corresponds to that in the attribute `classes_`.

    """
    check_is_fitted(self, "classes_")
    # Check data
    X = check_array(X, accept_sparse=["csr", "csc"])

    if self.n_features_ != X.shape[1]:
        raise ValueError(
            "Number of features of the model must "
            "match the input. Model n_features is {0} and "
            "input n_features is {1}."
            "".format(self.n_features_, X.shape[1])
        )

    # Parallel loop
    n_jobs, n_estimators, starts = _partition_estimators(
        self.n_estimators, self.n_jobs
    )

    all_proba = Parallel(n_jobs=n_jobs, verbose=self.verbose)(
        delayed(_parallel_predict_proba)(
            self.estimators_[starts[i] : starts[i + 1]],
            self.estimators_features_[starts[i] : starts[i + 1]],
            X,
            self.n_classes_,
        )
        for i in range(n_jobs)
    )

    # Reduce
    proba = sum(all_proba) / self.n_estimators

    return proba

Predict class probabilities for X.

The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a predict_proba method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.

Parameters

X : {array-like, sparse matrix} of shape = [n_samples, n_features]
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns

p : array of shape = [n_samples, n_classes]
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
def set_fit_request(self: BaggingPuClassifier,
*,
sample_weight: bool | str | None = '$UNCHANGED$') ‑> BaggingPuClassifier
Expand source code Browse git
def func(*args, **kw):
    """Updates the `_metadata_request` attribute of the consumer (`instance`)
    for the parameters provided as `**kw`.

    This docstring is overwritten below.
    See REQUESTER_DOC for expected functionality.
    """
    if not _routing_enabled():
        raise RuntimeError(
            "This method is only available when metadata routing is enabled."
            " You can enable it using"
            " sklearn.set_config(enable_metadata_routing=True)."
        )

    if self.validate_keys and (set(kw) - set(self.keys)):
        raise TypeError(
            f"Unexpected args: {set(kw) - set(self.keys)} in {self.name}. "
            f"Accepted arguments are: {set(self.keys)}"
        )

    # This makes it possible to use the decorated method as an unbound method,
    # for instance when monkeypatching.
    # https://github.com/scikit-learn/scikit-learn/issues/28632
    if instance is None:
        _instance = args[0]
        args = args[1:]
    else:
        _instance = instance

    # Replicating python's behavior when positional args are given other than
    # `self`, and `self` is only allowed if this method is unbound.
    if args:
        raise TypeError(
            f"set_{self.name}_request() takes 0 positional argument but"
            f" {len(args)} were given"
        )

    requests = _instance._get_metadata_request()
    method_metadata_request = getattr(requests, self.name)

    for prop, alias in kw.items():
        if alias is not UNCHANGED:
            method_metadata_request.add_request(param=prop, alias=alias)
    _instance._metadata_request = requests

    return _instance

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a :term:meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see :func:sklearn.set_config). Please check the :ref:User Guide <metadata_routing> on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version: 1.3

Parameters

sample_weight : str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for sample_weight parameter in fit.

Returns

self : object
The updated object.
def set_score_request(self: BaggingPuClassifier,
*,
sample_weight: bool | str | None = '$UNCHANGED$') ‑> BaggingPuClassifier
Expand source code Browse git
def func(*args, **kw):
    """Updates the `_metadata_request` attribute of the consumer (`instance`)
    for the parameters provided as `**kw`.

    This docstring is overwritten below.
    See REQUESTER_DOC for expected functionality.
    """
    if not _routing_enabled():
        raise RuntimeError(
            "This method is only available when metadata routing is enabled."
            " You can enable it using"
            " sklearn.set_config(enable_metadata_routing=True)."
        )

    if self.validate_keys and (set(kw) - set(self.keys)):
        raise TypeError(
            f"Unexpected args: {set(kw) - set(self.keys)} in {self.name}. "
            f"Accepted arguments are: {set(self.keys)}"
        )

    # This makes it possible to use the decorated method as an unbound method,
    # for instance when monkeypatching.
    # https://github.com/scikit-learn/scikit-learn/issues/28632
    if instance is None:
        _instance = args[0]
        args = args[1:]
    else:
        _instance = instance

    # Replicating python's behavior when positional args are given other than
    # `self`, and `self` is only allowed if this method is unbound.
    if args:
        raise TypeError(
            f"set_{self.name}_request() takes 0 positional argument but"
            f" {len(args)} were given"
        )

    requests = _instance._get_metadata_request()
    method_metadata_request = getattr(requests, self.name)

    for prop, alias in kw.items():
        if alias is not UNCHANGED:
            method_metadata_request.add_request(param=prop, alias=alias)
    _instance._metadata_request = requests

    return _instance

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a :term:meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see :func:sklearn.set_config). Please check the :ref:User Guide <metadata_routing> on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version: 1.3

Parameters

sample_weight : str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for sample_weight parameter in score.

Returns

self : object
The updated object.