Generates distribution object for gbmt.
Arguments
- name
The name (a string) of the distribution to be initialized and used in fitting a gradient boosted model via gbmt. The current distributions available can be viewed using the function
available_distributions. If no distribution is specified this function constructs a Gaussian distribution by default.- ...
Extra named parameters required for initializing certain distributions. If t-distribution is selected, an additional parameter (
df) specifying the number of degrees of freedom can be given. The default degrees of freedom is set to four.If quantile is selected then the quantile to estimate may be specified using the named parameter
alpha. The default quantile to estimate is 0.25.If the tweedie distribution is selected the power-law specifying the distribution may be set via the named parameter:
power. This parameter defaults to unity.If a Cox Partial Hazards model is selected a number of additional parameters are required, these are:
strataA vector of integers (or factors) specifying which strata each data-row belongs to, if none is specified it is assumed all training data is in the same stratum.
tiesString specifying the method to be used when dealing with tied event times. Currently only "breslow" and "efron" are available, with the latter being the default.
prior_node_coeff_varIt is a prior on the coefficient of variation associated with the hazard rate assigned to each terminal node when fitting a tree. Increasing its value emphasizes the importance of the training data in the node when assigning a prediction to said node. This defaults to 1000.
Finally, if the pairwise distribution is selected a number of parameters also need to be specified. These parameters are
group,metricandmax_rank. The first is a character vector with the column names of data that jointly indicate the group an instance belongs to (typically a query in Information Retrieval). For training, only pairs of instances from the same group and with different target labels may be considered.metricis the IR measure to use, one of- list("conc")
Fraction of concordant pairs; for binary labels, this is equivalent to the Area under the ROC Curve
- :
Fraction of concordant pairs; for binary labels, this is equivalent to the Area under the ROC Curve
- list("mrr")
Mean reciprocal rank of the highest-ranked positive instance
- :
Mean reciprocal rank of the highest-ranked positive instance
- list("map")
Mean average precision, a generalization of
mrrto multiple positive instances- :
Mean average precision, a generalization of
mrrto multiple positive instances- list("ndcg:")
Normalized discounted cumulative gain. The score is the weighted sum (DCG) of the user-supplied target values, weighted by log(rank+1), and normalized to the maximum achievable value. This is the default if the user did not specify a metric.
ndcgandconcallow arbitrary target values, while binary targets {0,1} are expected formapandmrr. Forndcgandmrr, a cut-off can be chosen using a positive integer parametermax_rank. If left unspecified, all ranks are taken into account.