Model Specific Parameters
James Hickey
2026-07-01
Source:vignettes/model-specific-parameters.Rmd
model-specific-parameters.RmdA number of distributions provided by gbm have model
specific parameters associated with them. All distributions have
parameters associated with them such as the mean or variance; however,
certain distributions require additional data to be defined fully. This
additional data is referred to as “model specific parameters”. This
document describes how to correctly specify these parameters on
construction of the associated GBMDist object as well as
their default values.
Distributions with model specific parameters
There are 5 distributions within gbm which have
additional parameters associated with them. These distributions are:
CoxPH, Pairwise, Quantile,
TDist and Tweedie.
Cox proportional hazards model
The Cox proportional hazards model has several model specific parameters associated with it. All of them are optional but play important roles in the boosting process.
-
strata: a vector of positive integers indicating which strata each row of data belongs to. If there are multiple rows per observation then this should be reflected in thestratavector. If not specified it is assumed all training data are in the same stratum and all test data are in another stratum. -
sorted: a vector specifying how the rows of data are ordered within theirstrataand the order within strata is the reverse order of the censored times or start times of the survival data. This vector is completely optional and will be calculated bygbmt. -
ties: a string specifying the method by which ties are broken. Currently the “breslow” and “efron” approximations are implemented, with the latter being the default method taken. -
prior_node_coeff_var: a double used to regularize the model predictions ingbm. It represents the prior on the number of events in the model. The predictions of theGBMFitare given by $\log("Number of events"/"Expected Number of events")$. Both the number of events in a dataset and the model’s expected number of events could be leading to non-finite behaviour. The inverse of this parameter is added to both the numerator and denominator appearing in the log ratio so as to ensure the predictions are finite. The default value is , representing a base event number of events irrespective of the value of the measured or expected number of events.
Pairwise distribution
The “Pairwise” distribution implements ranking measures following the
LamdaMART algorithm. Observations belong to groups, with all
pairs of items with different labels but belonging to the same group are
used for training. The distribution requires a character vector with the
column names of the data that jointly indicate the group an observation
belongs to. This character vector is passed to the group
argument on construction. When training with a Pairwise distribution a
number of information retrieval (IR) metrics are available whose utility
is maximised by the tree growing algorithm. The metric
parameter stores the selection and currently the IR metrics available
are:
- “conc”: Fraction of concordant pairs - for binary labels this is equivalent to the area under the ROC curve.
- “mrr”: mean reciprocal rank of the highest-ranked positive instance
- “map”: mean average precision - generalization of “mrr” to multiple positive instances.
- “ndcg”: normalized discounted cumulative gain.
The default for group is "query" while
metric defaults to "ndcg". If map
or mrr are selected the response must be in
.
A cut-off in the ranking of items in a groups can be specified via
max_rank, the default for this is 0 (all ranks taken into
account) and is only applicable for “ndcg” and “mrr”. Finally, the
group_index or label can be specified directly - note this
is optional and will be calculated by gbmt.
# Create pairwise grouped data
# create query groups, with an average size of 25 items each
N <- 1000
num.queries <- floor(N/25)
query <- sample(1:num.queries, N, replace=TRUE)
# X1 is a variable determined by query group only
query.level <- runif(num.queries)
X1 <- query.level[query]
# X2 varies with each item
X2 <- runif(N)
# X3 is uncorrelated with target
X3 <- runif(N)
# The target
Y <- X1 + X2
# Add some random noise to X2 that is correlated with
# queries, but uncorrelated with items
X2 <- X2 + scale(runif(num.queries))[query]
# Add some random noise to target
SNR <- 5 # signal-to-noise ratio
sigma <- sqrt(var(Y)/SNR)
Y <- Y + runif(N, 0, sigma)
data <- data.frame(Y, query=query, X1, X2, X3)
# Create appropriate Pairwise object
pair_dist <- gbm_dist(name="Pairwise", group="query", max_rank=1, metric="ndcg")Quantile
To perform quantile regression a QuantileGBMDist object
must be passed to gbmt. The quantile to estimate is stored
in the parameter alpha and this defaults to
0.25.
# Create a QuantileGBMDist object
quant_dist <- gbm_dist(name="Quantile", alpha=0.1)TDist
The t-distribution requires its degrees of freedom (df)
to be set. The default value for this is four but it can be specified on
contruction of the associated GBMDist object.
# Creat a t-distribution object with 7 degrees of freedom
t_dist <- gbm_dist(name="TDist", df=7)Tweedie
The tweedie distribution relates the variance of the response to its
expectation via:
,
where p is the power of the distribution. This parameter is
specified through the power named argument on calling
gbm_dist and its default value is 1.5.
# Create a TweedieGBMDist object with a compound Poisson-Gamma power
tweedie_dist <- gbm_dist(name="Tweedie", power=1.5)
# Use the Gamma distribution for the Tweedie p = 2 endpoint
gamma_dist <- gbm_dist(name="Gamma")Tweedie distributions include various more familiar limiting cases,
but this implementation handles the power=0,
power=1, and power=2 endpoints through their
dedicated distribution choices rather than through
gbm_dist(name="Tweedie", ...):
- normal distribution: use
gbm_dist(name="Gaussian")instead ofpower=0. - Poisson distribution: use
gbm_dist(name="Poisson")instead ofpower=1. - compound Poisson-Gamma distributions:
1 < power < 2. - Gamma distribution: use
gbm_dist(name="Gamma")instead ofpower=2. - positive stable distributions:
2 < power < 3andpower > 3. - inverse Gaussian distribution:
power=3.
Note no Tweedie models exist for
0 < power < 1.