We start with a basic
We start with a basic description of the MT-SGL model. Consider a multi-task learning (MTL) setting with k tasks. Let p be the number of covariates, shared across all the tasks, and n be the number of samples. Let denote the matrix of covariates, be the matrix of responses with each row corresponding to a sample, and denote the parameter matrix, with column corresponding to task h, h = 1, …, k, and row corresponding to feature i, i = 1, …, p. The MTL problem can be set-up as one of estimating the parameters based on suitable regularized loss function:where L(·) denotes a convex loss function and R(·) is a convex and possibly nonsmooth regularization function. In the context of least squares regression, for example, the loss function is defined as follows,where are the ith rows of Y, X, respectively, corresponding to the multi-task response and covariates for the ith sample. We note that FTY720 Phosphate the MTL framework can be easily extended to other loss functions, especially losses corresponding to generalized linear models (GLMs) (Nelder and Baker, 1972). In particular, based on the response profile for some tasks (see Fig. 1 in Section 4.3), one could consider loss functions based Poisson regression given by,where [[exp(·)]] denotes sum over element-wise exponentiation of the matrix argument. For the MTL regularization R(Θ), different choices encourage different structures in the estimated parameters, e.g., unstructured sparsity (Lasso) with R(Θ) = ∥ Θ ∥ 1, feature sparsity with R(Θ) = ∥ Θ ∥ 2,1 (Liu et al., 2009) and structured sparsity (Yuan and Lin, 2006). Group regularizers like group lasso (Yuan and Lin, 2006) via an ℓ2,1 regularization assumes covarying variables in groups, and have been extensively studied in the multi-task feature learning. The regularization ℓ2,1-norm (), uses the ℓ2-norm within a group and the ℓ1-norm between groups. The difference of lasso and group lasso is illustrated in Fig. 3. The key assumption behind the group lasso regularizer is that if a few features in a group are important, then most of the features in the same group should also be important. Group lasso regularized multi-task learning (MT-GL) aims to improve the generalization performance by exploiting the shared features among tasks (Liu et al., 2009, Gong et al., 2012). It can identify important biomarkers, which potentially play the key roles in memory and cognition circuitry. The MT-GL algorithm and its extensions have been successfully applied to capture the biomarkers having affects across most or response in the application of AD prediction, since multiple cognitive assessment scores are essentially influenced by the same underlying pathology and only a subset of brain regions are relevant to these scores (Guerrero et al., 2017, Zhu et al., 2016, Yan et al., 2015). The MT-GL model via the ℓ2,1-norm regularization considersand is suitable for simultaneously enforcing sparsity over features for all tasks. We assume the p covariates to be divided into q disjoint groups , ℓ=1, …, q, with each group having mℓ covariates respectively. In the context of AD, each group corresponds to a region-of-interest (ROI) in the brain, and the covariates in each group correspond to specific features of that region. For AD, the number of features in each group, mℓ, ranges from 1 to 4, and the number of groups q can be in the hundreds. Then we introduce a G2,1-norm according to the relationship between the brain regions (ROIs) and cognitive tasks and encourage a task-specific subset of ROIs. The G2,1-norm is defined as:where is the weight for each group and is the coefficient vector for group and task h. Plugging G2,1-norm and ℓ2,1-norm to the formulation in Eq. (1), the objective function of multi-task sparse group lasso (MT-SGL) is given in the following optimization problem:where λ1 ≥ 0, λ2 ≥ 0 are the regularization parameters.