In this article we consider the varying coefficient model which allows the relationship between the predictors and response to vary across the domain name of interest such as time. a features and correspondingly a(is the response we are interested in and ε denotes the random error satisfying where predictors have an effect and the regions where they may not. This is comparable although different than variable selection as selection methods attempt MM-102 to decide MM-102 whether a variable is usually active or not while our interest focuses on identifying regions. For variable selection in a traditional linear model numerous shrinkage methods have been developed. They include least complete shrinkage and selection operator (LASSO) (Tibshirani 1996 Smoothly Clipped Complete Deviation (SCAD) (Fan and Li 2001 adaptive LASSO (Zou 2006 and excessively others. Even though LASSO penalty gives sparse solutions it prospects to biased estimates for large coefficients due MM-102 to the linearity of the L1 penalty. To remedy this bias issue Fan and Li (2001) proposed the SCAD MM-102 penalty and showed that this SCAD penalized estimator enjoys the oracle house in the sense that not only it can select the correct submodel consistently but also the asymptotic covariance matrix of the estimator is the same as that of the ordinary least squares estimate as if the true subset model is known as a from the population (in a small neighborhood of ≤ and and ≤ > 0 is the bandwidth controlling the size of the local neighborhood. It implicitly controls the model complexity. Consequently it is essintial to choose an appropriate smoothing bandwidth in local polynomial regression. We will discuss how to select the bandwidth in section 2.1. The kernel function = 1. There are numerous choices for the kernel function. Examples are Gaussian kernel and Epanechnikov kernel (? ? so that ? = (× matrix. Further denote γ= (≤ and be a 2dimensional vector. Define U= diag(? be the = (x(≤ and Γ= (Γ× 2matrix. Define Y = (= diag(which minimizes the (× 2matrix ? 1) MM-102 (1 ??≤ is usually and is Wwith replaced by for = 1 2 ? 5 where as the validation data set and the remaining four parts of data as the training data set. For a candidate bandwidth and each ∈ = by solving the minimization problem much like (2). After we get the estimates a(∈ is usually defined as ∈ [∈ [≤ to have a same variance. Denote to be the standard deviation of the pseudo covariates ?≤ that of ≤ = (1) and = (and can properly adjust the effect of the different rates of convergence of the function and derivative estimates as presented next. For the local polynomial regression it is no longer appropriate to use as the sample size because not all observations contribute equally to the estimation at any given location. In fact some will contribute nothing if the kernel has a bounded support. Thus motivated we define the effective sample size as can be obtained by solving > 0 and some constant > 2. In this paper we use = 3.7 as suggested by Fan and Li (2001). For any point in the domain name of = (to GU/RH-II be the local polynomial estimates of to be the penalized local polynomial estimates when the regularization parameter is usually λ. 3.1 Algorithms We discuss how to solve (5) in this subsection. For the SCAD penalization problem Fan and Li (2001) proposed the local quadratic approximation (LQA) algorithm to optimize the penalized loss function. With LQA the optimization problem can be solved using a altered Newton-Raphson algorithm. The LQA estimator however cannot accomplish sparse solutions directly. A thresholding has to be applied to shrink small coefficients to zero. To remedy this issue Zou and Li (2008) proposed a new approximation method based on local linear approximation (LLA). The advantange of the LLA algorithm is usually that it inherits the computational efficiency of LASSO and also produces sparse solutions. Denote at the with ‖·‖ denoting the can be chosen as the unpenalized local polynomial estimates. Based on our limited numerical experience one step estimates already perform very competitively and it is not necessary to iterate further. Observe Zou and Li (2008) for comparable discussions. Consequently the one step estimate is usually adopted due to its computational efficiency. 3.2.