# Robust beta estimation: some empirical evidence.

Beta coefficients estimated from the Capital Asset Pricing Model (CAPM) are important inputs in portfolio management and financial management. In portfolio management, beta serves as a measure of an asset's volatility relative to a diversified market portfolio. In financial management, betas are used to estimate cost of capital for risky projects. While recent evidence (e.g., Fama and French, 1992) appear to indicate that the explanatory power of beta is supplanted by other firm specific variables like firm size, price-to-book ratio and liquidity, it may be premature to dismiss the CAPM altogether. As Shanken and Smith (1996) point out, the statistical tests used by Fama and French (1992) have rather limited power in discriminating between alternative hypotheses inspite of using long sample periods (see also Kothari, Shanken and Sloan, 1995). Furthermore, instead of abandoning the CAPM framework as Haugen (1996) advocates, the CAPM may be regarded as a base case model in which variables other than beta may be added to increase explanatory and predictive power.While the question of which model best describes asset returns is still being debated, an issue of on-going concern to practitioners is how to reliably estimate asset betas, given that they have chosen a particular asset pricing model. The choice of estimation method has an important bearing on beta estimation partly because of the distributional features of asset-return innovations. In particular, it is well known that return innovations for many assets tend to be skewed and exhibit excess kurtosis (have fatter tails) relative to the normal distribution. Non-normality renders betas estimated by commonly used methods such as OLS extremely sensitive to unusual observations or outliers. In other words, such betas are non-robust.

This paper explores how betas can be estimated when the underlying distribution of the returns innovations is both skewed and leptokurtotic. Specifically, a flexible class of distribution known as the Generalized Exponential, is introduced in the context of regression models. Maximum likelihood procedures are described, and applied to estimate robust CAPM betas. The methods described in this paper are shown to apply to any pricing model that employs a parametric functional form, including multifactor models such as those described above.

This paper proceeds as follows. The Methods of Estimation section discusses OLS estimation as well as a number of robust estimation techniques that have been used in the econometrics literature. The Data and Preliminary Analysis section describes the data set used in this study, and presents descriptive statistics on the distribution of OLS residuals for individual stock returns. The Generalized Student-t distribution is introduced in the next section. The Estimation Methodology and Results section describes the estimation methodology and reports the main results of the study, including a small out-of-sample experiment to evaluate the predictive accuracy of robust betas. The last section concludes the paper with a summary of the main findings and their implications.

Methods of Estimation

Beta coefficients are typically estimated using Ordinary Least Squares under the assumption that the regression disturbances are normally distributed. When the normality assumption holds, OLS estimators attain the Cramer-Rao lower bound and are thus the most efficient among all unbiased estimators. However, since the early studies of Mandlebrot (1963) and Fama (1965), empirical evidence has shown that the residuals of regressions involving stock returns are typically leptokurtic or have fatter tails than the normal distribution. Leptokurtosis in turn implies that the distribution contains a higher proportion of outliers than under a normal distribution.

It is well known that under non-normality, OLS estimators are no longer the most efficient of all unbiased estimators, even in large samples. Furthermore, non-normality can cause OLS estimates to be extremely sensitive to the influence of outliers. For example, even if a sample contains only a few large observations, this can easily cause the OLS regression line to swing towards those observations despite these observations being outliers.

In recent years, various alternatives to OLS have been proposed to deliver more distribution robust estimators. These alternatives include robust regression (Huber, 1981) and adaptive estimation (Bickel, 1982; Newey, 1988 and Manski, 1984). Robust regression is non-adaptive in that it does not allow the distributional features of the data to influence the outcome of the estimation process. Instead, non-normality is handled by the use of various weighting schemes to downplay the influence of outliers. In contrast, adaptive estimation seeks to estimate the parameters of the model under weak distributional assumptions, thus permitting the data to "speak for itself". Adaptive estimators are asymptotically more efficient than either OLS or robust estimators since they do not depend on particular distributional assumptions. More specifically, adaptive estimators can be as efficient asymptotically as maximum likelihood estimators obtained with knowledge of the true distribution.

A third approach that has gained popularity in recent years is partially adaptive estimation. In this approach, parameters of the model are estimated using a flexible distributional form which can model some salient aspects of the data (such as skewness, kurtosis or multimodality). While partially adaptive estimators will not be as asymptotically efficient as fully adaptive estimation, there are good practical reasons for using this approach. First, as pointed out by MacDonald and Newey (1988), the efficiency of adaptive estimators holds only in large samples, and therefore need not carry over to small sample situations that are often encountered in practice. For example, based on simulation evidence, MacDonald and White (1993) show that partially adaptive estimators generally outperform robust estimators in having lower standard errors. For most cases, partially adaptive estimators also perform on par with adaptive techniques such as kernel regression (Manski, 1984) and fully iterative Generalized Method of Moments (GMM) estimation (Newey, 1988). Second, partially adaptive estimation is much less demanding computationally than either adaptive estimation or robust regression, especially if flexible but parsimonious distribution specifications are used. This paper employs partially adaptive estimation to estimate CAPM betas. Returns innovations are assumed to follow a Generalized Student-t distribution which allows for skewness and leptokurtosis. In many empirical studies, the standard Student-t distribution has been found to be quite successful in modeling highly leptokurtic distributions (e.g., Blattberg and Gonedes, 1974; Bollerslev, 1987 and Kon, 1994). The Generalized Student-t generalizes the Student-t to the case where skewness is allowed. The need to model both skewness and excess kurtosis seems important given that these two forms of non-normality feature in many asset return innovations.

Data and Preliminary Analysis

The data for this study consists of monthly returns on 22 stocks which are constituents of the Straits Times Industrial Index (STII). The STII is a price-weighted index of 30 leading industrial stocks that trade on the Stock Exchange of Singapore (SES). It is monitored closely by the investment community as its component stocks are generally regarded as proxies for the Singapore economy.

Returns are computed as the natural logarithm of price relatives i.e. ln ([P.sub.t]/[P.sub.t-1]) where [P.sub.t] refers to the price of a stock at the end of month t. The sample period is the six years from January 1990 through December 1995. Only component stocks that were listed continuously over this sample period were included in the sample. Of the 30 stocks in the index, 22 fulfill this criterion. The names of these 22 stocks are listed in the Appendix.

As a proxy for the market portfolio, we use the Straits Times All-Singapore Index. This is a value-weighted index of all Singapore-incorporated stocks listed on the SES.(1) Finally, as a proxy for the risk-free rate, we use returns on three-month Singapore Government Treasury Bills.

Data on stock returns were obtained from the National University of Singapore Financial Database and data on the risk-free rate were gathered from the Monthly Statistical Bulletin published by the Monetary Authority of Singapore.

To focus on the issue of robust beta estimation, we consider the simple Sharpe-Litner-Mossin CAPM (Sharpe, 1964; Lintner, 1965; and Mossin, 1966). Empirically, the model can be written as:

[r.sub.it] - [r.sub.ft] = [[Beta].sub.0] + [[Beta].sub.i]([r.sub.mt] - [r.sub.ft]) + [u.sub.t] (1)

where [r.sub.i] is the rate of return on the ith asset, [r.sub.m] is the rate of rerum on the market portfolio, [r.sub.f] is the risk-free rate and [u.sub.t] is the disturbance term. Many other versions of the CAPM have been used in the literature, including the one factor world-index model (Cumby and Glen, 1990), two factor models with exchange risk factors (Adler and Dumas, 1983) and various conditional multi-beta pricing models (e.g., Shanken, 1990). A recent comparison of these models for emerging markets can be found in Harvey (1995). On the whole, the empirical evidence on the explanatory power of these competing models for emerging markets is found to be mixed. Because of its simplicity, the one factor CAPM is still the most popular version used in practice. Hence, there is merit in using it to study the robustness of beta estimation. Moreover, the simplicity of the one factor CAPM allows us to focus clearly on distributional issues without the interference of other factors. Nonetheless, the methods examined in this paper should be of interest to users of other versions of the CAPM as well.

We begin by estimating equation (1) by OLS. Table 1 presents some descriptive statistics on the distribution and time series properties of OLS residuals for each of the 22 stocks. Table 1 shows that the distribution of OLS residuals for most stocks is skewed (mainly to the right) and has heavy tails. The median skewness coefficient is 0.53, compared to zero for the normal distribution. In 9 cases, the skewness coefficient exceeds 0.70. The median kurtosis is 4.65 compared to 3 for the normal distribution, confirming the well documented fat tail phenomenon found in many empirical studies. Severe leptokurtosis is apparent in 6 cases where the kurtosis is more than twice that of the normal distribution. Based on the Bera-Jarque test of normality (a joint test of skewness and excess kurtosis), we reject the null hypothesis of a normal distribution for 14 stocks, i.e. nearly two-thirds of the sample. To determine the source of rejections, two other tests were performed. The first of these is a third-moment test for skewness alone (see D'Agostino et al., 1990). The test statistic for the skewness test is denoted by Z([square root of [b.sub.1]], following the notation used by D'Agostino et al. The second test is a fourth-moment test for excess kurtosis. The fourth-moment test statistic is denoted by Z([b.sub.2]). Under the [TABULAR DATA FOR TABLE 1 OMITTED] null hypothesis that the data is not skewed or leptokurtic, the two test statistics are each asymptotically normal.

The results of the tests for skewness and excess kurtosis are consistent with the Bera-Jarque joint test. All 14 stocks identified by the Bera-Jacque test as non-normal were also detected by the individual tests. More interestingly, the individual tests reveal that 10 of the 14 stocks (highlighted in bold) violate normality on account of both skewness and excess kurtosis. The importance of modeling return innovations with distributions that allow for both skewness and fat tails is clear.

Fat tails can be a symptom of time-varying volatility such as ARCH or GARCH effects (Engle, 1982; Bollerslev, 1986). To check whether the data exhibit time-varying volatility, we compute the Lagrange Multiplier (LM) test suggested by Engle (1982). This statistic is calculated as n (number of observations) times [R.sup.2], the coefficient of determination for a least square regression of the squared OLS residuals on a constant and k lagged squared residuals. The intuition behind this test is as follows. If the data is homoskedastic, then the variance cannot be predicted. Thus, variations in squared residuals will be purely random. On the other hand, if ARCH effects are present, then the squared residuals will be temporally correlated. Following the recommendation of Godfrey (1987) that the number of lags chosen should take into account the periodicity of the data, we compute LM statistics for k ranging from 1 to 12 months (see Table 2). Under the null hypothesis of no ARCH effects, the LM statistic for lag k is asymptotically distributed as a Chi-square variate with k degrees of freedom. For our purposes, sample values of n[R.sup.2] exceeding the 95th percentile of the Chi-square distribution is taken as evidence that ARCH effects are present.

Table 2. LM Test of ARCH in OLS Residuals Lags (Months) Stock 1 3 6 9 12 Avimo 1.15 1.81 2.93 5.18 3.60 C&C 0.55 1.24 2.44 3.52 16.65 Cerebos 9.29(*) 8.71(*) 10.46 7.76 7.20 F&N 0.01 4.36 5.57 5.01 13.47 Hawpar 0.81 5.04 10.69 9.24 16.90 Inchcape 2.61 1.79 4.22 10.28 7.68 Intraco 0.43 0.71 2.25 3.14 3.04 Keppel 1.28 2.19 2.00 3.80 11.18 LumChang 0.37 1.64 10.12 11.23 15.03 Metro 0.00 2.68 5.20 11.72 8.41 NOL 0.06 0.52 1.36 1.76 0.60 NatSteel 1.30 3.03 4.41 6.38 16.04 SingBus 0.79 0.97 1.94 3.97 8.29 Sembawang 0.26 2.30 5.82 9.72 15.15 SIA 0.42 2.23 3.20 3.72 6.03 Singtron 0.02 0.78 1.55 2.40 4.61 SPH 0.03 0.24 0.57 7.25 6.24 Times Pub 0.18 1.10 4.84 12.57 14.52 UIC 3.44 3.26 2.32 7.65 5.16 WBL 0.22 1.85 13.11(*) 6.76 20.27(*) WingTai 1.25 0.79 5.44 5.84 5.63 YHS 0.08 2.18 3.30 3.42 6.60 Notes: Lagrange Multiplier (LM) statistics are computed for CAPM (OLS) squared residuals over lags ranging from 1-12 months. The LM statistic is calculated as n[R.sup.2] where [R.sup.2] is the coefficient of determination obtained by regressing CAPM squared residuals on a constant plus lagged squared residuals (lags = 1-12 months) and n is the number of observations. Under the null hypothesis of no ARCH effects, the LM statistic follows a Chi-square distribution with degree of freedom (df) equal to the number of lags in the auxiliary regression. The 95% critical values of the Chi-square distribution for the various lags are as follows: Df 95% Critical Value 1 3.84 3 7.81 6 12.59 9 16.92 12 21.03 * denotes statistic is significant at the 5% level.

From Table 2, we find that ARCH effects are evident in only two stocks: Cerebos and WBL. For Cerebos, low order ARCH effects at lags 1 and 2 were found. For WBL, ARCH effects were detected at higher orders. In general, it appears that there is no pervasive evidence of changing conditional volatility in our sample of monthly returns. This finding is not surprising as the absence of strong ARCH effects in low frequency data (e.g. monthly) has been documented in many previous studies (Diebold, 1988; Baillie and Bollerslev, 1989; Drost and Nijman, 1993). We are therefore led to the conclusion that the heavy tails of OLS residuals cannot be attributed mainly to conditional heteroscedasticity.

Table 3. LM Test of Serial Correlation in OLS Residuals Lags (Months) Stock 1 3 6 9 12 Avimo 0.94 4.79 13.68(*) 15.26 15.24 C&C 0.13 4.88 7.94 12.68 15.81 Cerebos 9.85(*) 10.89(*) 17.06(*) 19.71(*) 7.19 F&N 2.46 6.12 12.03 13.86 15.93 Hawpar 0.01 9.00(*) 11.67 14.39 16.22 Inchcape 1.14 4.58 5.50 7.10 8.09 Intraco 0.34 3.99 7.91 9.11 9.75 Keppel 3.33 6.35 9.25 8.83 12.65 LumChang 0.37 1.68 2.78 7.01 8.10 Metro 0.22 1.93 3.66 8.94 10.77 NOL 1.44 1.98 8.30 9.62 9.47 NatSteel 0.04 1.64 5.04 7.40 12.10 SingBus 0.62 0.69 1.93 5.60 11.14 Sembawang 0.44 5.78 5.64 5.99 11.47 SIA 0.03 0.55 1.04 5.78 10.61 Singtron 0.01 1.83 15.24(*) 17.30(*) 13.78 SPH 0.45 2.30 5.81 7.45 11.87 Times Pub 0.03 4.89 5.45 14.07 15.19 UIC 5.09(*) 5.89 5.91 3.53 10.20 WBL 2.82 2.97 6.11 10.13 12.97 WingTai 0.04 1.05 6.04 9.38 13.00 YHS 0.57 3.93 8.59 11.03 13.23 Notes: Note: Lagrange Multiplier (LM) statistics are computed for CAPM (OLS) residuals over lags ranging from 1-12 months. The LM statistic is calculated as n[R.sup.2] where [R.sup.2] is the coefficient of determination obtained by regressing CAPM residuals on a constant plus lagged CAPM residuals (lags = 1-12 months) and n is the number of observations. Under the null hypothesis of no serial correlation, the LM statistic follows a Chi-square distribution with degree of freedom (dr) equal to number of lags in the auxiliary regression. The 95% critical values of the Chi-square distribution for the various lags are as follows: Df 95% Critical Value 3 .84 3 7.81 6 12.59 9 16.92 12 21.03 * denotes statistic is significant at the 5% level.

We also mounted an LM test to check for serial correlation in the OLS residuals. The LM statistic for this test is computed in the same manner as above, except that the auxiliary regression is run using OLS residuals instead of squared residuals. The asymptotic distribution for this statistic is also Chi-square with k degrees of freedom. The results of this test are reported in Table 3. The null hypothesis of zero serial correlation can be rejected for 5 stocks, mostly at short lags (k [less than or equal to] 3).

In summary, most of the OLS residuals are serially uncorrelated and exhibit little evidence of second-order dependence. The distribution of the residuals, however, is asymmetric and leptokurtic. Thus, betas estimated by OLS can be expected to be highly sensitive to outliers. The next section introduces the Generalized Student-t distribution which can be used to model both asymmetry and leptokurtosis.

The Generalized Student-t Distribution

A number of alternative distributions have been used in place of the normal distribution to model return innovations. Praetz (1972) and Clark (1973) show that leptokurtosis can be modeled by assuming that the innovations come from a mixture of normal distributions. As Clark (1973) shows, the motivation for assuming a mixture of normal distribution is that if information arrival is stochastic, then the unconditional distribution of the innovations will follow a mixture of normal distribution with time-varying or heteroskedastic variance.

One of the most widely used mixtures of normal distribution is the Student-t distribution (see Bollerslev, 1987 and Butler et. al., 1990). In an early study, Blattberg and Gonedes (1974) show that the Student-t distribution is a mixture of two normal distributions where the variance of each normal distribution follows an inverted gamma distribution. The Student-t density with v degrees of freedom (v [greater than or equal to] 2) and normalized (i.e., unit) variance is:

[Mathematical Expression Omitted] (2)

The normal distribution is a limiting case of the Student-t when v [approaches] [infinity]. Excess kurtosis can be captured by the Student-t distribution with a sufficiently low degree of freedom.

Empirical evidence indicates that the Student-t is able to fit leptokurtic distributions relatively well. Blattberg and Gonedes (1974) compare the empirical fit of the Student-t and stable Paretian distribution proposed by Mandelbrot (1963) and Fama (1965) and find that the Student-t provides a better description of stock returns than does the stable distribution. Additional evidence that supports the Student-t over the stable distributions are Akgiray and Booth (1988) and Tucker (1992). Kon (1994) finds that the Student-t also compares well with other distributions such as the poisson jump-diffusion model (Press, 1967; Ball and Torous, 1985) and the generalized discrete mixture-of-normals proposed by Kon (1984) in modeling the standardized (GARCH) residuals of daily stock returns.

The Student-t distribution has also been successfully applied to model the distribution of exchange rates. For example, Baillie and Bollerslev (1989) find that the Student-t distribution describes the data relatively well compared with the Box-Tiao power exponential distribution in capturing excess kurtosis for most currencies (see also Hsieh, 1989).

A drawback of the Student-t distribution is that it is symmetric and therefore cannot be used to model skewness. Typically, the distribution of return innovations is both skewed and leptokurtic. Thus, more general distributions are needed to adequately capture both of these features in the data. In theory, skewness and leptokurtosis can be modeled by a finite mixture of normal distributions, as in Kon (1984), where each return innovation is assumed to be drawn from one of a finite number of normal distributions with some mixing probability. But the estimation of such models is computationally burdensome. Consequently, the number of normal distributions used in practice is usually limited to about four or five. Even so, non-convergence of the estimates is often encountered.

This paper applies a generalization of the Student-t distribution which can handle both skewness and excess kurtosis in a parsimonious way. This distribution is the Generalized Student-t distribution (GET) introduced by Lye and Martin (1993). An important advantage of the GET over finite mixture of normals is that the GET involves only a few extra parameters over OLS. Thus, estimation with the GET is less computationally intensive than estimation using a mixture of normals, as well as other methods such as robust regression or adaptive methods.

The GET distribution is a subordinate of a very flexible class of distributions known as the Generalized Exponential family. The following is a summary of the Generalized Exponential family and the GET subordinate. The exposition given here is based mainly on Lye and Martin (1993). Further theoretical discussions of the Generalized Exponential family can be found in Barndorff-Nielson (1978).

The Generalized Exponential family extends the well-known Pearson family of distributions to the case where multimodality is allowed. Consider the standard Pearson family which are solutions of the following differential equation:

df/du = -g(u)f(u)/h(u) (3)

where g(u) and h(u) are polynomials in the random variable u and f(u) is the density of u. The general solution of (3) is:

[Mathematical Expression Omitted] (4)

where [Eta] is a normalizing constant given by:

[Mathematical Expression Omitted] (5)

In the standard Pearson family, g(u) and h(u) are polynomials in u of degrees at most one and two respectively. The Generalized Exponential family generalizes this polynomial restriction in two directions. First, degrees of polynomials above one for g(u) and two for h(u) are allowed. More importantly, g(u) and h(u) need not be strictly polynomials but can also assume more general functional forms.

Several important subordinate families are members of the Generalized Exponential family. This includes the GET, the Generalized Lognormal and the Generalized Beta distributions. These subordinate families in turn nest a very wide variety of distributions of empirical interests. For example, nested within the GET are the standard Student-t, Normal and Cauchy distributions. The Generalized Lognormal encompasses the Gamma and Generalized (multimodal) Gamma distributions, while the Generalized Beta includes as special cases, the Beta, Beta of the second kind, and Inverted Gamma distributions.

The focus of this paper is the GET distribution. The GET generalizes the standard Student-t distribution by allowing the distribution of errors to be asymmetric and even multimodal. The GET is obtained by specifying g(u) and h(u) as:

g(u) = [summation of] [[Alpha].sub.i][u.sup.i] where i = 0 to m = 1 (6)

h(u) = [[Gamma].sup.2] + [u.sup.2] (7)

where m is the number of parameters, and [[Gamma].sup.2] is the degree of freedom parameter. Important special cases of the GET can be derived by setting m = 6. Lye and Martin (1993) show that for m = 6, the GET can be expressed as:

GET(u) = exp ([[Theta].sub.1] [tan.sup.-] 1 (u/[Gamma]) + [[Theta].sub.2] log ([[Gamma].sup.2] + [u.sup.2]) + [summation of] [[Theta].sub.i] [u.sup.i-2] - [Eta] where i = 3 to 6) (8)

where the normalizing constant [Eta] is:

[Eta] = log [integral of] exp ([[Theta].sub.1] [tan.sup.-] 1 (u/[Gamma]) + [[Theta].sub.2] log ([[Gamma].sup.2] + [u.sup.2]) + [summation of] [[Theta].sub.1] [u.sup.i-2] between limits [infinity] and -[infinity]

The six parameters can be written as:

[[Theta].sub.2] = -[[Alpha].sub.1]/2 + [[Gamma].sup.2][[Alpha].sub.3]/2 + [[Gamma].sup.4][[Alpha].sub.5]/2

[[Theta].sub.3] = -[[Alpha].sub.2] + [[Gamma].sup.2][[Alpha].sub.4]

[[Theta].sub.4] = -[[Alpha].sub.3]/2 + [[Gamma].sup.2][[Alpha].sub.5]/2

[[Theta].sub.5] = -[[Alpha].sub.4]/3

[[Theta].sub.6] = -[[Alpha].sub.5]/4 (10)

The unimodal normal distribution is a special case of the GET, obtained by setting [[Theta].sub.1] = [[Theta].sub.2] = [[Theta].sub.5] = [[Theta].sub.6] = 0. The unimodal Student-t is derived by setting [[Theta].sub.2] = -(1 + [[Gamma].sup.2])/2 and [[Theta].sub.1] = [[Theta].sub.3] = [[Theta].sub.4] = [[Theta].sub.5] = [[Theta].sub.6] = 0. To allow for skewness as well as leptokurtosis, we consider the following specification of the GET:

GET(u, [Sigma], [Gamma], [[Theta].sub.1]) = -exp([[Theta].sub.1] [tan.sup.-] 1 (e/[Gamma]) - ([[Gamma].sup.2] + 1)/2 log (([[Gamma].sup.2] + [e.sup.2]) - [Eta]) (11)

where e = u/[Sigma] represents standardized regression residuals and the Student-t restriction for [[Theta].sub.2] is imposed to allow for leptokurtosis. In addition, skewness is modeled through the arctan term. The model is parsimonious in that it involves only two more parameters compared with OLS. In the next section, we discuss maximum likelihood estimation of GET parameters and the asymptotic properties of such estimators. The GET is then applied to our dataset to estimate robust betas.

Estimation Methodology and Results

Parameters of Generalized Exponential densities can be estimated by maximum likelihood. Under appropriate regularity conditions, the maximum likelihood estimator is consistent and asymptotically normal (Berk, 1972),

For our GET specification, the log-likelihood function is:

log [L.sub.n] = [summation over i] ([[Theta].sub.1] [tan.sup.-] 1 (e/[Gamma]) - ([[Gamma].sup.2] + 1)/2 log ([[Gamma].sup.2] + [e.sup.2]) - [Eta]) (12)

There are five parameters to estimate when this specification is applied to the CAPM. They are: [[Beta].sub.0], [[Beta].sub.1], [[Theta].sub.1], [Gamma] and [Sigma] (standard deviation of residuals). The BHHH algorithm of Bernt, Hall, Hall and Hausman (1974) with numerical first derivatives is used in the optimization. The BHHH is a quasi-Newton method that uses the cross-product of the matrix of first derivatives to estimate the Hessian matrix. To estimate the normalizing constant, numerical integrations were carried out using the Gauss-Legendre integration routine. For each stock, 20 different arbitrary values were tried. Typically, a single optimum is obtained. All computations were performed using the matrix programming language, GAUSS along with the maximum likelihood module, OPTMUM.

The maximum likelihood results are presented in Table 4. The first three columns of the table report betas estimated under OLS, Student-t and GET distribution. The GET betas are generally smaller than Student-t betas which in are turn smaller than OLS betas. On average, the difference in magnitude between GET betas and OLS betas is quite large. The median GET beta is 1.0, compared to 1.16 for OLS betas. Cases where the difference between OLS and GET betas are very pronounced include Avimo (OLS: 1.53, GET: 1.12), Cerebos (1.37, 0.71), NOL (1.66, 1.20) and YHS (1.17, 0.81). These stocks generally have above-average levels of (positive) skewness, excess kurtosis or both. Thus, allowing for skewheSS and excess kurtosis appear to make a big difference in beta estimates. In particular, many stocks apparently have lower systematic risks than is implied by OLS betas.

The next three columns of Table 4 present the log-likelihood values for each of the three distribution specifications. The median log-likelihood for OLS is 76.37. The median log-likelihood under the Student-t and GET increases to 78.39 and 78.46 respectively. Thus, both the Student-t and the GET models seem to fit the data better than OLS. To test whether the differences in likelihood are statistically significant, likelihood ratio tests were performed. The results of the tests are reported in the last three columns of Table 4. The first test compares OLS with the Student-t specification as the null hypothesis. The likelihood ratio statistic for this test is a Chi-square variate with one degree of freedom. Based on the 5% significance level, we see that the Student-t yields a significantly better fit for 12 stocks and did not perform worse than OLS for the remaining 10 stocks in the sample. Interestingly, the distributions of the 12 stocks are characterized by relatively high degrees of asymmetry and kurtosis compared to the whole sample. The median skewness of the 12 stocks is 0.90 (sample: 0.53) and the median kurtosis is 6.23 (sample: 4.65). Clearly, the Student-t's ability to describe the data better than OLS inspite of skewness is due to very heavy tails in the returns distribution for these 12 stocks.

The second likelihood ratio test compares OLS with the GET specification as the null hypothesis. The likelihood ratio statistic for this test is a Chi-square variate with two degrees of freedom. Results for this test are given in the second last column of Table 4. We find that at the 5% level, the GET describes the data significantly better than OLS in 13 stocks. These include all the 12 stocks mentioned previously where the Student-t distribution provided a better fit than OLS, plus an additional stock (LumChang) which has a highly skewed distribution (skewness: 1.02) and a moderate degree of excess kurtosis (kurtosis: 3.63).

Finally, to determine whether the GET distribution offers a superior fit than the Student-t distribution, a third likelihood ratio test was performed with the Student-t [TABULAR DATA FOR TABLE 4 OMITTED] as the null hypothesis. The results of this test confirm our a priori expectation that the GET should outperform the Student-t, especially in view of the fact that the distribution of many stocks in the sample were highly non-normal. Specifically, we find that of the 12 stocks where Student-t and GET distributions delivered higher likelihood than OLS, the GET outperformed the Student-t in 7 cases at the 5% level, and 10 cases at the 10% level (although it underperformed the Student-t in 3 cases). For convenience, these 13 stocks are listed in Table 5, along with statistics on the skewness and kurtosis of their distributions.

Table 5. Stocks with Significant Difference in Student-t and GET Likelihoods LR Test: Descriptive Statistics Stock GET vs t Skewness Kurtosis Avimo 7.92(*) 0.97 4.50 Cerebos 6.32(*) 0.31 12.39 LumChang 14.12(*) 1.02 3.63 NOL 3.06(+) 3.43 22.0 Singbus 3.86(*) 0.64 3.54 SIA 3.74(+) 1.04 4.80 Singtron 5.26(*) 0.42 5.43 UIC 4.36(*) 4.15 28.05 WBL 3.22(+) 1.03 5.80 YHS 5.16(*) 0.91 4.92 Hawpar -5.44(*) -0.30 2.91 NatSteel -3.76(+) 0.18 2.89 Sembawang -3.36(+) 0.11 2.65 Notes: This table summarises ten stocks for which the GET outperforms the Student-t in terms of likelihood ratio test with the Student-t distribution as the null hypothesis, and three stocks for which the GET underperforms the Student-t. Entries for this table are extracted from Table 1 and Table 2. * denote significance at the 5% level and + denotes significance at the 10% level.

Table 5 shows that the median skewness and kurtosis for the 10 cases where the GET improved over the Student-t is 1.0 and 5.18 respectively. Thus, 50% of these cases have a skewness coefficient above 1.0. Clearly, this is a group of stocks with highly asymmetrical distributions. The GET distribution ought to provide a better fit for this group, and the results show that it did.

In summary, the GET distribution is found to provide a better description of the data than either the normal distribution or the Student-t distribution. For the sample of stocks used in this study, GET betas were found to be generally smaller than OLS betas, indicating that stocks may have lower systematic risks than is implied by OLS betas. This result has important practical implications in portfolio management as well as financial management. For example, in portfolio management, fund managers may wish to switch from low beta stocks to high beta or "aggressive" stocks in order to time a rising market. Using OLS betas, their portfolios may unwittingly end up with stocks that have rather low market risks, the opposite of what is intended. In financial management, reliance on OLS betas may result in overestimation of required returns, and hence, rejection of many otherwise sound investment projects.

The analysis so far is based on the ability of the normal, Student-t and GET three distributions to fit the data in-sample. However, it is also interesting to assess the relative forecasting performance of betas estimated under the three distributions. A major difficulty in using ex-post betas for prediction is that returns may be non-stationary. Thus, ex-post betas may be "stale" estimates of ex-ante betas. To minimize the stale beta problem, we shall limit ourselves to one-month-ahead forecasts over a relatively short horizon of one year. Betas which were estimated over the five year period 1990-1994 were used to forecast returns for each month in 1995. The risk-free rate as at the end of month t - 1 was used to construct the forecasts for month t. A constant market risk premium was used for all the forecasts. This is taken as the average excess return on the SES All-Singapore Index over the period from 1990-1994. For each stock, we compute the mean square error (MSE) Of prediction based on OLS, Student-t and GET betas. Table 6 presents the percentage improvement in mean square errors from using Student-t and GET betas over OLS betas. This is computed as follows:

[MSE.sub.1] - [MSE.sub.2]/[MSE.sub.1] x 100 (12)

where [MSE.sub.1] is the mean square error arising from OLS betas, and [MSE.sub.2] is the mean square error arising from either Student-t or GET betas.

The results show that with the exception of three or four stocks, the use of Student-t or GET betas generally results in lower mean square errors than OLS betas. For example, using GET betas, improved forecasting accuracy was recorded for 17 stocks. The magnitude of improvement, however varies widely from 0.02% per month to 2.41% per month. Across all 22 stocks, the median percentage improvement in MSE is a modest 0.21% per month for GET betas. The corresponding figure for Student-t betas is 0.069% per month. Nonetheless, these improvements in MSE are statistically significant at 5% for both Student-t and GET betas based on a parametric t test of difference in means. The same conclusion holds when the non-parametric Wilcoxon test of medians was used instead. The out-of-sample results therefore provide further evidence on the utility of estimating betas robustly.

Conclusions

The impact of allowing for skewness and excess kurtosis in estimating CAPM betas was examined using the Generalized Student-t Distribution. Betas were estimated for 22 stocks listed on the Singapore Stock Exchange under the assumption that the regression residuals follow a normal, Student-t and GET distribution.

The distribution of OLS residuals for the 22 stocks were generally found to be skewed and leptokurtic. The GET distribution was able to fit the return innovations significantly better than the Student-t and the normal distribution. Reflecting the concentration of large outlying returns in the sample, OLS betas were generally found to overestimate systematic risk compared to GET betas. Based on a small out-of-sample experiment, the GET and to a lesser extent, the Student-t distribution was found to outperform OLS betas in forecasting ability.

Table 6. Percentage Improvement in Mean Square Errors Percentage Improvement over OLS Stock Student-t GET Avimo 0.38 0.36 C&C 0.02 0.02 Cerebos -0.61 0.83 F&N 0.00 0.00 Hawpar 0.06 0.23 Inchcape 0.12 0.20 Intraco 0.21 0.07 Keppel 0.00 -0.07 LumChang 1.41 1.62 Metro 0.00 0.04 NOL 2.01 2.41 Natsteel 0.09 0.22 Singbus 0.28 1.73 Sembawang 0.07 0.78 SIA 2.00 1.30 Singtron 0.02 0.03 SPH 0.29 0.35 Times Pub 0.07 0.42 UIC -0.27 -0.19 WBL 0.15 0.46 WingTai 0.02 0.04 YHS -0.55 -0.68 Mean: 0.26 0.39 Median: 0.069 0.21 t statistic (prob) 1.77(*)(0.046) 2.20(*)(0.0069) Wilcoxon Signed Rank Test: 2.05(*)(0.020) 2.89(*)(0.0019) z-statistic (prob) Notes: Table reports the forecasting performance of betas estimated under Student-t and GET distributions relative to OLS betas. Estimation period is the 60 months from January 1990-December 1994. Estimated betas were used to form one-month ahead returns forecast for each of the 12 months in 1995. Mean square forecast error (MSE) is then computed for each stock by averaging the monthly forecast errors. Percentage improvement in MSE is calculated as follows: 100 x ([MSE.sub.1] - [MSE.sub.2])/[MSE.sub.1] where [MSE.sub.1] is the mean square error based on OLS betas, and [MSE.sub.2] is the mean square error based on the assumption of a Student-t or GET distribution for the regression residuals. Two tests of difference in MSE are reported. The first is a t test of difference in average MSE across all 22 stocks and the second test is Wilcoxcon's non-parametric test of difference in median MSE across the 22 stocks. Figures in parentheses refer to one-tail probability values. * denotes significance at the 5% level.

Apart from providing a method for estimating more robust betas, the results of this paper also have implications for a number of other issues in finance. For example, event studies typically focus on the behavior of market model or CAPM residuals surrounding an "event." The information content of the event is then judged by performing standard t tests on the standardized residuals. The use of such tests however assumes that these residuals are i.i.d normal. This paper indicates that the normality assumption is likely to be violated. Non-normality in turn renders conventional t or F tests invalid, at least in small samples (see Amemiya, 1985). Thus for event studies constrained by sample size, non-parametric tests such as the Sign test or Wilcoxon test may yield more reliable inferences than the standard tests.

Our results also have some bearing on tests of the CAPM. For example, if the CAPM is tested by regressing excess returns as in equation 1, then the intercept term should be zero if CAPM holds. However, if regression residuals are non-normal, tests of a zero intercept may be severely misleading. Specifically, when return innovations are positively skewed, then as our results show, OLS betas estimated by assuming normal errors will be larger while intercept estimates will be smaller than estimates obtained by using a positively skewed distribution. Conversely, the intercept estimate will be "inflated" if the innovations are negatively skewed. Such problems will affect most tests of the CAPM, including the commonly used two-pass regression technique of Black, Jensen and Scholes (1972) and Fama and MacBeth (1973). The problem, however, can be reduced by using a more flexible distribution such as the Generalized student-t which allows for both skewness and leptokurtosis in the data.

Finally, it is instructive to consider the implications of our empirical results for equilibrium asset pricing theories. Two-moment asset pricing models like the CAPM that assumes mean variance analysis optimization are usually justified by appealing to the approximate normality of return innovations. This study indicates that normality appears to be the exception rather than the rule. Nevertheless, as Ingersoll (1987) points out, the mean variance framework is consistent with a broader class of symmetric elliptical distributions, which include the Student-t as a special case. This is significant, for it implies that models like the CAPM can accommodate distributions with thicker tails than the normal distribution. Moreover, this result is readily generalizable to multibeta versions of the CAPM, thus accommodating a far richer range of pricing models (see Lee et al., 1996). The problem of skewness, however, is still unresolved. Raw skewness per se should be irrelevant because the idiosyncratic component can be diversified away in a portfolio. As with systematic risk, co-skewness may be priced, provided this is a persistence feature of the data (Singleton and Wingender, 1986). This is clearly an empirical issue. Thus far, evidence for the three moment CAPM incorporating co-skewness has been inconclusive (see Tan, 1994). Thus, from an asset pricing stand point, the utility of incorporating asymmetry in the distribution of asset returns remains to be seen.

Acknowledgments: The author is thankful for help received from Vance Martin and Whitney Newey as well as helpful comments from an anonymous referee. All errors or omissions are my responsibility.

[TABULAR DATA FOR APPENDIX OMITTED]

Notes

1. The SES All Singapore Index, which has 1975 as its base year, was known as the SES All Share Index prior to the delisting of Malaysian stocks from the SES in January 1990. The SES All Share Index is constructed in a similar way to the SES All Singapore Index except that the former includes both Singapore and Malaysian stocks.

References

Akgiray, V. and G.G. Booth. 1988. Stock Price Processes with Discontinuous Time Paths: An Empirical Examination, Financial Review, 21: 163-184.

Amemiya, T. 1987. Advanced Econometrics, U.K.: Basil Blackwell.

Baillie, R.T. and T. Bollerslev. 1989. The Message in Daily Exchange Rates: A Conditional Variance Tale. Journal of Business and Economic Statistics, 7: 297-305.

Ball, C.A. and W.N. Torous. 1985. On Jumps in Common Stock Prices and their Impacts on Call Option Pricing. Journal of Finance, 40: 155-173.

Barndorff-Nielson. 1978. Information and Exponential Families in Statistical Theory. New York: John Wiley.

Berk, R. H. 1972. Consistency and Asymptotic Normality of MLE's for Exponential Models, Annals of Mathematical Statistics, 43: 193-204.

Bickel, P.J. 1982. On Adaptive Estimation. Annals of Statistics, 10: 647-671.

Black, F., M.C. Jensen and M. Scholes. 1972. The Capital Asset Pricing Model: Some Empirical Tests. Studies in the Theory of Capital Markets, edited by M.C. Jensen. New York: Praeger Publishers.

Blattberg, R.C. and N.J. Gonedes. 1974. A Comparison of the Stable and Student Distribution as Statistical Models for Stock Prices. Journal of Business, 47: 244-280.

Bollerslev, T. 1986. Generalized Autoregressive and Conditional Heteroskedastic Models. Journal of Econometrics, 31: 307-327.

Bollerslev, T. 1987. A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Returns. Review of Economics and Statistics, 69: 542-547.

Butler, R.J., J.B. McDonald, R.D. Nelson and S.B. White. 1990. Robust and Partially Adaptive Estimation of Regression Models. Review of Economics and Statistics, 72: 321-326.

Clark, P, K. 1973. A Subordinated Stochastic Process Model with Finite Variance for Speculative Prices. Econometrica, 41: 135-156.

Cobb, L. 1978. Stochastic Catastrophe Models and Multimodal Distributions. Behavioral Science, 23: 360-374.

Cumby, R.E. and J.D. Glen. 1990. Evaluating the Performance of International Mutual Funds. Journal of Finance, 45: 497-521.

D'Agostino, R.B., A. Belanger and R.B. D'Agostino, Jr. 1990. A Suggestion for Using Powerful and Informative Tests of Normality. The American Statistician, 44: 316-321.

Diebold, F.X, 1988. Empirical Modeling of Exchange Rates. New York: Springer Verlag.

Drost, F.C. and T.E. Nijman. 1993. Temporal Aggregation of GARCH Processes. Econometrica, 61: 909-927.

Engle, R.R. 1982. Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of U.K. Inflation. Econometrica, 50: 987-1008.

Fama, E. 1965. The Behavior of Stock Market Prices. Journal of Business, 38: 34-105.

Fama, E. and J. MacBeth. 1973, Risk, Return and Equilibrium: Empirical Tests. Journal of Political Economy, 81: 607-636.

Fama, E. and K.R. French. 1992. The Cross-Section of Expected Stock Returns. Journal of Finance, 47: 427-465.

Gallant, A.R., D.A. Hsieh and G. Tauchen. 1991. Semiparametric Estimation of Conditionally Constrained Heterogeneous Processes: Asset Pricing Applications. Econometrica, 57: 1091-1129.

Godfrey, L.G. 1979. Testing the Adequacy of a Time Series Model. Biometrika, 66: 67-72.

Godfrey, L.G. 1987. Discriminating between Autocorrelation and Misspecification in Regression Analysis: an Alternative Test Strategy. Review of Economics and Statistics, 69: 128-134.

Godfrey, L.G. and A.R. Tremayne. 1988. Checks of Model Adequacy for Univariate Time Series Models and Their Application to Econometric Relationships. Econometric Reviews, 7: 1-42.

Harvey, C.R. 1995. Predictable Risk and Returns in Emerging Markets. Review of Financial Studies, 8: 773-816.

Haugen, R.A. 1996. Finance from a New Perspective. Financial Management, 25: 86-97.

Hsieh, D.A. 1989. Modeling Heteroskedasticity in Daily Foreign Exchange Rates. Journal of Business and Economic Statistics, 7: 307-317.

Huber, P.J. 1981. Robust Statistics. New York: Wiley.

Ingersoll, J.E. 1987. The Theory of Financial Decision Making. NJ: Rowan and Littlefied.

Jarrow, R. and E. Rosenfeld. 1984. Jump Risk and the Intertemporal Capital Asset Pricing Model. Journal of Business, 57: 337-51.

Kon, S.J. 1984. Models of Stock Returns: A Comparison. Journal of Finance, 39: 147-165.

Kon, S.J. 1994. Alternative Models for the Conditional Heteroscedasticity of Stock Returns. Journal of Business, 67: 563-599.

Kothari, S.P., J. Shanken and R.G. Sloan. 1995. Another Look at the Cross-Section of Expected Stock Returns. Journal of Finance, 50: 185-224.

Lee, C.F., H. Reisman and Y. Shaman. 1996. A Note on the Generalized Multibeta CAPM. Mathematical Finance, 4: 67-68.

Lintner, J. 1965. The Valuation of Risk Assets and the Selection of Risky Investments in Stock Portfolios and Capital Assets. Review of Economics and Statistics, 47: 13-37.

Luenberger, D.G. 1984. Linear and Nonlinear Programming. MA: Addison Wesley.

Lye, J.N. and V.L. Martin. 1993. Robust Estimation, Nonnormalities and Generalized Exponential Distributions. Journal of the American Statistical Association, 88: 261-267.

Mandelbrot, B. 1963. The Variation of Certain Speculative Prices. Journal of Business, 36: 394-419.

Manski, C. 1984. Adaptive Estimation of Non-linear Regression Models. Econometric Reviews, 3: 145-194.

MacDonald, J.B. and W.K. Newey. 1988. Partially Adaptive Estimation of Regression Models via the Generalized t Distribution. Econometric Theory, 4: 428-457.

MacDonald, J.B. and S.B. White. 1993. A Comparison of Some Robust, Adaptive and Partially Adaptive Estimators of Regression Models. Econometric Reviews, 12: 103-124.

Mossin, J. 1966. Equilibrium in a Capital Asset Market. Econometrica, 35: 768-783.

Newey, W.K. 1988. Adaptive Estimation of Regression Models via Moment Restrictions. Journal of Econometrics, 38: 301-339.

Praetz, P.D. 1972. The Distribution of Share Price Changes. Journal of Business, 45: 49-55.

Press, S.J. 1967. A Compound Events Model for Security Prices. Journal of Business, 40: 317-335.

Shanken, J. 1990. Intertemporal Asset Pricing: an Empirical Investigation. Journal of Econometrics, 45: 99-120.

Shanken, J. and C.W. Smith. 1996. Implications of Capital Markets Research for Corporate Finance. Financial Management, 25: 98-104.

Sharpe, W.F. 1964. Capital Asset Prices: a Theory of Equilibrium under Conditions of Risk. Journal of Finance, 19: 425-442.

Singleton, J.C. and J. Wingender. 1986. Skewness Persistence in Common Stock Returns. Journal of Financial and Quantitative Analysis, 21: 330-334.

Tan, J.K. 1994. Risk, Return and the Three Moment Capital Asset Pricing Model: Another Look. Journal of Banking and Finance, 15: 449-460.

Tauchen, G. 1983. The Price-Volume Relationship in Speculative Markets. Econometrica, 51: 485-505.

Tucker, A.L. 1992. A Reexamination of Finite and Infinite-Variance Distributions as Models of Daily Stock Returns. Journal of Business and Economic Statistics, 10: 73-81.

Printer friendly Cite/link Email Feedback | |

Title Annotation: | coefficient for rating stocks |
---|---|

Author: | Wai Mun Fong |

Publication: | Review of Financial Economics |

Date: | Mar 22, 1997 |

Words: | 8077 |

Previous Article: | The effect of work stoppages on the value of firms in Canada. |

Next Article: | A nonparametric investigation of the 90-day T-bill rate. |

Topics: |