Investor Sentiment Index and Option Price Volatility Based on MIDAS Model: Evidence from China

The paper selects the transaction data of the option market and network data from June 1, 2015 to February 2, 2018. The principal component analysis is adopted to construct investor sentiment index. The OLS and MIDAS model are employed to study the influence of investor sentiment index on the option implied volatility in Shanghai Stock Exchange 50ETF. The empirical results show that the MIDAS model with more high-frequency information has stronger interpretation ability than the same frequency model. Investor sentiment index based on traditional indicators has a negative effect on the option implied volatility while the excessive attention of investors on the Internet would exert positive pressure on the option market. The conclusion can well explain the inherent mechanism of investor sentiment affecting option implied volatility. Therefore, it is of great practical significance to study the influence of investor sentiment on the option price volatility in China


Introduction
Investor sentiment is a kind of emotion formed by investors in view of their perception of various information. It contains the systematic deviation of investors' expectation of the future so that it affects the sustainability of the security market. As an active representative in China's financial market, the implied volatility of the option market is formed by the trading behavior of the market, and represents the expectation of investors for the future price volatility. It plays a very important guiding role in reflecting the pricing and optimization of options, the allocation of option assets and other regulatory policies. Under such a circumstance, it is of great importance to study the interaction between investor sentiment and implied volatility, especially to the risk prevention and sustainable development of the security market.
Bing Han [1] studied the impact of investor sentiment changes on the S&P500 option price. Zhang and Lin [2] found that there was a positive nonlinear spillover effect between investor sentiment and stock market return rate. Xiong et al. [3] found that in the market environment with and without trading restrictions, investor sentiment had opposite effects on the market risk management ability of CSI 300 stock index futures, but different influence depth on the price discovery ability. It is found that there are few studies on the relationship between investor sentiment and implied volatility, and in the above empirical studies, all the empirical models adopted have a common feature: the data frequency of the explanatory variable and the explained variable in the model is the same, which cannot be satisfied in practice. The traditional processing method is to convert the high-frequency data to the lowfrequency data by calculating the average value of the data, or to convert the low-frequency data to the high-frequency data by interpolation method. However, it can lead to the loss of information or data distortion, which may eventually result in the reduction of research credibility.
For a more comprehensive study of the impact of investor sentiment on option market volatility, there are three contributions to the paper. Firstly, unlike prevailing studies which employed transaction data related to the security market to construct investor sentiment index, we also introduced Baidu Search Index as a novel proxy sentiment index. Secondly, as for keywords extracting, which is different from previous research [4], the keywords were only related to several specific asset names. We collected information from posts and comments from China's large investment social media Oriental Fortune by using web scraping technology, then obtained the keywords by performing text analysis. This procedure could measure the investor sentiment in a more comprehensively way. Last but not least, the mixed data model MIDAS was used to analyze the impact of investor sentiment on the implied volatility of option. The paper constructed daily, weekly and monthly investor sentiment indexes, and the effects of mixed sentiment on the implied volatility of options can be studied on a more microscopic time scale.

Option implied volatility
Implied volatility refers to the volatility implied by option prices observed in the market. Fleming J et al. [5] studied the volatility index of the options derived from the S&P100 index, which has been proved that can better reflect investors' expectations of the option market. China iVIX index, issued by the Shanghai Stock Exchange, is used to measure the volatility expectations of the Shanghai 50ETF over the next 30 days. It was first announced on June 1,2015 and stopped publishing on February 13, 2018. Due to the clear negative correlation between China iVIX index and the market index, it can be referred as the "panic index" of investor [6]. Therefore, the paper employ the data of China iVIX index closing price to describe the implied volatility of SSE 50ETF option market.

Construct investor sentiment index based on traditional index
Traditional methods usually employ indirect indicators which include the transaction data related to the stock market to construct investor sentiment index. Since the investment decisions of these stock market transaction data are made by investors based on their psychological characteristics, there always seems to be some inherent correlation that can indirectly reflect investor sentiment. Closed-end fund can be adopted as the investor sentiment index [7,8]. Compared with a single indicator, a composite sentiment index can quantify the sentiment of option investors in a more comprehensive and multi-angle perspective. Therefore, some researchers adopted the composite indicators. Baker and Wurgler [9] defined sentiment based on the closed-end fund discount, NYSE share turnover, the number and average first-day returns on IPOs, the equity share in new issues and the dividend premium. Later, Kim et al. [10] also adopted this index.
In order to construct a composite index, three representative indicators are selected as the original variables in the Chinese options market: the option contract volume, the number of open interests and the put/call trading volume ratio. The principal component analysis method used by Baker and Wurgler [9] in constructing the sentiment will be adopted. The option contract volume, the number of open interests and the put/call trading volume ratio are marked as 1 x , 2 x , 3 x . Standardize the data before using this method. In order to eliminate the influence of variance, we extract the principal components from the correlation coefficient matrix. The composite index is constructed by the first principal component which explains 73.69% of the (standardized) sample variance. Hence, we obtain a daily sentiment index based on traditional indicators. As in Eq. 1: We now use the Fig. 1 to illustrate the rationality of the investor sentiment composite index. The time-series diagrams of all variables during the sample period are shown in Fig.  1. We can conclude that there exists a highly correlation between the three variables and China iVIX index, which makes support to the construction of the traditional investor sentiment index. It is found that the traditional investor sentiment index is negative correlated with the option implied volatility from Fig. 1(d).

Construct investor sentiment index based on Baidu index
The rapid development of Internet has provided new support for investor psychology and behavior researches. Investors are more willing to search information or express their opinions and attitudes on the online social platforms. Therefore, adopting the social media data to measure investor sentiment index can provide a better explanation for the impact of investors in the stock market.
Some researchers used microblogging forums tweets including Twitter and Weibo to show the relationship between public sentiment and the stock market. Kim et al. [11] collected text messages on the Yahoo! and found investor sentiment had predictive ability for stock returns. Wang et al. [12] was the first attempt to use Baidu Search Index for research. Therefore, using public opinion information and network information as the source of emotional information has become a common method, which has been verified that it can reflect emotions of the investors directly [4].
Scraped the text data of the postings and comments of Shanghai Stock Exchange 50ETF (short for SSE 50ETF) Option Bar from Oriental Fortune website, which include 2,326 comments from June 1, 2015 to February 2, 2018. The text information of the posts and comments crawled from the Oriental Fortune Website can extract the most frequently spoken words by shareholders, which are exactly the investors most concerned about and are most closely connected with the option market.
Keywords were selected based on the principle of frequency greater than 100 and included in Baidu Index.These 13 proxy variables including option, 50ETF, call option, market, trading, market trend, Shanghai Stock Exchange, index, buy, account opening, account, threshold and short are assigned labels as 1 13

Methodology
Ghysels et al. [13] proposed the MIDAS model based on the distributed lag model, which can effectively study the actual reflection of high-frequency explanatory variables on lowfrequency interpreted variables and avoid information loss or data distortion. The expression of the univariate MIDAS model is as Eq. 4: where m is the frequency multiple difference between the high-frequency data and the lowfrequency data.  [14] pointed out that Almon polynomial functions are mostly used in the prediction and analysis of Financial market volatility, while exponential Almon polynomial functions and Beta polynomial functions are mostly used in the analysis and prediction of macroeconomic. In the paper, Almon polynomial function is selected as the weight function in Eq. 6. .
For the choice of parameter vectors  , we employ 2 p = following Clements and Galvao [15]. For the selection of the lag order K , Gao and Yang [16] compared the adjust goodness of fit and significance level of parameter estimation with different lag orders. In the paper we select the optimal lag order according to the significance level of parameter estimation. Therefore, the model established in the paper is Eq. 7.
where t V is the closing price of China iVIX index.

Empirical analysis
The paper mainly studies two cases: (1) whether the investor sentiment index has a significant impact on option implied volatility and (2) whether the MIDAS model has a strong interpretation ability. To ensure that there is no spurious regression, we have to testify the stationarity of time series data before the empirical analysis. The result shows that all variables have passed the ADF test. In order to measure the pros and cons of the interpretation ability of the MIDAS model, Clements and Galvao [15] compared and analyzed the MIDAS model with the benchmark model which is OLS method under the same frequency data. Along this line, the parameter results of both models are given in Table 1 and Table 2 respectively. Note: ***, significance at 1% level; **, significance at 5% level; *, significance at 10% level.
The estimation results of the benchmark model are given in Table1. Table 2 shows the estimation results of the MIDAS models with different frequencies. For both MIDAS model and OLS model, the parameter estimation value of TD index is significantly negative, while BD index is significantly positive. These results are consistent with the conclusions drawn from Fig. 1(d) and Fig. 2.
By comparing the 2 Adj R − of different models, we can judge the fit of the model. When the explained variable is the monthly implied volatility of options, the fitting effect of the MIDAS model using daily data for the explanatory variable is better than that of the OLS homo-frequency model. The same results are obtained when the weekly data are used as explanatory variables. This indicated that the mixed frequency model has stronger explanatory power than homo-frequency model. However, the fitting effect of Monthly/Weekly MIDAS model is lower than that of monthly OLS model with the homofrequency. This shows that the interpretation ability of the MIDAS model is not uniformly better than the traditional regression model, but depends on factors such as the difference in frequency multiples used. Fig. 3 shows the fitting results obtained by using Monthly/Daily and Weekly/Daily data. It is intuitive to see that using high-frequency data can better simulate lowfrequency financial data.  It can be seen from the above empirical analysis results that Baidu Index can reflect the level of investor attention to a certain extent. If investors are interested in an option product, they will increase the search volume of relevant information, conversely, the increase in the amount of search information will also affect investors' investment expectations and behaviors, thereby affecting asset price fluctuation. Therefore, the excessive attention of investors on the Internet will exert positive pressure on the option market, which will increase the implied volatility of the option market.
In order to better verify the validity and robustness of MIDAS model, the closing price of the China iVIX index is replaced by the trading volume of SSE 50ETF. The regression results are consistent with the above results, indicating that the research results of the paper are robust.

Conclusions
Using the principal component analysis method, the paper constructs two investor sentiment indexes from different angles and frequencies. Two models, MIDAS model and OLS model, are established to analyze the influence of investor sentiment on the implied volatility of options. The empirical results show that the investment behavior of investors has a significant negative impact on the implied volatility of options, and investors' excessive attention to the Internet can increase the implied volatility of the options market. In the process of research, we found that the mixed frequency model has better explanatory power than the homofrequency model.
The findings of the paper can show the internal mechanism of the influence of investor sentiment on the option implied volatility, and provide references for other relevant researchers. Meantime, as for relevant government departments, in order to ensure a stable, healthy and sustainable development of the security market, it is recommended that the government improve the information disclosure system; monitor investor sentiment changes on official platforms; reasonably guide investors to invest rationally; prevent malicious transmission of false information on social media, and prevent disruption of order in the security market.