Sentiment Analysis of Yaya Event of Giant Panda in the United States based on LDA Topic Model

. With the rapid rise of the Internet and social media, hot topics can often become hotspots of online public opinion. In order to explore the netizens' views on hot topics, this paper constructs an LDA model to analyse the sentiment of the incident of travelling panda Yaya by crawling the Weibo comments on the incident of travelling Giant Panda Yaya in the U.S. The study shows that netizens suspect that Yaya has suffered some abuse in the U.S. Memphis Zoo, and urgently hope that the panda Yaya will come back to China as soon as possible to receive treatment and care in the domestic zoo. The results of the study are helpful for the relevant departments to understand the attitudes and tendencies of netizens towards hot topics in a timely manner, and quickly deduce the development trend of the topics, so as to guide the public opinion accordingly.


Introduction
In recent years, Sina Weibo, as a modern large-scale online social media, has been used by a large number of Chinese netizens.As of the first quarter of 2023, the number of monthly active users of Weibo reached 593 million, and the number of daily active users reached 255 million.The number of daily active users reached 255 million.Sina Weibo have gradually become the centre of online public opinion, and many netizens express their views on hot events through Weibo.This paper will focus on the Weibo comments of netizens on the incident of the Giant Panda Yaya travelling to the United States.Giant Panda Yaya was born on 3 August 2000 in Beijing Zoo, on 7 April 2003, Yaya and male panda Lele took a special flight to the Memphis Zoo in the U.S. to start a 20-year career in the U.S. On 1 February 2023, the sudden death of Lele, who was about to return home, triggered people's attention to the travelling U.S. pandas, which in turn triggered the attention to Giant Pandas Yaya.According to tourists visiting the Memphis Zoo broke the news that Yaya appeared stereotypical behaviour, the physical condition is not optimistic, for this reason, to take Yaya back to China's call is becoming stronger and stronger.2023 April 27th afternoon, the expiration of the lease period of pandas Yaya by special cargo plane to China [1].Netizens have a heated discussion on this, and expressed their own views on the incident.

Literature review
As one of the largest social media platforms based on user relationships in China, most of the netizens who use Weibo will leave their own comments on some hot topics on the Internet.Literature review Sentiment analysis is a way of analysing the text to determine the sentiment of the sender of the opinion on the event.Sentiment analysis of Weibo comment data can provide an in-depth understanding of netizens' views and emotional tendencies towards the event, and provide valuable reference and analysis basis for the relevant departments.Some researchers have already studied the hot topics of Weibo, such as Yue Yang et al [2], who used a sentiment analysis method containing keyword extraction and machine learning to analyse the sentiment of some Weibo comments related to the crash of China Eastern Airline MU5735, and statistically and analytically analysed the data of Weibo hot topics after the crash.Liu Jingwei et al [3] conducted sentiment analysis on Weibo hot topics by constructing Bi-LSTM neural network, so as to predict the heat evolution of Weibo hot topics.
LDA models are commonly used in social media, image processing, text classification and clustering, community methods and other fields [4].Xu Heng et al [5] clarified the number of related topics by using confusion and similarity, determined the research focus by topic strength, and studied the topic evolution from the change of topic strength under the time level.Lai Xianjing [6] used LDA model to study the text analysis of MOOC course reviews, which is beneficial for building online education platforms.Overseas scholars such as Inoue Madoka et al [7].used LDA models to explain the negative impact of the early COVID-19 pandemic on nursing research; Kozlowski Diego et al [8] used LDA models for world trade analysis; Sang-Woon Kim et al [9] investigated a paper classification system based on TF -IDF and LDA.
This study attempts to understand netizens' views on the hotspot event of the Giant Panda Yaya through emotional tendency analysis and LDA theme model analysis, which provides the relevant departments with a more intuitive understanding of the emotional tendency of the netizens, and helps to analyse the development trend of the online public opinion so as to take corresponding measures.
There are the following shortcomings in this study:(1) Since sentiment analysis is a subjective task, it is largely based on the diversity of sentiment expressions and the connections in the context to determine the sentiment tendency.Therefore, in the follow-up study, we should try to construct a sentiment lexicon and use other sentiment analysis models to improve the accuracy of the sentiment tendency in the comment data; Because sentiment analysis is a subjective task, it is largely based on the diversity of sentiment expressions and contextual connections.
(2) LDA is a widely used topic model, but it still has the problem of not accurately identifying Internet comment texts.In this study, we try our best to remove the Internet popular texts that cannot be recognised by the model through word segmentation, lexical annotation, and deletion of deletion words, but there is still subjective uncertainty.In the subsequent study, the LDA model can be improved by further optimising the word segmentation and stopping word removal scheme.
(3) Due to the condition limitation, the experimental samples collected in this experiment are limited, and the text clustering effect shown is not ideal, in the future, we can improve the experimental model by increasing the collection of experimental data samples, and further improve the experimental results.

Research framework
In this study, the LDA topic model is used to study the Weibo comment data of the incident of the Giant Panda Yaya, and to understand the attitude of netizens towards this hot event.Therefore, this study completes the sentiment analysis by crawling the comments of Weibo related to the Yaya incident, data preprocessing, data modelling, analysis and conclusion, etc.The research framework is shown in Figure 1 below.

Data acquisition and pre-processing
In this study, we crawled 2467 Weibo comments about the incident of travelling Panda Yaya by python crawler to obtain the netizens' evaluation of the incident of travelling Panda Yaya at different stages, and saved the crawled data into CSV files.
Data preprocessing can remove the insubstantial data in the text and make the data more accurate.The data obtained in this study contains more meaningless letters and numbers, and words such as "Weibo" and "Giant Panda" which appear more frequently are not useful for the experiment, so these words should be deleted before word separation.The phenomenon of multiple meanings of one word and one word with multiple meanings exists in the Chinese language, this study eliminates this phenomenon by means of word segmentation and lexical labelling, so that the computer can identify and process them.In order to reduce the size of the dataset and improve the accuracy of the model, this study also adopts the method of deleting deactivated words to stop word removal.
After removing the deactivated words, this study shows the frequency of keyword occurrence through word cloud as shown in Figure 2

Emotional disposition analysis
Sentiment tendency is generally expressed as the subjective tendency of the opinion sender towards the event, which is usually classified into positive, negative and neutral sentiment tendencies.Sentiment analysis is to investigate the attitude tendency with personal subjective emotion revealed in the relevant text data, so as to further judge the sentiment tendency carried in these text data.Sentiment words tend to influence the judgement of sentiment tendency to a great extent, and the connection between the sentiment words and the context can also cause the same result.There is a special class of emotion words in Weibo that can also indicate emotional tendency, the emoticons carried in the comments published by netizens [10], such as "[cry]" and "[angry]", which are more intuitive than the textual data to show the emotional tendency of the netizens, so attention should be paid to the extraction of the emoticons in the study.

Matching Sentiment Words
SnowNLP is a natural language processing library that is widely used in python, but which does not use the Natural Language Processing Toolkit (NLTK), but instead SnowNLP uses the plain Bayesian principle to achieve sentiment analysis, Chinese lexical annotation and other functions, which can better process Chinese text.Therefore, SnowNLP corpus is chosen for the sentiment analysis of the comment data in this study.

View Sentiment Analysis Results
In this study, word clouds are used to present positive comments as well as negative comment data to see the effect of sentiment analysis.The figure 3 shows that positive sentiment words such as "go home", "flower" and "hug" appear more frequently, and there are no negative sentiment words mixed in the word cloud, so it can be concluded that sentiment analysis can extract positive sentiment comments better.The figure 4 shows that negative sentiment words such as "abuse", "angry" and "explain" appear more frequently, and there are no positive sentiment words mixed in the word cloud, so it can be concluded that sentiment analysis can extract negative sentiment comments better.

How the LDA model works
LDA model is a three-layer Bayesian model, which can be divided into document, topic and vocabulary three layers, that is, each word of an article is composed by selecting a certain topic with a certain probability, and selecting a certain word with a certain probability from this topic.The final result is: "document -topic" and "theme -word The final result is: "document-topic" and "topic-word".LDA topic model through an iterative way, constantly adjusting the distribution of "documenttopic" and "topic -word", in order to get the topic that can explain the text data.
LDA model is a kind of unsupervised learning, that is, without the need to annotate the data, it can be able to filter the high-frequency words in a large number of comment data to extract the text theme, which can help to find the potential theme of the data, so as to analyse the core theme of the dataset.

Finding the optimal number of topics
In this study, the average cosine similarity is used as an evaluation index to determine the optimal number of topics in the LDA model.The smaller the value of average cosine similarity, the better the corresponding number of topics.The horizontal axis represents the number of themes and the vertical axis represents the average cosine value, the Figure 5 and Figure 6 represents the change of the average cosine similarity between themes with the increase of the number of themes.According to the results presented in the Figure 5 and Figure 6, it can be seen that for the positive comment text dataset, when the average cosine similarity between the topics reaches the lowest, the corresponding topic degree is 3. Therefore, the optimal number of topics is determined to be 3 by using the LDA model to model the positive comment text dataset, and the optimal number of topics is determined to be 2 for the negative comment text dataset, but considering that the number of topics is too small, which will lead to large errors and the number of positive and negative comment data topics should be the same, the optimal number of topics is determined to be 3 by using the LDA model to model the negative comment text dataset.However, considering that the number of topics is too small, which will lead to a large error in the experiment, and the number of topics should be the same for both positive and negative comment text data, the optimal number of topics is determined to be 3 by applying the LDA model to model the negative comment text data set, taking into account the above factors.

Constructing the LDA model
In this study, the LDA model is constructed for the obtained positive and negative review datasets respectively, and the relevant model parameters are set.After the corresponding LDA theme analysis, ten words with the highest frequency of occurrence will be generated for each theme and the probability of their occurrence will be shown, and the generated themes are shown in the Table 1.The above figure can present the latent topics in the positive comment data of the incident of the Giant Panda Yaya travelling to the U.S. The most frequent words in Topic 1 are hug, go home and hope, etc., mainly reflecting that netizens hope that Yaya will be well taken care of and that the Giant Panda Yaya will go home soon after leaving from the U.S., etc.In Topic 2, the frequent words are Thanks , take care of, nanny, etc., which mainly reflect that netizens are grateful to "daddy" and "nanny", who are the keepers of Yaya, for their patience in taking care of Yaya.The high-frequency words in Topic 3 are video, love, eat and baby, which mainly reflect the hope that Yaya will eat more food and become healthy, and that netizens want to have a video session with Yaya so that they can see what Yaya looks like at the moment, and so on.The Table 2 can present the potential themes in the negative comment data of the incident of the Giant Panda Yaya in the U.S. The most frequent words in Topic 1 are abuse,too, national treasure, pitiful, etc., mainly reflecting that netizens see the appearance of Yaya as a national treasure, and suspect that Yaya has received abuse.The high-frequency words in Topic 2 are skin disease, catch, return, which mainly reflect that netizens suspect that Yaya has contracted a skin disease in the United States and want to take Yaya back to China for treatment.The high-frequency words in Topic 3 are send, angry, etc., which mainly reflect that netizens are angry about the recent situation of Yaya in the United States.

Conclusion
In this study, the sentiment analysis of the incident of the Giant Panda Yaya in the United States is based on the LDA theme model, which can accurately reflect the sentiment tendency of netizens for this hot event.It is concluded from the comments of the netizens extracted from the Weibo that the netizens are more concerned about the Giant Panda Yaya and suspect that Yaya has suffered some abuse in the Memphis Zoo in the U.S..They hope that the Giant Panda Yaya will come back to China as soon as possible to accept the treatment and care at home.Most of the words shown in the word cloud are positive emotions, and the word with the highest frequency is "abuse", which shows that the relevant departments were able to find out the emotional tendency of the netizens and intervene in time, reversing the tendency of the netizens to pay attention to the incident.
As the Internet has entered thousands of households, most Internet users like to learn about social problems through the Internet and express their opinions on hot topics.Hot topics on the Internet often have the characteristics of strong dissemination and great influence, so the emotional analysis of the comments under the hot events on the Internet can help the relevant departments to quickly understand the emotional tendency of the netizens, and through the release of information through multiple channels, guide the netizens to correctly express their opinions, and control the development trend of the public opinion in a timely manner.

Figure 3 .
Figure 3. Positive Emotional Commentary Word Cloud

Figure 4 .
Figure 4. Negative Emotional Commentary Word Cloud

Figure 5 .Figure 6 .
Figure 5. Optimization of the number of topics for positive reviews using LDA /doi.org/10.1051/shsconf/202317801002178 SHS Web of Conferences

Table 1 .
Positive Comment Words and Probabilities

Table 2 .
Negative Comment Words and Probabilities