Research on Algorithm Recommended by Online Education for Big Data

“Big data” is becoming a hot topic in the Internet. The long tail problem of the massive online courses also becomes the biggest headache for operation team of online education. The manner in which the reader wants most courses show to be presented before the user is the key to improve the quality of online education. Personalized recommendation system is to discover the readers interests tendency based on the existing user data, project data, and interactive data, thus to provide personalized product recommendation for readers. This article is based on the two kinds of algorithms, namely the content and the collaborative filtering recommendation to propose an improved integration scheme, which can make good use of existing data to discover the useful knowledge for readers’ recommendation. The method firstly solves the sparsity problem in traditional collaborative filtering, and meanwhile we start from the global structure relation of course, to analyze the relationship between the reader and the course more comprehensively. The algorithm to improve the accuracy of recommendation from multiple angles, and provides a feasible method for precise recommendation of online educational video.


Introduction
Information retrieval and information recommendation is the main tool to solve the problem of big data [1] .Information retrieval is to weed out irrelevant information by providing keywords, and then dig out the relevant content from a mass of information.This method is more suitable for user with clear purpose.While for users with uncertain demand, they only want the system to be in accordance with their own interest or historical operation records to recommend some information that may be interesting for them, and thus to provide a better user experience and higher work efficiency.Personalized recommendation system uses personalized recommendation algorithm to make an analysis and research on content feature information, score or history operation records and other information of users and items, and discover the interests of users to screen the item set and provided personalized recommendation for specific users, thus to solve the problem of information overload [2] .
This article analyzes the current technical tool used for personalized recommendation engine, constructs a universal personalized recommendation system of RC version.We introduce Hadoop, Storm, RabbitMQ, Redis technology and proposed a complete design schedule for recommendation system, in which the ideas of realization respectively for the off-line processing of big data and real-time computing are put forward.In addition, the module communication message based on message queue is introduced in communication between modules, which ensures the processing capacity and real-time performance for system.

Personalized data and its analysis
In order to provide effective and accurate recommendation set to users, while guaranteeing the performance of the recommender system and other non-functional requirements, the researchers and enterprises have put forward many personalized recommendation algorithms, such as Item-based collaborative filtering recommendation algorithm [3,4] , User-based collaborative filtering recommendation algorithm [5] , Content-based recommendation algorithm [6,7,8] , Cluster-based collaborative filtering recommendation algorithm [9] , SVD-based collaborative filtering recommendation algorithm [10] and image-based collaborative filtering recommendation algorithm [11][12] , etc..These algorithms uses data mining techniques to conduct in-depth analysis of user data and project data to obtain the interest characteristics and the specific patterns of behavior for users, and thus to provide personalized recommendation for users.The personalized recommendation algorithm based on data mining consists of two stages of learning and use.In the process of learning, personalized recommendation algorithm conducts mining analysis on the original data, and establishes the recommendation model corresponding to algorithm.The recommended model data can be used to provide personalized recommendation for real-time guidance of users in the stage of use.
Recommendation algorithm based on the content is derived from the traditional information search technology, and it depends on the system to extract the project features, analyze the user behavior, and research the Internet users' interests and preferences to provide item set with similar features to them.The algorithm does not rely on the historical data of score between the user and the project [5] .
Personalized recommendation algorithm based on collaborative filtering is the most valuable recommendation algorithm in the field of research and enterprise application fields at present.Personalization is its main goal to be realized.As for the difference from the classic content-based recommendation methods, the algorithm is mainly to conduct analysis and mining the user groups with high similarity to the target users, or item set similar to target item, and then use the user group and item set to provide personalized recommendation for users.According to the difference of used business association, the collaborative filtering recommendation algorithm can be divided into User-based collaborative filtering algorithm [5] , Item-based collaborative filtering algorithm [8,4] and Model-based collaborative filtering algorithm [11][12] , etc..Each algorism in personalized recommendation system has its advantages and disadvantages, and also a certain degree of complementarity in preferences.So In the current Web recommendations will not adopt one single recommendation mechanism and strategy, but to integrate multiple methods, namely Hybrid Recommendation, thus to achieve a better effect of recommendation [13,14,15] .There are many combinations of hybrid recommendation, and the specific combination principle will be varied with different data and scenes.Therefore, we should choose the right combination of methods to achieve the full effect.

Integrated personality recommendation algorithm
This article improves the hybrid recommendation approach.Based on bipartite graph, we first use the user's history score information and item category feature information to construct a graph model based on user and item; the random walk algorithm will be used for computing global similarity between items in the graph model.This method has low computational complexity.

Two-layer weighted graph model
Let G={V, E, W} as a weighted mixed graph, among which Higher score will lead to higher degree of edge.However, in the electronic commerce system in score class, the trend of information exist in the original score information, so it cannot accurately represent user preference for the item.Take the reference points of user ratings as an example: reference points for part of the user score are higher, for example, 3 points are the reference point for Like, 2 points for Dislike.Similarly, some movie items, compared to others, tend to have higher score, and it may be affected by the release time of movies.This kind of trend information can be called the Global Effect (hereinafter referred to as GE) [16] .Before the score is taken as a preference degree of users for items, first of all we need to remove the global effect from the original score.In the actual recommendation system, there are so many influence elements of these global effect, such as holiday, quarter, etc.that affect the score over the same period.But the experiment proved that the effect of reference points GE on the final score is the biggest.This article only considers the global effect for two kinds of reference points: reference point for user and item.Let the reference point of User u and Item i respectively as u GE and i GE , and they can be represented in a manner shown in formula (1): w sim i j .

Random walk and recommendation algorithm based on weighted two-layer graph
In physics, random walk is presented as a kind of irregular forms of motion.Each step of its motion is random and independent from the previous transfer [17] .In the two layer graph model of user-item, the relationship between the user and the item is represented as a random pro-cess{ } n X , and its state space is the node in the two layer graph.If the current node is i (possibly is the node of user or item), the walk between node i and its connection node is random.Let the next walking connection node as j, walking probability , i j P , while the degree of this proba- bility is only related to the nearest m walking nodes.We call the walking trace { } n X as Markoff chain.If m=1, the next state value  This { } n X is the first-order Markoff chain, referred to Markoff chain., i j P represents one-step transition proba- bility from node i to node j.Markoff chain is to predict the next step of the nodes according to the current nodes and the current one step transition probability.And random walk model is a classical balanced Markoff chain, so the random walk algorithm can finally get a balanced state [17][18][19] .This article uses the random walk algorithm, starting from the user node, to get the similarity between the user node and all other nodes.Finally, we can use the size of the similarity to select 2 users or items with largest similarity and relevant to the users, namely the Top-K user and item recommendations.
In By t steps of random walk, the similarity between any two nodes will reach a stable equilibrium, and at this time, we can use the probability of random walk from node i to node j to represent their global similarity, namely 1 ( , ) , among which G is used to balance the weight of transfer probability for each step, The multiple-step transition probability can go through weighted combination by the parameter G ; t represents the step of random walk, and higher value of t indicates that there are more neighbour information.
If node i and node j are both item nodes, RD(i,j) will be the global similarity between those two items.At this time, we can use RD(i,j) to conduct Top-K recommendation based on item similarity for users.If two nodes were user node u and item node j, RD(i,j) will be the global similarity between user and this item.At this time, we can directly use this value to conduct Top-K recommendation for users.

Algorithm implementation and experiment 4.1 Algorithm implementation
Algorithm implementation, in physical structure, is composed of a Hadoop cluster, Storm cluster, Redis cache cluster, message queue middleware, Web service cluster and multiple database clusters.In a real environment, the cluster size has great influence on the bearing capacity and the processing ability of the system.This article uses a single or several servers to simulate the work of cluster.Table 1 provides the servers and the system where they run, in the process of algorithm implementation.

Experimental data
This article uses the experimental data set provided by the online education platform, including basic information of users up to 50000, basic information of 3000 films and the scoring information of 10000 users on the film, among which the score value is an integer within an interval of [1,5] , and higher value indicates a higher preference degree of users for the items.
In recommendation system, the measurement methods of recommendation quality of recommendation algorithms include measurement method of decision support accuracy and the measurement method of statistical precision and other methods.Common metrics include accuracy, novelty, etc [20] .This article uses the mean absolute error (MAE) to measure accuracy.

Calculation of mixed similarity
Item similarity is composed of potential similarity and similarity based on item characteristics.By introducing the parameter E , the potential similarity ( , ) a sim i j and feature similarity ( , ) c sim i j can be combined, in order to obtain the hybrid similarity between items.This experiment is performed under the condition of D =1900, and calculates the recommendation accuracy through the optimal value selected by iterative parameter.The experimental result is as shown in Figure 2. As shown in Figure 2, when the parameter E =0, namely it just uses the potential similarity as the final similarity, at this time MAE≈0.763.When the parameter is incremented and MAE is significantly reduced accordingly, it will prove that this mode of mix is effective.However, when the parameter E exceeds a certain value, MAE will be conversely increased.When the parameter E =1, namely it only uses the similarity based on item types to measure the similarity among items, at this time MAE≈0.807.When E =0.3, MAE≈0.743 will be the minimum value, namely the optimal mix.

Parameter selection of random walk for each weight
In the random walk algorithm, each step of the walk represents a certain degree of similarity between two items.At this time, the similarity generated by each step can be weighted using the parameter G , and use this weight to conduct weighted combination for the values of all steps, thus to get the final calculation result.This experiment constructs two-layer model by fixed parameters D =1900 and E =0.3, and the step is selected as t=5.The experimental result is as shown in Figure 3.
As shown in the experimental result, when 0<G <0.6, MAE will be reduced with the increase of G value.When G >0.6, MAE will be conversely increased with the increase of G value.When G =0.6, MAE≈0.715, and is also in its minimum value.

Conclusion
This article presents a new hybrid recommendation model based on the advantages and disadvantages of recommendation method of content and collaborative filtering.
The model can make full use of rating information and feature information between users and items, and consider the user-item relevance based on the relation of global structure.Compared with the traditional algorithm, the hybrid recommendation model is improved to a certain extent in the recommendation accuracy and recommendation efficiency method, but it is still not comprehensive.First of all, subject to the experimental data, this article only considers the characteristic of the item category, but in the real electronic commerce system, the relationship between the items is perplexing, and we can't just use classes to well measure the relationship between items.The next step, we can introduce the cognitive computing and other methods that are close to the human thinking to comprehensively consider the relation between items.

1 nX
is only related to the current state value n X , as shown in formula (3).
DOI: 10.1051/ C Owned by the authors, published by EDP Sciences, 2015 the mixed weighted graph can represent a two layer model, in which the upper part is user layer, and the lower part item layer, as shown in Figure1.
vertex set, user V the user vertex, item V the item vertex; H E the inner edge of item, UI E the connecting edge for user and item; and W the edge weights set.At this time, Figure 1.Two-layer model of user-item.The edge set UI E is the connection between User layer and Item Layer.At this time, the set UI W is the degree set of that connection.Let ui w as the connection degree of User u and Item i, which represents the user preference for the item or the size of score, namely ui ui w r .
Among which, U represents the score set, k(u) the scoring times of User u, k(i) the scoring times of Item i.By removing those two reference points GEs, we can get the preference degree of User u and Item i SHS Web of Conferences 01002-p.2ij w represents a connection degree between Item i and j, we can use the comprehensive similarity of Movie Lens system ( , ) s sim i j to represent the degree between them, name- ly ( , ) ij s the random walk model of the mixed weighted

Table 1 .
Servers and the system where they run, in the process of algorithm implementation.