AI stumbling on content distribution
Since the commercialization of the Internet since the development, whether it is news client, video site ｏｒ business platform ... ... all the platform, are their own default as a good breeder, according to their own ideas, the content (feed) Push (Fed) to the user.
These breeders are trained professionals, jargon called --- by the site editor for the user to set the agenda, according to the taste of most users pick content.
Later, the editor is too busy to use the machine to help --- the most simple way is the machine is "popular recommendation", such as according to the amount of traffic ｏｒ other data to sort.
Breeder model is the biggest problem is not know how the appetite of the diners, which will lead to two significant consequences: First, the diners are not satisfied with the individual needs of users can not be met; Second, their own waste of resources, a large number of long-tailed resources Exposure, increase sunk costs.
Someone found the benefits of the machine. The machine can be based on user characteristics to recommend content. Just as a clever cook can provide meals according to the taste of every diners, if the machine is smart enough, to a certain extent can solve all the individual needs of the user. This is not the content industry C2M?
To be precise, this is the content of the distribution of C2M, it is a single user for the purpose of communication, out of the mass communication / Focus spread pattern, is not enough to leather all the search engines and portal life?
This kind of intelligent content C2M has a profound background of the times. Today, you have stood on the edge of the times, watched AI technology lit the IOT lead, then you will find yourself unquestionable access to the next information nuclear explosion era: information terminal explosion, information scale explosion, information platform explosion……
In the information on the highway, you opened the car, you walked the road, all changed the rules, you are familiar with all the breeder-based knowledge framework are facing subversion.
In this era, the breeder model has failed, and clever machines will become the biggest variable.
The first appearance of the scene is the human production content, the machine distributes the content.
The next scene is the machine production content, the machine distribution content.
Content industry is facing C2M revolution, okay?
"Of course not, the machine is stupid." If you think so, then unfortunately, you are doomed to see the sun tomorrow.
"If you think so, then congratulate you into the pit."
The real situation, you may be unexpected.
First, the content of the road is the essence of C2M to individual communication
As an independent research direction, the ｓｏｕｒｃｅ of the recommended system can be traced back to the early 90s collaborative filtering algorithm, the middle of the representative of the traditional machine learning algorithm, such as Netflix contest promoted by the semantic model, now is more complex depth Learning model.
In recent years, the depth of learning by leaps and bounds, making the machine recommended into the entire Internet sun. Driven by new technologies, personalized communication has become more feasible, and more and more close to single-user communication.
(A) collaborative filtering faltering start
In accordance with the terms of the encyclopedia, collaborative filtering is the use of user groups preferences to recommend information for you interested in these users either interested in co-operation, ｏｒ have a common experience, and then the site with your feedback (such as score), filter analysis, and then Help others filter information.
Of course, the user's preferences are not necessarily limited to the information of particular interest, especially the record of interest information is also very important. Collaborative filtering showed excellent results, began to dominate the Internet industry.
Initially, collaborative filtering applies to mail filtering.
In 1992, Xerox scientists put forward the Tapestry system. This is the first application of collaborative filtering system design, mainly to solve the Xerox company in Palo Alto research center information overload problem. The research center staff will receive a lot of e-mail every day but no screening classification, so the research center will develop the experimental e-mail system to help employees solve the problem.
Then, collaborative filtering ideas began to be applied to content recommendations.
In 1994, Minnesota's GroupLens project team set up a news screening system, which can help news listeners filter their interest in news content, the audience read the content to a rating score, the system will score Record for future reference, assuming that the premise of the reader before interested in things in the future will be interested in reading, if the reader does not want to expose their identity can also be anonymous score. As the oldest content recommended research team, GroupLens in 1997 to create a movie recommendation system MovieLens, there are similar nature of the music recommendation system Ringo, and video recommendation system Video Recommender and so on.
Later, there was another milestone - the e-commerce recommendation system.
In 1998, Amazon's Linden and his colleagues applied the article-to-item patent for article-based technology, which was the classic algorithm used early in the Amazon, once detonated.
Collaborative filtering is not artificial intelligence? From a technical point of view, it also belongs to the AI category. But it must be pointed out that the collaborative filtering algorithm is more mentally handicapped, whether it is based on the user's collaborative filtering, ｏｒ based on collaborative filtering items, the recommended effect is always unsatisfactory.
How can we guide the continuous optimization of a recommended system through a methodological approach? How can a complex realistic factor be incorporated into the recommendation? Siege lions were very very big head, under the reward must have courage, and later, finally found a more flexible ideas.
(B) the traditional machine learning began to accelerate
In 2006, Netflix announced the hosting of the Netflix Prize. Netflix is a veteran online video rental site that hosts a contest aimed at solving the problem of machine learning and data mining for movie scoring problems. The organizers have made a $ 1 million award for individuals ｏｒ teams that can increase the accuracy of Netclix's recommendation system Cinematch by 10%!
Netflix in their own blog on the disclosure of many large amounts of data, for example as follows:
We have billions of users scoring data and are growing at a size of several million per day.
Our system generates millions of playback clicks every day and contains many features, such as the length of play, the time of play, and the type of device.
Our users add millions of videos to their playlists every day.
Obviously, in the face of these massive data, we can not rely on pure artificial ｏｒ small system to establish the classification criteria for the entire platform user preferences for standardization.
A year after the start of the match, Korbell's team won the first stage with 8.43% improvement. They have paid more than 2,000 hours of effort, the integration of 107 algorithms. Two of the most effective algorithms are matrix decomposition (usually called SVD, singular value decomposition) and localized Boltzmann machine (RBM).
Matrix decomposition is a complement to co-filtering. The core is to decompose a very sparse user score matrix R into two matrices: the matrix P of the User property and the matrix Q of the Item characteristic, and construct these vectors with known data and use them to predict Unknown item. The algorithm can improve the accuracy of the calculation, but also can add a variety of modeling elements, so that more diversified information into the better use of large amounts of data.
However, matrix decomposition is also inadequate. The downside is that the matrix decomposition and the co-filtering algorithm are all supervised, rough and simple, and suitable for small systems. The problem before the network giants is that if you need to build a large recommendation system, collaborative filtering and matrix decomposition will take a long time. How to do?
As a result, some siege lions will be transferred to unsupervised learning. The essence of clustering algorithms in unsupervised learning is to identify user groups and recommend the same content for users in this group. When we have enough data, it is best to use clustering as the first step to reduce the range of options for the relevant neighbors in the collaborative filtering algorithm.
The implicit semantic model utilizes the clustering analysis method, and one of the big advantages is that it can do both the prediction and the text content at the same time, so that the recommended effect can be greatly improved through the content.
The traditional method of analysis is not high in the user's label, and according to the label mapping to the results of the two steps. Such as the user to fill the age is not necessarily true, ｏｒ not all young people like comics. The core of the implicit semantic model is to transcend the dimension of these superficial semantic labels, and to explore the deeper association of user behavior through machine learning technology, which makes the recommendation more accurate.
Netflix Prize million dollars martial arts competition under the command, the world talent frequently. In 2009 reached a peak, as the recommended system of the most iconic event, the game attracted a large number of professionals engaged in the field of recommended systems research, but also to the technology from the professional circle to penetrate into the commercial field, triggering a warm Discuss and gradually evoke the mainstream of the site coveted, content-based recommendations, based on knowledge recommendations, mixed recommendation, based on the recommendation of trusted networks embarked on a rapid development of the channel.
These recommendation engines are different from collaborative filtering, for example, content-based recommendations are based on the content information of the project, and do not need to be based on the user's evaluation of the project, more need to use the machine learning method from the characteristics of the content Described in the case of the user's interest information. Content filtering mainly uses natural language processing, artificial intelligence, probability statistics and machine learning technology to filter.
Millions of dollars worthless? According to 2016 Netflix user data: registered members of 65 million people, watch the video a total of 100 million hours a day. Netflix says it can save $ 1 billion a year on the system.
(C) the depth of learning to bring "unmanned"
In recent years, the user's pain points appear. The popularity of smart phones, so that a huge amount of information and a small reading screen to become a pair of difficult to resolve the contradictions, the user is no longer read the scene is stick to the computer screen, but to the mobile fragmentation change, search engine failure, manual recommendation Busy, but the machine is not enough recommended, and this change on the large content platform is simply a test of life and death. To meet the demand is born, do not meet the dead.
In the face of this problem, YouTube and Facebook put forward a new solution: the use of in-depth learning, manufacturing smart machines. Over the past decade, the depth of learning has made a huge leap, for the settlement of large amounts of data more advantages.
If the manual content recommended as the driver to drive, then the depth of learning to bring the content recommended, such as unmanned vehicles. In this technology is the use of user data to "perceive" user preferences, the recommended system can be divided into data layer, trigger layer, fusion filter layer and sort layer, when the data layer to generate and store data into the candidate layer, also Triggers the core of the recommended task.
In YouTube, for example, the latest published recommendation system algorithm consists of two neural networks, one for candidate generation and one for sorting. First, the user's browsing history as input, candidate generation network can significantly reduce the number of recommended video, from a large library to ｓｅｌｅｃｔ a group of the most relevant video.
The candidate video generated is the most relevant to the user, and the user rating is further predicted. The goal of this network is to provide more extensive personalization through collaborative filtering. The task of sorting the network is to carefully analyze the candidate content, selected a small amount of the best choice. The specific operation is based on the video description data and user behavior information, the use of the design of the target function for each video score, the highest score of the video presented to the user.
In this mode, the machine took over the platform. In the depth of continuous learning training, the machine more and more intelligent, and people will continue to deal with the IQ, in a sense, gradually bear the responsibility of the watchdog.
Second, the content industry is about to be C2M subversion
(ATM) of a bank in the city of Corpus Christi, Texas, USA, spit out the words "save me" on 11th, and the news quickly spreads over the Chinese network , Become the headlines of many sites.
Do you need to see exactly the same article from N sites?
These redundant information consumes your energy and traffic, just as you open any TV channel, can see a lot of instant noodles advertising, it is difficult from a lot of information quickly find what they want.
How to solve the embarrassment of user information redundancy?
In the past there have been many unsuccessful technical programs, personal portal short-lived, RSS subscription is not a climate, cross-site tracking can not stand. Can lead the future, only C2M.
C2M mode can be applied to the whole network as a headline today, ｏｒ a platform based on Facebook. Its core is based on user behavior habits, characteristics and demands, the massive information to extract, sort and then passed to the user, which is to overcome the pain point of the secret.
But the voice of questioning a lot. Such as the view that synergistic filtering such a recommendation is easy for users to form information cocoon room, can not identify reading scenes, real-time poor, time-consuming and other shortcomings, and today's headlines such patterns are often criticized, but also difficult to capture the user interest , User data privacy and management and many other challenges.
Support and challenge each end, what is right and wrong? Although there are two major opportunities in the future, but now to cross three mountains.
1. The reasons for support are as follows:
① thousands of thousands of people, all adjustable.
The personalized content recommendation mechanism can recommend information based on the user's preferences. Through a variety of algorithms, by analyzing the user's historical behavior, compare the relevant users and related items guess the user may like the content, list the candidate set and verify, the user can get more accurate content, so that the distribution of information to do thousands of people , To achieve the content and the user's precise connection, rather than the traditional sense of thousands of people like the delivery.
② sea fishing needle, improve efficiency
Personalized recommendation eliminates the need for users to extract and search on massive amounts of information. Users do not need to touch the needle in the massive information, to a certain extent for the user to remove some of the useless information, narrow the scope of user information search, improve the user's reading efficiency.
③ cast its good, enhance the sticky
Constantly recommend the user for his content can increase the user sticky. Personalized recommendation technology through the algorithm for users interested in the content of the precise recommendation to help users quickly find the content of interest, when you read a content, will immediately give you recommend related things, you can increase the user sticky, improve user experience The
④ digging long tail, breaking the poles
Personalized recommendation to help users through the relevant algorithm to dig long tail content, to avoid the polarization of the Matthew effect. When A users prefer the more popular long tail content, and B users have the same ｏｒ similar with the A user interest and behavior habits,
Add: Here is your company address