Chinese Sentiment Classification Model based on Pre-Trained BERT

2021-10-21 17:30:03 阅读：183 来源： 互联网

标签：Pre BERT based classification Chinese 模型 sentient model

Abstract

In order to solve the problems of low accuracy, less training data and poor training results of traditional machine learning algorithm in Chinese sentient classification task, this paper proposes a Chinese sentient classification model based on pre trained BERT.
针对传统机器学习算法在中文感知分类任务中准确率低、训练数据少、训练效果差的问题，提出了一种基于预训练BERT的中文感知分类模型。

After pre training for Bert using large-scale unlabeled corpus, the Bert model can extract the text abstract features of a single Chinese
character based on the context semantic relationship. Then we combine the text abstract features with the whole sentence’s semantic vector and send them to the softmax classifier to form a Chinese sentient classification model for training and fine tuning.
采用大规模无标记语料库对Bert进行预处理后，Bert模型可以根据上下文语义关系提取单个汉字的文本抽象特征。然后将文本抽象特征与整个句子的语义向量相结合，发送到softmax分类器中，形成文本抽象特征用于训练和微调的中文感知分类模型。

The model performs well in multiple open source Chinese sentient classification datasets, which indicates that the model can complete Chinese sentient classification task well.
该模型在多个开放源码中文感知分类数据集上表现良好，表明该模型能够较好地完成中文感知分类良好的中文感知分类任务。

Conclusions

Because the relevant research team has disclosed the powerful pre trained BERT, we tried to complete the Chinese sentient classification task through the fine tuning pre trained BERT, and achieved good results. Compared with the previous model, this model is based on the pre training BERT model, which has the characteristics of fast training and accurate classification.
由于相关研究团队已经揭示了强大的预训练BERT，我们尝试通过微调预训练BERT来完成中文感知分类任务，并取得了良好的效果。与之前的模型相比，该模型基于训练前的BERT模型，具有训练快速、分类准确的特点。

Experiments show that the evaluation metrics of the model are all above 0.9 on several binary classification tasks, but the precision rate and F1 score of the multi-category sentient classification task simplifyweibo_4_moods are all around 0.73.
实验表明，该模型在多个二元分类任务上的评价指标均在0.9以上，而多类别感知分类任务simplifyweibo_4_ mood的准确率和F1评分均在0.73左右。

In addition to simplifyweibo_4_ moods containing too many emoticons, it is also possible that our model relies too much on pre trained BERT, resulting in limited features of the downstream task learned by the model. In future work, we can enrich our model by adding bi-LSTM and other structures after pre trained BERT, to make our model have more parameters which need to be trained from scratch.
除了简化包含太多表情符号的微博情绪外，还有一种可能是，我们的模型过于依赖预先训练过的BERT，导致模型学习到的下游任务的特征有限。在未来的工作中，我们可以通过在预先训练的BERT后添加bi-LSTM等结构来丰富我们的模型，使我们的模型有更多的参数需要从头训练。

标签：Pre,BERT,based,classification,Chinese,模型,sentient,model
来源： https://blog.csdn.net/qq_33790600/article/details/120890789

本站声明： 1. iCode9 技术分享网（下文简称本站）提供的所有内容，仅供技术学习、探讨和分享；
2. 关于本站的所有留言、评论、转载及引用，纯属内容发起人的个人观点，与本站观点和立场无关；
3. 关于本站的所有言论和文字，纯属内容发起人的个人观点，与本站观点和立场无关；
4. 本站文章均是网友提供，不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属；如您发现该文章侵犯了您的权益，可联系我们第一时间进行删除；
5. 本站为非盈利性的个人网站，所有内容不会用来进行牟利，也不会利用任何形式的广告来间接获益，纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

ICode9

Chinese Sentiment Classification Model based on Pre-Trained BERT

Abstract

Conclusions