如果你是一名自然语言处理从业者,那你一定听说过最近大火的 BERT 模型。本文是一份使用简化版的 BERT 模型——DisTillBERT 完成句子情感分类任务的详细教程,是一份不可多得的 BERT 快速入门指南。
sentence | label |
---|---|
a stirring , funny and finally transporting re imagining of beauty and the beast and 1930s horror films | 1 |
apparently reassembled from the cutting room floor of any given daytime soap | 0 |
they presume their audience won't sit still for a sociology lesson | 0 |
this is a visually stunning rumination on love , memory , history and the war between art and commerce | 1 |
jonathan parker 's bartleby should have been the be all end all of the modern office anomie films | 1 |
-
DistilBERT 先对句子进行处理,并将它提取到的信息传给下个模型。DistilBERT 是 BERT 的缩小版,它是由 HuggingFace 的一个团队开发并开源的。它是一种更轻量级并且运行速度更快的模型,同时基本达到了 BERT 原有的性能。
-
另外一个模型,是 scikit learn 中的 Logistic 回归模型,它会接收 DistilBERT 处理后的结果,并将句子分类为积极或消极(0 或 1)。
)
train_test_split
, header=None)
输出如下:
model = model_class.from_pretrained(pretrained_weights)
上面的代码将所有句子转化为了编号的列表。
这一步完成后,会将 DistilBERT 的输出赋给「last_hidden_states」。这是一个维度为(句子数,序列中的最大词数,DistilBERT 模型中的隐藏层数)的元组。在本例中,这个维度就是(2000,66,768),因为我们有 2000 个句子,2000 个句子中最长的序列长度为 66,DistilBERT 模型中有 768 个隐藏层。
现在,「features」是一个二维的 numpy 数组,它包含了我们数据集中所有句子的嵌入。
现在我们已经得到了 BERT 的输出,并且把训练 logistic 回归模型的数据集组装好了。这 768 列是特征,我们还给出了原始数据集中的标签。
train_features, test_features, train_labels, test_labels = train_test_split(features, labels)
现在模型已经训练好了,我们可以用测试集对其进行评估。
)
上面一段代码的输出告诉我们这个模型获得了 81% 的准确率。
作为参照,在这个数据集上所能达到的最高的准确率是 96.8。在这个任务中,我们同样可以训练 DistilBERT 来提高其得分,也就是通常所说的调优过程,该过程可以更新 BERT 的权重参数,并让其在句子分类任务中(通常被称为「下游任务」)获得更好的性能。经过调优后的 DistilBERT 可以达到 90.7 的准确率,而完整的 BERT 可以达到 94.9。
<section style="margin-right: 8px;margin-left: 8px;max-width: 100%;white-space: normal;color: rgb(0, 0, 0);font-family: -apple-system-font, system-ui, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;letter-spacing: 0.544px;text-align: center;widows: 1;line-height: 1.75em;box-sizing: border-box !important;overflow-wrap: break-word !important;"><strong style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;letter-spacing: 0.5px;font-size: 14px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><strong style="max-width: 100%;font-size: 16px;letter-spacing: 0.544px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;letter-spacing: 0.5px;box-sizing: border-box !important;overflow-wrap: break-word !important;">—</span></strong>完<strong style="max-width: 100%;font-size: 16px;letter-spacing: 0.544px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;letter-spacing: 0.5px;font-size: 14px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><strong style="max-width: 100%;font-size: 16px;letter-spacing: 0.544px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;letter-spacing: 0.5px;box-sizing: border-box !important;overflow-wrap: break-word !important;">—</span></strong></span></strong></span></strong></section><section style="max-width: 100%;white-space: normal;font-family: -apple-system-font, system-ui, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;letter-spacing: 0.544px;text-align: center;widows: 1;color: rgb(255, 97, 149);box-sizing: border-box !important;overflow-wrap: break-word !important;"><section powered-by="xiumi.us" style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section style="margin-top: 15px;margin-bottom: 25px;max-width: 100%;opacity: 0.8;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section style="max-width: 100%;letter-spacing: 0.544px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section powered-by="xiumi.us" style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section style="margin-top: 15px;margin-bottom: 25px;max-width: 100%;opacity: 0.8;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section style="margin-right: 8px;margin-bottom: 15px;margin-left: 8px;padding-right: 0em;padding-left: 0em;max-width: 100%;color: rgb(127, 127, 127);font-family: sans-serif;font-size: 12px;line-height: 25.5938px;letter-spacing: 3px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;color: rgb(0, 0, 0);box-sizing: border-box !important;overflow-wrap: break-word !important;"><strong style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;font-size: 16px;font-family: 微软雅黑;caret-color: red;box-sizing: border-box !important;overflow-wrap: break-word !important;">为您推荐</span></strong></span></section><p style="margin-right: 8px;margin-bottom: 5px;margin-left: 8px;padding-right: 0em;padding-left: 0em;max-width: 100%;min-height: 1em;color: rgb(127, 127, 127);font-family: sans-serif;font-size: 12px;line-height: 1.75em;letter-spacing: 0px;box-sizing: border-box !important;overflow-wrap: break-word !important;">猎户座α<span style="max-width: 100%;font-size: 14px;box-sizing: border-box !important;overflow-wrap: break-word !important;">:</span><span style="max-width: 100%;font-size: 14px;box-sizing: border-box !important;overflow-wrap: break-word !important;">AI玩转「吃鸡」游戏</span><br style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /></p><p style="margin-right: 8px;margin-bottom: 5px;margin-left: 8px;padding-right: 0em;padding-left: 0em;max-width: 100%;min-height: 1em;color: rgb(127, 127, 127);font-family: sans-serif;font-size: 12px;line-height: 1.75em;letter-spacing: 0px;box-sizing: border-box !important;overflow-wrap: break-word !important;">网传饶毅举报多位学者论文造假?官方回应了<br style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /></p><p style="margin-right: 8px;margin-bottom: 5px;margin-left: 8px;padding-right: 0em;padding-left: 0em;max-width: 100%;min-height: 1em;color: rgb(127, 127, 127);font-family: sans-serif;font-size: 12px;line-height: 1.75em;letter-spacing: 0px;box-sizing: border-box !important;overflow-wrap: break-word !important;">阿里如何抗住90秒100亿?看这篇你就明白了!<br style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /></p><p style="margin-right: 8px;margin-bottom: 5px;margin-left: 8px;padding-right: 0em;padding-left: 0em;max-width: 100%;min-height: 1em;color: rgb(127, 127, 127);font-family: sans-serif;font-size: 12px;line-height: 1.75em;letter-spacing: 0px;box-sizing: border-box !important;overflow-wrap: break-word !important;">深度学习必懂的13种概率分布<br style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /></p><p style="margin-right: 8px;margin-bottom: 5px;margin-left: 8px;padding-right: 0em;padding-left: 0em;max-width: 100%;min-height: 1em;color: rgb(127, 127, 127);font-family: sans-serif;font-size: 12px;line-height: 1.75em;letter-spacing: 0px;box-sizing: border-box !important;overflow-wrap: break-word !important;">担心美国政府限制,Github考虑在华设立子公司<br style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;" /></p></section></section></section></section></section></section></section></section>
本篇文章来源于: 深度学习这件小事
本文为原创文章,版权归知行编程网所有,欢迎分享本文,转载请保留出处!
内容反馈