点上方蓝色“菜鸟学Python”,选“星标”
重磅干货,第一时间送到
作者:Data Monster, 译者:Linstancy, 出品:AI科技大本营(ID:rgznai100)
本文将讨论文本预处理的基本步骤,旨在将文本信息从人类语言转换为机器可读格式以便用于后续处理。此外,本文还将进一步讨论文本预处理过程所需要的工具。
当拿到一个文本后,首先从文本正则化(text normalization) 处理开始。常见的文本正则化步骤包括:
-
将文本中出现的所有字母转换为小写或大写
-
将文本中的数字转换为单词或删除这些数字
-
删除文本中出现的标点符号、重音符号以及其他变音符号
-
删除文本中的空白区域
-
扩展文本中出现的缩写
-
删除文本中出现的终止词、稀疏词和特定词
-
文本规范化(text canonicalization)
下面将详细描述上述文本正则化步骤。
将文本中出现的字母转化为小写
示例1:将字母转化为小写
Python 实现代码:
<p style="font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);padding: 0.5em;margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;">input_str = ”The <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(174, 135, 250);overflow-wrap: inherit !important;word-break: inherit !important;">5</span> biggest countries <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">by</span> population <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">in</span> <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(174, 135, 250);overflow-wrap: inherit !important;word-break: inherit !important;">2017</span> are China, India, United States, Indonesia, <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">and</span> Brazil.”<br />input_str = input_str.lower()<br /><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">print</span>(input_str)</span></p>
输出:
<p style="font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);padding: 0.5em;margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;">the <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(174, 135, 250);overflow-wrap: inherit !important;word-break: inherit !important;">5</span> biggest countries <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">by</span> population <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">in</span> <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(174, 135, 250);overflow-wrap: inherit !important;word-break: inherit !important;">2017</span> are china, india, united states, indonesia, <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">and</span> brazil.</span></p>
删除文本中出现的数字
如果文本中的数字与文本分析无关的话,那就删除这些数字。通常,正则化表达式可以帮助你实现这一过程。
示例2:删除数字
Python 实现代码:
<p style="padding: 0.5em;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;"><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">import</span> re<br />input_str = ’Box A contains <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(174, 135, 250);overflow-wrap: inherit !important;word-break: inherit !important;">3</span> red <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">and</span> <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(174, 135, 250);overflow-wrap: inherit !important;word-break: inherit !important;">5</span> white balls, <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">while</span> Box B contains <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(174, 135, 250);overflow-wrap: inherit !important;word-break: inherit !important;">4</span> red <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">and</span> <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(174, 135, 250);overflow-wrap: inherit !important;word-break: inherit !important;">2</span> blue balls.’<br />result = re.sub(r’d+’, ‘’, input_str)<br /><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">print</span>(result)<br /></span></p>
输出:
<p style="padding: 0.5em;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;">Box A contains red <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">and</span> white balls, <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">while</span> Box B contains red <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">and</span> blue balls.<br /></span></p>
删除文本中出现的标点
以下示例代码演示如何删除文本中的标点符号,如 [!”#$%&’()*+,-./:;<=>?@[]^_`{|}~] 等符号。
示例3:删除标点
Python 实现代码:
<p style="padding: 0.5em;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;"><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">import</span> <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">string</span><br />input_str = “This &is [an] example? {of} <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">string</span>. with.? punctuation!!!!” # Sample <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">string</span><br />result = input_str.translate(<span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">string</span>.maketrans(“”,””), <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">string</span>.punctuation)<br /><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">print</span>(result)<br /></span></p>
输出:
<p style="padding: 0.5em;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;">This <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">is</span> an example <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">of</span> <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">string</span> <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">with</span> punctuation</span></p>
删除文本中出现的空格
可以通过 strip()函数移除文本前后出现的空格。
示例4:删除空格
Python 实现代码:
<p style="padding: 0.5em;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;">input_str = “ t a string examplet “<br />input_str = input_str.strip()<br />input_str<br /></span></p>
输出:
<p style="padding: 0.5em;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;">‘a <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">string</span> example’</span></p>
符号化(Tokenization)
符号化是将给定的文本拆分成每个带标记的小模块的过程,其中单词、数字、标点及其他符号等都可视为是一种标记。在下表中(Tokenization sheet),罗列出用于实现符号化过程的一些常用工具。
删除文本中出现的终止词
终止词(Stop words) 指的是“a”,“a”,“on”,“is”,“all”等语言中最常见的词。这些词语没什么特别或重要意义,通常可以从文本中删除。一般使用 Natural Language Toolkit(NLTK) 来删除这些终止词,这是一套专门用于符号和自然语言处理统计的开源库。
示例7:删除终止词
实现代码:
<p style="padding: 0.5em;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;">input_str = “NLTK <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">is</span> a leading platform <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">for</span> building Python programs to work <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">with</span> human language data.”<br />stop_words = set(stopwords.words(‘english’))<br /><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">from</span> nltk.tokenize <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">import</span> word_tokenize<br />tokens = word_tokenize(input_str)<br />result = [i <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">for</span> i <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">in</span> tokens <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">if</span> <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">not</span> i <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">in</span> stop_words]<br /><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">print</span> (result)<br /></span></p>
输出:
<p style="padding: 0.5em;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;">[‘NLTK’, ‘leading’, ‘platform’, ‘building’, ‘Python’, ‘programs’, ‘work’, ‘human’, ‘language’, ‘data’, ‘.’]</span></p>
此外,scikit-learn 也提供了一个用于处理终止词的工具:
<p style="padding: 0.5em;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;"><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">from</span> sklearn.feature_extraction.stop_words <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">import</span> ENGLISH_STOP_WORDS<br /></span></p>
同样,spaCy 也有一个类似的处理工具:
<p style="padding: 0.5em;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;"><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">from</span> <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">spacy</span><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(165, 218, 45);overflow-wrap: inherit !important;word-break: inherit !important;">.lang.en.stop_words</span> <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">import</span> <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">STOP_WORDS</span><br /></span></p>
删除文本中出现的稀疏词和特定词
在某些情况下,有必要删除文本中出现的一些稀疏术语或特定词。考虑到任何单词都可以被认为是一组终止词,因此可以通过终止词删除工具来实现这一目标。
词干提取(Stemming)
词干提取是一个将词语简化为词干、词根或词形的过程(如 books-book,looked-look)。当前主流的两种算法是 Porter stemming 算法(删除单词中删除常见的形态和拐点结尾) 和 Lancaster stemming 算法。
示例 8:使用 NLYK 实现词干提取
实现代码:
<p style="padding: 0.5em;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;"><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">from</span> nltk.stem <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">import</span> PorterStemmer<br /><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">from</span> nltk.tokenize <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">import</span> word_tokenize<br />stemmer= PorterStemmer()<br />input_str=”There are several types <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">of</span> stemming algorithms.”<br />input_str=word_tokenize(input_str)<br /><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">for</span> word <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">in</span> input_str:<br /> <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">print</span>(stemmer.stem(word))<br /></span></p>
输出:
<p style="padding: 0.5em;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;">There are sever <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">type</span> <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">of</span> stem algorithm.<br /></span></p>
词形还原(Lemmatization)
词形还原的目的,如词干过程,是将单词的不同形式还原到一个常见的基础形式。与词干提取过程相反,词形还原并不是简单地对单词进行切断或变形,而是通过使用词汇知识库来获得正确的单词形式。
当前常用的词形还原工具库包括: NLTK(WordNet Lemmatizer),spaCy,TextBlob,Pattern,gensim,Stanford CoreNLP,基于内存的浅层解析器(MBSP),Apache OpenNLP,Apache Lucene,文本工程通用架构(GATE),Illinois Lemmatizer 和 DKPro Core。
示例 9:使用 NLYK 实现词形还原
实现代码:
<p style="padding: 0.5em;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;"><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">from</span> nltk.stem <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">import</span> WordNetLemmatizer<br /><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">from</span> nltk.tokenize <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">import</span> word_tokenize<br />lemmatizer=WordNetLemmatizer()<br />input_str=”been had done languages cities mice”<br />input_str=word_tokenize(input_str)<br /><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">for</span> word <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">in</span> input_str:<br /> <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">print</span>(lemmatizer.lemmatize(word))<br /></span></p>
输出:
<p style="padding: 0.5em;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;">be have <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">do</span> <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">language</span> city mouse<br /></span></p>
词性标注(POS)
词性标注旨在基于词语的定义和上下文意义,为给定文本中的每个单词(如名词、动词、形容词和其他单词) 分配词性。当前有许多包含 POS 标记器的工具,包括 NLTK,spaCy,TextBlob,Pattern,Stanford CoreNLP,基于内存的浅层分析器(MBSP),Apache OpenNLP,Apache Lucene,文本工程通用架构(GATE),FreeLing,Illinois Part of Speech Tagger 和 DKPro Core。
示例 10:使用 TextBlob 实现词性标注
实现代码:
<p style="padding: 0.5em;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;">input_str=”Parts <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">of</span> speech examples: an article, to write, interesting, easily, <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">and</span>, <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">of</span>”<br /><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">from</span> textblob <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">import</span> TextBlob<br />result = TextBlob(input_str)<br /><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">print</span>(result.tags)<br /></span></p>
输出:
<p style="padding: 0.5em;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;">[(‘Parts’, u’NNS’), (‘of’, u’IN’), (‘speech’, u’NN’), (‘examples’, u’NNS’), (‘an’, u’DT’), (‘article’, u’NN’), (‘to’, u’TO’), (‘write’, u’VB’), (‘interesting’, u’VBG’), (‘easily’, u’RB’), (‘and’, u’CC’), (‘of’, u’IN’)]<br /></span></p>
词语分块(浅解析)
词语分块是一种识别句子中的组成部分(如名词、动词、形容词等),并将它们链接到具有不连续语法意义的高阶单元(如名词组或短语、动词组等) 的自然语言过程。常用的词语分块工具包括:NLTK,TreeTagger chunker,Apache OpenNLP,文本工程通用架构(GATE),FreeLing。
示例 11:使用 NLYK 实现词语分块
第一步需要确定每个单词的词性。
实现代码:
<p style="padding: 0.5em;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;">input_str=”A black television <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">and</span> a white stove were bought <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">for</span> the <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">new</span> apartment <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">of</span> John.”<br /><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">from</span> textblob <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">import</span> TextBlob<br />result = TextBlob(input_str)<br /><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">print</span>(result.tags)<br /></span></p>
输出:
<p style="padding: 0.5em;font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;">[(‘A’, u’DT’), (‘black’, u’JJ’), (‘television’, u’NN’), (‘and’, u’CC’), (‘a’, u’DT’), (‘white’, u’JJ’), (‘stove’, u’NN’), (‘were’, u’VBD’), (‘bought’, u’VBN’), (‘for’, u’IN’), (‘the’, u’DT’), (‘new’, u’JJ’), (‘apartment’, u’NN’), (‘of’, u’IN’), (‘John’, u’NNP’)]<br /></span></p>
第二部就是进行词语分块
实现代码:
<p style="font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);padding: 0.5em;margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;">reg_exp = “NP: {<DT>?<JJ>*<NN>}”<br />rp = nltk.RegexpParser(reg_exp)<br />result = rp.parse(result.tags)<br />print(result)<br /></span></p>
输出:
<p style="font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);padding: 0.5em;margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;">(S (NP A/DT black/JJ television/NN) <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">and</span>/CC (NP a/DT white/JJ stove/NN) were/VBD bought/VBN <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">for</span>/<span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">IN</span> (NP the/DT <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">new</span>/JJ apartment/NN)<br /><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">of</span>/<span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">IN</span> John/NNP)<br /></span></p>
也可以通过 result.draw() 函数绘制句子树结构图,如下图所示。
命名实体识别(Named Entity Recognition)
命名实体识别(NER) 旨在从文本中找到命名实体,并将它们划分到事先预定义的类别(人员、地点、组织、时间等)。
常见的命名实体识别工具如下表所示,包括:NLTK,spaCy,文本工程通用架构(GATE) -- ANNIE,Apache OpenNLP,Stanford CoreNLP,DKPro核心,MITIE,Watson NLP,TextRazor,FreeLing 等。
示例 12:使用 TextBlob 实现词性标注
实现代码:
<p style="font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);padding: 0.5em;margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;"><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">from</span> nltk <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">import</span> word_tokenize, pos_tag, ne_chunk<br />input_str = “Bill works <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">for</span> Apple so he went to Boston <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">for</span> a conference.”<br /><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">print</span> ne_chunk(pos_tag(word_tokenize(input_str)))<br /></span></p>
输出:
<p style="font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);padding: 0.5em;margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;">(S (PERSON Bill/NNP) works/VBZ <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">for</span>/<span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">IN</span> Apple/NNP so/<span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">IN</span> he/PRP went/VBD <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">to</span>/<span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">TO</span> (GPE Boston/NNP) <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">for</span>/<span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">IN</span> a/DT conference/NN ./.)<br /></span></p>
共指解析 Coreference resolution(回指分辨率 anaphora resolution)
代词和其他引用表达应该与正确的个体联系起来。Coreference resolution 在文本中指的是引用真实世界中的同一个实体。如在句子 “安德鲁说他会买车”中,代词“他”指的是同一个人,即“安德鲁”。常用的 Coreference resolution 工具如下表所示,包括 Stanford CoreNLP,spaCy,Open Calais,Apache OpenNLP 等。
搭配提取(Collocation extraction)
搭配提取过程并不是单独、偶然发生的,它是与单词组合一同发生的过程。该过程的示例包括“打破规则 break the rules”,“空闲时间 free time”,“得出结论 draw a conclusion”,“记住 keep in mind”,“准备好 get ready”等。
示例 13:使用 ICE 实现搭配提取
实现代码:
<p style="font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);padding: 0.5em;margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;">input=[“he <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">and</span> Chazz duel <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">with</span> all keys on the line.”]<br /><span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">from</span> ICE <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">import</span> CollocationExtractor<br />extractor = CollocationExtractor.with_collocation_pipeline(“T1” , bing_key = “Temp”,pos_check = <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(248, 35, 117);overflow-wrap: inherit !important;word-break: inherit !important;">False</span>)<br />print(extractor.get_collocations_of_length(input, length = <span style="letter-spacing: 1px;font-size: inherit;line-height: inherit;color: rgb(174, 135, 250);overflow-wrap: inherit !important;word-break: inherit !important;">3</span>))<br /></span></p>
输出:
<p style="font-size: 14px;letter-spacing: 0px;font-family: Consolas, Inconsolata, Courier, monospace;border-radius: 0px;color: rgb(169, 183, 198);background: rgb(40, 43, 46);padding: 0.5em;margin-left: 8px;margin-right: 8px;line-height: 1.75em;display: block !important;overflow-wrap: normal !important;word-break: normal !important;overflow: auto !important;text-align: justify;"><span style="letter-spacing: 1px;">[“on the line”]<br /></span></p>
关系提取(Relationship extraction)
关系提取过程是指从非结构化的数据源 (如原始文本)获取结构化的文本信息。严格来说,它确定了命名实体(如人、组织、地点的实体) 之间的关系(如配偶、就业等关系)。例如,从“昨天与 Mark 和 Emily 结婚”这句话中,我们可以提取到的信息是 Mark 是 Emily 的丈夫。
总结
本文讨论文本预处理及其主要步骤,包括正则化、符号化、词干化、词形还原、词语分块、词性标注、命名实体识别、共指解析、搭配提取和关系提取。还通过一些表格罗列出常见的文本预处理工具及所对应的示例。在完成这些预处理工作后,得到的结果可以用于更复杂的 NLP 任务,如机器翻译、自然语言生成等任务。
<section mpa-paragraph-type="ignored" data-mpa-template="t"><section><section data-role="outer" label="Powered by 135editor.com"><section data-tools="135编辑器" data-id="91525"><section><section><section data-role="outer" label="Powered by 135editor.com"><section data-tools="135编辑器" data-id="91525"><section><section data-mpa-template="t" mpa-paragraph-type="ignored"><section data-mpa-template="t" mpa-paragraph-type="ignored"><section data-mpa-template="t" mpa-paragraph-type="ignored"><h1><section data-role="paragraph"><section data-width="100%"><section><section><section><section><section powered-by="xiumi.us"><section><section powered-by="xiumi.us"><section><section><section data-mpa-template="t" mpa-from-tpl="t" style="color: rgb(62, 62, 62);letter-spacing: 0.544px;text-align: left;white-space: pre-wrap;widows: 1;word-spacing: 2px;caret-color: rgb(51, 51, 51);"><section data-mpa-template="t" mpa-from-tpl="t"><section data-mpa-template="t" mpa-from-tpl="t"><section style="font-family: Optima-Regular, Optima, PingFangSC-light, PingFangTC-light, "PingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;color: rgb(0, 0, 0);letter-spacing: 1px;line-height: 1.75em;"><span style="color: rgb(0, 122, 170);"><strong><span style="letter-spacing: 0.1em;orphans: 4;">推荐阅读:</span></strong></span></section><p style="color: rgb(63, 63, 63);font-family: Optima-Regular, Optima, PingFangSC-light, PingFangTC-light, "PingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;letter-spacing: 0.544px;"><br /></p><p style="font-family: Optima-Regular, Optima, PingFangSC-light, PingFangTC-light, "PingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;color: rgb(0, 0, 0);letter-spacing: 1px;line-height: 2em;">这个GitHub 1400星的Git魔法书火了,斯坦福校友出品丨有中文版</p><p style="font-family: Optima-Regular, Optima, PingFangSC-light, PingFangTC-light, "PingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;color: rgb(0, 0, 0);letter-spacing: 1px;line-height: 2em;">贼 TM 好用的 Java 工具类库<br /></p><p style="font-family: Optima-Regular, Optima, PingFangSC-light, PingFangTC-light, "PingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;color: rgb(0, 0, 0);letter-spacing: 1px;line-height: 2em;">超全Python IDE武器库大总结,优缺点一目了然!<br /></p><p style="font-family: Optima-Regular, Optima, PingFangSC-light, PingFangTC-light, "PingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;color: rgb(0, 0, 0);letter-spacing: 1px;line-height: 2em;">秋招来袭!GitHub28.5颗星!这个汇聚阿里,腾讯,百度,美团,头条的面试题库必须安利!<br /></p><p style="font-family: Optima-Regular, Optima, PingFangSC-light, PingFangTC-light, "PingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;color: rgb(0, 0, 0);letter-spacing: 1px;line-height: 2em;">收获10400颗星!这个Python库有点黑科技,竟然可以伪造很多'假'的数据!<br /></p><p style="font-family: Optima-Regular, Optima, PingFangSC-light, PingFangTC-light, "PingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;color: rgb(0, 0, 0);letter-spacing: 1px;line-height: 2em;"><span style="color: var(--weui-LINK);-webkit-tap-highlight-color: rgba(0, 0, 0, 0);cursor: pointer;font-size: 15px;">牛掰了!这个Python库有点逆天了,竟然能把图片,视频无损清晰放大!</span><br /></p><p style="color: rgb(63, 63, 63);font-family: Optima-Regular, Optima, PingFangSC-light, PingFangTC-light, "PingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;letter-spacing: 0.544px;"><br /></p><section style="margin-right: 8px;margin-left: 8px;font-family: Optima-Regular, Optima, PingFangSC-light, PingFangTC-light, "PingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;color: rgb(0, 0, 0);min-height: 1em;letter-spacing: 1px;line-height: 1.75em;"><br /></section><section style="margin-right: 8px;margin-left: 8px;font-family: Optima-Regular, Optima, PingFangSC-light, PingFangTC-light, "PingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;color: rgb(0, 0, 0);min-height: 1em;letter-spacing: 1px;line-height: 1.75em;text-align: center;"><img class="rich_pages" data-cropselx1="0" data-cropselx2="562" data-cropsely1="0" data-cropsely2="422" data-ratio="0.6564625850340136" data-s="300,640" data-type="png" data-w="1176" style="orphans: 4;letter-spacing: 0.544px;caret-color: rgb(63, 63, 63);color: rgb(63, 63, 63);word-spacing: 1px;box-sizing: border-box !important;visibility: visible !important;width: 402px !important;" src="https://www.zkxjob.com/wp-content/uploads/2022/05/wxsync-2022-05-4aee0237cdec4bba3fed02f5f75e5dd5.png" /></section><section style="margin-right: 8px;margin-left: 8px;font-family: Optima-Regular, Optima, PingFangSC-light, PingFangTC-light, "PingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;color: rgb(0, 0, 0);min-height: 1em;letter-spacing: 1px;line-height: 1.75em;"><br /></section></section></section></section></section></section></section></section></section></section></section></section></section></section></section></h1><h1 style="white-space: pre-wrap;letter-spacing: 0.544px;font-family: 微软雅黑;"><section mpa-from-tpl="t" style="font-family: -apple-system-font, system-ui, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;letter-spacing: 0.544px;text-align: center;"><pre style="letter-spacing: 0.544px;word-spacing: 1px;line-height: inherit;"><pre style="letter-spacing: 0.544px;line-height: inherit;"><section data-mpa-template="t" mpa-from-tpl="t"><section data-id="94086" data-color="#276ca3" data-tools="135编辑器" mpa-from-tpl="t"><section><section mpa-from-tpl="t"><pre style="color: rgb(63, 63, 63);letter-spacing: 0.544px;line-height: inherit;"><section data-mpa-template="t" mpa-from-tpl="t"><section data-id="94086" data-color="#276ca3" data-tools="135编辑器" mpa-from-tpl="t"><section style="text-align: left;"><section mpa-from-tpl="t" style="display: inline-block;"><pre style="letter-spacing: 0.544px;line-height: inherit;"><pre data-darkmode-bgcolor-15882384789136="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15882384789136="rgb(255, 255, 255)" data-style="letter-spacing: 0.544px; background-color: rgb(255, 255, 255); text-align: center; color: rgba(230, 230, 230, 0.9); font-size: 16px; line-height: 25.6px; overflow-wrap: break-word !important;" data-darkmode-bgcolor-15882396318564="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15882396318564="rgb(255, 255, 255)" data-darkmode-color-15882396318564="rgba(230, 230, 230, 0.9)" data-darkmode-original-color-15882396318564="rgba(230, 230, 230, 0.9)" data-darkmode-bgcolor-15900529136199="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15900529136199="rgb(255, 255, 255)" data-darkmode-color-15900529136199="rgba(230, 230, 230, 0.9)" data-darkmode-original-color-15900529136199="rgba(230, 230, 230, 0.9)" style="letter-spacing: 0.544px;text-align: center;color: rgba(230, 230, 230, 0.9);line-height: 25.6px;"><section><section data-darkmode-bgcolor-15860613985508="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15860613985508="rgb(255, 255, 255)" data-darkmode-color-15860613985508="rgb(230, 230, 230)" data-darkmode-original-color-15860613985508="rgb(0, 0, 0)" data-darkmode-bgcolor-15870356070738="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15870356070738="rgb(255, 255, 255)" data-darkmode-color-15870356070738="rgb(230, 230, 230)" data-darkmode-original-color-15870356070738="rgb(0, 0, 0)" data-darkmode-bgcolor-15870356071023="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15870356071023="rgb(255, 255, 255)" data-darkmode-color-15870356071023="rgb(230, 230, 230)" data-darkmode-original-color-15870356071023="rgb(0, 0, 0)" data-darkmode-bgcolor-15882384789136="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15882384789136="rgb(255, 255, 255)" data-darkmode-color-15882384789136="rgb(230, 230, 230)" data-darkmode-original-color-15882384789136="rgb(0, 0, 0)" data-darkmode-bgcolor-15882396318564="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15882396318564="rgb(255, 255, 255)" data-darkmode-color-15882396318564="rgb(230, 230, 230)" data-darkmode-original-color-15882396318564="rgb(0, 0, 0)" data-darkmode-bgcolor-15900529136199="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15900529136199="rgb(255, 255, 255)" data-darkmode-color-15900529136199="rgb(230, 230, 230)" data-darkmode-original-color-15900529136199="rgb(0, 0, 0)" style="display: inline-block;clear: both;"><section data-tools="135编辑器" data-id="91842" data-darkmode-bgcolor-15860613985508="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15860613985508="rgb(255, 255, 255)" data-darkmode-color-15860613985508="rgb(230, 230, 230)" data-darkmode-original-color-15860613985508="rgb(0, 0, 0)" data-darkmode-bgcolor-15870356070738="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15870356070738="rgb(255, 255, 255)" data-darkmode-color-15870356070738="rgb(230, 230, 230)" data-darkmode-original-color-15870356070738="rgb(0, 0, 0)" data-darkmode-bgcolor-15870356071023="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15870356071023="rgb(255, 255, 255)" data-darkmode-color-15870356071023="rgb(230, 230, 230)" data-darkmode-original-color-15870356071023="rgb(0, 0, 0)" data-darkmode-bgcolor-15882384789136="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15882384789136="rgb(255, 255, 255)" data-darkmode-color-15882384789136="rgb(230, 230, 230)" data-darkmode-original-color-15882384789136="rgb(0, 0, 0)" data-darkmode-bgcolor-15882396318564="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15882396318564="rgb(255, 255, 255)" data-darkmode-color-15882396318564="rgb(230, 230, 230)" data-darkmode-original-color-15882396318564="rgb(0, 0, 0)" data-darkmode-bgcolor-15900529136199="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15900529136199="rgb(255, 255, 255)" data-darkmode-color-15900529136199="rgb(230, 230, 230)" data-darkmode-original-color-15900529136199="rgb(0, 0, 0)" style="letter-spacing: 0.544px;border-width: 0px;border-style: none;border-color: initial;"><section data-darkmode-bgcolor-15860613985508="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15860613985508="rgb(255, 255, 255)" data-darkmode-color-15860613985508="rgb(230, 230, 230)" data-darkmode-original-color-15860613985508="rgb(0, 0, 0)" data-darkmode-bgcolor-15870356070738="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15870356070738="rgb(255, 255, 255)" data-darkmode-color-15870356070738="rgb(230, 230, 230)" data-darkmode-original-color-15870356070738="rgb(0, 0, 0)" data-darkmode-bgcolor-15870356071023="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15870356071023="rgb(255, 255, 255)" data-darkmode-color-15870356071023="rgb(230, 230, 230)" data-darkmode-original-color-15870356071023="rgb(0, 0, 0)" data-darkmode-bgcolor-15882384789136="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15882384789136="rgb(255, 255, 255)" data-darkmode-color-15882384789136="rgb(230, 230, 230)" data-darkmode-original-color-15882384789136="rgb(0, 0, 0)" data-darkmode-bgcolor-15882396318564="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15882396318564="rgb(255, 255, 255)" data-darkmode-color-15882396318564="rgb(230, 230, 230)" data-darkmode-original-color-15882396318564="rgb(0, 0, 0)" data-darkmode-bgcolor-15900529136199="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15900529136199="rgb(255, 255, 255)" data-darkmode-color-15900529136199="rgb(230, 230, 230)" data-darkmode-original-color-15900529136199="rgb(0, 0, 0)" style="display: inline-block;clear: both;"><section data-brushtype="text" data-darkmode-bgcolor-15860613985508="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15860613985508="rgb(255, 255, 255)" data-darkmode-color-15860613985508="rgb(230, 230, 230)" data-darkmode-original-color-15860613985508="rgb(0, 0, 0)" data-darkmode-bgimage-15860613985508="1" data-style="padding: 18px 15px 20px 10px; color: rgb(86, 146, 214); text-align: center; letter-spacing: 1.5px; background-image: url('https://www.zkxjob.com/wp-content/uploads/2022/05/wxsync-2022-05-a2a8a5e1e58f30392066a170034ee027.png'); background-size: 100% 100%; background-repeat: no-repeat; overflow-wrap: break-word !important;" data-darkmode-bgcolor-15870356070738="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15870356070738="rgb(255, 255, 255)" data-darkmode-color-15870356070738="rgb(230, 230, 230)" data-darkmode-original-color-15870356070738="rgb(0, 0, 0)" data-darkmode-bgimage-15870356070738="1" data-darkmode-bgcolor-15870356071023="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15870356071023="rgb(255, 255, 255)" data-darkmode-color-15870356071023="rgb(230, 230, 230)" data-darkmode-original-color-15870356071023="rgb(0, 0, 0)" data-darkmode-bgimage-15870356071023="1" data-darkmode-bgcolor-15882384789136="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15882384789136="rgb(255, 255, 255)" data-darkmode-color-15882384789136="rgb(230, 230, 230)" data-darkmode-original-color-15882384789136="rgb(0, 0, 0)" data-darkmode-bgimage-15882384789136="1" data-darkmode-bgcolor-15882396318564="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15882396318564="rgb(255, 255, 255)" data-darkmode-color-15882396318564="rgb(230, 230, 230)" data-darkmode-original-color-15882396318564="rgb(0, 0, 0)" data-darkmode-bgimage-15882396318564="1" data-darkmode-bgcolor-15900529136199="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15900529136199="rgb(255, 255, 255)" data-darkmode-color-15900529136199="rgb(230, 230, 230)" data-darkmode-original-color-15900529136199="rgb(0, 0, 0)" data-darkmode-bgimage-15900529136199="1" style="padding: 18px 15px 20px 10px;background-size: 100% 100%;background-image: url('https://www.zkxjob.com/wp-content/uploads/2022/05/wxsync-2022-05-a2a8a5e1e58f30392066a170034ee027.png');color: rgb(86, 146, 214);letter-spacing: 1.5px;background-repeat: no-repeat;"><section data-darkmode-bgcolor-15860613985508="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15860613985508="rgb(255, 255, 255)" data-darkmode-color-15860613985508="rgb(230, 230, 230)" data-darkmode-original-color-15860613985508="rgb(0, 0, 0)" data-darkmode-bgimage-15860613985508="1" data-darkmode-bgcolor-15870356070738="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15870356070738="rgb(255, 255, 255)" data-darkmode-color-15870356070738="rgb(230, 230, 230)" data-darkmode-original-color-15870356070738="rgb(0, 0, 0)" data-darkmode-bgimage-15870356070738="1" data-darkmode-bgcolor-15870356071023="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15870356071023="rgb(255, 255, 255)" data-darkmode-color-15870356071023="rgb(230, 230, 230)" data-darkmode-original-color-15870356071023="rgb(0, 0, 0)" data-darkmode-bgimage-15870356071023="1" data-darkmode-bgcolor-15882384789136="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15882384789136="rgb(255, 255, 255)" data-darkmode-color-15882384789136="rgb(230, 230, 230)" data-darkmode-original-color-15882384789136="rgb(0, 0, 0)" data-darkmode-bgimage-15882384789136="1" data-darkmode-bgcolor-15882396318564="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15882396318564="rgb(255, 255, 255)" data-darkmode-color-15882396318564="rgb(230, 230, 230)" data-darkmode-original-color-15882396318564="rgb(0, 0, 0)" data-darkmode-bgimage-15882396318564="1" data-darkmode-bgcolor-15900529136199="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15900529136199="rgb(255, 255, 255)" data-darkmode-color-15900529136199="rgb(230, 230, 230)" data-darkmode-original-color-15900529136199="rgb(0, 0, 0)" data-darkmode-bgimage-15900529136199="1" style="display: flex;justify-content: center;align-items: center;"><section data-darkmode-bgcolor-15860613985508="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15860613985508="rgb(255, 255, 255)" data-darkmode-color-15860613985508="rgb(230, 230, 230)" data-darkmode-original-color-15860613985508="rgb(0, 0, 0)" data-darkmode-bgimage-15860613985508="1" data-darkmode-bgcolor-15870356070738="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15870356070738="rgb(255, 255, 255)" data-darkmode-color-15870356070738="rgb(230, 230, 230)" data-darkmode-original-color-15870356070738="rgb(0, 0, 0)" data-darkmode-bgimage-15870356070738="1" data-darkmode-bgcolor-15870356071023="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15870356071023="rgb(255, 255, 255)" data-darkmode-color-15870356071023="rgb(230, 230, 230)" data-darkmode-original-color-15870356071023="rgb(0, 0, 0)" data-darkmode-bgimage-15870356071023="1" data-darkmode-bgcolor-15882384789136="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15882384789136="rgb(255, 255, 255)" data-darkmode-color-15882384789136="rgb(230, 230, 230)" data-darkmode-original-color-15882384789136="rgb(0, 0, 0)" data-darkmode-bgimage-15882384789136="1" data-darkmode-bgcolor-15882396318564="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15882396318564="rgb(255, 255, 255)" data-darkmode-color-15882396318564="rgb(230, 230, 230)" data-darkmode-original-color-15882396318564="rgb(0, 0, 0)" data-darkmode-bgimage-15882396318564="1" data-darkmode-bgcolor-15900529136199="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15900529136199="rgb(255, 255, 255)" data-darkmode-color-15900529136199="rgb(230, 230, 230)" data-darkmode-original-color-15900529136199="rgb(0, 0, 0)" data-darkmode-bgimage-15900529136199="1" style="margin-left: 2px;width: 20px;"></section><section data-brushtype="text" data-darkmode-bgcolor-15860613985508="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15860613985508="rgb(255, 255, 255)" data-darkmode-color-15860613985508="rgb(51, 51, 51)" data-darkmode-original-color-15860613985508="rgb(51, 51, 51)" data-darkmode-bgimage-15860613985508="1" data-darkmode-bgcolor-15870356070738="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15870356070738="rgb(255, 255, 255)" data-darkmode-color-15870356070738="rgb(51, 51, 51)" data-darkmode-original-color-15870356070738="rgb(51, 51, 51)" data-darkmode-bgimage-15870356070738="1" data-darkmode-bgcolor-15870356071023="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15870356071023="rgb(255, 255, 255)" data-darkmode-color-15870356071023="rgb(51, 51, 51)" data-darkmode-original-color-15870356071023="rgb(51, 51, 51)" data-darkmode-bgimage-15870356071023="1" data-darkmode-bgcolor-15882384789136="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15882384789136="rgb(255, 255, 255)" data-darkmode-color-15882384789136="rgb(51, 51, 51)" data-darkmode-original-color-15882384789136="rgb(51, 51, 51)" data-darkmode-bgimage-15882384789136="1" data-darkmode-bgcolor-15882396318564="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15882396318564="rgb(255, 255, 255)" data-darkmode-color-15882396318564="rgb(51, 51, 51)" data-darkmode-original-color-15882396318564="rgb(51, 51, 51)" data-darkmode-bgimage-15882396318564="1" data-darkmode-bgcolor-15900529136199="rgb(36, 36, 36)" data-darkmode-original-bgcolor-15900529136199="rgb(255, 255, 255)" data-darkmode-color-15900529136199="rgb(51, 51, 51)" data-darkmode-original-color-15900529136199="rgb(51, 51, 51)" data-darkmode-bgimage-15900529136199="1" style="font-size: 14px;color: rgb(51, 51, 51);text-align: left;"><span style="font-family: 楷体, 楷体_GB2312, SimKai;">点这里,获取一大波福利</span><span style="font-family: mp-quote, -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 17px;text-align: justify;"></span></section></section></section></section></section></section></section>
本篇文章来源于: 菜鸟学Python
本文为原创文章,版权归知行编程网所有,欢迎分享本文,转载请保留出处!
你可能也喜欢
- ♥ python adb的作用是什么?12/15
- ♥ 如何用python查看网页代码01/11
- ♥ Python执行数据库查询操作09/20
- ♥ python中的%d是什么08/13
- ♥ 2020年Python优质原创文章年度榜单!06/13
- ♥ 1800字,说说我面了大厂、央企,银行,8场笔试的秋招心得06/05
内容反馈