知行编程网知行编程网  2022-07-04 13:00 知行编程网 隐藏边栏 |   抢沙发  2 
文章评分 0 次,平均分 0.0

Pandas进阶大神!从0到100你只差这篇文章!

来自 | 51CTO博客   作者 | youerning 

   一、数据对象
pandas主要有两种数据对象:Series、DataFrame
注: 后面代码使用pandas版本0.20.1,通过import pandas as pd引入

1. Series

Series是一种带有索引的序列对象。
简单创建如下:
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(117, 113, 94);line-height: 26px;"># 通过传入一个序列给pd.Series初始化一个Series对象, 比如list</span><br  />s1=pd.Series(list(<span style="color: rgb(166, 226, 46);line-height: 26px;">"1234"</span>))<br  />print(s1)<br  /><span style="line-height: 26px;">0</span>    <span style="line-height: 26px;">1</span><br  /><span style="line-height: 26px;">1</span>    <span style="line-height: 26px;">2</span><br  /><span style="line-height: 26px;">2</span>    <span style="line-height: 26px;">3</span><br  /><span style="line-height: 26px;">3</span>    <span style="line-height: 26px;">4</span><br  />dtype:object</section>

2. DataFrame

类似与数据库table有行列的数据对象。

创建方式如下:

<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(117, 113, 94);line-height: 26px;"># 通过传入一个numpy的二维数组或者dict对象给pd.DataFrame初始化一个DataFrame对象</span><br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 通过numpy二维数组</span><br  /><span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">import</span> numpy <span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">as</span> np<br  />df1 = pd.DataFrame(np.random.randn(<span style="line-height: 26px;">6</span>,<span style="line-height: 26px;">4</span>))<br  />print(df1)<br  /><span style="line-height: 26px;">0</span>   <span style="line-height: 26px;">1</span>   <span style="line-height: 26px;">2</span>   <span style="line-height: 26px;">3</span><br  /><span style="line-height: 26px;">0</span>   <span style="line-height: 26px;">-0.646340</span>   <span style="line-height: 26px;">-1.249943</span>   <span style="line-height: 26px;">0.393323</span>    <span style="line-height: 26px;">-1.561873</span><br  /><span style="line-height: 26px;">1</span>   <span style="line-height: 26px;">0.371630</span>    <span style="line-height: 26px;">0.069426</span>    <span style="line-height: 26px;">1.693097</span>    <span style="line-height: 26px;">0.907419</span><br  /><span style="line-height: 26px;">2</span>   <span style="line-height: 26px;">-0.328575</span>   <span style="line-height: 26px;">-0.256765</span>   <span style="line-height: 26px;">0.693798</span>    <span style="line-height: 26px;">-0.787343</span><br  /><span style="line-height: 26px;">3</span>   <span style="line-height: 26px;">1.875764</span>    <span style="line-height: 26px;">-0.416275</span>   <span style="line-height: 26px;">-1.028718</span>   <span style="line-height: 26px;">0.158259</span><br  /><span style="line-height: 26px;">4</span>   <span style="line-height: 26px;">1.644791</span>    <span style="line-height: 26px;">-1.321506</span>   <span style="line-height: 26px;">-0.33742</span><br  /><span style="line-height: 26px;">5</span>   <span style="line-height: 26px;">0.8206895</span>   <span style="line-height: 26px;">0.006391</span>    <span style="line-height: 26px;">-1.447894</span>   <span style="line-height: 26px;">0.506203</span>    <span style="line-height: 26px;">0.977295</span><br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 通过dict字典</span><br  />df2 = pd.DataFrame({ <span style="color: rgb(166, 226, 46);line-height: 26px;">'A'</span> : <span style="line-height: 26px;">1.</span>,<br  /><span style="color: rgb(166, 226, 46);line-height: 26px;">'B'</span> : pd.Timestamp(<span style="color: rgb(166, 226, 46);line-height: 26px;">'20130102'</span>),                                                <br  /><span style="color: rgb(166, 226, 46);line-height: 26px;">'C'</span> :pd.Series(<span style="line-height: 26px;">1</span>,index=list(range(<span style="line-height: 26px;">4</span>)),dtype=<span style="color: rgb(166, 226, 46);line-height: 26px;">'float32'</span>), <br  /><span style="color: rgb(166, 226, 46);line-height: 26px;">'D'</span> : np.array([<span style="line-height: 26px;">3</span>] * <span style="line-height: 26px;">4</span>,dtype=<span style="color: rgb(166, 226, 46);line-height: 26px;">'int32'</span>),                                          <br  /><span style="color: rgb(166, 226, 46);line-height: 26px;">'E'</span> : pd.Categorical([<span style="color: rgb(166, 226, 46);line-height: 26px;">"test"</span>,<span style="color: rgb(166, 226, 46);line-height: 26px;">"train"</span>,<span style="color: rgb(166, 226, 46);line-height: 26px;">"test"</span>,<span style="color: rgb(166, 226, 46);line-height: 26px;">"train"</span>]),                     <br  /><span style="color: rgb(166, 226, 46);line-height: 26px;">'F'</span> : <span style="color: rgb(166, 226, 46);line-height: 26px;">'foo'</span> })<br  />print(df2)<br  /><br  />    A   B   C   D   E   F<br  /><span style="line-height: 26px;">0</span>   <span style="line-height: 26px;">1.0</span> <span style="line-height: 26px;">2013-01-02</span>  <span style="line-height: 26px;">1.0</span> <span style="line-height: 26px;">3</span>   test    foo<br  /><span style="line-height: 26px;">1</span>   <span style="line-height: 26px;">1.0</span> <span style="line-height: 26px;">2013-01-02</span>  <span style="line-height: 26px;">1.0</span> <span style="line-height: 26px;">3</span>   train   foo<br  /><span style="line-height: 26px;">2</span>   <span style="line-height: 26px;">1.0</span> <span style="line-height: 26px;">2013-01-02</span>  <span style="line-height: 26px;">1.0</span> <span style="line-height: 26px;">3</span>   test    foo<br  /><span style="line-height: 26px;">3</span>   <span style="line-height: 26px;">1.0</span> <span style="line-height: 26px;">2013-01-02</span>  <span style="line-height: 26px;">1.0</span> <span style="line-height: 26px;">3</span>   train   foo</section>

3. 索引

不管是Series对象还是DataFrame对象都有一个对对象相对应的索引,Series的索引类似于每个元素, DataFrame的索引对应着每一行。

查看:在创建对象的时候,每个对象都会初始化一个起始值为0,自增的索引列表, DataFrame同理。

<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(117, 113, 94);line-height: 26px;"># 打印对象的时候,第一列就是索引</span><br  />print(s1)<br  /><span style="line-height: 26px;">0</span>    <span style="line-height: 26px;">1</span><br  /><span style="line-height: 26px;">1</span>    <span style="line-height: 26px;">2</span><br  /><span style="line-height: 26px;">2</span>    <span style="line-height: 26px;">3</span><br  /><span style="line-height: 26px;">3</span>    <span style="line-height: 26px;">4</span><br  />dtype: object<br  /><br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 或者只查看索引, DataFrame同理</span><br  />print(s1.index)</section>


   二、增删查改

这里的增删查改主要基于DataFrame对象,为了有足够数据用于展示,这里选择tushare的数据。
1. tushare安装
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;">pip install tushare</section>
创建数据对象如下:
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">import</span> tushare <span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">as</span> ts<br  />df = ts.get_k_data(<span style="color: rgb(166, 226, 46);line-height: 26px;">"000001"</span>)<br  /></section>
DataFrame 行列,axis 图解:
Pandas进阶大神!从0到100你只差这篇文章!

Pandas进阶大神!从0到100你只差这篇文章!


2. 查询

查看每列的数据类型
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(117, 113, 94);line-height: 26px;"># 查看df数据类型</span><br  />df.dtypes<br  />date       object<br  />open        float64<br  />close        float64<br  />high         float64<br  />low          float64<br  />volume    float64<br  />code       object<br  />dtype: object<br  /></section>
查看指定指定数量的行:head函数默认查看前5行,tail函数默认查看后5行,可以传递指定的数值用于查看指定行数。
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;">查看前<span style="line-height: 26px;">5</span>行<br  />df.head()<br  />date    open    close   high    low volume  code<br  /><span style="line-height: 26px;">0</span>   <span style="line-height: 26px;">2015-12-23</span>  <span style="line-height: 26px;">9.927</span>   <span style="line-height: 26px;">9.935</span>   <span style="line-height: 26px;">10.174</span>  <span style="line-height: 26px;">9.871</span>   <span style="line-height: 26px;">1039018.0</span>   <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">1</span>   <span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">9.919</span>   <span style="line-height: 26px;">9.823</span>   <span style="line-height: 26px;">9.998</span>   <span style="line-height: 26px;">9.744</span>   <span style="line-height: 26px;">640229.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2</span>   <span style="line-height: 26px;">2015-12-25</span>  <span style="line-height: 26px;">9.855</span>   <span style="line-height: 26px;">9.879</span>   <span style="line-height: 26px;">9.927</span>   <span style="line-height: 26px;">9.815</span>   <span style="line-height: 26px;">399845.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">3</span>   <span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">9.895</span>   <span style="line-height: 26px;">9.537</span>   <span style="line-height: 26px;">9.919</span>   <span style="line-height: 26px;">9.537</span>   <span style="line-height: 26px;">822408.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">4</span>   <span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">9.545</span>   <span style="line-height: 26px;">9.624</span>   <span style="line-height: 26px;">9.632</span>   <span style="line-height: 26px;">9.529</span>   <span style="line-height: 26px;">619802.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 查看后5行</span><br  />df.tail()<br  />date    open    close   high    low volume  code<br  /><span style="line-height: 26px;">636</span> <span style="line-height: 26px;">2018-08-01</span>  <span style="line-height: 26px;">9.42</span>    <span style="line-height: 26px;">9.15</span>    <span style="line-height: 26px;">9.50</span>    <span style="line-height: 26px;">9.11</span>    <span style="line-height: 26px;">814081.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">637</span> <span style="line-height: 26px;">2018-08-02</span>  <span style="line-height: 26px;">9.13</span>    <span style="line-height: 26px;">8.94</span>    <span style="line-height: 26px;">9.15</span>    <span style="line-height: 26px;">8.88</span>    <span style="line-height: 26px;">931401.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">638</span> <span style="line-height: 26px;">2018-08-03</span>  <span style="line-height: 26px;">8.93</span>    <span style="line-height: 26px;">8.91</span>    <span style="line-height: 26px;">9.10</span>    <span style="line-height: 26px;">8.91</span>    <span style="line-height: 26px;">476546.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">639</span> <span style="line-height: 26px;">2018-08-06</span>  <span style="line-height: 26px;">8.94</span>    <span style="line-height: 26px;">8.94</span>    <span style="line-height: 26px;">9.11</span>    <span style="line-height: 26px;">8.89</span>    <span style="line-height: 26px;">554010.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">640</span> <span style="line-height: 26px;">2018-08-07</span>  <span style="line-height: 26px;">8.96</span>    <span style="line-height: 26px;">9.17</span>    <span style="line-height: 26px;">9.17</span>    <span style="line-height: 26px;">8.88</span>    <span style="line-height: 26px;">690423.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 查看前10行</span><br  />df.head(<span style="line-height: 26px;">10</span>)date    open    close   high    low volume  code<br  /><span style="line-height: 26px;">0</span>   <span style="line-height: 26px;">2015-12-23</span>  <span style="line-height: 26px;">9.927</span>   <span style="line-height: 26px;">9.935</span>   <span style="line-height: 26px;">10.174</span>  <span style="line-height: 26px;">9.871</span>   <span style="line-height: 26px;">1039018.0</span>   <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">1</span>   <span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">9.919</span>   <span style="line-height: 26px;">9.823</span>   <span style="line-height: 26px;">9.998</span>   <span style="line-height: 26px;">9.744</span>   <span style="line-height: 26px;">640229.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2</span>   <span style="line-height: 26px;">2015-12-25</span>  <span style="line-height: 26px;">9.855</span>   <span style="line-height: 26px;">9.879</span>   <span style="line-height: 26px;">9.927</span>   <span style="line-height: 26px;">9.815</span>   <span style="line-height: 26px;">399845.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">3</span>   <span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">9.895</span>   <span style="line-height: 26px;">9.537</span>   <span style="line-height: 26px;">9.919</span>   <span style="line-height: 26px;">9.537</span>   <span style="line-height: 26px;">822408.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">4</span>   <span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">9.545</span>   <span style="line-height: 26px;">9.624</span>   <span style="line-height: 26px;">9.632</span>   <span style="line-height: 26px;">9.529</span>   <span style="line-height: 26px;">619802.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">5</span>   <span style="line-height: 26px;">2015-12-30</span>  <span style="line-height: 26px;">9.624</span>   <span style="line-height: 26px;">9.632</span>   <span style="line-height: 26px;">9.640</span>   <span style="line-height: 26px;">9.513</span>   <span style="line-height: 26px;">532667.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">6</span>   <span style="line-height: 26px;">2015-12-31</span>  <span style="line-height: 26px;">9.632</span>   <span style="line-height: 26px;">9.545</span>   <span style="line-height: 26px;">9.656</span>   <span style="line-height: 26px;">9.537</span>   <span style="line-height: 26px;">491258.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">7</span>   <span style="line-height: 26px;">2016-01-04</span>  <span style="line-height: 26px;">9.553</span>   <span style="line-height: 26px;">8.995</span>   <span style="line-height: 26px;">9.577</span>   <span style="line-height: 26px;">8.940</span>   <span style="line-height: 26px;">563497.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">8</span>   <span style="line-height: 26px;">2016-01-05</span>  <span style="line-height: 26px;">8.972</span>   <span style="line-height: 26px;">9.075</span>   <span style="line-height: 26px;">9.210</span>   <span style="line-height: 26px;">8.876</span>   <span style="line-height: 26px;">663269.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">9</span>   <span style="line-height: 26px;">2016-01-06</span>  <span style="line-height: 26px;">9.091</span>   <span style="line-height: 26px;">9.179</span>   <span style="line-height: 26px;">9.202</span>   <span style="line-height: 26px;">9.067</span>   <span style="line-height: 26px;">515706.0</span>    <span style="line-height: 26px;">000001</span><br  /></section>
查看某一行或多行,某一列或多列
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(117, 113, 94);line-height: 26px;"># 查看第一行</span><br  />df[<span style="line-height: 26px;">0</span>:<span style="line-height: 26px;">1</span>]<br  />date    open    close   high    low volume  code<br  /><span style="line-height: 26px;">0</span>   <span style="line-height: 26px;">2015-12-23</span>  <span style="line-height: 26px;">9.927</span>   <span style="line-height: 26px;">9.935</span>   <span style="line-height: 26px;">10.174</span>  <span style="line-height: 26px;">9.871</span>   <span style="line-height: 26px;">1039018.0</span>   <span style="line-height: 26px;">000001</span><br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 查看 10到20行</span><br  />df[<span style="line-height: 26px;">10</span>:<span style="line-height: 26px;">21</span>]<br  />date    open    close   high    low volume  code<br  /><span style="line-height: 26px;">10</span>  <span style="line-height: 26px;">2016-01-07</span>  <span style="line-height: 26px;">9.083</span>   <span style="line-height: 26px;">8.709</span>   <span style="line-height: 26px;">9.083</span>   <span style="line-height: 26px;">8.685</span>   <span style="line-height: 26px;">174761.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">11</span>  <span style="line-height: 26px;">2016-01-08</span>  <span style="line-height: 26px;">8.924</span>   <span style="line-height: 26px;">8.852</span>   <span style="line-height: 26px;">8.987</span>   <span style="line-height: 26px;">8.677</span>   <span style="line-height: 26px;">747527.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">12</span>  <span style="line-height: 26px;">2016-01-11</span>  <span style="line-height: 26px;">8.757</span>   <span style="line-height: 26px;">8.566</span>   <span style="line-height: 26px;">8.820</span>   <span style="line-height: 26px;">8.502</span>   <span style="line-height: 26px;">732013.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">13</span>  <span style="line-height: 26px;">2016-01-12</span>  <span style="line-height: 26px;">8.621</span>   <span style="line-height: 26px;">8.605</span>   <span style="line-height: 26px;">8.685</span>   <span style="line-height: 26px;">8.470</span>   <span style="line-height: 26px;">561642.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">14</span>  <span style="line-height: 26px;">2016-01-13</span>  <span style="line-height: 26px;">8.669</span>   <span style="line-height: 26px;">8.526</span>   <span style="line-height: 26px;">8.709</span>   <span style="line-height: 26px;">8.518</span>   <span style="line-height: 26px;">391709.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">15</span>  <span style="line-height: 26px;">2016-01-14</span>  <span style="line-height: 26px;">8.430</span>   <span style="line-height: 26px;">8.574</span>   <span style="line-height: 26px;">8.597</span>   <span style="line-height: 26px;">8.343</span>   <span style="line-height: 26px;">666314.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">16</span>  <span style="line-height: 26px;">2016-01-15</span>  <span style="line-height: 26px;">8.486</span>   <span style="line-height: 26px;">8.327</span>   <span style="line-height: 26px;">8.597</span>   <span style="line-height: 26px;">8.295</span>   <span style="line-height: 26px;">448202.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">17</span>  <span style="line-height: 26px;">2016-01-18</span>  <span style="line-height: 26px;">8.231</span>   <span style="line-height: 26px;">8.287</span>   <span style="line-height: 26px;">8.406</span>   <span style="line-height: 26px;">8.199</span>   <span style="line-height: 26px;">421040.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">18</span>  <span style="line-height: 26px;">2016-01-19</span>  <span style="line-height: 26px;">8.319</span>   <span style="line-height: 26px;">8.526</span>   <span style="line-height: 26px;">8.582</span>   <span style="line-height: 26px;">8.287</span>   <span style="line-height: 26px;">501109.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">19</span>  <span style="line-height: 26px;">2016-01-20</span>  <span style="line-height: 26px;">8.518</span>   <span style="line-height: 26px;">8.390</span>   <span style="line-height: 26px;">8.597</span>   <span style="line-height: 26px;">8.311</span>   <span style="line-height: 26px;">603752.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">20</span>  <span style="line-height: 26px;">2016-01-21</span>  <span style="line-height: 26px;">8.343</span>   <span style="line-height: 26px;">8.215</span>   <span style="line-height: 26px;">8.558</span>   <span style="line-height: 26px;">8.215</span>   <span style="line-height: 26px;">606145.0</span>    <span style="line-height: 26px;">000001</span><br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 查看看Date列前5个数据</span><br  />df[<span style="color: rgb(166, 226, 46);line-height: 26px;">"date"</span>].head() <span style="color: rgb(117, 113, 94);line-height: 26px;"># 或者df.date.head()</span><br  /><span style="line-height: 26px;">0</span>    <span style="line-height: 26px;">2015-12-23</span><br  /><span style="line-height: 26px;">1</span>    <span style="line-height: 26px;">2015-12-24</span><br  /><span style="line-height: 26px;">2</span>    <span style="line-height: 26px;">2015-12-25</span><br  /><span style="line-height: 26px;">3</span>    <span style="line-height: 26px;">2015-12-28</span><br  /><span style="line-height: 26px;">4</span>    <span style="line-height: 26px;">2015-12-29</span><br  />Name: date, dtype: object<br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 查看看Date列,code列, open列前5个数据</span><br  />df[[<span style="color: rgb(166, 226, 46);line-height: 26px;">"date"</span>,<span style="color: rgb(166, 226, 46);line-height: 26px;">"code"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"open"</span>]].head()<br  />date    code    open<br  /><span style="line-height: 26px;">0</span>   <span style="line-height: 26px;">2015-12-23</span>  <span style="line-height: 26px;">000001</span>  <span style="line-height: 26px;">9.927</span><br  /><span style="line-height: 26px;">1</span>   <span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">000001</span>  <span style="line-height: 26px;">9.919</span><br  /><span style="line-height: 26px;">2</span>   <span style="line-height: 26px;">2015-12-25</span>  <span style="line-height: 26px;">000001</span>  <span style="line-height: 26px;">9.855</span><br  /><span style="line-height: 26px;">3</span>   <span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">000001</span>  <span style="line-height: 26px;">9.895</span><br  /><span style="line-height: 26px;">4</span>   <span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">000001</span>  <span style="line-height: 26px;">9.545</span><br  /></section>
使用行列组合条件查询
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(117, 113, 94);line-height: 26px;"># 查看date, code列的第10行</span><br  />df.loc[<span style="line-height: 26px;">10</span>, [<span style="color: rgb(166, 226, 46);line-height: 26px;">"date"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"code"</span>]]<br  /><br  />date    <span style="line-height: 26px;">2016-01-07</span><br  />code        <span style="line-height: 26px;">000001</span><br  />Name: <span style="line-height: 26px;">10</span>, dtype: object<br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 查看date, code列的第10行到20行</span><br  />df.loc[<span style="line-height: 26px;">10</span>:<span style="line-height: 26px;">20</span>, [<span style="color: rgb(166, 226, 46);line-height: 26px;">"date"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"code"</span>]]<br  /><br  />date    code<br  /><span style="line-height: 26px;">10</span>  <span style="line-height: 26px;">2016-01-07</span>  <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">11</span>  <span style="line-height: 26px;">2016-01-08</span>  <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">12</span>  <span style="line-height: 26px;">2016-01-11</span>  <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">13</span>  <span style="line-height: 26px;">2016-01-12</span>  <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">14</span>  <span style="line-height: 26px;">2016-01-13</span>  <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">15</span>  <span style="line-height: 26px;">2016-01-14</span>  <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">16</span>  <span style="line-height: 26px;">2016-01-15</span>  <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">17</span>  <span style="line-height: 26px;">2016-01-18</span>  <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">18</span>  <span style="line-height: 26px;">2016-01-19</span>  <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">19</span>  <span style="line-height: 26px;">2016-01-20</span>  <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">20</span>  <span style="line-height: 26px;">2016-01-21</span>  <span style="line-height: 26px;">000001</span><br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 查看第一行,open列的数据</span><br  />df.loc[<span style="line-height: 26px;">0</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"open"</span>]<br  /><span style="line-height: 26px;">9.9269999999999996</span><br  /></section>
通过位置查询:值得注意的是上面的索引值就是特定的位置。
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(117, 113, 94);line-height: 26px;"># 查看第1行()</span><br  />df.iloc[<span style="line-height: 26px;">0</span>]<br  />date      <span style="line-height: 26px;">2015-12-24</span><br  />open           <span style="line-height: 26px;">9.919</span><br  />close          <span style="line-height: 26px;">9.823</span><br  />high           <span style="line-height: 26px;">9.998</span><br  />low            <span style="line-height: 26px;">9.744</span><br  />volume        <span style="line-height: 26px;">640229</span><br  />code          <span style="line-height: 26px;">000001</span><br  />Name: <span style="line-height: 26px;">0</span>, dtype: object<br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 查看最后一行</span><br  />df.iloc[<span style="line-height: 26px;">-1</span>]<br  />date      <span style="line-height: 26px;">2018-08-08</span><br  />open            <span style="line-height: 26px;">9.16</span><br  />close           <span style="line-height: 26px;">9.12</span><br  />high            <span style="line-height: 26px;">9.16</span><br  />low              <span style="line-height: 26px;">9.1</span><br  />volume         <span style="line-height: 26px;">29985</span><br  />code          <span style="line-height: 26px;">000001</span><br  />Name: <span style="line-height: 26px;">640</span>, dtype: object<br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 查看第一列,前5个数值</span><br  />df.iloc[:,<span style="line-height: 26px;">0</span>].head()<br  /><span style="line-height: 26px;">0</span>    <span style="line-height: 26px;">2015-12-24</span><br  /><span style="line-height: 26px;">1</span>    <span style="line-height: 26px;">2015-12-25</span><br  /><span style="line-height: 26px;">2</span>    <span style="line-height: 26px;">2015-12-28</span><br  /><span style="line-height: 26px;">3</span>    <span style="line-height: 26px;">2015-12-29</span><br  /><span style="line-height: 26px;">4</span>    <span style="line-height: 26px;">2015-12-30</span><br  />Name: date, dtype: object<br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 查看前2到4行,第1,3列</span><br  />df.iloc[<span style="line-height: 26px;">2</span>:<span style="line-height: 26px;">4</span>,[<span style="line-height: 26px;">0</span>,<span style="line-height: 26px;">2</span>]]<br  /><br  />date    close<br  /><span style="line-height: 26px;">2</span>   <span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">9.537</span><br  /><span style="line-height: 26px;">3</span>   <span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">9.624</span><br  /></section>
通过条件筛选:
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;">查看open列大于<span style="line-height: 26px;">10</span>的前<span style="line-height: 26px;">5</span>行<br  />df[df.open > <span style="line-height: 26px;">10</span>].head()<br  /><br  />date    open    close   high    low volume  code<br  /><span style="line-height: 26px;">378</span> <span style="line-height: 26px;">2017-07-14</span>  <span style="line-height: 26px;">10.483</span>  <span style="line-height: 26px;">10.570</span>  <span style="line-height: 26px;">10.609</span>  <span style="line-height: 26px;">10.337</span>  <span style="line-height: 26px;">1722570.0</span>   <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">379</span> <span style="line-height: 26px;">2017-07-17</span>  <span style="line-height: 26px;">10.619</span>  <span style="line-height: 26px;">10.483</span>  <span style="line-height: 26px;">10.987</span>  <span style="line-height: 26px;">10.396</span>  <span style="line-height: 26px;">3273123.0</span>   <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">380</span> <span style="line-height: 26px;">2017-07-18</span>  <span style="line-height: 26px;">10.425</span>  <span style="line-height: 26px;">10.716</span>  <span style="line-height: 26px;">10.803</span>  <span style="line-height: 26px;">10.299</span>  <span style="line-height: 26px;">2349431.0</span>   <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">381</span> <span style="line-height: 26px;">2017-07-19</span>  <span style="line-height: 26px;">10.657</span>  <span style="line-height: 26px;">10.754</span>  <span style="line-height: 26px;">10.851</span>  <span style="line-height: 26px;">10.551</span>  <span style="line-height: 26px;">1933075.0</span>   <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">382</span> <span style="line-height: 26px;">2017-07-20</span>  <span style="line-height: 26px;">10.745</span>  <span style="line-height: 26px;">10.638</span>  <span style="line-height: 26px;">10.880</span>  <span style="line-height: 26px;">10.580</span>  <span style="line-height: 26px;">1537338.0</span>   <span style="line-height: 26px;">000001</span><br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 查看open列大于10且open列小于10.6的前五行</span><br  />df[(df.open > <span style="line-height: 26px;">10</span>) & (df.open < <span style="line-height: 26px;">10.6</span>)].head()<br  />date    open    close   high    low volume  code<br  /><span style="line-height: 26px;">378</span> <span style="line-height: 26px;">2017-07-14</span>  <span style="line-height: 26px;">10.483</span>  <span style="line-height: 26px;">10.570</span>  <span style="line-height: 26px;">10.609</span>  <span style="line-height: 26px;">10.337</span>  <span style="line-height: 26px;">1722570.0</span>   <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">380</span> <span style="line-height: 26px;">2017-07-18</span>  <span style="line-height: 26px;">10.425</span>  <span style="line-height: 26px;">10.716</span>  <span style="line-height: 26px;">10.803</span>  <span style="line-height: 26px;">10.299</span>  <span style="line-height: 26px;">2349431.0</span>   <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">387</span> <span style="line-height: 26px;">2017-07-27</span>  <span style="line-height: 26px;">10.550</span>  <span style="line-height: 26px;">10.422</span>  <span style="line-height: 26px;">10.599</span>  <span style="line-height: 26px;">10.363</span>  <span style="line-height: 26px;">1194490.0</span>   <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">388</span> <span style="line-height: 26px;">2017-07-28</span>  <span style="line-height: 26px;">10.441</span>  <span style="line-height: 26px;">10.569</span>  <span style="line-height: 26px;">10.638</span>  <span style="line-height: 26px;">10.412</span>  <span style="line-height: 26px;">819195.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">390</span> <span style="line-height: 26px;">2017-08-01</span>  <span style="line-height: 26px;">10.471</span>  <span style="line-height: 26px;">10.865</span>  <span style="line-height: 26px;">10.904</span>  <span style="line-height: 26px;">10.432</span>  <span style="line-height: 26px;">2035709.0</span>   <span style="line-height: 26px;">000001</span> <br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 查看open列大于10或open列小于10.6的前五行</span><br  />df[(df.open > <span style="line-height: 26px;">10</span>) | (df.open < <span style="line-height: 26px;">10.6</span>)].head()<br  />date    open    close   high    low volume  code<br  /><span style="line-height: 26px;">0</span>   <span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">9.919</span>   <span style="line-height: 26px;">9.823</span>   <span style="line-height: 26px;">9.998</span>   <span style="line-height: 26px;">9.744</span>   <span style="line-height: 26px;">640229.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">1</span>   <span style="line-height: 26px;">2015-12-25</span>  <span style="line-height: 26px;">9.855</span>   <span style="line-height: 26px;">9.879</span>   <span style="line-height: 26px;">9.927</span>   <span style="line-height: 26px;">9.815</span>   <span style="line-height: 26px;">399845.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2</span>   <span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">9.895</span>   <span style="line-height: 26px;">9.537</span>   <span style="line-height: 26px;">9.919</span>   <span style="line-height: 26px;">9.537</span>   <span style="line-height: 26px;">822408.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">3</span>   <span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">9.545</span>   <span style="line-height: 26px;">9.624</span>   <span style="line-height: 26px;">9.632</span>   <span style="line-height: 26px;">9.529</span>   <span style="line-height: 26px;">619802.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">4</span>   <span style="line-height: 26px;">2015-12-30</span>  <span style="line-height: 26px;">9.624</span>   <span style="line-height: 26px;">9.632</span>   <span style="line-height: 26px;">9.640</span>   <span style="line-height: 26px;">9.513</span>   <span style="line-height: 26px;">532667.0</span>    <span style="line-height: 26px;">000001</span><br  /></section>
3. 增加
在前面已经简单的说明Series, DataFrame的创建,这里说一些常用有用的创建方式。
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(117, 113, 94);line-height: 26px;"># 创建2018-08-08到2018-08-15的时间序列,默认时间间隔为Day</span><br  />s2 = pd.date_range(<span style="color: rgb(166, 226, 46);line-height: 26px;">"20180808"</span>, periods=<span style="line-height: 26px;">7</span>)<br  />print(s2)<br  /><br  />DatetimeIndex([<span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08'</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-09'</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-10'</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-11'</span>,<br  /><span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-12'</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-13'</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-14'</span>],                               <br  />               dtype=<span style="color: rgb(166, 226, 46);line-height: 26px;">'datetime64[ns]'</span>, freq=<span style="color: rgb(166, 226, 46);line-height: 26px;">'D'</span>)<br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 指定2018-08-08 00:00 到2018-08-09 00:00 时间间隔为小时</span><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># freq参数可使用参数, 参考: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases</span><br  /> s3 = pd.date_range(<span style="color: rgb(166, 226, 46);line-height: 26px;">"20180808"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"20180809"</span>, freq=<span style="color: rgb(166, 226, 46);line-height: 26px;">"H"</span>)<br  />print(s2)<br  /><br  />DatetimeIndex([<span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 00:00:00'</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 01:00:00'</span>,<br  /><span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 02:00:00'</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 03:00:00'</span>,<br  /><span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 04:00:00'</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 05:00:00'</span>,<br  /><span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 06:00:00'</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 07:00:00'</span>,<br  /><span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 08:00:00'</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 09:00:00'</span>,<br  /><span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 10:00:00'</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 11:00:00'</span>,<br  /><span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 12:00:00'</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 13:00:00'</span>,<br  /><span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 14:00:00'</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 15:00:00'</span>,<br  /><span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 16:00:00'</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 17:00:00'</span>,<br  /><span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 18:00:00'</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 19:00:00'</span>,<br  /><span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 20:00:00'</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 21:00:00'</span>,<br  /><span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 22:00:00'</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-08 23:00:00'</span>,<br  /><span style="color: rgb(166, 226, 46);line-height: 26px;">'2018-08-09 00:00:00'</span>],<br  />               dtype=<span style="color: rgb(166, 226, 46);line-height: 26px;">'datetime64[ns]'</span>, freq=<span style="color: rgb(166, 226, 46);line-height: 26px;">'H'</span>)<br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 通过已有序列创建时间序列</span><br  />s4 = pd.to_datetime(df.date.head())<br  />print(s4)<br  /><br  /><span style="line-height: 26px;">0</span>   <span style="line-height: 26px;">2015-12-24</span><br  /><span style="line-height: 26px;">1</span>   <span style="line-height: 26px;">2015-12-25</span><br  /><span style="line-height: 26px;">2</span>   <span style="line-height: 26px;">2015-12-28</span><br  /><span style="line-height: 26px;">3</span>   <span style="line-height: 26px;">2015-12-29</span><br  /><span style="line-height: 26px;">4</span>   <span style="line-height: 26px;">2015-12-30</span><br  />Name: date, dtype: datetime64[ns]</section>

4. 修改

<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(117, 113, 94);line-height: 26px;"># 将df 的索引修改为date列的数据,并且将类型转换为datetime类型</span><br  />df.index = pd.to_datetime(df.date)<br  />df.head()<br  /><br  />    date    open    close   high    low volume  code     date <br  /><span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">9.919</span>   <span style="line-height: 26px;">9.823</span>   <span style="line-height: 26px;">9.998</span>   <span style="line-height: 26px;">9.744</span>   <span style="line-height: 26px;">640229.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-25</span>  <span style="line-height: 26px;">2015-12-25</span>  <span style="line-height: 26px;">9.855</span>   <span style="line-height: 26px;">9.879</span>   <span style="line-height: 26px;">9.927</span>   <span style="line-height: 26px;">9.815</span>   <span style="line-height: 26px;">399845.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">9.895</span>   <span style="line-height: 26px;">9.537</span>   <span style="line-height: 26px;">9.919</span>   <span style="line-height: 26px;">9.537</span>   <span style="line-height: 26px;">822408.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">9.545</span>   <span style="line-height: 26px;">9.624</span>   <span style="line-height: 26px;">9.632</span>   <span style="line-height: 26px;">9.529</span>   <span style="line-height: 26px;">619802.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-30</span>  <span style="line-height: 26px;">2015-12-30</span>  <span style="line-height: 26px;">9.624</span>   <span style="line-height: 26px;">9.632</span>   <span style="line-height: 26px;">9.640</span>   <span style="line-height: 26px;">9.513</span>   <span style="line-height: 26px;">532667.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 修改列的字段</span><br  />df.columns = [<span style="color: rgb(166, 226, 46);line-height: 26px;">"Date"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"Open"</span>,<span style="color: rgb(166, 226, 46);line-height: 26px;">"Close"</span>,<span style="color: rgb(166, 226, 46);line-height: 26px;">"High"</span>,<span style="color: rgb(166, 226, 46);line-height: 26px;">"Low"</span>,<span style="color: rgb(166, 226, 46);line-height: 26px;">"Volume"</span>,<span style="color: rgb(166, 226, 46);line-height: 26px;">"Code"</span>]<br  />print(df.head())<br  /><br  /> Date   Open  Close   High    Low    Volume    Code     date<br  /><span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">9.919</span>  <span style="line-height: 26px;">9.823</span>  <span style="line-height: 26px;">9.998</span>  <span style="line-height: 26px;">9.744</span>   <span style="line-height: 26px;">640229.0</span>  <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-25</span>  <span style="line-height: 26px;">2015-12-25</span>  <span style="line-height: 26px;">9.855</span>  <span style="line-height: 26px;">9.879</span>  <span style="line-height: 26px;">9.927</span>  <span style="line-height: 26px;">9.815</span>   <span style="line-height: 26px;">399845.0</span>  <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">9.895</span>  <span style="line-height: 26px;">9.537</span>  <span style="line-height: 26px;">9.919</span>  <span style="line-height: 26px;">9.537</span>  <span style="line-height: 26px;">822408.0</span>  <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">9.545</span>  <span style="line-height: 26px;">9.624</span>  <span style="line-height: 26px;">9.632</span>  <span style="line-height: 26px;">9.529</span>  <span style="line-height: 26px;">619802.0</span>  <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-30</span>  <span style="line-height: 26px;">2015-12-30</span>  <span style="line-height: 26px;">9.624</span>  <span style="line-height: 26px;">9.632</span>  <span style="line-height: 26px;">9.640</span>  <span style="line-height: 26px;">9.513</span>  <span style="line-height: 26px;">532667.0</span>  <span style="line-height: 26px;">000001</span><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 将Open列每个数值加1, apply方法并不直接修改源数据,所以需要将新值复制给df</span><br  />df.Open = df.Open.apply(<span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">lambda</span> x: x+<span style="line-height: 26px;">1</span>)<br  />df.head()<br  /><br  />  Date    Open    Close   High    Low Volume   Code    date<br  /><span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">10.919</span>  <span style="line-height: 26px;">9.823</span>   <span style="line-height: 26px;">9.998</span>   <span style="line-height: 26px;">9.744</span>   <span style="line-height: 26px;">640229.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-25</span>  <span style="line-height: 26px;">2015-12-25</span>  <span style="line-height: 26px;">10.855</span>  <span style="line-height: 26px;">9.879</span>   <span style="line-height: 26px;">9.927</span>   <span style="line-height: 26px;">9.815</span>   <span style="line-height: 26px;">399845.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">10.895</span>  <span style="line-height: 26px;">9.537</span>   <span style="line-height: 26px;">9.919</span>   <span style="line-height: 26px;">9.537</span>   <span style="line-height: 26px;">822408.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">10.545</span>  <span style="line-height: 26px;">9.624</span>   <span style="line-height: 26px;">9.632</span>   <span style="line-height: 26px;">9.529</span>   <span style="line-height: 26px;">619802.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-30</span>  <span style="line-height: 26px;">2015-12-30</span>  <span style="line-height: 26px;">10.624</span>  <span style="line-height: 26px;">9.632</span>   <span style="line-height: 26px;">9.640</span>   <span style="line-height: 26px;">9.513</span>   <span style="line-height: 26px;">532667.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 将Open,Close列都数值上加1,如果多列,apply接收的对象是整个列</span><br  />df[[<span style="color: rgb(166, 226, 46);line-height: 26px;">"Open"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"Close"</span>]].head().apply(<span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">lambda</span> x: x.apply(<span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">lambda</span> x: x+<span style="line-height: 26px;">1</span>))<br  /><br  />            Open    Close<br  />date        <br  /><span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">11.919</span>  <span style="line-height: 26px;">10.823</span><br  /><span style="line-height: 26px;">2015-12-25</span>  <span style="line-height: 26px;">11.855</span>  <span style="line-height: 26px;">10.879</span><br  /><span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">11.895</span>  <span style="line-height: 26px;">10.537</span><br  /><span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">11.545</span>  <span style="line-height: 26px;">10.624</span><br  /><span style="line-height: 26px;">2015-12-30</span>  <span style="line-height: 26px;">11.624</span>  <span style="line-height: 26px;">10.632</span></section>


5. 删除

通过drop方法drop指定的行或者列。
注意: drop方法并不直接修改源数据,如果需要使源dataframe对象被修改,需要传入inplace=True,通过之前的axis图解,知道行的值(或者说label)在axis=0,列的值(或者说label)在axis=1。
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(117, 113, 94);line-height: 26px;"># 删除指定列,删除Open列</span><br  />df.drop(<span style="color: rgb(166, 226, 46);line-height: 26px;">"Open"</span>, axis=<span style="line-height: 26px;">1</span>).head() <span style="color: rgb(117, 113, 94);line-height: 26px;">#或者df.drop(df.columns[1]) </span><br  /><br  />   Date    Close   High      Low Volume     Code       date        <br  /><br  /><span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">9.823</span>   <span style="line-height: 26px;">9.998</span>   <span style="line-height: 26px;">9.744</span>   <span style="line-height: 26px;">640229.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-25</span>  <span style="line-height: 26px;">2015-12-25</span>  <span style="line-height: 26px;">9.879</span>   <span style="line-height: 26px;">9.927</span>   <span style="line-height: 26px;">9.815</span>   <span style="line-height: 26px;">399845.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">9.537</span>   <span style="line-height: 26px;">9.919</span>   <span style="line-height: 26px;">9.537</span>   <span style="line-height: 26px;">822408.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">9.624</span>   <span style="line-height: 26px;">9.632</span>   <span style="line-height: 26px;">9.529</span>   <span style="line-height: 26px;">619802.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-30</span>  <span style="line-height: 26px;">2015-12-30</span>  <span style="line-height: 26px;">9.632</span>   <span style="line-height: 26px;">9.640</span>   <span style="line-height: 26px;">9.513</span>   <span style="line-height: 26px;">532667.0</span>    <span style="line-height: 26px;">000001</span><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 删除第1,3列. 即Open,High列</span><br  />df.drop(df.columns[[<span style="line-height: 26px;">1</span>,<span style="line-height: 26px;">3</span>]], axis=<span style="line-height: 26px;">1</span>).head() <span style="color: rgb(117, 113, 94);line-height: 26px;"># 或df.drop(["Open", "High], axis=1).head()</span><br  />        Date    Close      Low Volume       Code         date <br  /><span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">9.823</span>   <span style="line-height: 26px;">9.744</span>   <span style="line-height: 26px;">640229.0</span>    <span style="line-height: 26px;">000001</span> <br  /><span style="line-height: 26px;">2015-12-25</span>  <span style="line-height: 26px;">2015-12-25</span>  <span style="line-height: 26px;">9.879</span>   <span style="line-height: 26px;">9.815</span>   <span style="line-height: 26px;">399845.0</span>    <span style="line-height: 26px;">000001</span> <br  /><span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">9.537</span>   <span style="line-height: 26px;">9.537</span>   <span style="line-height: 26px;">822408.0</span>    <span style="line-height: 26px;">000001</span> <br  /><span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">9.624</span>   <span style="line-height: 26px;">9.529</span>   <span style="line-height: 26px;">619802.0</span>    <span style="line-height: 26px;">000001</span> <br  /><span style="line-height: 26px;">2015-12-30</span>  <span style="line-height: 26px;">2015-12-30</span>  <span style="line-height: 26px;">9.632</span>   <span style="line-height: 26px;">9.513</span>   <span style="line-height: 26px;">532667.0</span>    <span style="line-height: 26px;">000001</span></section>

 

   三、pandas常用函数


1. 统计
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(117, 113, 94);line-height: 26px;"># descibe方法会计算每列数据对象是数值的count, mean, std, min, max, 以及一定比率的值</span><br  />df.describe()     <br  /><br  />Open    Close   High    Low Volume<br  />count   <span style="line-height: 26px;">641.0000</span>    <span style="line-height: 26px;">641.0000</span>    <span style="line-height: 26px;">641.0000</span>    <span style="line-height: 26px;">641.0000</span>    <span style="line-height: 26px;">641.0000</span><br  />mean    <span style="line-height: 26px;">10.7862</span> <span style="line-height: 26px;">9.7927</span>  <span style="line-height: 26px;">9.8942</span>  <span style="line-height: 26px;">9.6863</span>  <span style="line-height: 26px;">833968.6162</span><br  />std <span style="line-height: 26px;">1.5962</span>  <span style="line-height: 26px;">1.6021</span>  <span style="line-height: 26px;">1.6620</span>  <span style="line-height: 26px;">1.5424</span>  <span style="line-height: 26px;">607731.6934</span><br  />min <span style="line-height: 26px;">8.6580</span>  <span style="line-height: 26px;">7.6100</span>  <span style="line-height: 26px;">7.7770</span>  <span style="line-height: 26px;">7.4990</span>  <span style="line-height: 26px;">153901.0000</span><br  /><span style="line-height: 26px;">25</span>% <span style="line-height: 26px;">9.7080</span>  <span style="line-height: 26px;">8.7180</span>  <span style="line-height: 26px;">8.7760</span>  <span style="line-height: 26px;">8.6500</span>  <span style="line-height: 26px;">418387.0000</span><br  /><span style="line-height: 26px;">50</span>% <span style="line-height: 26px;">10.0770</span> <span style="line-height: 26px;">9.0960</span>  <span style="line-height: 26px;">9.1450</span>  <span style="line-height: 26px;">8.9990</span>  <span style="line-height: 26px;">627656.0000</span><br  /><span style="line-height: 26px;">75</span>% <span style="line-height: 26px;">11.8550</span> <span style="line-height: 26px;">10.8350</span> <span style="line-height: 26px;">10.9920</span> <span style="line-height: 26px;">10.7270</span> <span style="line-height: 26px;">1039297.0000</span><br  />max <span style="line-height: 26px;">15.9090</span> <span style="line-height: 26px;">14.8600</span> <span style="line-height: 26px;">14.9980</span> <span style="line-height: 26px;">14.4470</span> <span style="line-height: 26px;">4262825.0000</span><br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 单独统计Open列的平均值</span><br  />df.Open.mean()<br  /><span style="line-height: 26px;">10.786248049922001</span><br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 查看居于95%的值, 默认线性拟合</span><br  />df.Open.quantile(<span style="line-height: 26px;">0.95</span>)<br  /><span style="line-height: 26px;">14.187</span><br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 查看Open列每个值出现的次数</span><br  />df.Open.value_counts().head()<br  /><br  /><span style="line-height: 26px;">9.8050</span>    <span style="line-height: 26px;">12</span><br  /><span style="line-height: 26px;">9.8630</span>    <span style="line-height: 26px;">10</span><br  /><span style="line-height: 26px;">9.8440</span>    <span style="line-height: 26px;">10</span><br  /><span style="line-height: 26px;">9.8730</span>    <span style="line-height: 26px;">10</span><br  /><span style="line-height: 26px;">9.8830</span>     <span style="line-height: 26px;">8</span><br  />Name: Open, dtype: int64<br  /></section>

2. 缺失值处理
删除或者填充缺失值。
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(117, 113, 94);line-height: 26px;"># 删除含有NaN的任意行</span><br  />df.dropna(how=<span style="color: rgb(166, 226, 46);line-height: 26px;">'any'</span>)<br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 删除含有NaN的任意列</span><br  />df.dropna(how=<span style="color: rgb(166, 226, 46);line-height: 26px;">'any'</span>, axis=<span style="line-height: 26px;">1</span>)<br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 将NaN的值改为5</span><br  />df.fillna(value=<span style="line-height: 26px;">5</span>)</section>


3. 排序

按行或者列排序, 默认也不修改源数据。
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(117, 113, 94);line-height: 26px;"># 按列排序</span><br  />df.sort_index(axis=<span style="line-height: 26px;">1</span>).head()<br  /><br  />Close   Code    Date    High    Low Open    Volume<br  />date<br  /><span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">9.8230</span>  <span style="line-height: 26px;">000001</span>  <span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">9.9980</span>  <span style="line-height: 26px;">9.7440</span>  <span style="line-height: 26px;">10.9190</span> <span style="line-height: 26px;">640229.0000</span><br  /><span style="line-height: 26px;">2015-12-25</span>  <span style="line-height: 26px;">1.0000</span>  <span style="line-height: 26px;">000001</span>  <span style="line-height: 26px;">2015-12-25</span>  <span style="line-height: 26px;">1.0000</span>  <span style="line-height: 26px;">9.8150</span>  <span style="line-height: 26px;">10.8550</span> <span style="line-height: 26px;">399845.0000</span><br  /><span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">1.0000</span>  <span style="line-height: 26px;">000001</span>  <span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">1.0000</span>  <span style="line-height: 26px;">9.5370</span>  <span style="line-height: 26px;">10.8950</span> <span style="line-height: 26px;">822408.0000</span><br  /><span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">9.6240</span>  <span style="line-height: 26px;">000001</span>  <span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">9.6320</span>  <span style="line-height: 26px;">9.5290</span>  <span style="line-height: 26px;">10.5450</span> <span style="line-height: 26px;">619802.0000</span><br  /><span style="line-height: 26px;">2015-12-30</span>  <span style="line-height: 26px;">9.6320</span>  <span style="line-height: 26px;">000001</span>  <span style="line-height: 26px;">2015-12-30</span>  <span style="line-height: 26px;">9.6400</span>  <span style="line-height: 26px;">9.5130</span>  <span style="line-height: 26px;">10.6240</span> <span style="line-height: 26px;">532667.0000</span><br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 按行排序,不递增</span><br  />df.sort_index(ascending=<span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">False</span>).head()<br  /><br  />Date    Open    Close   High    Low Volume  Code   <br  />date<br  /><span style="line-height: 26px;">2018-08-08</span>  <span style="line-height: 26px;">2018-08-08</span>  <span style="line-height: 26px;">10.1600</span> <span style="line-height: 26px;">9.1100</span>  <span style="line-height: 26px;">9.1600</span>  <span style="line-height: 26px;">9.0900</span>  <span style="line-height: 26px;">153901.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2018-08-07</span>  <span style="line-height: 26px;">2018-08-07</span>  <span style="line-height: 26px;">9.9600</span>  <span style="line-height: 26px;">9.1700</span>  <span style="line-height: 26px;">9.1700</span>  <span style="line-height: 26px;">8.8800</span>  <span style="line-height: 26px;">690423.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2018-08-06</span>  <span style="line-height: 26px;">2018-08-06</span>  <span style="line-height: 26px;">9.9400</span>  <span style="line-height: 26px;">8.9400</span>  <span style="line-height: 26px;">9.1100</span>  <span style="line-height: 26px;">8.8900</span>  <span style="line-height: 26px;">554010.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2018-08-03</span>  <span style="line-height: 26px;">2018-08-03</span>  <span style="line-height: 26px;">9.9300</span>  <span style="line-height: 26px;">8.9100</span>  <span style="line-height: 26px;">9.1000</span>  <span style="line-height: 26px;">8.9100</span>  <span style="line-height: 26px;">476546.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2018-08-02</span>  <span style="line-height: 26px;">2018-08-02</span>  <span style="line-height: 26px;">10.1300</span> <span style="line-height: 26px;">8.9400</span>  <span style="line-height: 26px;">9.1500</span>  <span style="line-height: 26px;">8.8800</span>  <span style="line-height: 26px;">931401.0000</span> <span style="line-height: 26px;">000001</span><br  /></section>
安装某一列的值排序
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(117, 113, 94);line-height: 26px;"># 按照Open列的值从小到大排序</span><br  />df.sort_values(by=<span style="color: rgb(166, 226, 46);line-height: 26px;">"Open"</span>)<br  />        Date    Open    Close   High    Low Volume  Code<br  />date   <span style="line-height: 26px;">2016-03-01</span>  <span style="line-height: 26px;">2016-03-01</span>  <span style="line-height: 26px;">8.6580</span>  <span style="line-height: 26px;">7.7220</span>  <span style="line-height: 26px;">7.7770</span>  <span style="line-height: 26px;">7.6260</span>  <span style="line-height: 26px;">377910.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2016-02-15</span>  <span style="line-height: 26px;">2016-02-15</span>  <span style="line-height: 26px;">8.6900</span>  <span style="line-height: 26px;">7.7930</span>  <span style="line-height: 26px;">7.8410</span>  <span style="line-height: 26px;">7.6820</span>  <span style="line-height: 26px;">278499.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2016-01-29</span>  <span style="line-height: 26px;">2016-01-29</span>  <span style="line-height: 26px;">8.7540</span>  <span style="line-height: 26px;">7.9610</span>  <span style="line-height: 26px;">8.0240</span>  <span style="line-height: 26px;">7.7140</span>  <span style="line-height: 26px;">544435.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2016-03-02</span>  <span style="line-height: 26px;">2016-03-02</span>  <span style="line-height: 26px;">8.7620</span>  <span style="line-height: 26px;">8.0400</span>  <span style="line-height: 26px;">8.0640</span>  <span style="line-height: 26px;">7.7380</span>  <span style="line-height: 26px;">676613.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2016-02-26</span>  <span style="line-height: 26px;">2016-02-26</span>  <span style="line-height: 26px;">8.7770</span>  <span style="line-height: 26px;">7.7930</span>  <span style="line-height: 26px;">7.8250</span>  <span style="line-height: 26px;">7.6900</span>  <span style="line-height: 26px;">392154.0000</span> <span style="line-height: 26px;">000001</span></section>


4. 合并

concat, 按照行方向或者列方向合并。
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(117, 113, 94);line-height: 26px;"># 分别取0到2行,2到4行,4到9行组成一个列表,通过concat方法按照axis=0,行方向合并, axis参数不指定,默认为0</span><br  />split_rows = [df.iloc[<span style="line-height: 26px;">0</span>:<span style="line-height: 26px;">2</span>,:],df.iloc[<span style="line-height: 26px;">2</span>:<span style="line-height: 26px;">4</span>,:], df.iloc[<span style="line-height: 26px;">4</span>:<span style="line-height: 26px;">9</span>]]<br  />pd.concat(split_rows)<br  /><br  />    Date    Open    Close   High    Low Volume  Code<br  />date<br  /><span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">10.9190</span> <span style="line-height: 26px;">9.8230</span>  <span style="line-height: 26px;">9.9980</span>  <span style="line-height: 26px;">9.7440</span>  <span style="line-height: 26px;">640229.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-25</span>  <span style="line-height: 26px;">2015-12-25</span>  <span style="line-height: 26px;">10.8550</span> <span style="line-height: 26px;">1.0000</span>  <span style="line-height: 26px;">1.0000</span>  <span style="line-height: 26px;">9.8150</span>  <span style="line-height: 26px;">399845.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">10.8950</span> <span style="line-height: 26px;">1.0000</span>  <span style="line-height: 26px;">1.0000</span>  <span style="line-height: 26px;">9.5370</span>  <span style="line-height: 26px;">822408.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">10.5450</span> <span style="line-height: 26px;">9.6240</span>  <span style="line-height: 26px;">9.6320</span>  <span style="line-height: 26px;">9.5290</span>  <span style="line-height: 26px;">619802.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-30</span>  <span style="line-height: 26px;">2015-12-30</span>  <span style="line-height: 26px;">10.6240</span> <span style="line-height: 26px;">9.6320</span>  <span style="line-height: 26px;">9.6400</span>  <span style="line-height: 26px;">9.5130</span>  <span style="line-height: 26px;">532667.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-31</span>  <span style="line-height: 26px;">2015-12-31</span>  <span style="line-height: 26px;">10.6320</span> <span style="line-height: 26px;">9.5450</span>  <span style="line-height: 26px;">9.6560</span>  <span style="line-height: 26px;">9.5370</span>  <span style="line-height: 26px;">491258.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2016-01-04</span>  <span style="line-height: 26px;">2016-01-04</span>  <span style="line-height: 26px;">10.5530</span> <span style="line-height: 26px;">8.9950</span>  <span style="line-height: 26px;">9.5770</span>  <span style="line-height: 26px;">8.9400</span>  <span style="line-height: 26px;">563497.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2016-01-05</span>  <span style="line-height: 26px;">2016-01-05</span>  <span style="line-height: 26px;">9.9720</span>  <span style="line-height: 26px;">9.0750</span>  <span style="line-height: 26px;">9.2100</span>  <span style="line-height: 26px;">8.8760</span>  <span style="line-height: 26px;">663269.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2016-01-06</span>  <span style="line-height: 26px;">2016-01-06</span>  <span style="line-height: 26px;">10.0910</span> <span style="line-height: 26px;">9.1790</span>  <span style="line-height: 26px;">9.2020</span>  <span style="line-height: 26px;">9.0670</span>  <span style="line-height: 26px;">515706.0000</span> <span style="line-height: 26px;">000001</span><br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 分别取2到3列,3到5列,5列及以后列数组成一个列表,通过concat方法按照axis=1,列方向合并</span><br  />split_columns = [df.iloc[:,<span style="line-height: 26px;">1</span>:<span style="line-height: 26px;">2</span>], df.iloc[:,<span style="line-height: 26px;">2</span>:<span style="line-height: 26px;">4</span>], df.iloc[:,<span style="line-height: 26px;">4</span>:]]<br  />pd.concat(split_columns, axis=<span style="line-height: 26px;">1</span>).head()<br  /><br  />    Open    Close   High    Low Volume     Code    date<br  /><span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">10.9190</span> <span style="line-height: 26px;">9.8230</span>  <span style="line-height: 26px;">9.9980</span>  <span style="line-height: 26px;">9.7440</span>  <span style="line-height: 26px;">640229.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-25</span>  <span style="line-height: 26px;">10.8550</span> <span style="line-height: 26px;">1.0000</span>  <span style="line-height: 26px;">1.0000</span>  <span style="line-height: 26px;">9.8150</span>  <span style="line-height: 26px;">399845.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-28</span>  <span style="line-height: 26px;">10.8950</span> <span style="line-height: 26px;">1.0000</span>  <span style="line-height: 26px;">1.0000</span>  <span style="line-height: 26px;">9.5370</span>  <span style="line-height: 26px;">822408.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-29</span>  <span style="line-height: 26px;">10.5450</span> <span style="line-height: 26px;">9.6240</span>  <span style="line-height: 26px;">9.6320</span>  <span style="line-height: 26px;">9.5290</span>  <span style="line-height: 26px;">619802.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">2015-12-30</span>  <span style="line-height: 26px;">10.6240</span> <span style="line-height: 26px;">9.6320</span>  <span style="line-height: 26px;">9.6400</span>  <span style="line-height: 26px;">9.5130</span>  <span style="line-height: 26px;">532667.0000</span> <span style="line-height: 26px;">000001</span><br  /></section>
追加行, 相应的还有insert, 插入插入到指定位置
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(117, 113, 94);line-height: 26px;"># 将第一行追加到最后一行</span><br  />df.append(df.iloc[<span style="line-height: 26px;">0</span>,:], ignore_index=<span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">True</span>).tail()<br  /><br  /><br  />Date    Open    Close   High    Low Volume  Code<br  /><span style="line-height: 26px;">637</span> <span style="line-height: 26px;">2018-08-03</span>  <span style="line-height: 26px;">9.9300</span>  <span style="line-height: 26px;">8.9100</span>  <span style="line-height: 26px;">9.1000</span>  <span style="line-height: 26px;">8.9100</span>  <span style="line-height: 26px;">476546.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">638</span> <span style="line-height: 26px;">2018-08-06</span>  <span style="line-height: 26px;">9.9400</span>  <span style="line-height: 26px;">8.9400</span>  <span style="line-height: 26px;">9.1100</span>  <span style="line-height: 26px;">8.8900</span>  <span style="line-height: 26px;">554010.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">639</span> <span style="line-height: 26px;">2018-08-07</span>  <span style="line-height: 26px;">9.9600</span>  <span style="line-height: 26px;">9.1700</span>  <span style="line-height: 26px;">9.1700</span>  <span style="line-height: 26px;">8.8800</span>  <span style="line-height: 26px;">690423.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">640</span> <span style="line-height: 26px;">2018-08-08</span>  <span style="line-height: 26px;">10.1600</span> <span style="line-height: 26px;">9.1100</span>  <span style="line-height: 26px;">9.1600</span>  <span style="line-height: 26px;">9.0900</span>  <span style="line-height: 26px;">153901.0000</span> <span style="line-height: 26px;">000001</span><br  /><span style="line-height: 26px;">641</span> <span style="line-height: 26px;">2015-12-24</span>  <span style="line-height: 26px;">10.9190</span> <span style="line-height: 26px;">9.8230</span>  <span style="line-height: 26px;">9.9980</span>  <span style="line-height: 26px;">9.7440</span>  <span style="line-height: 26px;">640229.0000</span> <span style="line-height: 26px;">000001</span></section>

5. 对象复制

由于dataframe是引用对象,所以需要显示调用copy方法用以复制整个dataframe对象。

   四、绘图


pandas的绘图是使用matplotlib,如果想要画的更细致, 可以使用matplotplib,不过简单的画一些图还是不错的。
因为上图太麻烦,这里就不配图了,可以在资源文件里面查看pandas-blog.ipynb文件或者自己敲一遍代码。
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(117, 113, 94);line-height: 26px;"># 这里使用notbook,为了直接在输出中显示,需要以下配置</span><br  />%matplotlib inline<br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 绘制Open,Low,Close.High的线性图</span><br  />df[[<span style="color: rgb(166, 226, 46);line-height: 26px;">"Open"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"Low"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"High"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"Close"</span>]].plot()<br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 绘制面积图</span><br  />df[[<span style="color: rgb(166, 226, 46);line-height: 26px;">"Open"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"Low"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"High"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"Close"</span>]].plot(kind=<span style="color: rgb(166, 226, 46);line-height: 26px;">"area"</span>)</section>


   五、数据读写


读写常见文件格式,如csv,excel,json等,甚至是读取“系统的剪切板”这个功能有时候很有用。直接将鼠标选中复制的内容读取创建dataframe对象。
df2.head()


   六、简单实例

这里以处理web日志为例,也许不太实用,因为ELK处理这些绰绰有余,不过喜欢什么自己来也未尝不可。

1. 分析access.log

日志文件: https://raw.githubusercontent.com/Apache-Labor/labor/master/labor-04/labor-04-example-access.log

2. 日志格式及示例
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(117, 113, 94);line-height: 26px;"># 日志格式</span><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 字段说明, 参考:https://ru.wikipedia.org/wiki/Access.log</span><br  /> %h%l%u%t “%r ”%> s%b “%{Referer} i ”“%{User-Agent} i ”<br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 具体示例</span><br  /><span style="line-height: 26px;">75.249.65.145</span> US - [<span style="line-height: 26px;">2015-09-02</span> <span style="line-height: 26px;">10</span>:<span style="line-height: 26px;">42</span>:<span style="line-height: 26px;">51.003372</span>] <span style="color: rgb(166, 226, 46);line-height: 26px;">"GET /cms/tina-access-editor-for-download/ HTTP/1.1"</span> <span style="line-height: 26px;">200</span> <span style="line-height: 26px;">7113</span> <span style="color: rgb(166, 226, 46);line-height: 26px;">"-"</span> <span style="color: rgb(166, 226, 46);line-height: 26px;">"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"</span> www.example.com <span style="line-height: 26px;">124.165.3.7</span> <span style="line-height: 26px;">443</span> redirect-handler - + <span style="color: rgb(166, 226, 46);line-height: 26px;">"-"</span> Vea2i8CoAwcAADevXAgAAAAB TLSv1<span style="line-height: 26px;">.2</span> ECDHE-RSA-AES128-GCM-SHA256 <span style="line-height: 26px;">701</span> <span style="line-height: 26px;">12118</span> -% <span style="line-height: 26px;">88871</span> <span style="line-height: 26px;">803</span> <span style="line-height: 26px;">0</span> <span style="line-height: 26px;">0</span> <span style="line-height: 26px;">0</span> <span style="line-height: 26px;">0</span></section>

3. 读取并解析日志文件

解析日志文件
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;">HOST = <span style="color: rgb(166, 226, 46);line-height: 26px;">r'^(?P<host>.*?)'</span><br  />SPACE = <span style="color: rgb(166, 226, 46);line-height: 26px;">r's'</span><br  />IDENTITY = <span style="color: rgb(166, 226, 46);line-height: 26px;">r'S+'</span><br  />USER = <span style="color: rgb(166, 226, 46);line-height: 26px;">r"S+"</span><br  />TIME = <span style="color: rgb(166, 226, 46);line-height: 26px;">r'[(?P<time>.*?)]'</span><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># REQUEST = r'"(?P<request>.*?)"'</span><br  />REQUEST = <span style="color: rgb(166, 226, 46);line-height: 26px;">r'"(?P<method>.+?)s(?P<path>.+?)s(?P<http_protocol>.*?)"'</span><br  />STATUS = <span style="color: rgb(166, 226, 46);line-height: 26px;">r'(?P<status>d{3})'</span><br  />SIZE = <span style="color: rgb(166, 226, 46);line-height: 26px;">r'(?P<size>S+)'</span><br  />REFER = <span style="color: rgb(166, 226, 46);line-height: 26px;">r"S+"</span><br  />USER_AGENT = <span style="color: rgb(166, 226, 46);line-height: 26px;">r'"(?P<user_agent>.*?)"'</span><br  /><br  />REGEX = HOST+SPACE+IDENTITY+SPACE+USER+SPACE+TIME+SPACE+REQUEST+SPACE+STATUS+SPACE+SIZE+SPACE+IDENTITY+USER_AGENT+SPACE<br  />line = <span style="color: rgb(166, 226, 46);line-height: 26px;">'79.81.243.171 - - [30/Mar/2009:20:58:31 +0200] "GET /exemples.php HTTP/1.1" 200 11481 "http://www.facades.fr/" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0; .NET CLR 2.0.50727)" "-"'</span><br  />reg = re.compile(REGEX)<br  />reg.match(line).groups()<br  /></section>
将数据注入DataFrame对象
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;">COLUMNS = [<span style="color: rgb(166, 226, 46);line-height: 26px;">"Host"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"Time"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"Method"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"Path"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"Protocol"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"status"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"size"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"User_Agent"</span>]<br  /><br  />field_lis = []<br  /><span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">with</span> open(<span style="color: rgb(166, 226, 46);line-height: 26px;">"access.log"</span>) <span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">as</span> rf:<br  /><span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">for</span> line <span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">in</span> rf:<br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 由于一些记录不能匹配,所以需要捕获异常, 不能捕获的数据格式如下</span><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 80.32.156.105 - - [27/Mar/2009:13:39:51 +0100] "GET  HTTP/1.1" 400 - "-" "-" "-"</span><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 由于重点不在写正则表达式这里就略过了</span><br  />        <span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">try</span>:<br  />fields = reg.match(line).groups()<br  /><span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">except</span> Exception <span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">as</span> e:<br  /><span style="color: rgb(117, 113, 94);line-height: 26px;">#print(e)</span><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;">#print(line)</span><br  />            <span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">pass</span><br  />        field_lis.append(fields)<br  /><br  />log_df  = pd.DataFrame(field_lis)<br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 修改列名</span><br  />log_df.columns = COLUMNS<br  /><br  /><span style="line-height: 26px;"><span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">def</span> <span style="color: rgb(166, 226, 46);font-weight: bold;line-height: 26px;">parse_time</span><span style="line-height: 26px;">(value)</span>:</span><br  />    <span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">try</span>:<br  /><span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">return</span> pd.to_datetime(value)<br  /><span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">except</span> Exception <span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">as</span> e:<br  />        print(e)<br  />        print(value)<br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 将Time列的值修改成pandas可解析的时间格式</span><br  />log_df.Time = log_df.Time.apply(<span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">lambda</span> x: x.replace(<span style="color: rgb(166, 226, 46);line-height: 26px;">":"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">" "</span>, <span style="line-height: 26px;">1</span>))<br  />log_df.Time = log_df.Time.apply(parse_time)<br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 修改index, 将Time列作为index,并drop掉在Time列</span><br  />log_df.index = pd.to_datetime(log_df.Time) <br  />log_df.drop(<span style="color: rgb(166, 226, 46);line-height: 26px;">"Time"</span>, inplace=<span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">True</span>)<br  />log_df.head()<br  /><br  />    Host    Time    Method  Path    Protocol    status  size    User_Agent<br  />Time<br  /><span style="line-height: 26px;">2009-03-22</span> <span style="line-height: 26px;">06</span>:<span style="line-height: 26px;">00</span>:<span style="line-height: 26px;">32</span> <span style="line-height: 26px;">88.191.254.20</span>   <span style="line-height: 26px;">2009-03-22</span> <span style="line-height: 26px;">06</span>:<span style="line-height: 26px;">00</span>:<span style="line-height: 26px;">32</span> GET /   HTTP/<span style="line-height: 26px;">1.0</span>    <span style="line-height: 26px;">200</span> <span style="line-height: 26px;">8674</span>    <span style="color: rgb(166, 226, 46);line-height: 26px;">"-<br  />2009-03-22 06:06:20 66.249.66.231   2009-03-22 06:06:20 GET /popup.php?choix=-89    HTTP/1.1    200 1870    "</span>Mozilla/<span style="line-height: 26px;">5.0</span> (compatible; Googlebot/<span style="line-height: 26px;">2.1</span>; +htt...<br  /><span style="line-height: 26px;">2009-03-22</span> <span style="line-height: 26px;">06</span>:<span style="line-height: 26px;">11</span>:<span style="line-height: 26px;">20</span> <span style="line-height: 26px;">66.249.66.231</span>   <span style="line-height: 26px;">2009-03-22</span> <span style="line-height: 26px;">06</span>:<span style="line-height: 26px;">11</span>:<span style="line-height: 26px;">20</span> GET /specialiste.php    HTTP/<span style="line-height: 26px;">1.1</span>    <span style="line-height: 26px;">200</span> <span style="line-height: 26px;">10743</span>   <span style="color: rgb(166, 226, 46);line-height: 26px;">"Mozilla/5.0 (compatible; Googlebot/2.1; +htt...<br  />2009-03-22 06:40:06 83.198.250.175  2009-03-22 06:40:06 GET /   HTTP/1.1    200 8714    "</span>Mozilla/<span style="line-height: 26px;">4.0</span> (compatible; MSIE <span style="line-height: 26px;">7.0</span>; Windows N...<br  /><span style="line-height: 26px;">2009-03-22</span> <span style="line-height: 26px;">06</span>:<span style="line-height: 26px;">40</span>:<span style="line-height: 26px;">06</span> <span style="line-height: 26px;">83.198.250.175</span>  <span style="line-height: 26px;">2009-03-22</span> <span style="line-height: 26px;">06</span>:<span style="line-height: 26px;">40</span>:<span style="line-height: 26px;">06</span> GET /style.css  HTTP/<span style="line-height: 26px;">1.1</span>    <span style="line-height: 26px;">200</span> <span style="line-height: 26px;">1692</span>    <span style="color: rgb(166, 226, 46);line-height: 26px;">"Mozilla/4.0 (compatible; MSIE 7.0; Windows N...<br  /></span></section>
查看数据类型
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(117, 113, 94);line-height: 26px;"># 查看数据类型</span><br  />log_df.dtypes <br  /><br  />Host                  object<br  />Time          datetime64[ns]<br  />Method                object<br  />Path                  object<br  />Protocol              object<br  />status                object<br  />size                  object<br  />User_Agent            object<br  />dtype: object<br  /></section>
由上可知, 除了Time字段是时间类型,其他都是object,但是Size, Status应该为数字
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="line-height: 26px;"><span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">def</span> <span style="color: rgb(166, 226, 46);font-weight: bold;line-height: 26px;">parse_number</span><span style="line-height: 26px;">(value)</span>:</span><br  /><span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">try</span>:<br  /><span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">return</span> pd.to_numeric(value)<br  />    <span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">except</span> Exception <span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">as</span> e:<br  />        <span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">pass</span><br  /><span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">return</span> <span style="line-height: 26px;">0</span><br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 将Size,Status字段值改为数值类型</span><br  />log_df[[<span style="color: rgb(166, 226, 46);line-height: 26px;">"Status"</span>,<span style="color: rgb(166, 226, 46);line-height: 26px;">"Size"</span>]] = log_df[[<span style="color: rgb(166, 226, 46);line-height: 26px;">"Status"</span>,<span style="color: rgb(166, 226, 46);line-height: 26px;">"Size"</span>]].apply(<span style="color: rgb(249, 38, 114);font-weight: bold;line-height: 26px;">lambda</span> x: x.apply(parse_number))<br  />log_df.dtypes<br  />Host                  object<br  />Time          datetime64[ns]<br  />Method                object<br  />Path                  object<br  />Protocol              object<br  />Status                 int64<br  />Size                   int64<br  />User_Agent            object<br  />dtype: object<br  /></section>
统计status数据
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="color: rgb(117, 113, 94);line-height: 26px;"># 统计不同status值的次数</span><br  />log_df.Status.value_counts()<br  /><br  /><span style="line-height: 26px;">200</span>    <span style="line-height: 26px;">5737</span><br  /><span style="line-height: 26px;">304</span>    <span style="line-height: 26px;">1540</span><br  /><span style="line-height: 26px;">404</span>    <span style="line-height: 26px;">1186</span> <br  /><span style="line-height: 26px;">400</span>     <span style="line-height: 26px;">251</span><br  /><span style="line-height: 26px;">302</span>      <span style="line-height: 26px;">37</span><br  /><span style="line-height: 26px;">403</span>       <span style="line-height: 26px;">3</span><br  /><span style="line-height: 26px;">206</span>       <span style="line-height: 26px;">2</span><br  />Name: Status, dtype: int64<br  /></section>
绘制pie图
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;">log_df.Status.value_counts().plot(kind=<span style="color: rgb(166, 226, 46);line-height: 26px;">"pie"</span>, figsize=(<span style="line-height: 26px;">10</span>,<span style="line-height: 26px;">8</span>))</section>


Pandas进阶大神!从0到100你只差这篇文章!

查看日志文件时间跨度
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;">log_df.index.max() - log_df.index.min()<br  />Timedelta(<span style="color: rgb(166, 226, 46);line-height: 26px;">'15 days 11:12:03'</span>)<br  /></section>
分别查看起始,终止时间
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;">print(log_df.index.max())<br  />print(log_df.index.min())<br  /><br  /><span style="line-height: 26px;">2009-04-06</span> <span style="line-height: 26px;">17</span>:<span style="line-height: 26px;">12</span>:<span style="line-height: 26px;">35</span><br  /><span style="line-height: 26px;">2009-03-22</span> <span style="line-height: 26px;">06</span>:<span style="line-height: 26px;">00</span>:<span style="line-height: 26px;">32</span><br  /></section>
按照此方法还可以统计Method, User_Agent字段 ,不过User_Agent还需要额外清洗以下数据。
统计top 10 IP地址
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;"><span style="line-height: 26px;">91.121.31.184</span>     <span style="line-height: 26px;">745</span><br  /><span style="line-height: 26px;">88.191.254.20</span>     <span style="line-height: 26px;">441</span><br  /><span style="line-height: 26px;">41.224.252.122</span>    <span style="line-height: 26px;">420</span><br  /><span style="line-height: 26px;">194.2.62.185</span>      <span style="line-height: 26px;">255</span><br  /><span style="line-height: 26px;">86.75.35.144</span>      <span style="line-height: 26px;">184</span><br  /><span style="line-height: 26px;">208.89.192.106</span>    <span style="line-height: 26px;">170</span><br  /><span style="line-height: 26px;">79.82.3.8</span>         <span style="line-height: 26px;">161</span><br  /><span style="line-height: 26px;">90.3.72.207</span>       <span style="line-height: 26px;">157</span><br  /><span style="line-height: 26px;">62.147.243.132</span>    <span style="line-height: 26px;">150</span><br  /><span style="line-height: 26px;">81.249.221.143</span>    <span style="line-height: 26px;">141</span><br  />Name: Host, dtype: int64<br  /></section>
绘制请求走势图
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;">log_df2 = log_df.copy()<br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 为每行加一个request字段,值为1</span><br  />log_df2[<span style="color: rgb(166, 226, 46);line-height: 26px;">"Request"</span>] = <span style="line-height: 26px;">1</span><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 每一小时统计一次request数量,并将NaN值替代为0,最后绘制线性图,尺寸为16x9</span><br  />log_df2.Request.resample(<span style="color: rgb(166, 226, 46);line-height: 26px;">"H"</span>).sum().fillna(<span style="line-height: 26px;">0</span>).plot(kind=<span style="color: rgb(166, 226, 46);line-height: 26px;">"line"</span>,figsize=(<span style="line-height: 26px;">16</span>,<span style="line-height: 26px;">10</span>))</section>

Pandas进阶大神!从0到100你只差这篇文章!
分别绘图
<section style="padding: 15px 16px 16px;font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;font-size: 12px;display: -webkit-box;overflow-x: auto;background: rgb(39, 40, 34);color: rgb(221, 221, 221);border-radius: 5px;margin-left: 8px;margin-right: 8px;">分别对<span style="line-height: 26px;">202</span>,<span style="line-height: 26px;">304</span>,<span style="line-height: 26px;">404</span>状态重新取样,并放在一个列表里面<br  />req_df_lis = [<br  />log_df2[log_df2.Status == <span style="line-height: 26px;">200</span>].Request.resample(<span style="color: rgb(166, 226, 46);line-height: 26px;">"H"</span>).sum().fillna(<span style="line-height: 26px;">0</span>), <br  />log_df2[log_df2.Status == <span style="line-height: 26px;">304</span>].Request.resample(<span style="color: rgb(166, 226, 46);line-height: 26px;">"H"</span>).sum().fillna(<span style="line-height: 26px;">0</span>), <br  />log_df2[log_df2.Status == <span style="line-height: 26px;">404</span>].Request.resample(<span style="color: rgb(166, 226, 46);line-height: 26px;">"H"</span>).sum().fillna(<span style="line-height: 26px;">0</span>) <br  />]<br  /><br  /><br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 将三个dataframe组合起来</span><br  />req_df = pd.concat(req_df_lis,axis=<span style="line-height: 26px;">1</span>)<br  />req_df.columns = [<span style="color: rgb(166, 226, 46);line-height: 26px;">"200"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"304"</span>, <span style="color: rgb(166, 226, 46);line-height: 26px;">"404"</span>]<br  /><span style="color: rgb(117, 113, 94);line-height: 26px;"># 绘图</span><br  />req_df.plot(figsize=(<span style="line-height: 26px;">16</span>,<span style="line-height: 26px;">10</span>))</section>

Pandas进阶大神!从0到100你只差这篇文章!

<section data-brushtype="text" style="padding-right: 0em;padding-left: 0em;white-space: normal;max-width: 100%;letter-spacing: 0.544px;color: rgb(62, 62, 62);font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif;font-size: 16px;widows: 1;word-spacing: 2px;caret-color: rgb(255, 0, 0);background-color: rgb(255, 255, 255);text-align: center;box-sizing: border-box !important;overflow-wrap: break-word !important;"><strong style="max-width: 100%;color: rgb(0, 0, 0);font-family: -apple-system-font, system-ui, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;letter-spacing: 0.544px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;letter-spacing: 0.5px;font-size: 14px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><strong style="max-width: 100%;font-size: 16px;letter-spacing: 0.544px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;letter-spacing: 0.5px;box-sizing: border-box !important;overflow-wrap: break-word !important;">—</span></strong>完<strong style="max-width: 100%;font-size: 16px;letter-spacing: 0.544px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;letter-spacing: 0.5px;font-size: 14px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><strong style="max-width: 100%;font-size: 16px;letter-spacing: 0.544px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;letter-spacing: 0.5px;box-sizing: border-box !important;overflow-wrap: break-word !important;">—</span></strong></span></strong></span></strong></section><pre style="padding-right: 0em;padding-left: 0em;max-width: 100%;letter-spacing: 0.544px;color: rgb(62, 62, 62);font-size: 16px;widows: 1;word-spacing: 2px;caret-color: rgb(255, 0, 0);background-color: rgb(255, 255, 255);text-align: center;box-sizing: border-box !important;overflow-wrap: break-word !important;"><pre style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section style="max-width: 100%;letter-spacing: 0.544px;white-space: normal;font-family: -apple-system-font, system-ui, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section powered-by="xiumi.us" style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section style="margin-top: 15px;margin-bottom: 25px;max-width: 100%;opacity: 0.8;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section style="max-width: 100%;letter-spacing: 0.544px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section powered-by="xiumi.us" style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section style="margin-top: 15px;margin-bottom: 25px;max-width: 100%;opacity: 0.8;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;"><section style="margin-bottom: 15px;padding-right: 0em;padding-left: 0em;max-width: 100%;color: rgb(127, 127, 127);font-size: 12px;font-family: sans-serif;line-height: 25.5938px;letter-spacing: 3px;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;color: rgb(0, 0, 0);box-sizing: border-box !important;overflow-wrap: break-word !important;"><strong style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="max-width: 100%;font-size: 16px;font-family: 微软雅黑;caret-color: red;box-sizing: border-box !important;overflow-wrap: break-word !important;">为您推荐</span></strong></span></section><section style="margin-top: 5px;margin-bottom: 5px;padding-right: 0em;padding-left: 0em;max-width: 100%;min-height: 1em;font-family: sans-serif;letter-spacing: 0px;opacity: 0.8;line-height: normal;box-sizing: border-box !important;overflow-wrap: break-word !important;">干货 | 算法工程师超实用技术路线图</section><section style="margin-top: 5px;margin-bottom: 5px;padding-right: 0em;padding-left: 0em;max-width: 100%;min-height: 1em;font-family: sans-serif;letter-spacing: 0px;opacity: 0.8;line-height: normal;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="font-size: 14px;">Github上10个超好看的可视化面板,nice!</span></section><section style="margin-top: 5px;margin-bottom: 5px;padding-right: 0em;padding-left: 0em;max-width: 100%;min-height: 1em;font-family: sans-serif;letter-spacing: 0px;opacity: 0.8;line-height: normal;box-sizing: border-box !important;overflow-wrap: break-word !important;"><span style="font-size: 14px;">吴恩达上新:生成对抗网络(GAN)专项课程</span><br  /></section><section style="margin-top: 5px;margin-bottom: 5px;padding-right: 0em;padding-left: 0em;max-width: 100%;min-height: 1em;font-family: sans-serif;letter-spacing: 0px;opacity: 0.8;line-height: normal;box-sizing: border-box !important;overflow-wrap: break-word !important;">拿到2021年灰飞烟灭的算法岗offer的大佬们是啥样的?<br  /></section><section style="margin-top: 5px;margin-bottom: 5px;padding-right: 0em;padding-left: 0em;max-width: 100%;min-height: 1em;font-family: sans-serif;letter-spacing: 0px;opacity: 0.8;line-height: normal;color: rgb(0, 0, 0);box-sizing: border-box !important;overflow-wrap: break-word !important;">你一定从未看过如此通俗易懂的YOLO系列解读 (下)</section></section></section></section></section></section></section></section></section>

Pandas进阶大神!从0到100你只差这篇文章!

本篇文章来源于: 深度学习这件小事

本文为原创文章,版权归所有,欢迎分享本文,转载请保留出处!

知行编程网
知行编程网 关注:1    粉丝:1
这个人很懒,什么都没写

发表评论

表情 格式 链接 私密 签到
扫一扫二维码分享