统计
import pandas
import numpy
# 通过传递一个 numpyarray,时间索引以及列标签来创建一个DataFrame:
dates = pandas.date_range("20180509", periods=6)
df = pandas.DataFrame(numpy.random.randn(6, 4), index=dates, columns=list('ABCD'))
print("时间索引以及列标签来创建一个DataFrame:", df, sep="\n")
# 描述性统计,求每一列的平均数
print("每一列的平均数", df.mean(), sep="\n")
# 其他轴的形同操作
print("每一行的平均数", df.mean(1), sep="\n")
# 对于拥有不同维度、需要对其的对象进行操作。Pandas会自动沿着指定的维度进行广播
s = pandas.Series([1, 3, 5, numpy.nan, 6, 8], index=dates).shift(2) # shift函数主要的功能就是使数据框中的数据移动。
print("行索引不变,移动列的数据。", s, sep="\n")
print("df-s", df.sub(s, axis='index'), sep="\n")# 按照index进行匹配,为s补全为一个矩阵后进行计算,完成两个矩阵相减(df-s)
"E:\Python 3.6.2\python.exe" F:/PycharmProjects/test.py
时间索引以及列标签来创建一个DataFrame:
A B C D
2018-05-09 0.689544 0.875232 0.452993 1.875628
2018-05-10 -0.216719 0.298931 -1.159366 0.188906
2018-05-11 0.268589 1.206928 -0.119726 -0.148764
2018-05-12 -1.035244 1.092390 1.006421 -0.226186
2018-05-13 0.670916 0.738597 -0.184312 -1.280867
2018-05-14 -0.359534 1.109787 0.650537 -0.030985
每一列的平均数
A 0.002925
B 0.886978
C 0.107758
D 0.062955
dtype: float64
每一行的平均数
2018-05-09 0.973349
2018-05-10 -0.222062
2018-05-11 0.301757
2018-05-12 0.209345
2018-05-13 -0.013917
2018-05-14 0.342451
Freq: D, dtype: float64
行索引不变,移动列的数据。
2018-05-09 NaN
2018-05-10 NaN
2018-05-11 1.0
2018-05-12 3.0
2018-05-13 5.0
2018-05-14 NaN
Freq: D, dtype: float64
A B C D
2018-05-09 NaN NaN NaN NaN
2018-05-10 NaN NaN NaN NaN
2018-05-11 -0.731411 0.206928 -1.119726 -1.148764
2018-05-12 -4.035244 -1.907610 -1.993579 -3.226186
2018-05-13 -4.329084 -4.261403 -5.184312 -6.280867
2018-05-14 NaN NaN NaN NaN
Process finished with exit code 0
函数apply()
import pandas
import numpy
# 通过传递一个 numpyarray,时间索引以及列标签来创建一个DataFrame:
dates = pandas.date_range("20180509", periods=6)
df = pandas.DataFrame(numpy.random.randn(6, 4), index=dates, columns=list('ABCD'))
print("时间索引以及列标签来创建一个DataFrame:", df, sep="\n")
# 对数据应用函数
print("从第一行开始,其下一行网上一行结果上累加:", df.apply(numpy.cumsum), sep="\n") # 每行数值向上求和
print("每列的最大数减去最小数:", df.apply(lambda x: x.max() - x.min()), sep="\n")
"E:\Python 3.6.2\python.exe" F:/PycharmProjects/test.py
时间索引以及列标签来创建一个DataFrame:
A B C D
2018-05-09 0.628765 -1.453298 -0.169228 -0.185065
2018-05-10 0.444467 0.159900 -1.581807 0.852065
2018-05-11 1.537534 -1.718371 -1.378338 -0.183929
2018-05-12 -2.131473 -2.586691 -0.241944 -0.842446
2018-05-13 -0.898688 0.394125 1.413996 -1.897569
2018-05-14 -0.891981 0.913925 0.686605 -0.842980
从第一行开始,其下一行网上一行结果上累加:
A B C D
2018-05-09 0.628765 -1.453298 -0.169228 -0.185065
2018-05-10 1.073232 -1.293399 -1.751035 0.667000
2018-05-11 2.610767 -3.011770 -3.129372 0.483071
2018-05-12 0.479293 -5.598461 -3.371316 -0.359374
2018-05-13 -0.419395 -5.204337 -1.957321 -2.256944
2018-05-14 -1.311376 -4.290412 -1.270715 -3.099924
每列的最大数减去最小数:
A 3.669008
B 3.500616
C 2.995802
D 2.749634
dtype: float64
Process finished with exit code 0
直方图
import pandas
import numpy
# 通过传递一个 numpyarray,时间索引以及列标签来创建一个DataFrame:
dates = pandas.date_range("20180509", periods=6)
df = pandas.DataFrame(numpy.random.randn(6, 4), index=dates, columns=list('ABCD'))
print("时间索引以及列标签来创建一个DataFrame:", df, sep="\n")
s = pandas.Series(numpy.random.randint(0, 7, size=10))
print("随机生成十个数的序列:", s, sep="\n")
print("统计每个数出现的次数:", s.value_counts(), sep="\n")
"E:\Python 3.6.2\python.exe" F:/PycharmProjects/test.py
时间索引以及列标签来创建一个DataFrame:
A B C D
2018-05-09 -1.447060 0.998378 -0.272173 -0.240873
2018-05-10 2.019563 0.397001 1.469093 -0.313272
2018-05-11 0.932445 0.973830 -1.914278 -1.374748
2018-05-12 -0.980636 1.336340 -0.232319 1.176833
2018-05-13 -1.850315 -0.738035 -1.085791 1.378875
2018-05-14 1.162965 1.892369 0.499482 0.647424
0 5
1 2
2 1
3 4
4 1
5 5
6 0
7 1
8 0
9 3
dtype: int32
Process finished with exit code 0
字符串方法
Series对象在其str属性中配备了一组字符串处理方法,可以很容易的应用到数组中的每个元素。
import pandas
import numpy
s = pandas.Series(['A', 'B', 'C', 'Aaba', 'Baca', numpy.nan, 'CABA', 'dog', 'cat'])
print("序列值全部改成小写:", s.str.lower(), sep="\n")
"E:\Python 3.6.2\python.exe" F:/PycharmProjects/test.py
序列值全部改成小写:
0 a
1 b
2 c
3 aaba
4 baca
5 NaN
6 caba
7 dog
8 cat
dtype: object
Process finished with exit code 0