机器学习pandas数据结构之Series笔记

2019年8月23日 124次阅读来源: 杨_过

对Series的理解也源于对其相关的代码操作，本次仅贴一些代码来加深理解以及记忆

  1 import pandas as pd
  2 import numpy as np
  3 s = pd.Series(np.random.randn(5),index=['a','b','c','d','e'])  # index添加行索引
  4 s
  5 # 输出
  6 a    1.752127
  7 b    0.127374
  8 c    0.581114
  9 d    0.466064
 10 e   -1.493042
 11 dtype: float64
 12 
 13 s.index
 14 # 输出 Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
 15 
 16 s = pd.Series(np.random.randn(5))  # 自动添加索引
 17 s
 18 # 输出 
 19 0    0.209798
 20 1    0.791759
 21 2   -1.352022
 22 3    0.164453
 23 4    0.647989
 24 dtype: float64
 25 
 26 d = {'a':0.,'b':1.,'d':3}  # Series通过字典形式创建表，字典中的键就被当成行索引
 27 s = pd.Series(d, index=list('abcd'))  # 没有的索引赋值为NaN
 28 s
 29 # 输出
 30 
 31 Out[12]:
 32 a    0.0
 33 b    1.0
 34 c    NaN
 35 d    3.0
 36 dtype: float64
 37 
 38 s = pd.Series(5,index=list('abcd'))  # 每个索引对应的值相等
 39 s
 40 #输出
 41 a    5
 42 b    5
 43 c    5
 44 d    5
 45 dtype: int64
 46 
 47 s = pd.Series(np.random.randn(5))  # 通过随机数创建
 48 s
 49 #输出
 50 0   -0.014250
 51 1    0.990860
 52 2    1.785053
 53 3   -2.155324
 54 4   -0.815233
 55 dtype: float64
 56 
 57 s[0]  # 可以通过单个索引值读取对应的值
 58 #输出 -0.014250144041201129
 59 
 60 s[:3]  # 可以通过索引范围读取对应的范围值
 61 # 输出
 62 0   -0.014250
 63 1    0.990860
 64 2    1.785053
 65 dtype: float64
 66 
 67 s[[1,3,4]]  # 任意读取多个索引对应的值需要两对[[]]
 68 # 输出
 69 1    0.990860
 70 3   -2.155324
 71 4   -0.815233
 72 dtype: float64
 73 
 74 np.sin(s)  # 可以用numpy函数对Series创建的表中的数据进行一系列操作
 75 # 输出
 76 0   -0.014250
 77 1    0.836498
 78 2    0.977135
 79 3   -0.833973
 80 4   -0.727885
 81 dtype: float64
 82 
 83 s = pd.Series(np.random.randn(5),index=['a','b','c','d','e'])
 84 s
 85 #输出
 86 a    1.127395
 87 b    0.229895
 88 c    0.161001
 89 d    0.362886
 90 e    0.203692
 91 dtype: float64
 92 
 93 s['a']  # 索引也可以是字符字符串
 94 # 输出 1.1273946030373316
 95 
 96 s['b']= 3  # 可以通过索引赋值操作改变表中的值
 97 s
 98 # 输出
 99 a    1.127395
100 b    3.000000
101 c    0.161001
102 d    0.362886
103 e    0.203692
104 dtype: float64
105 
106 s['g'] = 100  # 如果表中没有那个索引，通过索引的赋值将会自动添加到表的行尾
107 s
108 # 输出
109 a      1.127395
110 b      3.000000
111 c      0.161001
112 d      0.362886
113 e      0.203692
114 g    100.000000
115 dtype: float64
116 
117 print(s.get('f'))  # 也可以使用get函数读取索引对应的值，如果没有就返回默认的None
118 # 输出 None
119 
120 print (s.get('f',0))  # 也可以加自定义值的返回值
121 # 输出 0
122 
123 s1 = pd.Series(np.random.randn(3),index=['a','c','e'])
124 s2 = pd.Series(np.random.randn(3),index=['a','d','e'])
125 print(f'{s1}\n\n{s2}')
126 # 输出 
127 a   -0.036147
128 c   -1.466236
129 e   -0.649153
130 dtype: float64
131 
132 a    1.460091
133 d   -0.788388
134 e    0.175337
135 dtype: float64
136 
137 s1 + s2  # 通过Series创建的表也可以按照行相同索引对应的值相加，如果两表索引对应
138          # 不上就默认赋值NaN
139 # 输出
140 a    1.423945
141 c         NaN
142 d         NaN
143 e   -0.473816
144 dtype: float64

    原文作者：杨_过
    原文地址: https://www.cnblogs.com/yang901112/p/11397213.html
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。