我正试图找到一种方法来做一个累计总计来解释熊猫的关系.
让我们从赛道会议中获取假设数据,在那里我有人,赛,热和时间.
每个人的位置根据以下内容:
对于给定的种族/热量组合:
>首先放置时间最短的人
>第二低的人排在第二位
等等…
这将是相当简单的代码,但一方面..
如果两个人有相同的时间,他们都会获得相同的位置,然后下一次大于他们的时间将具有该值作为位置.
在下表中,对于100码短跑,加热1,RUNNER1首先完成,RUNNER2 / RUNNER3获得第二,RUNNER3获得第三(下一次在RUNNER2 / RUNNER3之后)
所以基本上,逻辑如下:
如果比赛<> race.shift()或heat<> heat.shift()然后place = 1
如果race = race.shift()和heat = heat.shift()和time> time.shift则place = place.shift()1
如果race = race.shift()和heat = heat.shift()和time> time.shift那么place = place.shift()
令我困惑的部分是如何处理这种关系.否则我可以做点什么
df['Place']=np.where(
(df['race']==df['race'].shift())
&
(df['heat']==df['heat'].shift()),
df['Place'].shift()+1,
1)
谢谢!
示例数据如下:
Person,Race,Heat,Time
RUNNER1,100 Yard Dash,1,9.87
RUNNER2,100 Yard Dash,1,9.92
RUNNER3,100 Yard Dash,1,9.92
RUNNER4,100 Yard Dash,1,9.96
RUNNER5,100 Yard Dash,1,9.97
RUNNER6,100 Yard Dash,1,10.01
RUNNER7,100 Yard Dash,2,9.88
RUNNER8,100 Yard Dash,2,9.93
RUNNER9,100 Yard Dash,2,9.93
RUNNER10,100 Yard Dash,2,10.03
RUNNER11,100 Yard Dash,2,10.26
RUNNER7,200 Yard Dash,1,19.63
RUNNER8,200 Yard Dash,1,19.67
RUNNER9,200 Yard Dash,1,19.72
RUNNER10,200 Yard Dash,1,19.72
RUNNER11,200 Yard Dash,1,19.86
RUNNER12,200 Yard Dash,1,19.92
我最终想要的是
Person,Race,Heat,Time,Place
RUNNER1,100 Yard Dash,1,9.87,1
RUNNER2,100 Yard Dash,1,9.92,2
RUNNER3,100 Yard Dash,1,9.92,2
RUNNER4,100 Yard Dash,1,9.96,3
RUNNER5,100 Yard Dash,1,9.97,4
RUNNER6,100 Yard Dash,1,10.01,5
RUNNER7,100 Yard Dash,2,9.88,1
RUNNER8,100 Yard Dash,2,9.93,2
RUNNER9,100 Yard Dash,2,9.93,2
RUNNER10,100 Yard Dash,2,10.03,3
RUNNER11,100 Yard Dash,2,10.26,4
RUNNER7,200 Yard Dash,1,19.63,1
RUNNER8,200 Yard Dash,1,19.67,2
RUNNER9,200 Yard Dash,1,19.72,3
RUNNER10,200 Yard Dash,1,19.72,3
RUNNER11,200 Yard Dash,1,19.86,4
RUNNER12,200 Yard Dash,1,19.92,4
[编辑]现在,更进一步..
让我们假设一旦我留下一组唯一值,下次该设置出现时,值将重置为1 ..
所以,例如, – 注意它变为“加热1”然后“加热2”并回到“加热1” – 我不希望排名从之前的“加热1”继续,而是我想要它们重置.
Person,Race,Heat,Time,Place
RUNNER1,100 Yard Dash,1,9.87,1
RUNNER2,100 Yard Dash,1,9.92,2
RUNNER3,100 Yard Dash,1,9.92,2
RUNNER4,100 Yard Dash,2,9.96,1
RUNNER5,100 Yard Dash,2,9.97,2
RUNNER6,100 Yard Dash,2,10.01,3
RUNNER7,100 Yard Dash,1,9.88,1
RUNNER8,100 Yard Dash,1,9.93,2
RUNNER9,100 Yard Dash,1,9.93,2
最佳答案 你可以使用:
grouped = df.groupby(['Race','Heat'])
df['Place'] = grouped['Time'].transform(lambda x: pd.factorize(x, sort=True)[0]+1)
import pandas as pd
df = pd.DataFrame({'Heat': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1], 'Person': ['RUNNER1', 'RUNNER2', 'RUNNER3', 'RUNNER4', 'RUNNER5', 'RUNNER6', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER12'], 'Race': ['100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '200 Yard Dash', '200 Yard Dash', '200 Yard Dash', '200 Yard Dash', '200 Yard Dash', '200 Yard Dash'], 'Time': [9.8699999999999992, 9.9199999999999999, 9.9199999999999999, 9.9600000000000009, 9.9700000000000006, 10.01, 9.8800000000000008, 9.9299999999999997, 9.9299999999999997, 10.029999999999999, 10.26, 19.629999999999999, 19.670000000000002, 19.719999999999999, 19.719999999999999, 19.859999999999999, 19.920000000000002]})
grouped = df.groupby(['Race','Heat'])
df['Place'] = grouped['Time'].transform(lambda x: pd.factorize(x, sort=True)[0]+1)
df['Rank'] = grouped['Time'].rank(method='min')
print(df)
产量
Heat Person Race Time Place Rank
0 1 RUNNER1 100 Yard Dash 9.87 1.0 1.0
1 1 RUNNER2 100 Yard Dash 9.92 2.0 2.0
2 1 RUNNER3 100 Yard Dash 9.92 2.0 2.0
3 1 RUNNER4 100 Yard Dash 9.96 3.0 4.0
4 1 RUNNER5 100 Yard Dash 9.97 4.0 5.0
5 1 RUNNER6 100 Yard Dash 10.01 5.0 6.0
6 2 RUNNER7 100 Yard Dash 9.88 1.0 1.0
7 2 RUNNER8 100 Yard Dash 9.93 2.0 2.0
8 2 RUNNER9 100 Yard Dash 9.93 2.0 2.0
9 2 RUNNER10 100 Yard Dash 10.03 3.0 4.0
10 2 RUNNER11 100 Yard Dash 10.26 4.0 5.0
11 1 RUNNER7 200 Yard Dash 19.63 1.0 1.0
12 1 RUNNER8 200 Yard Dash 19.67 2.0 2.0
13 1 RUNNER9 200 Yard Dash 19.72 3.0 3.0
14 1 RUNNER10 200 Yard Dash 19.72 3.0 3.0
15 1 RUNNER11 200 Yard Dash 19.86 4.0 5.0
16 1 RUNNER12 200 Yard Dash 19.92 5.0 6.0
请注意,Pandas有一个Groupby.rank
方法可以计算许多常见的排名形式 – 但不是你描述的那种.请注意,例如在第3行,第二名和第三名选手之间的比赛排名是4,而地点是3.
关于编辑:使用
(df['Heat'] != df['Heat'].shift()).cumsum()
消除歧义:
import pandas as pd
df = pd.DataFrame({'Heat': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1], 'Person': ['RUNNER1', 'RUNNER2', 'RUNNER3', 'RUNNER4', 'RUNNER5', 'RUNNER6', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER12'], 'Race': ['100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash'], 'Time': [9.8699999999999992, 9.9199999999999999, 9.9199999999999999, 9.9600000000000009, 9.9700000000000006, 10.01, 9.8800000000000008, 9.9299999999999997, 9.9299999999999997, 10.029999999999999, 10.26, 19.629999999999999, 19.670000000000002, 19.719999999999999, 19.719999999999999, 19.859999999999999, 19.920000000000002]})
df['HeatGroup'] = (df['Heat'] != df['Heat'].shift()).cumsum()
grouped = df.groupby(['Race','HeatGroup'])
df['Place'] = grouped['Time'].transform(lambda x: pd.factorize(x, sort=True)[0]+1)
df['Rank'] = grouped['Time'].rank(method='min')
print(df)
产量
Heat Person Race Time HeatGroup Place Rank
0 1 RUNNER1 100 Yard Dash 9.87 1 1.0 1.0
1 1 RUNNER2 100 Yard Dash 9.92 1 2.0 2.0
2 1 RUNNER3 100 Yard Dash 9.92 1 2.0 2.0
3 1 RUNNER4 100 Yard Dash 9.96 1 3.0 4.0
4 1 RUNNER5 100 Yard Dash 9.97 1 4.0 5.0
5 1 RUNNER6 100 Yard Dash 10.01 1 5.0 6.0
6 2 RUNNER7 100 Yard Dash 9.88 2 1.0 1.0
7 2 RUNNER8 100 Yard Dash 9.93 2 2.0 2.0
8 2 RUNNER9 100 Yard Dash 9.93 2 2.0 2.0
9 2 RUNNER10 100 Yard Dash 10.03 2 3.0 4.0
10 2 RUNNER11 100 Yard Dash 10.26 2 4.0 5.0
11 1 RUNNER7 100 Yard Dash 19.63 3 1.0 1.0
12 1 RUNNER8 100 Yard Dash 19.67 3 2.0 2.0
13 1 RUNNER9 100 Yard Dash 19.72 3 3.0 3.0
14 1 RUNNER10 100 Yard Dash 19.72 3 3.0 3.0
15 1 RUNNER11 100 Yard Dash 19.86 3 4.0 5.0
16 1 RUNNER12 100 Yard Dash 19.92 3 5.0 6.0