python – 使用loc的pandas数据帧索引

我有一种FIZZ-BUZZ问题.我有工作日的日历.在下一栏中,我将’FIZZ’放在某些行上.如果两个’FIZZ’之间存在空隙,我会在其间放置’BUZZ’,除非工作日是’SUN’.请参阅下面的代码(我使用0.15.2 pandas版本):

import datetime
import pandas as pd

dict_weekday = {1: 'MON', 2: 'TUE', 3: 'WED', 4: 'THU', 5: 'FRI', 6: 'SAT', 7: 'SUN'}
df = pd.DataFrame(pd.date_range(datetime.date(2014, 1, 1), datetime.date(2014, 1, 10), freq='D'), columns=['Date'])
df['Weekday'] = df['Date'].apply(lambda x: dict_weekday[x.isoweekday()])
df['A'] = df['Weekday']
idx_lst = [0, 2, 3, 5, 9]
df.loc[idx_lst, 'A'] = 'FIZZ'
previous_idx = idx_lst[0]

for idx in idx_lst:
    print idx
    try:
        print df.loc[idx - 1, 'Weekday'], df.loc[idx, 'Weekday']
        if idx - previous_idx == 2 and df.loc[idx - 1, 'Weekday'] != 'SUN':
            df.loc[idx-1, 'A'] = 'BUZZ'
    except KeyError:
        continue

    previous_idx = idx

print df

输出是:

0
2
2014-12-18 00:00:00 FRI
3
FRI SAT
5
2014-12-21 00:00:00 MON
9
2014-12-18 00:00:00 FRI
        Date Weekday     A
0 2014-01-01     WED  FIZZ
1 2014-01-02     THU  BUZZ
2 2014-01-03     FRI  FIZZ
3 2014-01-04     SAT  FIZZ
4 2014-01-05     SUN  BUZZ
5 2014-01-06     MON  FIZZ
6 2014-01-07     TUE   TUE
7 2014-01-08     WED   WED
8 2014-01-09     THU   THU
9 2014-01-10     FRI  FIZZ

注意第4行,A列中应该有SUN而不是BUZZ.另请注意,如果idx-1不在idx_lst中,则.loc [idx-1]会给出时间戳.如果我使用.ix而不是.loc我会得到正确的答案:

0
2
THU FRI
3
FRI SAT
5
SUN MON
9
THU FRI
        Date Weekday     A
0 2014-01-01     WED  FIZZ
1 2014-01-02     THU  BUZZ
2 2014-01-03     FRI  FIZZ
3 2014-01-04     SAT  FIZZ
4 2014-01-05     SUN   SUN
5 2014-01-06     MON  FIZZ
6 2014-01-07     TUE   TUE
7 2014-01-08     WED   WED
8 2014-01-09     THU   THU
9 2014-01-10     FRI  FIZZ

有什么解释吗?提前致谢.

最佳答案 令人惊讶的行为是由于pd.Series试图将类似日期时间的值强制转换为pd.Timestamps.

df.loc [1]返回pd.Series([pd.Timestamp(‘2014-01-02′),’THU’,’THU’])
不幸的是gets coerced to Timestamps因为这三个值都是日期时间的:

In [154]: pd.Series([pd.Timestamp('2014-01-02'), 'THU', 'THU'])
Out[154]: 
0   2014-01-02
1   2014-12-18
2   2014-12-18
dtype: datetime64[ns]

相反,df.loc [2]不会将值强制转换为Timestamps,因为’FIZZ’不是日期的:

In [155]: pd.Series([pd.Timestamp('2014-01-03'), 'FRI', 'FIZZ'])
Out[155]: 
0    2014-01-03 00:00:00
1                    FRI
2                   FIZZ
dtype: object

在使用.loc之前首先形成Series,df [‘Weekday’]可以避免这个问题:

In [158]: df['Weekday'].loc[1]
Out[158]: 'THU'

这是因为df [‘Weekday’].dtype仍然是dtype(‘O’);没有转换为时间戳.

for idx in idx_lst:
    try:
        # print(idx-1, df.ix[idx - 1, 'Weekday'], df.loc[idx - 1, 'Weekday'])
        if idx - previous_idx == 2 and df['Weekday'].loc[idx - 1] != 'SUN':
            df.loc[idx-1, 'A'] = 'BUZZ'
    except KeyError:
        continue

    previous_idx = idx

产量

        Date Weekday     A
0 2014-01-01     WED  FIZZ
1 2014-01-02     THU  BUZZ
2 2014-01-03     FRI  FIZZ
3 2014-01-04     SAT  FIZZ
4 2014-01-05     SUN   SUN
5 2014-01-06     MON  FIZZ
6 2014-01-07     TUE   TUE
7 2014-01-08     WED   WED
8 2014-01-09     THU   THU
9 2014-01-10     FRI  FIZZ
点赞