Pandas：大对节点之间的最短路径长度

2023年12月1日 397次阅读

我有一个数据框包含orgin_nodes和Distination_nodes,如下所示：

我需要通过应用下一个函数使用networkx库在这些节点之间计算short_path_length：

def short_path_length (node1,node2):
    return nx.shortest_path_length(G, node1, nod2,weight='length')

df['short_path_length']=np.vectorize(short_length_nodes)(df['Orgin_nodes'],df['Destination_nodes'])

其中G是从osmnx库派生的网络图：
我将此代码应用于Dataframe的示例,结果如下：

当我将它应用于大约3000000行的原始数据帧时,需要更多时间吗？

有没有办法使运行速度更快？

UPDATE1：

我跟着@gboeing回答,我将networkx图转换为igraph,如下所示(https://github.com/gboeing/osmnx-examples/blob/master/notebooks/18-osmnx-to-igraph.ipynb)：

ox.config(use_cache=True, log_console=True)
weight = 'length'
G_nx = nx.relabel.convert_node_labels_to_integers(G)
# convert networkx graph to igraph
G_ig = ig.Graph(directed=True)
G_ig.add_vertices(list(G_nx.nodes()))
G_ig.add_edges(list(G_nx.edges()))
G_ig.vs['osmid'] = list(nx.get_node_attributes(G_nx, 'osmid').values())
G_ig.es[weight] = list(nx.get_edge_attributes(G_nx, weight).values())



def short_path_length(node1,node2):
        return G_ig.shortest_paths(source=node1,target=node2, weights=weight)[0][0]


df['short_path_length'] = df.apply(short_path_length(df['Orgin_nodes'],df['Destination_nodes']), axis=1)

我收到了这个错误：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<timed exec> in <module>()

<timed exec> in short_path_length(node1, node2)

ValueError: vertex IDs must be positive, got: -1

导致此错误的原因是df [‘Orgin_nodes’]中的节点编号,df [‘Destination_nodes’]与G_ig顶点名称不匹配.
我该怎么做才能解决它？

UPDATE2

我解决了上述问题通过创建数据框包含G_nx.nodes及其对应的OSMid值,并用G_nx.nodes替换Orgin_nodes和Destination_nodes,如下所示：

df_indices_osmid_Orgin=pd.DataFrame.from_dict({'Orgin_nodes':list(nx.get_node_attributes(G_nx, 'osmid').values()),'Indecise_Nodes_Orgin':list(G_nx.nodes())})
df=pd.merge(df,df_indices_osmid_Orgin,how='inner',on='Orgin_nodes')
df_indices_osmid_Dest=pd.DataFrame.from_dict({'Destination_nodes':list(nx.get_node_attributes(G_nx, 'osmid').values()),'Indecise_Nodes_Dest':list(G_nx.nodes())})
df=pd.merge(df,df_indices_osmid_Dest,how='inner',on='Destination_nodes')

并应用df的以下函数样本来测量最短距离：

sampl_df=df.head()
def short_path_length(row):
    return G_ig.shortest_paths(source=row['Indecise_Nodes_Orgin'], target=row['Indecise_Nodes_Dest'], weights=weight)[0][0]
sampl_df['short_path_length_1'] = sampl_df.apply(short_path_length, axis=1)

虽然它运行没有错误,但与之前的试验相比需要更长的时间：

sampl_df=df.head()
%%time
    def short_path_length(row):
        return G_ig.shortest_paths(source=row['Indecise_Nodes_Orgin'], target=row['Indecise_Nodes_Dest'], weights=weight)[0][0]
sampl_df['short_path_length_1'] = sampl_df.apply(short_path_length, axis=1)

壁挂时间：2.89秒

每回路2.88 s±66.3 ms(平均值±标准偏差,7次运行,每次1次循环)

%%time
def short_path_length(row):
    return nx.shortest_path_length(G, row['Orgin_nodes'], row['Destination_nodes'], weight='length')
sampl_df['short_path_length_2'] = sampl_df.apply(short_path_length, axis=1)

壁挂时间：1.24秒

每回路1.2 s±15.7 ms(平均值±标准偏差,7次运行,每次1次循环)

%%time
def short_path_length (node1,node2):
     return nx.shortest_path_length(G, node1, node2,weight='length')

sampl_df['short_path_length_intr3']=np.vectorize(short_path_length)(sampl_df['Orgin_nodes'],sampl_df['Destination_nodes'])

壁挂时间：1.2秒

每回路1.21 s±12 ms(平均值±标准偏差,7次运行,每次1次循环)

所以可以注意到第三个是最好的,或者这不是用于识别哪个运行得更快的标度.

最佳答案这是一个固有的非向量化问题,因为您传递节点标签并使用图形对象在算法上计算它们之间的最短路径.通过简化代码,您可能会获得较小的加速：

def short_path_length(row):
    return nx.shortest_path_length(G, row['Orgin_nodes'], row['Destination_nodes'], weight='length')
df['short_path_length'] = df.apply(short_path_length, axis=1)

为了提高速度,请将OSMnx图导出为igraph以快速计算C中的最短路径,如OSMnx examples中的笔记本18所示.