gremlin数据加载与图遍历语言简介（增删改查）

2023年8月28日 1,112次阅读来源: 数据结构之图

Gremlin 语言 tinkergraph

（一）

1.加载与存储tinkergraph

Graph = TinkerGraph.open()

①将graph存储为graphSON或者graphML格式

graph.io(graphml()).writeGraph(‘my-graph.graphml’)

graph.io(graphson()).writeGraph(“my-graph.json”) //unwrapped 适合数据量大

fos = new FileOutputStream(“my-graph.json”)

GraphSONWriter.build().wrapAdjacencyList(true).create().writeGraph(fos,graph) //wrapped

②加载存储在graphSON和graphML中的图

graph.io(IoCore.graphml()).readGraph(‘my-graph.graphml’)

graph.io(IoCore.graphson()).readGraph(‘my-graph.json’)

③将查询结果存储为json格式

首先一个mapper，可以指定版本

json_mapper = GraphSONMapper.

build().

version(GraphSONVersion.V1_0).

create().

createMapper()

查询

lax = g.V().has(‘code’,’LAX’).next()

存储

json_mapper.writeValueAsString(lax)

④从groovy文件中加载

:load graph-master/sample-data/load-air-routes-graph.groovy

在gremlin中，可以定义变量，可以定义函数，可以使用循环

variables

austin=g.V().has(‘code’,’AUS’).next()

g.V(austin).out()

若不希望结果马上显示，则可以通过在结尾处添加 ;[]

austin=g.V().has(‘code’,’AUS’).next();[]

Functions

打印两地之间距离

def dist(g,from,to) {

d=g.V().has(‘code’,from).outE().as(‘a’).inV().has(‘code’,to)

.select(‘a’).values(‘dist’).next()

return d }

dist(g,’AUS’,’MEX’)

Loop

for (a in g.V().hasLabel(‘airport’).limit(10).toList()) {println(a.values(‘code’).next()+””+a.values(‘icao’).next())}

3.增adding vertex,edges and properties

①增加addV(),addE()

xyz = graph.addVertex(label,’airport’,

‘code’,’XYZ’,

‘icao’,’KXYZ’,

‘desc’,’This is not a real airport’)

dfw = g.V().has(‘code’,’DFW’).next()

xyz.addEdge(‘route’,dfw)

在增加节点时并不推荐使用graph，推荐使用g

xyz = g.addV(‘airport’).property(‘code’,’XYZ’).

property(‘icao’,’KXYZ’).

property(‘desc’,’This is not a real airport’).next()

g.V().has(‘code’,’DFW’).addE(‘route’).to(xyz)

②新增属性，property()

新增一个名为places的属性，其中放置AUS中所有地点

g.V().has(‘code’,’AUS’).property(‘places’,out().values(‘code’).fold())

多值属性 list

g.addV().property(‘code’,’AUS’).property(‘code’,’KAUS’)

在这里code这个属性中同时存在两个值

新增set形式的属性

g.V(3).property(set,’hw’,”hello”).property(set,’hw’,’world’)

为property增加属性

g.V().has(‘code’,’AUS’).properties(‘code’).property(‘date’,’6/6/2017’)

③若数据库支持，可以在添加节点时自己设置ID

g.inject(99997L,99998L).addV().property(id,identity())

g.addV().property(id,99999L)

④也可以通过循环的方法添加节点

⑤使用coalesce，若不存在，则添加节点

4.删drop

①vertex

g.V().has(‘code’,’XYZ’).drop()

②edge

g.V().has(‘code’,’LHR’).outE().as(‘e’).inV().has(‘code’,’AUS’).select(‘e’).drop()

③property

g.V().has(‘code’,’SFO’).properties(‘desc’).drop()

不仅vertex edge, property也有ID，在更新property的时候，ID也会相应改变，相当于重新设置一个property

④全部删除

g.E().drop()

g.V().drop()

5.子图

创建子图，可以将子图加载到内存中

gremlin> subg=g.V(1..46).outE().

……1> filter(inV().hasId(within(1L..46L))).

……2> subgraph(‘a’).cap(‘a’).next()

==>tinkergraph[vertices:46 edges:1326]

sgt = subg.traversal()

由AUS起始，两度相连

subg = g.V().has(‘code’,’AUS’).

repeat(bothE().subgraph(‘subGraph’).outV()).times(2).

cap(‘subGraph’).next()

6.分析查询效果

使用相应的class分析查询运行时间和消耗情况

①clock clockWithResult

Clock(a){…} a代表运行次数， {}中是执行语句

gremlin> clock(1) {g.V().has(‘airport’,’country’,’CN’).next()}

==>4.6895679999999995

Clock仅能返回运行时间，使用clockwithResult可以既返回时间也返回结果

gremlin> clockWithResult(1) {g.V().has(‘country’,’CN’).count().next()}

==>60.400453

==>209

②profile消耗在哪里

g.V().has(‘region’,’US-TX’).out().has(‘region’,’US-CA’).

out().has(‘country’,’DE’).profile()

将其缀在最后就可以

③索引 tinkergraph

在code上创建索引，可以减少运行时间

graph.createIndex(‘code’,Vertex)

查看索引项

gremlin> graph.getIndexedKeys(Vertex)

==>code

删除索引

graph.dropIndex(‘code’,Vertex

（二）查询

1.toString, ;[] , getClass

可以展示目前graph的节点及边信息

gremlin> graph.toString()

==>tinkergraph[vertices:3619 edges:50148]

通过在命令后附加;[]来避免结果刷屏现象

a=g.V().has(‘code’,’AUS’).out().toList();[]

通过在其后附加getClass可以追寻到其java code

gremlin> g.V().hasLabel(‘airport’).has(‘code’,’DFW’).getClass()

==>class org.apache.tinkerpop.gremlin.groovy.loaders.SugarLoader$GraphTraversalCategory

2.遍历开始

①

g.V() g.E() 节点和边，选择起始

out *	射出邻接节点
in *	射入邻接节点
both *	相连接的节点，不论方向
outE *	射出相邻边
inE *	射入相邻边
bothE *	相临边
outV	射出节点，针对边说
inV	射入节点，针对边说
otherV	除此之外

②path 返回节点和边

路径path , simplePath, cyclicPath

路径，不闭合路径，闭合路径

3.筛选

① has(), hasLable(), hasNot() , not() , hasId()

Has()中可以有三个参数，其中第一个为LABEL，其余为属性的KEY和VALUE

②where

4.限制返回条数

①limit 返回前20条

g.V().hasLabel(‘airport’).values(‘code’).limit(20)

②tail 返回后20条

g.V().hasLabel(‘airport’).values(‘code’).tail(20)

③range返回(a,b]之间， range(local,a,b)则返回列

g.V().hasLabel(‘airport’).range(0,20).values(‘code’)

返回从5开始之后的所有

g.V().has(‘region’,’US-TX’).range(5,-1).fold()

④skip(a) 相当于range(a,-1)

返回a之后的所有

⑤dedup 删除重复行

5.运算函数

①比较函数

eq	等于
neq	不等于
gt	大于
gte	大于等于
lt	小于
lte	小于等于
inside	包含（a,b），且不包括边界
outside	不包含
between	[a,b)
within	匹配至少一部分
without	一点不匹配

②and, or, not 两种使用方法

g.V().hasLabel(‘airport’).and(has(‘region’,’US-TX’),has(‘longest’,gte(12000))).values(‘code’)

g.V().has(‘code’,’AUS’).and().has(‘icao’,’KAUS’)

6.基础统计

count	统计个数
sum	加和
max	最大值
min	最小值
mean	平均值

②随机coin sample， math.random, order

Coin(p) p是概率，以多大的概率从中选择

Sample(n) n是个数，从中随机抽样几个

Math.random()

Order() 排序，和by搭配使用

g.V().hasLabel(‘airport’).order().by(‘longest’,decr).valueMap().

select(‘code’,’longest’).limit(10)

7.条件

①coalesce 判断是否存在

判断其中的结果是否存在，若存在则返回，若不存在，则返回其他结果，在其中可以存放多于两个判别式，但只返回第一个存在结果的条件的结果。

如果和constant配合使用，则可以给不存在的结果赋值。

gremlin> g.V(1).coalesce(has(‘region’,’US-TX’).values(‘desc’),constant(“Not in Texas”))

==>Not in Texas

② choose if then else Choose(条件，真，假)

g.V().has(‘region’,’US-TX’).choose(values(‘longest’).is(gt(12000)),

values(‘code’),

values(‘desc’))

与option 结合使用，则可以达到switch case的效果

g.V().hasLabel(‘airport’).choose(values(‘code’)).

option(‘DFW’,values(‘desc’)).

option(‘AUS’,values(‘region’)).

option(‘LAX’,values(‘runways’))

③ optional 若其中值为真，则返回真值，若其中值为假，则返回前一步值

8.分组group(), groupCount()

搭配by使用，通过group().by()来确定分组标准

g.V().hasLabel(‘airport’).groupCount().by(‘country’)

g.V().hasLabel(‘continent’).group().by(‘code’).by(out().count())

9.local 针对前面的每一项都进行一词括号的操作

gremlin> g.V().hasLabel(‘airport’).out(‘route’).count().mean()

==>43400.0

gremlin> g.V().hasLabel(‘airport’).local(out(‘route’).count()).mean()

==>12.863070539419088

gremlin> g.V().hasLabel(‘airport’).out(‘route’).count()

==>43400

10.设置序号

使用withindex（a）设置索引，其中a是起始数字

gremlin> g.V().has(‘region’,’US-OK’).values(‘code’).withIndex()

==>[OKC,0]

==>[TUL,1]

==>[LAW,2]

==>[SWO,3]

gremlin> g.V().has(‘region’,’US-OK’).values(‘code’).indexed(1)

==>[1,OKC]

==>[2,TUL]

==>[3,LAW]

==>[4,SWO]

11.Sack 在本次traversal中存储该变量

使用时先初始化

g.V().has(‘code’,’SAF’).out().values(‘runways’).fold()

==>[7,4,3,6]

初始化一个初始值为1的sack，每次循环均加上这个数

g.withSack(1).V().has(‘code’,’SAF’).out().values(‘runways’).sack(sum).sack().fold()

==>[8,5,4,7]

使用它可以将超过某值的value设置为某个数，通过by指定sack希望操作的

g.V().sack(assign).by(constant(400)).has(‘code’,’SAF’).

outE().sack(max).by(‘dist’).sack().fold()

==>[549,708,400,400]

可搭配min,max,sum等使用

12.标量

①constant()设置值

②is() 标量的判断

13.aggregate临时存储 store 也可临时存储

g.V().has(‘code’,’AUS’).out().aggregate(‘nonstop’).

out().where(without(‘nonstop’)).dedup().count()

14.inject插入值在查询中

g.V().has(‘code’,’AUS’).values().inject(‘ABIA’)

15.union合并结果集

16.sideEffect

g.V(3).sideEffect(out().count().store(‘a’)).

out().out().count().as(‘b’).select(‘a’,’b’)

17.Repeat

①repeat…until… until…repeat 相当于do…while 和 while…do

g.V().has(‘code’,’AUS’).

repeat(out().simplePath()).

until(has(‘code’,’AGR’)).

path().by(‘code’).limit(10)

②多度查询

使用repeat.times(n)得到的仅为n度的路径，若添加emit还可以得到少于n次的路径

g.V(3).repeat(out()).times(3).has(‘code’,’MIA’).

limit(5).path().by(‘code’)

g.V().has(‘code’,’AUS’).

repeat(out(‘route’)).until(cyclicPath()).

limit(10).path().by(‘code’)

18.math

数学运算符号依旧可以使用，但需要在math()中使用

g.V().limit(1).math(‘100/2’)

19.text匹配操作

textContains	其中包含该string的词句，大小写不敏感
textContainsPrefix	结果中包含以该string为起始的词，大小写敏感
textPrefix	结果中仅包含第一个词是以string起始的结果，大小写敏感
textContainsRegex	匹配正则表达式
textContainsFuzzy	模糊匹配
textRegex	忽略大小写.
textFuzzy	最相似匹配

20.GeoSpacial API

可依据地理位置查询特定范围内交集等

    原文作者：数据结构之图
    原文地址: https://blog.csdn.net/Lindsey_Tai/article/details/81628883
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。