neo4j快速入库性能测试

信息安全不简单鸭 2024-04-13 07:02:52

亲自测试一下load csv的性能,perf_counter()方法得到的结果可信度更高,实验数据如下:

节点数

边数

time.clock()

time.perf_counter()

2878

1800

5毫秒

473毫秒

26945

18086

5毫秒

2445毫秒

151191

189396

--

23419毫秒

ailx10

网络安全优秀回答者

网络安全硕士

去咨询

对这次实验的总结如下:

实验1(1k):1秒内预计能够处理 6084节点,3805条边实验2(10k):1秒内预计能够处理 11020节点,7397条边实验3(100k):1秒内预计能够处理 6455节点,8087条边

这次学习到了一个uuid用法,可以将字符串转换成一个不重复的整数,有点像hash函数的味道,在讲key转换成id的过程中,uuid起到了很大的作用~

from py2neo import Graphdef login(): graph = Graph("http://localhost:7474", auth=("neo4j", "h3ll0")) # graph.delete_all() # result =graph.run("""MATCH (n) DETACH DELETE n;""") # # 只执行一次 # result = graph.run("CREATE CONSTRAINT UniqueRequestIPNode ON (p:RequestIPNode) ASSERT p.id_src_ip IS UNIQUE;") # result = graph.run("CREATE CONSTRAINT UniqueDomainNode ON (o:DomainNode) ASSERT o.id_web_domain IS UNIQUE;") # result = graph.run("CREATE CONSTRAINT UniqueResponseNode ON (q:ResponseNode) ASSERT q.id_response_content IS UNIQUE;") # # 添加请求IP节点 result = graph.run(""" LOAD CSV WITH HEADERS FROM 'file:///requestIPNode_1.csv' AS row WITH toInteger(row.id_src_ip) AS id_src_ip,row.src_ip AS src_ip ,row.a1 AS a1, row.a2 AS a2 MERGE (p:RequestIPNode {id_src_ip: id_src_ip}) SET p.a1 = a1, p.a2=a2,p.src_ip=src_ip RETURN count(p); """) # # 添加域名节点 result = graph.run(""" LOAD CSV WITH HEADERS FROM 'file:///domainNode_1.csv' AS row WITH toInteger(row.id_web_domain) AS id_web_domain,row.web_domain AS web_domain ,row.b1 AS b1, row.b2 AS b2 MERGE (o:DomainNode {id_web_domain: id_web_domain}) SET o.b1 = b1, o.b2=b2,o.web_domain=web_domain RETURN count(o); """) # # 添加响应节点 result = graph.run(""" LOAD CSV WITH HEADERS FROM 'file:///responseNode_1.csv' AS row WITH toInteger(row.id_response_content) AS id_response_content,row.response_content AS response_content ,row.c1 AS c1, row.c2 AS c2 MERGE (q:ResponseNode {id_response_content: id_response_content}) SET q.c1 = c1, q.c2=c2,q.response_content=response_content RETURN count(q); """) # 添加请求IP--》域名的边 # # result = graph.run(":auto USING PERIODIC COMMIT 500") -- 无法执行 result = graph.run(""" LOAD CSV WITH HEADERS FROM 'file:///requestIPDomainEdge_1.csv' AS row WITH toInteger(row.id_src_ip) AS id_src_ip, toInteger(row.id_web_domain) AS id_web_domain, row.src_ip AS src_ip, row.web_domain AS web_domain,row.d1 AS d1,row.d2 AS d2 MATCH (p:RequestIPNode {id_src_ip: id_src_ip}) MATCH (o:DomainNode {id_web_domain: id_web_domain}) MERGE (p)-[rel:RequestIPDomainEdge {src_ip: src_ip,web_domain:web_domain,d1:d1,d2:d2}]->(o) RETURN count(rel); """) # # 添加域名--》响应IP的边 result = graph.run(""" LOAD CSV WITH HEADERS FROM 'file:///domainResponseEdge_1.csv' AS row WITH toInteger(row.id_web_domain) AS id_web_domain, toInteger(row.id_response_content) AS id_response_content, row.web_domain AS web_domain, row.response_content AS response_content,row.e1 AS e1,row.e2 AS e2 MATCH (o:DomainNode {id_web_domain: id_web_domain}) MATCH (q:ResponseNode {id_response_content: id_response_content}) MERGE (o)-[edg:DomainResponseEdge {web_domain:web_domain,response_content:response_content,e1:e1,e2:e2}]->(q) RETURN count(edg); """) print(result) if __name__ == "__main__": login()
0 阅读:4

信息安全不简单鸭

简介:感谢大家的关注