Skip to content

Commit

Permalink
polish
Browse files Browse the repository at this point in the history
  • Loading branch information
stefano-ottolenghi committed Jun 4, 2024
1 parent 13909dd commit 3eb7515
Showing 1 changed file with 59 additions and 77 deletions.
136 changes: 59 additions & 77 deletions python-manual/modules/ROOT/pages/performance.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -71,22 +71,22 @@ To lazy-load a result, you have to use xref:transactions.adoc#managed-transactio

.Comparison between eager and lazy loading
====
Consider a query that results in 4500 result records and that the driver's link:https://neo4j.com/docs/api/python-driver/current/api.html#fetch-size-ref[batch size] is set to 1000 (default).
Consider a query that results in 4500 result records, and that the driver's link:https://neo4j.com/docs/api/python-driver/current/api.html#fetch-size-ref[batch size] is set to 1000 (default).
[cols="1a,1a", options="header"]
|===
|Eager loading
|Lazy loading
|
- The server has to read all 4500 records from the storage before it can send even the first one the driver (i.e. it takes more time for the client to receive the first record).
- Before the driver can pass any record to the application, it has to receive all 4500 records.
- Before any record is available to the application, the driver has to receive all 4500 records.
- The client has to hold in memory all of 4500 records.
|
- The server reads the first 1000 records and sends them to the driver.
- The application can process records as soon as the first batch of 1000 is transferred.
- When the first batch has been processed, the server reads another batch and delivers it to the driver.
- Waiting and resource consumption (both client- and server-side) for the remaining 2500 records is deferred to when the application will request more records, which are delivered in 3 more batches.
- Waiting time and resource consumption (both client- and server-side) for the remaining 2500 records is deferred to when the application requests more records, which are delivered in 3 more batches.
- Resource consumption is bounded by at most 1000 records.
|===
Expand All @@ -97,39 +97,44 @@ Consider a query that results in 4500 result records and that the driver's link:
import neo4j
from time import sleep, time
import tracemalloc
import sys
from neo4j.debug import watch
#watch("neo4j", out=sys.stdout)
URI = "<URI for Neo4j database>"
AUTH = ("<Username>", "<Password>")
# Returns 20 records, each with properties
# - `output` (an expensive computation, to slow down retrieval)
# - `dummyData` (a list of 10000 ints, about 8 KB).
slow_query = '''
UNWIND range(1, 4500) AS s
UNWIND range(1, 20) AS s
RETURN reduce(s=s, x in range(1,1000000) | s + sin(toFloat(x))+cos(toFloat(x))) AS output,
range(1, 10000) AS dummyData
'''
sleep_time = 0
# Delay for each processed record, proxy for some expensive processing.
sleep_time = 0.5
def main():
driver = neo4j.GraphDatabase.driver('neo4j://localhost', auth=('neo4j', 'verysecret'))
driver.verify_connectivity()
start_time = time()
log('BATCHING (execute_read)')
tracemalloc.start()
batching(driver)
log(f'Peak memory usage: {tracemalloc.get_traced_memory()[1]} bytes')
tracemalloc.stop()
log("--- %s seconds ---" % (time() - start_time))
start_time = time()
log('NO BATCHING (execute_query)')
tracemalloc.start()
nobatching(driver)
log(f'Peak memory usage: {tracemalloc.get_traced_memory()[1]} bytes')
tracemalloc.stop()
log("--- %s seconds ---" % (time() - start_time))
with neo4j.GraphDatabase.driver(URI, auth=AUTH) as driver:
driver.verify_connectivity()
start_time = time()
log('BATCHING (execute_read)')
tracemalloc.start()
batching(driver)
log(f'Peak memory usage: {tracemalloc.get_traced_memory()[1]} bytes')
tracemalloc.stop()
log("--- %s seconds ---" % (time() - start_time))
start_time = time()
log('NO BATCHING (execute_query)')
tracemalloc.start()
nobatching(driver)
log(f'Peak memory usage: {tracemalloc.get_traced_memory()[1]} bytes')
tracemalloc.stop()
log("--- %s seconds ---" % (time() - start_time))
def batching(driver):
Expand All @@ -142,7 +147,7 @@ def batching(driver):
log(f'Processing record {record.get("output")}')
sleep(sleep_time) # proxy for some expensive operation
with driver.session(database='neo4j', fetch_size=1000) as session:
with driver.session(database='neo4j') as session:
processed_result = session.execute_read(process_records)
Expand All @@ -161,62 +166,39 @@ def log(msg):
if __name__ == '__main__':
main()
----
.Output
[source, output, role=nocollapse]
----
[1717057433.14] LAZY LOADING (execute_read)
[1717057433.14] Submit query
[1717057433.24] Processing record 0.5309371354666308 // <1>
[1717057433.74] Processing record 1.5309371354662915
[1717057434.25] Processing record 2.5309371354663197
...
[1717057442.88] Processing record 19.530937135463947
[1717057443.38] Peak memory usage: 768642 bytes
[1717057443.38] --- 10.248241662979126 seconds ---
'''
[1714382024.91] BATCHING (execute_read)
[1714382024.91] Submit query
[1714382026.81] Processing record 0.5309371354666308
[1714382026.81] Processing record 1.5309371354662915
[1714382026.81] Processing record 2.5309371354663197
[1714382026.81] Processing record 3.530937135466613
[1714382026.81] Processing record 4.530937135466787
[1714382026.81] Processing record 5.5309371354666315
[1714382026.81] Processing record 6.530937135466685
[1714382026.81] Processing record 7.530937135466191
[1714382026.81] Processing record 8.53093713546581
[1714382026.81] Processing record 9.530937135465868
[1714382026.81] Processing record 10.530937135465868
[1714382026.81] Processing record 11.530937135465868
[1714382026.81] Processing record 12.530937135465868
[1714382026.81] Processing record 13.530937135465557
[1714382026.81] Processing record 14.53093713546418
[1714382026.81] Processing record 15.530937135463606
[1714382026.81] Processing record 16.53093713546345
[1714382026.81] Processing record 17.530937135463947
[1714382026.81] Processing record 18.530937135463947
[1714382026.81] Processing record 19.530937135463947
[1714382026.82] Peak memory usage: 37357 bytes
[1714382026.82] --- 1.9052681922912598 seconds ---
[1714382026.82] NO BATCHING (execute_query)
[1714382026.82] Submit query
[1714382028.72] Processing record 0.5309371354666308
[1714382028.72] Processing record 1.5309371354662915
[1714382028.72] Processing record 2.5309371354663197
[1714382028.72] Processing record 3.530937135466613
[1714382028.72] Processing record 4.530937135466787
[1714382028.72] Processing record 5.5309371354666315
[1714382028.72] Processing record 6.530937135466685
[1714382028.72] Processing record 7.530937135466191
[1714382028.72] Processing record 8.53093713546581
[1714382028.72] Processing record 9.530937135465868
[1714382028.72] Processing record 10.530937135465868
[1714382028.72] Processing record 11.530937135465868
[1714382028.72] Processing record 12.530937135465868
[1714382028.72] Processing record 13.530937135465557
[1714382028.72] Processing record 14.53093713546418
[1714382028.72] Processing record 15.530937135463606
[1714382028.72] Processing record 16.53093713546345
[1714382028.72] Processing record 17.530937135463947
[1714382028.72] Processing record 18.530937135463947
[1714382028.72] Processing record 19.530937135463947
[1714382028.72] Peak memory usage: 22269 bytes
[1714382028.72] --- 1.904068946838379 seconds ---
'''
[1717057443.38] EAGER LOADING (execute_query)
[1717057443.38] Submit query
[1717057445.31] Processing record 0.5309371354666308 // <2>
[1717057445.81] Processing record 1.5309371354662915
[1717057446.31] Processing record 2.5309371354663197
...
[1717057454.82] Processing record 19.530937135463947
[1717057455.34] Peak memory usage: 7081123 bytes // <3>
[1717057455.34] --- 11.960006713867188 seconds ---
----
<1> In lazy loading, the first record is processed with a negligible delay after the query is submitted. The driver has to wait only for the server to execute the query
<2> l
<3> l
====


== Route read queries to cluster readers

In a cluster, *route read queries to link:{neo4j-docs-base-uri}/operations-manual/current/clustering/introduction/#clustering-secondary-mode[secondary nodes]*. You do this by:
Expand Down

0 comments on commit 3eb7515

Please sign in to comment.