-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bulkwriter sample read csv #1673
Conversation
7fe57d0
to
14b895e
Compare
347d406
to
7a020c8
Compare
Signed-off-by: yhmo <[email protected]>
|
||
fields = [ | ||
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True), | ||
FieldSchema(name="path", dtype=DataType.VARCHAR, max_length=512), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this field be primary key?
with LocalBulkWriter( | ||
schema=schema, | ||
local_path="/tmp/bulk_writer", | ||
segment_size=4*1024*1024, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason for hardcode 4 here?
local_path="/tmp/bulk_writer", | ||
segment_size=4*1024*1024, | ||
) as local_writer: | ||
read_sample_data("./data/train_embeddings.csv", local_writer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that csv is too big(100 GB) that cannot load into the processing?
threads = [] | ||
thread_count = 100 | ||
rows_per_thread = 1000 | ||
rows_per_thread = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any limitation for size per row here?
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: xiaofan-luan, XuanYang-cn, yhmo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -161,11 +307,11 @@ def test_cloud_bulkinsert(): | |||
access_key=object_url_access_key, | |||
secret_key=object_url_secret_key, | |||
cluster_id=cluster_id, | |||
collection_name=COLLECTION_NAME, | |||
collection_name=CSV_COLLECTION_NAME, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bulk_import miss api-key parameter
) | ||
print(resp) | ||
|
||
print(f"===================== get import job progress ====================") | ||
print(f"\n===================== get import job progress ====================") | ||
job_id = resp['data']['jobId'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json.loads(resp.text)['data']['jobId']
) | ||
print(resp) | ||
|
||
print(f"===================== get import job progress ====================") | ||
print(f"\n===================== get import job progress ====================") | ||
job_id = resp['data']['jobId'] | ||
resp = get_import_progress( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
miss api_key parameter
@@ -174,7 +320,7 @@ def test_cloud_bulkinsert(): | |||
) | |||
print(resp) | |||
|
|||
print(f"===================== list import jobs ====================") | |||
print(f"\n===================== list import jobs ====================") | |||
resp = list_import_jobs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
miss api_key parameter
No description provided.