Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix gateway server bug and interrupt handling for chunk dispatching #868

Merged
merged 221 commits into from
Jun 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
221 commits
Select commit Hold shift + click to select a range
e66f065
Add Broadcast client API interface (#675)
lynnliu030 Nov 29, 2022
8b842c1
add bucket replication script for experiments
sarahwooders Nov 29, 2022
d6a5a9a
fix region logic
sarahwooders Dec 1, 2022
e15d8fb
merge
sarahwooders Dec 1, 2022
dfecc9c
add log copying
sarahwooders Dec 2, 2022
5881d45
update transfer cost grid
lynnliu030 Dec 4, 2022
9f9c4dd
switch batch to recursive
sarahwooders Dec 5, 2022
771f4dd
Merge branch 'broadcast' of github.com:skyplane-project/skyplane into…
sarahwooders Dec 5, 2022
43a9514
change the upload ids
lynnliu030 Dec 6, 2022
0133f32
put
lynnliu030 Dec 7, 2022
0da6d43
changes for broadcast multipart
lynnliu030 Dec 8, 2022
59b4176
Mux_and fix (#718)
sarahwooders Dec 8, 2022
99f584d
couple gateawy fixes
sarahwooders Dec 8, 2022
d785084
just return 0
sarahwooders Dec 8, 2022
d6f7065
fix bc verification
lynnliu030 Dec 8, 2022
1093d5f
print per-dst remaining bytes
lynnliu030 Dec 8, 2022
8fb9a72
fix obj store wait time
sarahwooders Dec 8, 2022
689386b
Merge branch 'multipart' of github.com:skyplane-project/skyplane into…
sarahwooders Dec 8, 2022
9c56692
assert completed = file deleted
sarahwooders Dec 8, 2022
b5350d0
minor fix
sarahwooders Dec 8, 2022
b4eb68d
accidentally deleted line
sarahwooders Dec 9, 2022
28ca507
fix issue with terminal operators
sarahwooders Dec 9, 2022
c7e56da
update tput profile & ILP
lynnliu030 Dec 9, 2022
651fb97
update run config
lynnliu030 Dec 9, 2022
3590de7
reduce client parallism
sarahwooders Dec 9, 2022
aa83a89
count num processes
sarahwooders Dec 10, 2022
fb460b6
add gw programs
lynnliu030 Dec 10, 2022
bbbfd30
add back print
sarahwooders Dec 10, 2022
207b8f4
modify gateway program processing on gateway side
sarahwooders Dec 10, 2022
5a798f7
working 5 dest transfer
sarahwooders Dec 11, 2022
c176230
broadcast random
lynnliu030 Dec 11, 2022
b5e3d18
add more regions
sarahwooders Dec 11, 2022
04b84f0
increase queue size
sarahwooders Dec 11, 2022
9befa07
merge
lynnliu030 Dec 12, 2022
eb6772e
change regions
sarahwooders Dec 12, 2022
02882c6
fix process counting
sarahwooders Dec 12, 2022
5f22237
reduce # of connections
lynnliu030 Dec 12, 2022
5a6cfe6
lower # of connections
lynnliu030 Dec 12, 2022
4db2767
fix error
sarahwooders Dec 12, 2022
de19866
Merge branch 'multipart' of github.com:skyplane-project/skyplane into…
sarahwooders Dec 12, 2022
7c43ab2
add banned nodes
sarahwooders Dec 12, 2022
3104803
Filter out specific regions and fix ILP (#723)
sarahwooders Dec 13, 2022
a47c341
merge
lynnliu030 Dec 13, 2022
319756b
remove multiplication
lynnliu030 Dec 13, 2022
b0c6ac8
remove multiplication
lynnliu030 Dec 13, 2022
4a5cdc4
Check gbyte_to_transfer
parasj Dec 13, 2022
2682608
script with aws/gcp/azure
lynnliu030 Dec 13, 2022
cb7b514
update num_vms for iterative ILP
lynnliu030 Dec 13, 2022
3aa336a
Merge remote-tracking branch 'origin/main' into multipart
parasj Dec 13, 2022
ec3d9c5
update aws script
lynnliu030 Dec 13, 2022
4e7554e
reset queue sizes
sarahwooders Dec 13, 2022
0f686c2
merge
sarahwooders Dec 13, 2022
ca4115f
Increase retry pool by default
parasj Dec 13, 2022
d1c956f
update script
lynnliu030 Dec 13, 2022
76f68d8
update script
lynnliu030 Dec 13, 2022
e44d5b3
modify connection num
sarahwooders Dec 23, 2022
34f2fb3
Merge branch 'multipart' of github.com:skyplane-project/skyplane into…
sarahwooders Dec 23, 2022
b7f2956
fixed ips
lynnliu030 Dec 24, 2022
5f68afb
Merge branch 'multipart' of https://github.com/skyplane-project/skypl…
lynnliu030 Dec 24, 2022
ac725ee
merge
sarahwooders Dec 24, 2022
7091c90
add topology plotting during runtime
sarahwooders Dec 29, 2022
7ffb026
update visualize gateway program
sarahwooders Dec 29, 2022
3b4c573
fix partitions
lynnliu030 Dec 29, 2022
6491a99
partially implemented support for reading existing gw program
sarahwooders Dec 30, 2022
a0827ec
Merge branch 'multipart' of github.com:skyplane-project/skyplane into…
sarahwooders Dec 30, 2022
a70b712
change instance types
sarahwooders Dec 31, 2022
1cdf740
add log directory for deprovision
sarahwooders Jan 5, 2023
d4c065a
reduce gateway cp parallism and fix recieve bug
sarahwooders Jan 14, 2023
b46e8dc
fix tracker timer
lynnliu030 Jan 31, 2023
862d36b
tracker output
lynnliu030 Feb 5, 2023
1e920e4
update p2p algorithms
lynnliu030 Feb 6, 2023
f5966b8
fix region issue
sarahwooders Feb 13, 2023
d131b75
merge
sarahwooders Feb 13, 2023
4694265
test broadcast object store in gcp
sarahwooders Feb 25, 2023
8f5c707
merge in changes from main
sarahwooders Feb 25, 2023
7baa9ba
stash
sarahwooders Feb 28, 2023
65f9df8
stash
sarahwooders Feb 28, 2023
c358bcb
map subregions
sarahwooders Mar 2, 2023
a56aafc
add basic obj store interfacing to client, write tests, and fix bucke…
sarahwooders Mar 8, 2023
6484f35
reformat
sarahwooders Mar 8, 2023
63ac676
fix formatting
sarahwooders Mar 8, 2023
369de0d
temporarily give up on azure
sarahwooders Mar 8, 2023
ab97b84
move client test to integration test
sarahwooders Mar 8, 2023
d296033
add new files and use generator
sarahwooders Mar 13, 2023
00c5088
reformat
sarahwooders Mar 13, 2023
16ab757
reformat
sarahwooders Mar 13, 2023
4081a36
fix imports
sarahwooders Mar 13, 2023
c0eb132
fix formatting
sarahwooders Mar 13, 2023
dffee82
add basic obj store interfacing to client, write tests, and fix bucke…
sarahwooders Mar 8, 2023
bcda24b
reformat
sarahwooders Mar 8, 2023
6b583f8
fix formatting
sarahwooders Mar 8, 2023
acd101e
temporarily give up on azure
sarahwooders Mar 8, 2023
10f6cd3
move client test to integration test
sarahwooders Mar 8, 2023
69f5af6
add new files and use generator
sarahwooders Mar 13, 2023
e957d4f
reformat
sarahwooders Mar 13, 2023
afa0ec8
fix imports
sarahwooders Mar 13, 2023
5206e0d
fix formatting
sarahwooders Mar 13, 2023
6b4fb9a
add cost function
sarahwooders Mar 14, 2023
03c100d
reformat and remove variables not needed
lynnliu030 Mar 14, 2023
d161e3d
add cost estimation to client dataplane
sarahwooders Mar 14, 2023
f481516
Merge branch 'main' of github.com:sarahwooders/skyplane
sarahwooders Mar 14, 2023
00d0667
add transfer pairs
sarahwooders Mar 14, 2023
8040dd3
merge
sarahwooders Mar 15, 2023
400214a
add logging for error
sarahwooders Mar 15, 2023
92f5d41
error prints
sarahwooders Mar 15, 2023
4fb9a0f
file size fix?
sarahwooders Mar 15, 2023
5805021
fix error? idk
sarahwooders Mar 15, 2023
b4014d3
Merge branch 'skyplane-project:main' into main
sarahwooders Apr 8, 2023
67e5a39
dataplane
sarahwooders Apr 8, 2023
ad1a491
initial implementation
sarahwooders Apr 10, 2023
e9bc643
add pipeline file
sarahwooders Apr 10, 2023
1a4c225
reformat
sarahwooders Apr 10, 2023
8b80f25
add upload id pipelining for multipart
sarahwooders Apr 11, 2023
d46b2d2
initial TransferJob rework (multicast)
abiswal2001 Apr 11, 2023
e313be2
fix transfer generation, but gateway wont start
sarahwooders Apr 12, 2023
c34277d
add deprovisioning and copy error logs
sarahwooders Apr 12, 2023
643a203
half way through removing chunk req
sarahwooders Apr 12, 2023
9d58956
direct transfer works
sarahwooders Apr 14, 2023
c8f99b6
remove docker script for old gateway
sarahwooders Apr 14, 2023
1c80d39
reformat
sarahwooders Apr 14, 2023
ec19129
fix broadcast important
sarahwooders Apr 16, 2023
3f68bb5
working multicast but broken transfer tracking
sarahwooders Apr 21, 2023
0858e87
add multi dest tracker
sarahwooders Apr 21, 2023
460d78c
reformat/cleanup
sarahwooders Apr 21, 2023
9c12137
scaffold more planners
sarahwooders Apr 21, 2023
f081063
implement verification
sarahwooders Apr 24, 2023
123e799
fix different prefix
sarahwooders Apr 24, 2023
4020e14
cleanup
sarahwooders Apr 24, 2023
bee41b2
try to fix docs
sarahwooders Apr 24, 2023
3934177
remove old imports
sarahwooders Apr 24, 2023
cd7876b
remove pandas
sarahwooders Apr 24, 2023
7cfe700
update poetry
sarahwooders Apr 24, 2023
1bd7110
remove experiment import
sarahwooders Apr 25, 2023
0a18559
fix most tests
sarahwooders Apr 25, 2023
8de0414
reformat
sarahwooders Apr 25, 2023
4c85d3b
merge
sarahwooders Apr 25, 2023
68b3867
Merge branch 'sarahwooders-gateway-program-refactor'
sarahwooders Apr 25, 2023
47d0cda
cleanup
sarahwooders Apr 25, 2023
8b8b718
fixed after merge thank god
sarahwooders Apr 25, 2023
14b5580
reformat
sarahwooders Apr 25, 2023
9256710
reformat and add cost estimate fixes
sarahwooders Apr 25, 2023
1bde576
add back throughput
sarahwooders Apr 26, 2023
dfd104b
more cleanup
sarahwooders Apr 26, 2023
36cc6a7
cleanup
sarahwooders Apr 26, 2023
096fc05
remove dockerfile
sarahwooders Apr 26, 2023
5c99bf6
fix ibm imports
sarahwooders Apr 26, 2023
7418ee4
fix imports
sarahwooders Apr 26, 2023
8b4b25f
more cleanuP
sarahwooders Apr 26, 2023
35c90b0
fix ibm imports and pbar
sarahwooders Apr 26, 2023
d2a0aed
add bar for multipart completion
sarahwooders Apr 26, 2023
cc51688
cleanup and remove ibm test
sarahwooders Apr 26, 2023
4599973
forgot to add operator files
sarahwooders Apr 26, 2023
c793ea8
support CLI
sarahwooders Apr 27, 2023
e66fdba
comment out on-prem
sarahwooders Apr 27, 2023
00caafe
ignore solver for linting
sarahwooders Apr 27, 2023
2dc42de
reformat
sarahwooders Apr 27, 2023
e737d3d
format
sarahwooders Apr 27, 2023
2952746
fix
sarahwooders Apr 27, 2023
82030b6
fix errors
sarahwooders Apr 27, 2023
d0017e1
fix pytype issues
sarahwooders Apr 27, 2023
3238983
fix transfer list bug
sarahwooders Apr 29, 2023
5883e51
Merge branch 'skyplane-project:main' into main
sarahwooders Apr 30, 2023
acd952f
add private ips
sarahwooders Apr 30, 2023
08059aa
merge
sarahwooders Apr 30, 2023
2649fe2
add back region tag check
sarahwooders Apr 30, 2023
bff0155
Merge branch 'main' of github.com:sarahwooders/skyplane
sarahwooders Apr 30, 2023
618dbb4
cleanup
sarahwooders Apr 30, 2023
8a5ef62
remove pop
sarahwooders Apr 30, 2023
c2a2242
fix queue
sarahwooders Apr 30, 2023
07f579d
fix pytype
sarahwooders Apr 30, 2023
167f894
fix errors
sarahwooders Apr 30, 2023
cacdf03
disable ibmcloud for skyplane init
sarahwooders May 2, 2023
8980a21
Merge branch 'skyplane-project:main' into main
sarahwooders May 2, 2023
3076b9a
update poetry
sarahwooders May 2, 2023
f3d448b
Merge branch 'main' of github.com:sarahwooders/skyplane
sarahwooders May 2, 2023
e093a42
reformat
sarahwooders May 2, 2023
df517dd
add integration for pull req
sarahwooders May 2, 2023
772cff5
Add gateway start exception handling
sarahwooders May 2, 2023
d1b3d91
fix planning for single region transfers (same source/destination reg…
sarahwooders May 2, 2023
ea882ec
Merge remote-tracking branch 'upstream/main'
sarahwooders May 2, 2023
82f6266
rm print
sarahwooders May 2, 2023
aefcd10
fix azure private ip
sarahwooders May 2, 2023
6c63718
Merge remote-tracking branch 'upstream/main'
sarahwooders May 2, 2023
36c8fcf
reformat
sarahwooders May 2, 2023
4eaa864
refactor and cleanup client code
sarahwooders May 2, 2023
1ba24cd
Merge remote-tracking branch 'upstream/main'
sarahwooders May 2, 2023
01e9158
reformat
sarahwooders May 3, 2023
a2edb08
merge
sarahwooders May 3, 2023
d41427c
add local tests
sarahwooders May 3, 2023
f29bb76
fix tests
sarahwooders May 3, 2023
1486b85
add s3 interface
sarahwooders May 3, 2023
f3521dc
Refactor CLI transfer code and support local fallback (#829)
sarahwooders May 3, 2023
68eb8d5
Update integration-test-local.yml
sarahwooders May 3, 2023
6e5cfd1
Update integration-test-multiple-sizes.yml
sarahwooders May 3, 2023
bf0cb99
commit
sarahwooders May 3, 2023
5ea262b
add waiting and retry for multipart completion
sarahwooders May 5, 2023
2d4a4cc
reformat
sarahwooders May 5, 2023
269a9cb
Merge remote-tracking branch 'upstream/main'
sarahwooders May 5, 2023
18069fd
Merge branch 'integration-tests' into main
sarahwooders May 5, 2023
9e9bbd1
set multipart flag
sarahwooders May 5, 2023
4643983
Merge branch 'main' of github.com:sarahwooders/skyplane
sarahwooders May 5, 2023
f19fc34
merge
sarahwooders May 9, 2023
0258def
broken
sarahwooders May 10, 2023
11e3eef
worked for 1TB
sarahwooders May 11, 2023
8336835
reformat
sarahwooders May 11, 2023
f0370eb
cleanup
sarahwooders May 11, 2023
e277f4d
reformat
sarahwooders May 11, 2023
38a2de8
poetry
sarahwooders May 11, 2023
ed20f35
cleanup
sarahwooders May 15, 2023
427db15
Merge branch 'main' of github.com:sarahwooders/skyplane
sarahwooders May 31, 2023
894420c
add r2 initial implementation
sarahwooders May 31, 2023
d912000
merge
sarahwooders Jun 12, 2023
46f13c9
merge
sarahwooders Jun 12, 2023
86f3044
add pytest integration tests
sarahwooders Jun 13, 2023
68d8dd7
fix bug
sarahwooders Jun 14, 2023
79180c0
remove temporarily
sarahwooders Jun 14, 2023
0a0fde7
ctril-c working
sarahwooders Jun 14, 2023
dd6d2c2
cleanup
sarahwooders Jun 14, 2023
8190ba7
remove dispatch_error
sarahwooders Jun 14, 2023
ab86868
fix pytype
sarahwooders Jun 15, 2023
43a512a
move transfer config to TransferJob init
sarahwooders Jun 15, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ pynacl = { version = "^1.5.0", optional = true }
pyopenssl = { version = "^22.0.0", optional = true }
werkzeug = { version = "^2.1.2", optional = true }
pyarrow = "^10.0.1"
pytest = "^7.3.2"

[tool.poetry.extras]
aws = ["boto3"]
Expand Down
2 changes: 1 addition & 1 deletion scripts/gen_data/gen_many_small.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,4 @@ def make_file(data, fname):

files = [f"{outdir}/{i:08d}.bin" for i in range(args.nfiles)]
data = np.arange(args.size // 4, dtype=np.int32).tobytes()
do_parallel(partial(make_file, data), files, desc="Generating files", spinner=True, spinner_persist=True)
do_parallel(partial(make_file, data), files, desc="Generating files", spinner=True, spinner_persist=True, n=16)
4 changes: 3 additions & 1 deletion skyplane/api/dataplane.py
Original file line number Diff line number Diff line change
Expand Up @@ -246,7 +246,7 @@ def copy_gateway_logs(self):
# copy logs from all gateways in parallel
do_parallel(self.copy_gateway_log, self.bound_nodes.values(), n=-1)

def deprovision(self, max_jobs: int = 64, spinner: bool = False):
def deprovision(self, max_jobs: int = 64, spinner: bool = True):
"""
Deprovision the remote gateways

Expand All @@ -267,6 +267,8 @@ def deprovision(self, max_jobs: int = 64, spinner: bool = False):
for task in self.pending_transfers:
logger.fs.warning(f"Before deprovisioning, waiting for jobs to finish: {list(task.jobs.keys())}")
task.join()
for thread in threading.enumerate():
assert "_run_multipart_chunk_thread" not in thread.name, f"thread {thread.name} is still running"
except KeyboardInterrupt:
logger.warning("Interrupted while waiting for transfers to finish, deprovisioning anyway.")
raise
Expand Down
22 changes: 18 additions & 4 deletions skyplane/api/tracker.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
import functools
import signal

from pprint import pprint
import json
import time
from abc import ABC
from datetime import datetime
from threading import Thread
from threading import Thread, Event

import urllib3
from typing import TYPE_CHECKING, Dict, List, Optional, Set
Expand Down Expand Up @@ -97,6 +99,14 @@ def __init__(self, dataplane, jobs: List["TransferJob"], transfer_config: Transf
self.jobs = {job.uuid: job for job in jobs}
self.transfer_config = transfer_config

# exit handling
self.exit_flag = Event()

def signal_handler(signal, frame):
self.exit_flag.set()

signal.signal(signal.SIGINT, signal_handler)

if hooks is None:
self.hooks = EmptyTransferHook()
else:
Expand Down Expand Up @@ -138,16 +148,20 @@ def run(self):
session_start_timestamp_ms = int(time.time() * 1000)
try:
# pre-dispatch chunks to begin pre-buffering chunks
chunk_streams = {
job_uuid: job.dispatch(self.dataplane, transfer_config=self.transfer_config) for job_uuid, job in self.jobs.items()
}
chunk_streams = {job_uuid: job.dispatch(self.dataplane) for job_uuid, job in self.jobs.items()}
for job_uuid, job in self.jobs.items():
logger.fs.debug(f"[TransferProgressTracker] Dispatching job {job.uuid}")
self.job_chunk_requests[job_uuid] = {}
self.job_pending_chunk_ids[job_uuid] = {region: set() for region in self.dataplane.topology.dest_region_tags}
self.job_complete_chunk_ids[job_uuid] = {region: set() for region in self.dataplane.topology.dest_region_tags}

for chunk in chunk_streams[job_uuid]:
if self.exit_flag.is_set():
logger.fs.debug(f"[TransferProgressTracker] Exiting due to signal")
self.hooks.on_dispatch_end()
self.hooks.on_transfer_end()
job.stop() # stop threads in chunk stream
return
chunks_dispatched = [chunk]
self.job_chunk_requests[job_uuid][chunk.chunk_id] = chunk
self.hooks.on_chunk_dispatched(chunks_dispatched)
Expand Down
Loading