Update profiling records without reseting catalog #519

outtanames · 2024-01-15T18:26:20Z

Variety of updates/fixes to the profiler to support generating profiling catalogs across local/cloud:

profiling runs now load and update the existing catalog rather than regenerating the entire thing, so we can do incremental model profiling runs for a specific subset of models
Various fixes/checks when reading/writing to/from the profiling catalog, edge cases were breaking e.g. gpu utilization
Refactor profiler to make different overloads of prof less ambiguous between the Profiler itself, context managers, actual profiling results etc.
--catalog-path flag so we can write profiling results to a filemount in skypilot rather than needing to track down the catalog inside the nos cache each time.

Summary

Related issues

Checks

make lint: I've run make lint to lint the changes in this PR.
make test: I've made sure the tests (make test-cpu or make test) are passing.
Additional tests:
- Benchmark tests (when contributing new models)
- GPU/HW tests

spillai · 2024-01-15T18:34:20Z

Can you add tests for these?

spillai · 2024-01-15T18:44:47Z

nos/common/profiler.py

@@ -228,6 +228,22 @@ def as_df(self) -> pd.DataFrame:
        )
        return df

+    def from_df(self, df: pd.DataFrame) -> None:


This needs to be a classmethod where you return cls(records)

@classmethod def from_df(cls, df: pd.DataFrame) -> "Profiler": ... return cls(records)

spillai · 2024-01-15T18:45:27Z

nos/common/profiler.py

+                record.update(key, value)
+            self.records.append(record)
+
+    def from_json_path(self, filename: Union[Path, str]) -> None:


Same comment as above, needs to be a classmethod that returns return cls.from_df(df)

spillai · 2024-01-15T18:47:15Z

nos/common/profiler.py

+            self.prof = Profiler()
+            self.prof.from_json_path(NOS_PROFILE_CATALOG_PATH)
+        else:
+            logger.debug("No prof catalog found")


Use f"Profile catalog not found (filename={NOS_PROFILE_CATALOG_PATH})."

spillai · 2024-01-15T18:47:30Z

nos/common/profiler.py

-        with Profiler() as self.prof, torch.inference_mode():
+        from nos.constants import NOS_PROFILE_CATALOG_PATH
+
+        self.prof = Profiler()


self.prof = Profiler.from_json_path(NOS_PROFILE_CATALOG_PATH)

… a request

…rofiler, profiling data and profiling context managers

… the hub based on profile path, gpu utilization populates now

…ult for profiling

…ferent key strucure. Init an empty profiler by default and require rebuild

…ening

outtanames · 2024-01-26T00:04:59Z

Can you add tests for these?

see additions, we now check for gpu utilization in the catalog for clip-vit-patch32 (along with a successful profiling run overall).

spillai reviewed Jan 15, 2024

View reviewed changes

outtanames force-pushed the sloftin/inline-profiling-records branch from ae09f21 to 49b158e Compare January 16, 2024 08:32

spillai changed the title ~~update profiling records without reseting catalog~~ Update profiling records without reseting catalog Jan 16, 2024

outtanames added 15 commits January 22, 2024 15:42

Fixes to profiling flow, table populates from json catalog now

eeeea94

metadata.metadata -> metadata.profile

acb0270

allow missing columns for now

4e037e5

update profiling records without deleting the whole catalog

971de8c

allow mismatched kwargs for now

4fd4459

lint

80fce11

export profiling results to a catalog path

c9ab150

catalog path is a path, not a file

d35ff25

add utilizations to required columns when loading spec

b1ad9e8

use existing profiling record when in recomputing profiling stats for…

21ea8cc

… a request

Try to fix overloading of "prof" and disambiguate between the model p…

6563eb7

…rofiler, profiling data and profiling context managers

Copy catalog to default location for now until we get it loading from…

c76cf58

… the hub based on profile path, gpu utilization populates now

update existing profile

c70c5ba

cleanup, lint, revert version bump

25f2061

Set CUDA_VISIBLE_DEVICES to 0 if not set, in practice reasonable defa…

c2b6b9f

…ult for profiling

outtanames force-pushed the sloftin/inline-profiling-records branch from 0c0c126 to c2b6b9f Compare January 22, 2024 23:54

outtanames added 7 commits January 22, 2024 17:26

Add skypilot gpus to device validator

6d4c89b

Need to work out the right strings for each device one by one for now

badca7a

Add more test coverage for catalog paths

0b2811d

add a100 as a valid device

d59e676

Update column names on nos list

dabf6fa

Return profiler from classmethod when loading from json

ca774b4

fix path for util test

0bec5a1

outtanames added 9 commits January 23, 2024 19:08

Empty the nos catalog for now to avoid conflicts with new key scheme

5520086

Remove the current model profile catalog, need to repopulate with dif…

ff76c21

…ferent key strucure. Init an empty profiler by default and require rebuild

cleanup, add util to profile models by method type

1da2c5d

Would really, really prefer to just use nested json rather than flatt…

7ed552c

…ening

Restore model profile catalog with one entry required for tests

9fb9d08

Hide json catalog from linter for now

4cadd7f

need some debug info on catalog loading to fix the test

018b743

Enable info logging on test-cpu

728abba

Add H100, K80 to NOS devices

20cf1aa

outtanames added 3 commits January 25, 2024 16:16

update test for new format

ef9485a

move catalog to the right place

fe3ed71

fix linter

7c854ff

outtanames merged commit 09b0acd into autonomi-ai:main Jan 26, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update profiling records without reseting catalog #519

Update profiling records without reseting catalog #519

outtanames commented Jan 15, 2024 •

edited

Loading

spillai commented Jan 15, 2024

spillai Jan 15, 2024

spillai Jan 15, 2024

spillai Jan 15, 2024

spillai Jan 15, 2024

spillai Jan 15, 2024 •

edited

Loading

outtanames commented Jan 26, 2024

Update profiling records without reseting catalog #519

Update profiling records without reseting catalog #519

Conversation

outtanames commented Jan 15, 2024 • edited Loading

Summary

Related issues

Checks

spillai commented Jan 15, 2024

spillai Jan 15, 2024

Choose a reason for hiding this comment

spillai Jan 15, 2024

Choose a reason for hiding this comment

spillai Jan 15, 2024

Choose a reason for hiding this comment

spillai Jan 15, 2024

Choose a reason for hiding this comment

spillai Jan 15, 2024 • edited Loading

Choose a reason for hiding this comment

outtanames commented Jan 26, 2024

outtanames commented Jan 15, 2024 •

edited

Loading

spillai Jan 15, 2024 •

edited

Loading