API tests #154

broskoTT · 2024-10-14T14:35:25Z

A plan is to have a set of API tests:

To showcase UMD usage to clients
Which is arch agnostic
Which showcases (like this simplest one) how cumbersome is the current minimal UMD setup.

A lot of stuff should be taken from existing tests.
At this point it seems harder to rewrite existing tests rather than starting a new one.

The tests should also somewhat follow code examples shown in this diagram: https://docs.google.com/drawings/d/1-m1azdsBqMA0A6ATYRMfkhyeuOJuGCEI62N5a96LXj0/edit

A plan is to have a set of API tests: - To showcase UMD usage to clients - Which is arch agnostic - Which showcases (like this simplest one) how cumbersome is the current minimal UMD setup. A first PR for #154

A continuation on working on #154 The first test showcases the basic usage of cluster descriptor. The second test tests whether ClusterDescriptor can manage multiple clusters of cards, cluster meaning connected chips (through ethernet). It actually shows explicitly that this is not currently supported. So this test fails on a system with multiple unconnected clusters of cards (simple 2xN150 setup would suffice). We don't have a machine like that in UMD pool, so the test passes on CI.

This change adds minimal setup to read/write including remote chips. I've tested this on a setup with 3 N300 cards. Motivation was that I wanted to change wait_for_non_mmio_flush to include chip_id. Then I wanted to write a test. Then it didn't make sense writing a test for this if basic IO is not setup. This test adds to the collection of API tests which we should work on reducing. Related to #154 . Related to #157 Minor changes: - get_soc_descriptor had two different definitions, consolidated those. - added is_chip_remote for convenience

wait_for_non_mmio_flush should be per chip_id. It is used like this in tt_metal, but API doesn't offer it per chip. This is related to #157 and preparing UMD for changes in tt_metal. Also related to #154 since I've added a test which shows how this flush is used. - Does not break existing workflows - tt_metal corresponding PR: tenstorrent/tt-metal#13949 - tt_debuda change: Not used

Currently tt_SiliconDevice accepts one map for everything. But that goes against our effort to separate cluster vs chip responsibilities, related to #157 This change includes: - Adding chip_id to setup_core_to_tlb_map. There should be a map per mmio chip - Started a chip api tests file - Added an api example/test on how TLB setup functions at the moment. This also does some minor configuration testing, whether it throws or not. - Minor cosmetic changes to other API tests Contributes to #154 since it adds more api tests. This change will require tt_metal changes. - Breaks existing usages of setup_core_to_tlb_map - tt_metal corresponding PR: tenstorrent/tt-metal#13949 - tt_debuda change: Not used

This attribute is tied to a chip. Although it could be argued that currently we only support a cluster of chips of same architecture, therefore this would be the same for all chips in a single cluster. Still, this makes sense and further unblocks supporting multiple architectures in a single cluster/driver. It will also allow it to be lowered to TTDevice class which will is arch specific (currently architecture_implementation). Related to #157 , contributes to #154 . - Breaks existing usages of get_pcie_base_addr_from_device - tt_metal corresponding PR: tenstorrent/tt-metal#13949 - tt_debuda change: Not used

broskoTT added the api redesign Related to the ongoing push for redesigning UMDs api label Oct 14, 2024

broskoTT linked a pull request Oct 14, 2024 that will close this issue

First API test #153

Merged

broskoTT removed a link to a pull request Oct 14, 2024

First API test #153

Merged

broskoTT mentioned this issue Oct 14, 2024

First API test #153

Merged

This was referenced Oct 15, 2024

API test - simple IO #159

Merged

ClusterDescriptor API test - not supporting multiple clusters #165

Merged

This was referenced Oct 17, 2024

wait_for_non_mmio_flush per chip #166

Merged

TLB map setup per chip #179

Merged

get_pcie_base_addr_from_device per chip #183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API tests #154

API tests #154

broskoTT commented Oct 14, 2024

API tests #154

API tests #154

Comments

broskoTT commented Oct 14, 2024