-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change tt_cluster to use single tt_SiliconDevice #13949
base: main
Are you sure you want to change the base?
Conversation
tt_metal/third_party/umd
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently pointing to brosko/tt_metal_singledriver_test, which holds several PRs merged.
Will keep this comment open until I switch to a commit on main in this PR.
Is there a set of tests running on any multichip environment (ideally multi mmio chips), which I can run to increase confidence in this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good to me and okay to merge once the tests list we discussed offline have parity with main
if (is_tg_cluster_) { | ||
num_host_mem_ch_per_mmio_device = HOST_MEM_CHANNELS; | ||
} | ||
uint32_t num_host_mem_ch_per_mmio_device = HOST_MEM_CHANNELS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets go for min(HOST_MEM_CHANNELS, number_of_chips)
for now otherwise on some machines we might see the warning from umd that prints that X channels requested but only Y available
Currently tt_SiliconDevice accepts one map for everything. But that goes against our effort to separate cluster vs chip responsibilities, related to #157 This change includes: - Adding chip_id to setup_core_to_tlb_map. There should be a map per mmio chip - Started a chip api tests file - Added an api example/test on how TLB setup functions at the moment. This also does some minor configuration testing, whether it throws or not. - Minor cosmetic changes to other API tests Contributes to #154 since it adds more api tests. This change will require tt_metal changes. - Breaks existing usages of setup_core_to_tlb_map - tt_metal corresponding PR: tenstorrent/tt-metal#13949 - tt_debuda change: Not used
This attribute is tied to a chip. Although it could be argued that currently we only support a cluster of chips of same architecture, therefore this would be the same for all chips in a single cluster. Still, this makes sense and further unblocks supporting multiple architectures in a single cluster/driver. It will also allow it to be lowered to TTDevice class which will is arch specific (currently architecture_implementation). Related to #157 , contributes to #154 . - Breaks existing usages of get_pcie_base_addr_from_device - tt_metal corresponding PR: tenstorrent/tt-metal#13949 - tt_debuda change: Not used
Ticket
First step towards #13948
What's changed
Changing tt_cluster so that it uses a single tt_SiliconDevice (to be renamed to tt::umd::Cluster) for all connected chips.
In UMD, the following changes were made prior to this change:
Related UMD PRs
tenstorrent/tt-umd#166
tenstorrent/tt-umd#179
tenstorrent/tt-umd#165
tenstorrent/tt-umd#183
tenstorrent/tt-umd#131
Note that each of these UMD PRs are well scoped (as it, they have merit to be checked in by themselves), and they each slightly change the API, but are contributing to the overall effort of using a single tt_SiliconDevice, as is done in this PR.
Checklist
I've ran T3K and TGG workloads to make sure everything still works for multi chip environment:
https://github.com/tenstorrent/tt-metal/actions/runs/11398968066
https://github.com/tenstorrent/tt-metal/actions/runs/11398965728
T3K unit tests: https://github.com/tenstorrent/tt-metal/actions/runs/11441393489
T3K demo tests: https://github.com/tenstorrent/tt-metal/actions/runs/11441405195
TG unit tests: https://github.com/tenstorrent/tt-metal/actions/runs/11441415308
TG demos: https://github.com/tenstorrent/tt-metal/actions/runs/11441422195
TGG unit tests: https://github.com/tenstorrent/tt-metal/actions/runs/11441430017
TGG demos: https://github.com/tenstorrent/tt-metal/actions/runs/11441437612