Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink tt_SiliconDevice construction #118

Open
joelsmithTT opened this issue Oct 4, 2024 · 5 comments · Fixed by #131
Open

Rethink tt_SiliconDevice construction #118

joelsmithTT opened this issue Oct 4, 2024 · 5 comments · Fixed by #131
Labels
api redesign Related to the ongoing push for redesigning UMDs api

Comments

@joelsmithTT
Copy link
Contributor

tt_SiliconDevice construction is clumsy:

  • TLB windows should be an implementation detail to the greatest extent possible. Do not require the caller to specify the relationship between TLB index and strings used to name them.
  • Hugepages, number of host memory channels should be an implementation detail. What UMD users really care about is memory that is device-accessible.
  • Flags controlling whether to do allocations and cleanup at construction indicate a broken design (see Global mutex issues #72)
  • tt_SiliconDevice isn't necessarily usable after it is constructed: depending on what the caller is trying to do, there are other methods that need to be invoked.

Part of the problem is that tt_SiliconDevice is not a device abstraction, but has grown into a god object. At the device level, a SOC descriptor makes sense, but a topology descriptor does not. So some of these issues will go away as the design is stratified to be more hierarchical.

@broskoTT
Copy link
Contributor

broskoTT commented Oct 9, 2024

During the experimental cleanups I did in the code, I found that it looks much easier to do tt_SiliconDevice -> tt::umd::Cluster thansition, and then lower the stuff which is chip specific to Chip. So "At the device level, a SOC descriptor makes sense, but a topology descriptor does not" -> topology descriptor should be left in this class, and soc will be moved to new class. This is just a technical comment.

A comment on the constructor: The constructor of tt_SiliconDevice should have 0 mandatory arguments. The constructor of future tt::umd::Chip should have one argument, device_id.
Soc and topology descriptors should be fetched from umd, not given to.

@broskoTT broskoTT linked a pull request Oct 9, 2024 that will close this issue
@broskoTT
Copy link
Contributor

broskoTT commented Oct 9, 2024

  • tt_SiliconDevice isn't necessarily usable after it is constructed: depending on what the caller is trying to do, there are other methods that need to be invoked.

It is okay that something else has to be done as well. For example start/stop the device, and maybe some custom configuration prior to starting the device (maybe some custom tlb maps, this is up for discussion anyways, or setting some address parameters), which could even be modified in the same object after stopping/starting the device again. But I do agree that if there is a set of functions that are expected to be called each time after constructor, that should be part of constructor.

@broskoTT broskoTT added the api redesign Related to the ongoing push for redesigning UMDs api label Oct 9, 2024
@joelsmithTT
Copy link
Contributor Author

I found that it looks much easier to do tt_SiliconDevice -> tt::umd::Cluster thansition, and then lower the stuff which is chip specific to Chip

This is a good insight.

But I do agree that if there is a set of functions that are expected to be called each time after constructor, that should be part of constructor.

My view is that after the object is constructed, the user should be able to call its methods and expect them to work. If the object needs special treatment in between construction and being able to use it normally, then there is opportunity to change the design... a goal is to make objects difficult to use incorrectly.

@broskoTT
Copy link
Contributor

broskoTT commented Oct 9, 2024

If the object needs special treatment in between construction and being able to use it normally

I thought non-mandatory configuration, like something out of standard config. User can use the object normally and everything will work, but if they need something custom (like custom tlb maps), they should set them after construction.

@pjanevskiTT
Copy link
Contributor

PR #131 closed it, that wasn't the intention, reopening it to keep track of all things that are not solved yet

@pjanevskiTT pjanevskiTT reopened this Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api redesign Related to the ongoing push for redesigning UMDs api
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants