Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: Recent changes to 6.0.1-112 make it ABI incompatible to 6.0.0 #152

Open
bertwesarg opened this issue Jan 20, 2024 · 5 comments
Open

Comments

@bertwesarg
Copy link

bertwesarg commented Jan 20, 2024

Problem Description

6.0.1-112 includes this change to the rocm_smi.h header:

@@ -1091,6 +1099,19 @@
   uint16_t current_vclk0s[RSMI_MAX_NUM_CLKS];
   uint16_t current_dclk0s[RSMI_MAX_NUM_CLKS];
 
+  /*
+   * v1.5 additions
+   */
+  // JPEG activity percent (encode/decode)
+  uint16_t jpeg_activity[RSMI_MAX_NUM_JPEG_ENGS];
+
+  // PCIE NAK sent accumulated count
+  uint32_t pcie_nak_sent_count_acc;
+
+  // PCIE NAK received accumulated count
+  uint32_t pcie_nak_rcvd_count_acc;
+
+
   /// \endcond
 } rsmi_gpu_metrics_t;
 

but this change of the size of type rsmi_gpu_metrics_t breaks the ABI to 6.0.0. And currently both librocm_smi.so libs have the same interface version 6.

@oliveiradan
Copy link
Contributor

Problem Description

6.0.1-112 includes this change to the rocm_smi.h header:

@@ -1091,6 +1099,19 @@
   uint16_t current_vclk0s[RSMI_MAX_NUM_CLKS];
   uint16_t current_dclk0s[RSMI_MAX_NUM_CLKS];
 
+  /*
+   * v1.5 additions
+   */
+  // JPEG activity percent (encode/decode)
+  uint16_t jpeg_activity[RSMI_MAX_NUM_JPEG_ENGS];
+
+  // PCIE NAK sent accumulated count
+  uint32_t pcie_nak_sent_count_acc;
+
+  // PCIE NAK received accumulated count
+  uint32_t pcie_nak_rcvd_count_acc;
+
+
   /// \endcond
 } rsmi_gpu_metrics_t;
 

but this change of the size of type rsmi_gpu_metrics_t breaks the ABI to 6.0.0. And currently both librocm_smi.so libs have the same interface version 6.

Are you referring specifically to the fact the rsmi_gpu_metrics_t size changed? Or that the ABI versioning still needs to be updated to reflect the change? Like 6.0.1, for example?

@bertwesarg
Copy link
Author

Are you referring specifically to the fact the rsmi_gpu_metrics_t size changed?

Correct.

Like 6.0.1, for example?

This is a particular bad example in this case, as it does not tell the linker/runtime loader, that librocm_smi64.so.6.0.60001 is not compatible to librocm_smi64.so.6.0.60000. They both still have the same SONAME which is librocm_smi64.so.6, and binaries/libraries only record the SONAME in there dependencies (DT_NEEDED)

@oliveiradan
Copy link
Contributor

oliveiradan commented Jan 26, 2024

Are you referring specifically to the fact the rsmi_gpu_metrics_t size changed?

Correct.

Like 6.0.1, for example?

This is a particular bad example in this case, as it does not tell the linker/runtime loader, that librocm_smi64.so.6.0.60001 is not compatible to librocm_smi64.so.6.0.60000. They both still have the same SONAME which is librocm_smi64.so.6, and binaries/libraries only record the SONAME in there dependencies (DT_NEEDED)

That's a great point. We expect the rsmi_gpu_metric_t type size to change when new elements are added (which is happening more often due to requirements), but we need to change the SONAME to reflect the change and not break the existing ABI. As it follows the ROCm versioning, let me check on that so we can address it accordingly.

Question: Did it break your application? or are you just saying it should have a different version to identify the change?
Thank you for bringing that up!

@bertwesarg
Copy link
Author

I'm just reading the code. There are at least two possible ways I know to prevent changing the soname every time a struct changes its size:

  1. Use symbol versions for functions taking the struct as argument
  2. Let the caller pass the size of the struct to the functions

I think its too late for option 2., but 1. should still be able to add, but it wont avoid the increment in this bugfix

@ppanchad-amd
Copy link

@bertwesarg @oliveiradan Internal ticket has been created to fix this issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants