Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mellanox]Fix issue #2720 Not able to read out values of voltage/temp/power on some cables #2957

Merged
merged 3 commits into from
May 31, 2019
Merged

[Mellanox]Fix issue #2720 Not able to read out values of voltage/temp/power on some cables #2957

merged 3 commits into from
May 31, 2019

Conversation

stephenxs
Copy link
Collaborator

- What I did
Fix issue #2720 Not able to read out values of voltage/temp/power on some cables, which is due to dom reading error on Mellanox switches. Fix it by reading dom via ethtool rather than sysfs and limit all the modifications within Mellanox-specific code.

- How I did it
A new class based on SfpUtilBase and a new method _read_eeprom_specific_bytes_via_ethtool have been introduced in order to change the way the eeprom DOM data is read. Typically the best practice to do this kind of thing is to limit the modification within the function which executes reading operations only and keep other stuff (especially the interface) untouched. However, this can hardly be achieved since the original reading function takes the file object as an input parameter to represent the port. It is done by having the file object pointed to /var/run/hwmanagement files, which will not be maintained in the future.
As a result, the following reading interface has to be introduced with an explicit port number/name as an input parameter in order to get rid of the dependency on those files:
_read_eeprom_specific_bytes_via_ethtool
Since the interface changed, all methods that call the interface should also be overwritten in order to call the new interface, including:
_read_eeprom_devid
get_transceiver_info_dict
get_transceiver_dom_info_dict

Only interface used to read eeprom DOM has been replaced and the main logic has not been changed except the following mentioned:

  1. reading DOM data for sfp port, which is implementioned in get_transceiver_dom_info_dict. In this case a "calibration" should be firstly read from eeprom before other values like temperature, voltage, rx/tx power, can be parsed. However, this has been ignored in the original code, resulting in that the data can't be parsed.
  2. In the original implementation, the data area containing the data is read from DOM separately in order to avoid read unnecessary data and achieve better performance. Having used ethtool to read DOM data, the performance gap between reading all the area and reading the spot data separately has been narrowed to almost zero. To make the code neat and readable, we change the way of reading this data to reading the DOM data in a batch.
  3. In function get_transceiver_dom_info_dict, a default dict is initialized with all data set to 'N/A' and is returned when ethtool fails to read any dom data. This is because the ethtool returns None for ports without dom support, which results in None being returned by get_transceiver_dom_info_dict and thus failing xcvrd to add the TRANSCEIVER_DOM_SENSOR table entry of those ports to STATE_DB.

- How to verify it
1.execute the following command on both SFP and QSFP ports
sudo sfputil show eeprom -d -p Ethernetxxx
sudo sfputil show eeprom -p Ethernetxxx
2.make sure that TRANSCEIVER_INFO and TRANSCEIVER_DOM_SENSOR table in STADE_DB exist for all ports and contain correct data.

- Description for the changelog
Fix issue #2720 Not able to read out values of voltage/temp/power on some cables, which is due to dom reading error.

- A picture of a cute animal (not mandatory but encouraged)

Stephen Sun added 3 commits May 28, 2019 14:06
purpose and restrictions
1. reading eeprom via ethtool.
2. avoid changing common codes shared by all the manufacture (sonic-platform-common), contrain all the modifications with Mellanox-specific code.
current implementation
A new class based on SfpUtilBase and a new method _read_eeprom_specific_bytes_via_ethtool have been introduced in order to change the way the eprom DOM data is read. Typically the best practice to do this kind of thing is to contrain the modification within the function which execute reading operations only and keep other stuffs (especially the interface) untouched. However, this can hardly be achieved since the original reading function takes the file object as input parameter to represent the port. It is done by having the file object to point to /var/run/hwmanagement files, which will not be maintained in the future. As a result, a new interface has to be introduced with a port number/name as input parameter in order to get rid of the dependency on the those files:
_read_eeprom_specific_bytes_via_ethtool
Since the interface changed, all methods that call the interface should also be overwritten in order to call the new interface, including:
_read_eeprom_devid
get_transceiver_info_dict
get_transceiver_dom_info_dict
Only interface used to read eeprom DOM has been replaced and the main logic has not been changed except the following mentioned.
1. reading DOM data for sfp port, which is implementioned in get_transceiver_dom_info_dict. In this case a "calibration" should be firstly read from eeprom before other values like temperature, voltage, rx/tx power, can be parsed. However, this has been ignored in the original code, resulting in that the data cann't be parsed.
2. In the original implemention the data area containing the data are read from DOM separatedly in order to avoid read uncessary data and achieve a better performance. Having used ethtool to read DOM data, the performance gap between reading all the area and reading the spot data separatedly has been narrowed to almost zero. To make the code neat and readable, we change the way to read this data.
… support

Currently, the way in which dom data is read has been changed from using sysfs to using ethtool.
The ethtool returns None for ports without dom support, resulting in None being returned. However, this fails xcvrd to add the TRANSCEIVER_DOM_SENSOR table entry of associated port to CONFIG_DB and then causes SNMP fail.
To address this issue a default dict is initialized with all data set to 'N/A' and is returned is the above case.
BTW, in the original implementation which sysfs is used to read dom data, even though non-None data is returned for ports without dom support, it does not contain valid data. This can result in wrong data in TRANSCEIVER_DOM_SENSOR table.
removing unnecessary empty lines
removing redundent code
replacing hardcoding strings/numbers with predefined const variables
@lguohan lguohan merged commit cfa14ce into sonic-net:master May 31, 2019
@stephenxs stephenxs deleted the ddm-read-error-workaround branch June 1, 2019 02:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants