Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZwaveJS Renames Devices on HA Restart #80398

Closed
dcmeglio opened this issue Oct 15, 2022 · 166 comments · Fixed by #98145
Closed

ZwaveJS Renames Devices on HA Restart #80398

dcmeglio opened this issue Oct 15, 2022 · 166 comments · Fixed by #98145

Comments

@dcmeglio
Copy link
Contributor

The problem

When I restart HA, I have multiple devices that rename themselves back to default names and remove the Area. For example my "Master Bedroom Ceiling Lights" renames to "700 Series Toggle Dimmer"

What version of Home Assistant Core has the issue?

2022.10.4

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

zwave_js

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zwave_js/

Diagnostics information

zwave_js-dfc70561bf763591e895062eb2bfda66-700 Toggle Dimmer-94311538a60068cd86453504b2c1913b.json.txt

Example YAML snippet

No response

Anything in the logs that might be useful for us?

Nothing seems interesting in the zwave js ui logs. Here is all references to the node since restarting HA:
C:\Users\dmegl\Downloads\z-ui_2022-10-15 (1).log
  1866,38: 2022-10-15 18:17:57.187 INFO Z-WAVE: Node 120: value updated: 114-0-manufacturerId 634 => 634
  1867,38: 2022-10-15 18:17:57.190 INFO Z-WAVE: Node 120: value updated: 114-0-productType 28672 => 28672
  1868,38: 2022-10-15 18:17:57.193 INFO Z-WAVE: Node 120: value updated: 114-0-productId 40964 => 40964
  1876,38: 2022-10-15 18:17:57.622 INFO Z-WAVE: Node 120: value updated: 134-0-libraryType 3 => 3
  1877,38: 2022-10-15 18:17:57.625 INFO Z-WAVE: Node 120: value updated: 134-0-protocolVersion 7.13 => 7.13
  1878,38: 2022-10-15 18:17:57.629 INFO Z-WAVE: Node 120: value updated: 134-0-firmwareVersions 10.0 => 10.0
  1879,38: 2022-10-15 18:17:57.632 INFO Z-WAVE: Node 120: value updated: 134-0-hardwareVersion 1 => 1
  2309,38: 2022-10-15 18:18:44.782 INFO Z-WAVE: Node 120: value updated: 114-0-manufacturerId 634 => 634
  2310,38: 2022-10-15 18:18:44.784 INFO Z-WAVE: Node 120: value updated: 114-0-productType 28672 => 28672
  2311,38: 2022-10-15 18:18:44.787 INFO Z-WAVE: Node 120: value updated: 114-0-productId 40964 => 40964
  2320,38: 2022-10-15 18:18:45.449 INFO Z-WAVE: Node 120: value updated: 134-0-libraryType 3 => 3
  2321,38: 2022-10-15 18:18:45.453 INFO Z-WAVE: Node 120: value updated: 134-0-protocolVersion 7.13 => 7.13
  2322,38: 2022-10-15 18:18:45.457 INFO Z-WAVE: Node 120: value updated: 134-0-firmwareVersions 10.0 => 10.0
  2323,38: 2022-10-15 18:18:45.460 INFO Z-WAVE: Node 120: value updated: 134-0-hardwareVersion 1 => 1

Additional information

Running zwave js ui

@home-assistant
Copy link

Hey there @home-assistant/z-wave, mind taking a look at this issue as it has been labeled with an integration (zwave_js) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)


zwave_js documentation
zwave_js source
(message by IssueLinks)

@dcmeglio
Copy link
Contributor Author

Ok it has something to do with Zwave JS UI overwriting values in HA. I went through and set names for every device in Zwave JS UI via the nodes.json and now only the area gets wiped, not the name. So for certain, just a few nodes, Zwave JS UI is overwriting whatever is set in HA with the default name.

@Chef-de-IT
Copy link

I have the same issue as well. Opened related:

#80447 (comment)

@dcmeglio
Copy link
Contributor Author

That does sound very similar. Are you seeing something in the logs that suggests a device is going out of range/reconnecting? I didn't spot anything. For reference I also have a fairly large zwave mesh, 109 devices. So that's another similarity

@Chef-de-IT
Copy link

@dcmeglio I just assumed, based on the distances involved plus the fact that bluetooth and wifi range varies substantially over time in the interference-dense area

@dcmeglio
Copy link
Contributor Author

@dcmeglio I just assumed, based on the distances involved plus the fact that bluetooth and wifi range varies substantially over time in the interference-dense area

Got it, it does seem reasonable and it's my suspicion too, I was just hoping you had some way to prove it.

@Chef-de-IT
Copy link

Understood. I was just hoping that HomeAssistant would be smart enough to keep the device's hardware addr, name, and area in a database and if it went out of range and back in, find it in that there database - not be like "peekaboo, this right there is a brand new device now"

@dcmeglio
Copy link
Contributor Author

Totally agree. I use Zwave JS UI. If I set the names in the addon it seems to remember those names and apply them in HA. That causes my entity names to at least not get screwed up when it does this.

@Chef-de-IT
Copy link

Totally agree. I use Zwave JS UI. If I set the names in the addon it seems to remember those names and apply them in HA. That causes my entity names to at least not get screwed up when it does this.

picard_annoyed

How are those names not symmetrically synced in both directions?

@arigit
Copy link
Contributor

arigit commented Oct 21, 2022

+1 seeing this issue. Currently on 2022.10.4 but I noticed it on 2022.9.x as well

@deadsquid
Copy link

+1 seeing this in 2022.10.2.

@Chef-de-IT
Copy link

Since I've started using Helpers to be able to control some of the 23 Z-Wave lights as groups in unison, I've also noticed that when a Z-Wave device loses its name, it also gets its entities renamed AND it sometimes gets booted from its groups. If groups include groups (e.g. a "Dimmers A" (side lighting) group includes FloorLamps and WallSconces groups along with some individual members) this makes managing scenes and controlling lights very clean and convenient. When it works.

It's pretty unfortunate a device can just up and lose its identity and get booted from the group. In one case it actually didn't get booted from its group but had its generic name in there, but when I edited its name in UI, the old entities hung around. This feels discouraging in terms of counting on HA for any 3rd party clients, but I'm hopeful the developers will see this & look into it. HA is a fantastic tool - I just wish it weren't moving quite so fast forward at the cost of such bugs. The entire Z-Wave mesh network control experience is unreliable and I wish HA were coded more defensively to deal with that inherent issue of many meshes.

Screenshot 2022-11-06 at 17-07-40 Group Settings – Home Assistant

Screenshot 2022-11-06 Group Members – Home Assistant

@Chef-de-IT
Copy link

Here's an example of this having happened again yesterday. There are 4 identical floor lamps in the 4 corners of the restaurant dining room. They're grouped together to manage their brightness in unison, by means of a group (UI Settings - Devices - Helpers - Group - Light group).

I noticed that one of the lamps wasn't responding to the group's brightness changes:

Chef de IT Z-Wave Zooz client install pic1

Lo and behold, perhaps to a momentary RF interference in the area or another reason for that lamp's Z-Wave mesh node to disappear and reappear, Z-Wave added a misnamed entity instead of the MDR.Dim.Floorlamp.4 - that had no real device.

Screenshot_20221108-Z-Wave Device Rename Impacts Groups

Note that at that point the correct entity for the 4th lamp, MDR.Dim.Floorlamp.4 was also available on the Entity pull-down menu, and when swapped in, the entire group started working as it should. Until the next glitch, I surmise.

@deadsquid
Copy link

Update for me, I completely pooched my JS device configs, and ended up restoring from a full backup and haven't experienced an issue since.

+1 seeing this in 2022.10.2.

@snarlingllama
Copy link

I am having the same issue except I do not use the ZWAVE JS UI add-on; only the "regular" ZWAVE JS add-on (Current version: 0.1.74).

For the ZWAVE JS integration, I have:
Driver Version:
10.3.0
Server Version:
1.24.0

Also:
Home Assistant 2022.11.4
Supervisor 2022.10.2
Operating System 9.3
Frontend 20221108.0 - latest

I have 100 ZWAVE devices, so this is extremely distressing to experience. Basically, every time I restart HA, it randomly renames entities, adds the wrong entities to devices and marks them unavailable (example, it added kwh to a Ecolink door sensor). It also resets all naming conventions for multisensors, smart switches, and door sensors without any consistency. Its rarely ever the same device that gets impacted.

Can anyone confirm if the solution posted by @deadsquid worked for them? Does "pooched my JS device configs" mean uninstalling the integration or something else?

Would greatly appreciate help with this issue as its the only unstable aspect of HA for me.

@Chef-de-IT
Copy link

Can anyone confirm if the solution posted by @deadsquid worked for them? Does "pooched my JS device configs" mean uninstalling the integration or something else?

@snarlingllama , unfortunately it didn't work for me. I've only a quarter of your Z-wave devices on the larger of my pilot installs (just 24) but already had an uncomfortably tense "conversation" with a client. We both prefer to keep decorum & not raise one's voice - but having added an RGB light ("Zooz Z-Wave Plus S2 12/24 V DC RGBW Dimmer ZEN31 for LED Strips and DC Lighting, Work as a Network Repeater") it had a ~1 sec delay responding to customer input for color change, and customer touching the HA color wheel control may have resulted in multiple "inputs" - which quickly queued up & for good 30 sec the system had a mind of its own with light changes & frantic customer "corrective input" only being added to the back of the queue. The ordeal "resolved itself" when the phone was thrown across the room and the queue eventually cleared.

ALL of my Z-Wave devices are 700 series chip, Z-Wave Plus S2. Latest firmware, network healed.
Needless to say this makes it hard to be comfortable with this tech for anything beyond a handful of switches in a couple of adjacent rooms. Which is fine if all this is capable of - but I was going off a premise of 100+ device capability and 1-mile line of sight range (so say 1/4mile would be non-line-of-sight). Ugh. Sorry for the vent but it's context for how milliseconds & signal repeats add up to such real-life situations.

Screenshot_20221116_165848_Firefox
Screenshot_20221116_170156_Firefox

@snarlingllama
Copy link

@Chef-de-IT Thank you for the response. Looking at your screenshots, I am having the exact same issues.

What is really strange, one of my Ecolink door sensors (which is installed about 6 feet from my Aeotec Z-Wave stick) suddenly changed its entity ID without any reboots, etc. Just out of the blue it changed the entity ID and was working fine moment before this happened (we have an automation that plays an audio file and sends a push notification to our phones then this door is opened).

Does anyone have any ideas, etc. of what could be happening here? I would imagine more than just 5+ people are experiencing this. There has to be some commonality here. Back in the 2022.8 days, none of this ever happened with my Z-Wave devices. It appears to be related to the recent releases.

What am I missing here?

@deadsquid
Copy link

deadsquid commented Nov 24, 2022

Follow-up: I was overly optimistic and continue to see the metadata issue on add-on restart. It has happened with multiple devices from Jasco (wall outlet), Innovelli (switch, dimmer, and fan), and Zooz (scene controller), and the chronology of when the devices were added don't seem to matter. Metadata loss (the device remains in the controller) can occur on zwave restart, stop/start, or a HA reboot (all proper shutdowns). I'm happy to instrument the logs as needed, but haven't seen anything that stands out as to when a problem occurs. It's amazingly frustrating, and continues with the latest release.

@snarlingllama - pooched means I tried uninstalling and re-installing ZwaveJS, migrating over to ZwaveJS UI, and a couple other things, and made things considerably worse in the process so simply restored a known-good full backup (which is now a default process on any change or right before I want to restart things for any reason).

Current setup is HA OS an a Generic x86-64 (AcePC AK1 - Celeron), and I use a ZooZ S2 ZST 10 700 with the latest firmware (7.17.2)for the controller

Home Assistant 2022.11.4
Supervisor 2022.10.2
Operating System 9.3
Frontend 20221108.0 - latest

Happy to provide more info, and it's a pretty straight-forward setup. Diagnostic info from the integration attached.

config_entry-zwave_js-fd9de23798136f36e5543a532a561534.json.zip

@Chef-de-IT
Copy link

Chef-de-IT commented Nov 24, 2022

Does anyone have any ideas, etc. of what could be happening here?

@snarlingllama if I had to guess, it could be a naively coded new device discovery mechanism. A Z-Wave mesh node disappears and then reappears, mesh says, here's a new node, its hardware address isn't first checked against the database of already configured device metadata keyed off their hardware addresses but an "add new device" routine is called, that overwrites any "already taken hardware address" with "new device blank defaults". I doubt it's exactly that in terms of the actual moving pieces, as I didn't look at the code, but it's perhaps something amounting to this.

I would imagine more than just 5+ people are experiencing this.

It's possible for developers to conflate 5 people (only) commenting with just 5 people having an issue, although for every one of us there are perhaps 200 who don't know what github is, 200 who know but can't be bothered, 200 who have the issue and not realize it,100 who just give up and 10,000 who have all their two z-wave switches (in plastic electrical boxes not metal) and a fish feeder running perfectly fine all situated within 36inches of the HA computer in their basement in rural Iowa, 76 lightyears away from the nearest source of radio interference.

@blhoward2
Copy link

blhoward2 commented Nov 25, 2022

@snarlingllama if I had to guess, it could be a naively coded new device discovery mechanism. A Z-Wave mesh node disappears and then reappears, mesh says, here's a new node, its hardware address isn't first checked against the database of already configured device metadata keyed off their hardware addresses but an "add new device" routine is called, that overwrites any "already taken hardware address" with "new device blank defaults". I doubt it's exactly that in terms of the actual moving pieces, as I didn't look at the code, but it's perhaps something amounting to this.

This isn’t how device discovery works. The same device should be detected every time, even if you nuke the cache. That’s how switching between the addons works without causing a rename.

@blhoward2
Copy link

Ok it has something to do with Zwave JS UI overwriting values in HA. I went through and set names for every device in Zwave JS UI via the nodes.json and now only the area gets wiped, not the name. So for certain, just a few nodes, Zwave JS UI is overwriting whatever is set in HA with the default name.

To be clear, this isn’t possible. The zwavejs integration may have a bug causing this but Z-Wave JS UI has no ability to change anything in HA. It just pipes events through the websocket where they’re consumed on the HA side.

@blhoward2
Copy link

blhoward2 commented Nov 25, 2022

All: I recognize that this is an intermittent issue but no one can begin to solve it without logs. Turn on log to file in zwavejs-ui, at the debug level or higher, and be vigilant for the change. (Note that logging is set in two places. You want under Z-Wave.) It’s going to be difficult to pin down where exactly in the log this is happening but that will have to happen. Node exports from before and after will also be crucial. Describing the symptoms over and over again doesn’t help diagnose the issue.

@Chef-de-IT
Copy link

All: I recognize that this is an intermittent issue but no one can begin to solve it without logs. Turn on log to file in zwavejs-ui, at the debug level or higher, and be vigilant for the change. (Note that logging is set in two places. You want under Z-Wave.) It’s going to be difficult to pin down where exactly in the log this is happening but that will have to happen. Node exports from before and after will also be crucial. Describing the symptoms over and over again doesn’t help diagnose the issue.

I'm happy to do my part - please provide HomeAssistant screenshots on which logs you want enabled and what settings in HA UI I should pick. The issue is happening every other day, so there should be no problem getting the logs for it. I've a HomeAssistant with the Z-Wave JS (not JS-UI) integration. Please provide the precise steps for the logs you would like.

@blhoward2
Copy link

blhoward2 commented Nov 25, 2022

You cannot log to file in the vanilla addon (to my knowledge) so you’re unable to help here. I don’t run addons so I could be wrong.

@deadsquid
Copy link

I’ll see if I can pull logs.

@Chef-de-IT
Copy link

In HA UI > Settings > Add-ons > Z-Wave JS > Settings, I have the log level set to Debug. In In HA UI > Settings > Add-ons > Z-Wave JS > Log, I have the following:
Screenshot Home Assistant Z-Wave Logs

There's a fair bit of:

  • Dropping message with invalid payload
  • error: Duplicate command
  • [Security2CCMessageEncapsulation] [INVALID]

@dcmeglio
Copy link
Contributor Author

I posted logs with the original report and also confirmed that if you set the name and area in the nodes.json it works around the issue. Nothing in the logs seems to provide any insight to me and no one followed up with any questions on the logs I provided.

@Chef-de-IT
Copy link

2022-11-24T23:47:15.230Z SERIAL « 0x012700a814013....more-hex-string...d8dbe (41 bytes)
6ff645b4d1400d37e
2022-11-24T23:47:15.231Z SERIAL » [ACK] (0x06)
2022-11-24T23:47:15.233Z CNTRLR [Node 053] [] [Meter] value[65537]: 0.18 => 0.18 [Endpoint 1]
2022-11-24T23:47:15.233Z DRIVER « [Node 053] [REQ] [BridgeApplicationCommand]
│ type: broadcast
│ target node: 255
│ RSSI: -45 dBm
└─[Security2CCMessageEncapsulation]
│ sequence number: 34
└─[MultiChannelCCCommandEncapsulation]
│ source: 1
│ destination: 0
└─[MeterCCReport]
type: Electric
scale: kWh
rate type: Consumed
value: 0.18
time delta: 0 seconds
2022-11-24T23:47:16.098Z SERIAL « 0x012700a814....more-hex-string...a1d8dbe (41 bytes)
6ff645b4d1400d37e
2022-11-24T23:47:16.099Z DRIVER Dropping message with invalid payload
2022-11-24T23:47:16.099Z DRIVER « [Node 053] [REQ] [BridgeApplicationCommand]
│ type: broadcast
│ target node: 255
│ RSSI: -45 dBm
└─[Security2CCMessageEncapsulation] [INVALID]
error: Duplicate command
2022-11-24T23:47:16.100Z SERIAL » [ACK] (0x06)
Starting logging event forwarder at debug level
Stopping logging event forwarder
2022-11-25T00:27:46.788Z SERIAL « 0x012300a80....more-hex-string...499383d56505 (37 bytes)
f9700a71e
2022-11-25T00:27:46.790Z SERIAL » [ACK] (0x06)
2022-11-25T00:27:46.792Z CNTRLR [Node 053] [
] [Meter] value[66049]: 2.8 => 2.6 [Endpoint 1]
2022-11-25T00:27:46.793Z DRIVER « [Node 053] [REQ] [BridgeApplicationCommand]
│ RSSI: -89 dBm
└─[Security2CCMessageEncapsulation]
│ sequence number: 35
└─[MultiChannelCCCommandEncapsulation]
│ source: 1
│ destination: 0
└─[MeterCCReport]
type: Electric
scale: W
rate type: Consumed
value: 2.6
time delta: 0 seconds
2022-11-25T00:27:46.882Z SERIAL « 0x012300a800013....more-hex-string...8499383d56505 (37 bytes)
f9700a41d
2022-11-25T00:27:46.883Z DRIVER Dropping message with invalid payload
2022-11-25T00:27:46.884Z DRIVER « [Node 053] [REQ] [BridgeApplicationCommand]
│ RSSI: -92 dBm
└─[Security2CCMessageEncapsulation] [INVALID]
error: Duplicate command
2022-11-25T00:27:46.884Z SERIAL » [ACK] (0x06)
2022-11-25T00:27:47.731Z SERIAL « 0x012300a800....more-hex-string...99383d56505 (37 bytes)
f9700a61f
2022-11-25T00:27:47.732Z DRIVER Dropping message with invalid payload
2022-11-25T00:27:47.732Z DRIVER « [Node 053] [REQ] [BridgeApplicationCommand]
│ RSSI: -90 dBm
└─[Security2CCMessageEncapsulation] [INVALID]
error: Duplicate command
2022-11-25T00:27:47.733Z SERIAL » [ACK] (0x06)
2022-11-25T00:27:47.821Z SERIAL « 0x012300a8000....more-hex-string...9383d56505 (37 bytes)
f9700a61f
2022-11-25T00:27:47.822Z DRIVER Dropping message with invalid payload
2022-11-25T00:27:47.822Z DRIVER « [Node 053] [REQ] [BridgeApplicationCommand]
│ RSSI: -90 dBm
└─[Security2CCMessageEncapsulation] [INVALID]
error: Duplicate command
2022-11-25T00:27:47.822Z SERIAL » [ACK] (0x06)
2022-11-25T00:27:48.048Z SERIAL « 0x012300a814....more-hex-string...499383d56505 (37 bytes)
f9700d37e
2022-11-25T00:27:48.049Z DRIVER Dropping message with invalid payload
2022-11-25T00:27:48.050Z DRIVER « [Node 053] [REQ] [BridgeApplicationCommand]
│ type: broadcast
│ target node: 255
│ RSSI: -45 dBm
└─[Security2CCMessageEncapsulation] [INVALID]
error: Duplicate command
2022-11-25T00:27:48.050Z SERIAL » [ACK] (0x06)
2022-11-25T00:27:48.598Z SERIAL « 0x012300a8140....more-hex-string...83d56505 (37 bytes)
f9700e24f
2022-11-25T00:27:48.599Z DRIVER Dropping message with invalid payload
2022-11-25T00:27:48.599Z DRIVER « [Node 053] [REQ] [BridgeApplicationCommand]
│ type: broadcast
│ target node: 255
│ RSSI: -30 dBm
└─[Security2CCMessageEncapsulation] [INVALID]
error: Duplicate command
2022-11-25T00:27:48.600Z SERIAL » [ACK] (0x06)

@blhoward2
Copy link

In HA UI > Settings > Add-ons > Z-Wave JS > Settings, I have the log level set to Debug. In In HA UI > Settings > Add-ons > Z-Wave JS > Log, I have the following: Screenshot Home Assistant Z-Wave Logs

There's a fair bit of:

  • Dropping message with invalid payload
  • error: Duplicate command
  • [Security2CCMessageEncapsulation] [INVALID]

This is unrelated. It suggests a poor mesh or possibly a misbehaving device.

@Chef-de-IT
Copy link

I quit custom naming the entities and took the default name at time of node inclusion, not ideal but no longer a problem.

Same here - which is pretty terrible for maintenance but I group them in light groups. It partly side-steps the problem.

@Chef-de-IT
Copy link

Chef-de-IT commented Jul 31, 2023

For the last time, DSK isn't taken into account. Changes in manufacturer ID, product type, product ID trigger a new device in HA. In the log file just posted, the manufacturer ID is reported by the device as having changed. That shouldn't happen.

Uh, your reply just gave me an idea re what might be happening. When devices are added to Z-wave, occasionally I witnessed something going "sideways" in the device interview "handshake" and it either doesn't establish the S2 connection (unlike five of its identical peers just added), or more rarely doesn't get other device properties. Specifically including the Manufacturer! And my using a re-inverview feature in the UI sometimes fixes this; other times, deleting and re-adding a device does. So, if an old device comes back & is re-detected by Z-Wave but the same thing happens where it doesn't get all the properties, we may have a resulting HA device ID that differs from the old one - and hence renaming.

If so it'd be a "design bug" in my mind to ever compute a device ID from device properties that can ever be affected by such a variable "handshake" instead of something inherently durable and immutable, like DSK.

@blhoward2
Copy link

For the last time, DSK cannot be used. Not all devices have DSKs and we have to be backwards compatible. You don't understand what is going on. S2 prevents corruption over the air (or rather allows it to be detected), so that doesn't appear to be what is going on. It would help if everyone started listing what devices they see renamed (manufacturer, model, and fw version), and whether they are included with S2.

@Chef-de-IT
Copy link

Chef-de-IT commented Jul 31, 2023

You don't understand what is going on. S2 prevents corruption over the air (or rather allows it to be detected), so that doesn't appear to be what is going on. It would help if everyone started listing what devices they see renamed (manufacturer, model, and fw version), and whether they are included with S2.

It's not about S2 per se - it's just one of the kinds of variations I've witnessed when a new device was being added - where one or a couple of identical S2-capable devices out of many would inexplicably be added as non-S2.

Another variation on new device add which I witnessed on several installs (including the small ones) is where HA wouldn't get many of the device properties including the manufacturer name (and I assume the manufacturer ID to go with it). So it's not hard to fathom if a device is instead re-discovered and the same happens it'd trip up the HA logic you mentioned "it thought the node was replaced"

For the last time, DSK cannot be used. Not all devices have DSKs and we have to be backwards compatible.

I understand that not all devices of all kinds have unique device hardware IDs. AFAIK all Z-Wave devices, by design, have such unique device hardware IDs, being their DSKs. All ZigBee devices have a Zigbee ID - a 16 hexidecimal digit (8 byte) string that is unique only to that ZigBee device, assigned by its manufacturer. All Insteon devices have a code (6 hexadecimal digits). All TCP/IP, UDP (Ethernet and Wi-Fi) devices have a MAC address. I'd be hard-pressed to find an example of a mainstream device type that lacks a unique & durable device address or hardware ID. Perhaps some exist - but IMHO only for those devices the approach you described ...

Any change to the manufacturer ID, product type, or product ID causes a change as they're all included in the calculation of the ID

... makes good sense - whereas to NOT use a durable, immutable, manufacturer-assigned unique device ID/address/DSK by default for the majority of the devices where it's available - and instead compute a synthetic HA device ID from scratch from properties that a bad mesh handshake might not furnish - that just sounds like a less than robust an architectural decision to me - a lower-than-necessary common denominator given "if" statement is a thing.

@Chef-de-IT
Copy link

It would help if everyone started listing what devices they see renamed (manufacturer, model, and fw version), and whether they are included with S2.

ABSOLUTELY! Here you go:

  • Minoston MP21ZD, Firmware: 2.0.1. These have had the issue quite a bit.
  • Zooz ZEN77, Firmware: 10.30.2. These have had the issue very occasionally.
  • Leviton ZW4SF, Firmware: 1.8.1. This had the issue once.

All these are 700 series chipset (as is the rest of the network) and are S2-capable and are showing S2 Authenticated as of this writing. Only those of the above mentioned devices that were quite remote from the USB stick and on a large-ish mesh, ever experienced being spontaneously renamed as per this issue. The same mesh contained multiple other devices of that same exact type / firmware / everything, that were closer by to the USB controller stick on that same network and those never ever had the renaming issue. In case you need for context, the controller is Zooz 700 Series, Firmware: 7.17.2.

@raman325
Copy link
Contributor

raman325 commented Aug 1, 2023

Preface

I am going to attempt to jump in here because I can appreciate the annoyance of this issue and I'd like to help get it resolved, but to be frank, the majority of the conversation in this thread is unhelpful, and these long back and forths are making it hard to track the conversation about the issue itself. To the participants in this thread who are facing this issue, respectfully, please stop trying to come up with a solution to a problem you clearly don't understand. I realize that you may just be trying to be helpful; while I can appreciate the thought, it is having the opposite effect.

@Chef-de-IT @blhoward2 has said it multiple times in this thread so I will only say it one final time: we CANNOT use DSK as an identifier because (a) the DSK is not available to be retrieved by applications integration with zwave-js, and even if it was (b) not all zwave devices have one since it is an S2 concept and many devices still only have a max of S0 or no security support.

Conclusions/decisions

Right now the identifiers we use (manufacturer ID, product ID, etc.) are the best identifiers we can find that are available for all Z-Wave devices. Any conversations that discuss alternate approaches for device IDs can be had in the Discord devs channel but it's just noise in this thread.

How to Help

We need a combination of debug driver logs and debug HA integration logs to troubleshoot this issue and to identify possible solutions. Instructions on how to set logging for both sides are well documented (in the Z-Wave JS documentation as well as the HA documentation), and multiple people have shared those instructions in this thread. If you need help configuring things, instead of commenting in this issue, please visit the Discord #zwave or #integrations channels to get support.

Things we are trying to find in the logs:

  • When the manufacturer data goes from good to corrupt (e.g. known manufacturer ID to unknown one)
  • When the manufacturer data goes from corrupt to good (e.g. unknown manufacturer ID to known one)

Housekeeping

Going forward for any new comments, if they are not providing the requested information, they will be hidden and marked as off topic to avoid polluting this thread even further.

@kpine
Copy link
Contributor

kpine commented Aug 1, 2023

If you are looking for a workaround for now, try disabling the update entities. Those are the entities that trigger queries for manufacturer data on a scheduled interval. Disabling the entities will prevent the queries for the mfg data. Since the corrupted mfg data is likely the problem (as these new logs allude to), then stopping the queries avoids the problem. The only other actions that would trigger mfg data queries are: re-interviews, clicking on "OTA Updates" in Z-Wave JS UI, or manually refreshing the CC data with driver code.

If you are interested in helping solve the issue, then attempting to reproduce it with all the logs asked for would be welcome. The update entities on their own don't update that frequently, you can try to accelerate the issue by:

  1. Going to the "OTA Updates" device tab in ZUI and repeatedly click "CHECK UPDATES", if you use ZUI.
  2. Or manually refreshing the mfg data with a service call in HA (be sure to select a target):
     service: zwave_js.invoke_cc_api
     data:
       command_class: "114"
       method_name: get
       parameters: []

If you see one of the mfg data properties change value in the driver logs (of course, make sure driver Debug logs are enabled), then restarting HA or reloading the integration should result in the device removal. This can be demonstrated by enabling the integration debug logs prior to restarting HA or reloading the integration.

We want to see driver debug logs showing the mfg data going from good to bad, as that's what would trigger the device removal. So far the only information is going from bad to good. Having this data might help understand what the underlying issue is.

@arigit
Copy link
Contributor

arigit commented Aug 1, 2023

For what is worth, I was affected by this problem randomly once or twice a month for a long time (I have some 40-50 devices, including Zooz light switches and flood sensors and relays, Aeotec presence sensors etc, mostly 700-series), I had noticed that when the device renaming happened, it was causing the HA device names and entity IDs of the impacted devices to be "reset" to the default node names that ZwaveJS gave the devices (in the ZwaveJS UI, I had always left whatever ZwaveJS node names were assigned by default without changing them, and I only changed the Device / EntityID names inside HA).

So what I tried 7-8 months ago was to use the ZwaveJS UI to set the node names of all of my devices to make them match whatever name I was using in HA for them.

After I did that, I never again ran in to renaming issues. In the interim I added a few new devices (following the same principle, setting the same device name in ZwaveJS UI as used inside HA) and even migrated from a 400series controller to a new 700-series controller. Touch wood - it's been some 8 months since I last had this issue.

@TarheelGrad1998
Copy link

Maybe it would help if someone could clarify exactly what logs you're looking for, instead of asking us to figure it out for ourselves? I provided all the logs I know about except for HA logs. I've searched "driver debug logs" or "debug driver logs" and that's not in the documentation that I can find.

What is needed from the HA side? This?

logger:
  default: warn
  logs:
    homeassistant.components.zwave_js: debug

@kpine
Copy link
Contributor

kpine commented Aug 1, 2023

Maybe it would help if someone could clarify exactly what logs you're looking for, instead of asking us to figure it out for ourselves?

We did that already, but the comments have been hidden because of all the nonsense.

Integration debug logs and diagnostics: #80398 (comment)

logger:
  default: info
  logs:
    zwave_js_server: debug
    homeassistant.components.zwave_js: debug

If you're taking the time to capture the debug logs, might as well include the server ones as well so the whole picture is available. However, note that if you have devices like Door locks, then the user PINs might be revealed in the logs. If that's the case, then either disable the server logs, or ask to send the logs privately. If the server logs are disabled, I don't think that information is included, but check anyways, or ask to send privately.

Driver debug logs from ZUI: #80398 (comment)

https://zwave-js.github.io/zwave-js-ui/#/troubleshooting/generating-logs?id=driver-logs

@TarheelGrad1998
Copy link

Not everybody has been on this thread the whole time. Thanks.

I have that all enabled now, and you guys saw the mfg update from my log yesterday. I just rebooted and....nothing. No change at all. Proverbial watched pot...

I'll keep it running like this for now and if it happens again soon will post. If it happens again in 2 months...who knows. My device diagnostics and driver debug from yesterday are above.

@kpine
Copy link
Contributor

kpine commented Aug 1, 2023

I have that all enabled now, and you guys saw the mfg update from my log yesterday. I just rebooted and....nothing. No change at all. Proverbial watched pot...

Your log only shows the manufacturer ID going from a corrupted value to a good value.

2023-07-31T19:30:23.470Z CNTRLR   [Node 043] [~] [Manufacturer Specific] manufacturerId: 1549 => 63 [Endpoint 0]
                                  4

We don't see the matching change from 634 to 1549, which is the event of most interest. The ID 1549 is not a real ID. 634 is Zooz.

Also, if you did not restart HA in-between any of these transitions (good to corrupt, corrupt to good), you won't see any device changes. HA only acts when it sees a difference on startup. Are you sure your recent startup was from the state where the ID was 1549? Do you have the debug logs for that?

You can try the steps I posted above to try and reproduce it more quickly. As yours is a battery device, it's also limited by the wake up interval, so you will only see a difference depending on how often it wakes up.

EDIT: thinking some more, I don't think the device diagnostic is always an accurate representation of the HA device state. The device identifiers more accurate. Those can be extracted from the Dev Tools Template editor.

Template:

{{ device_attr(device_id("light.dining_room"), "identifiers") }}

Produces:

{('zwave_js', '3611217407-2-99:18756:12344'), ('zwave_js', '3611217407-2')}

Or you can find them in /config/.storage/core.device_registry.

        "identifiers": [
          [
            "zwave_js",
            "3611217407-2-99:18756:12344"
          ],
          [
            "zwave_js",
            "3611217407-2"
          ]
        ],

Please confirm what you currently see.

@TarheelGrad1998
Copy link

I have no idea what states the IDs were in. How would I know that? What I know is everything was fine Sunday night, and I rebooted Monday morning around 09:00 and the device was renamed. So I sent the logs from Monday. That particular entry was from 19:30 so was well after the renaming reboot, as I said before. I can't remember the time previously that I rebooted, probably last week sometime.

I did try your service call from before this reboot, but like I said, I don't think anything untoward happened. I'm hedging on that because I just noticed I now have 3 binary_sensor entities for this one device, the original name (becca_door) as well as becca_bed_door_sensor_window_door_is_open and becca_bed_door_sensor_window_door_is_open_2. The original name doesn't show on the device page, so I'm not sure it is working correctly anymore. :(

Here are all the relevant logs from this restart. fwiw.
device_diag_before.txt
zwavejs_2023-08-01.log
device_diag_after.txt
home-assistant.log

The device identifiers are:

        "identifiers": [
          [
            "zwave_js",
            "3571274682-43"
          ],
          [
            "zwave_js",
            "3571274682-43-634:28672:57345"
          ]
        ],

@TarheelGrad1998
Copy link

So I think the "old name" sensor I was seeing was from my cache, as it is indeed now gone with a refresh and I'm left with the other two. So I believe I may have captured an event in the above logs.

@TarheelGrad1998
Copy link

It happened to me again this morning as I was restarting in order to update from 2023.7 to 2023.8.2, on the same device, around 13:15. Here are the requested details.

    "identifiers": [
      [
        "zwave_js",
        "3571274682-43-516:3072:57345"
      ],
      [
        "zwave_js",
        "3571274682-43"
      ]
    ],

device_diagnistics_after.txt
home-assistant.log
zwavejs_2023-08-18.log

@AlCalzone
Copy link
Contributor

@TarheelGrad1998 can you share the zwavejs_... logs from the previous days too? Looks like the change happened before the one you posted.

@AlCalzone
Copy link
Contributor

Ok, found it in the log from 17th.

The device wakes up:

2023-08-18T03:11:53.101Z DRIVER « [Node 043] [REQ] [ApplicationCommand]
                                  └─[WakeUpCCWakeUpNotification]

Z-Wave JS requests the IDs, to which the device first responds quickly:

2023-08-18T03:11:53.122Z DRIVER » [Node 043] [REQ] [SendData]
                                  │ transmit options: 0x25
                                  │ callback id:      57
                                  └─[ManufacturerSpecificCCGet]
...
2023-08-18T03:11:53.740Z DRIVER « [Node 043] [REQ] [ApplicationCommand]
                                  └─[ManufacturerSpecificCCReport]
                                      manufacturer id: 0x027a
                                      product type:    0x7000
                                      product id:      0xe001

Then Z-Wave JS requests the fw version - this request is not acknowledged after 12 seconds:

2023-08-18T03:11:53.751Z DRIVER » [Node 043] [REQ] [SendData]
                                  │ transmit options: 0x25
                                  │ callback id:      58
                                  └─[VersionCCGet]
...
2023-08-18T03:12:05.223Z DRIVER « [REQ] [SendData]
                                    callback id:     58
                                    transmit status: NoAck

The request is retried a few times, unsuccessfully:

2023-08-18T03:12:24.858Z DRIVER « [REQ] [SendData]
                                    callback id:     58
                                    transmit status: NoAck
2023-08-18T03:12:24.892Z CNTRLR   [Node 043] The node did not respond after 3 attempts. It is probably asleep, m
                                  oving its messages to the wakeup queue.

but then 2 seconds after this the first report is suddenly received again, this time corrupted

2023-08-18T03:12:26.693Z DRIVER « [Node 043] [REQ] [ApplicationCommand]
                                  └─[ManufacturerSpecificCCReport]
                                      manufacturer id: 0x0204
                                      product type:    0x0c00
                                      product id:      0xe001

immediately followed by the firmware version response

2023-08-18T03:12:26.771Z DRIVER « [Node 043] [REQ] [ApplicationCommand]
                                  └─[VersionCCReport]
                                      library type:      Enhanced Slave
                                      protocol version:  7.13
                                      firmware versions: 1.30
                                      hardware version:  1

There's nothing obvious in the logs that explains this corruption.

Lots of meter reports without changes, which I recommend optimizing if possible, but those aren't causing that much traffic the moment it goes wrong.
...unless there's a lot more traffic behind the scenes that never reaches Z-Wave JS because it is too corrupted. Unfortunately, only a Zniffer will see this.

I think the best way forward is twofold:

  1. To reduce the likelyhood of this happening, the driver should validate the IDs just before applying the update, not when checking.
  2. The integration shouldn't remove/replace the device entries as eagerly - AFAIK this is being worked on

@TarheelGrad1998
Copy link

So do you have all you need now? I did have the rename happen again on Saturday, and what was unique that time was NOT from a HA reboot but from controller exclusion/inclusions. but maybe that triggers the same process in HA?

Anyway, if you need those logs I can pull them. Or if you have what you need, I'll turn down my logging and save a few cycles. :)

@AlCalzone
Copy link
Contributor

If you have those, an additional data point can't hurt

@TarheelGrad1998
Copy link

Here are the current device diagnostics and identifiers, plus the logs from Saturday (8/19). I know it happened Saturday but unfortunately I do not know when. I was doing Exclusion/Inclusion to try to fix another node and didn't notice the rename until later (and did not expect this to trigger it....but it did).

    "identifiers": [
      [
        "zwave_js",
        "3571274682-43-634:28672:57345"
      ],
      [
        "zwave_js",
        "3571274682-43"
      ]
    ],

zwavejs_2023-08-19.log.gz
device diagnostics.txt
home-assistant.zip

@kpine
Copy link
Contributor

kpine commented Aug 21, 2023

So do you have all you need now?

I think we have all that we need. Thanks for assistance getting these logs, it has certainly helped!

I did have the rename happen again on Saturday, and what was unique that time was NOT from a HA reboot but from controller exclusion/inclusions. but maybe that triggers the same process in HA?

The removal of a device triggered an exception in the integration, two times in fact. An unhandled exception will cause the integration to reload. Reloading the integration has the same effect as restarting HA, the device and entities are reconciled with the current driver state. Presumably a restart prior to this had the corrupted IDs and these restarts included the correct IDs, so the device replacement occurred.

I have seen this problem on my test systems before, but never looked at it in detail. So this is a separate bug.

2023-08-19 18:12:38.668 DEBUG (MainThread) [zwave_js_server] Received message:
WSMessage(type=<WSMsgType.TEXT: 1>, data='{"type":"event","event":{"source":"node","event":"value removed","nodeId":53,"args":{"commandClassName":"Node Naming and Location","commandClass":119,"endpoint":0,"property":"name","prevValue":"Kitchen Range Extender","propertyName":"name"}}}', extra='')

2023-08-19 18:12:38.669 DEBUG (MainThread) [zwave_js_server] Listen completed. Cleaning up
2023-08-19 18:12:38.780 ERROR (MainThread) [homeassistant.components.zwave_js] Unexpected exception: '53-119-0-name'
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/zwave_js/__init__.py", line 788, in client_listen
    await client.listen(driver_ready)
  File "/usr/local/lib/python3.11/site-packages/zwave_js_server/client.py", line 261, in listen
    await self.receive_until_closed()
  File "/usr/local/lib/python3.11/site-packages/zwave_js_server/client.py", line 329, in receive_until_closed
    self._handle_incoming_message(data)
  File "/usr/local/lib/python3.11/site-packages/zwave_js_server/client.py", line 412, in _handle_incoming_message
    self.driver.receive_event(event)  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zwave_js_server/model/driver.py", line 83, in receive_event
    self.controller.receive_event(event)
  File "/usr/local/lib/python3.11/site-packages/zwave_js_server/model/controller/__init__.py", line 813, in receive_event
    node.receive_event(event)
  File "/usr/local/lib/python3.11/site-packages/zwave_js_server/model/node/__init__.py", line 421, in receive_event
    self._handle_event_protocol(event)
  File "/usr/local/lib/python3.11/site-packages/zwave_js_server/event.py", line 71, in _handle_event_protocol
    handler(event)
  File "/usr/local/lib/python3.11/site-packages/zwave_js_server/model/node/__init__.py", line 919, in handle_value_removed
    event.data["value"] = self.values.pop(value_id)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: '53-119-0-name'
2023-08-19 18:12:38.979 INFO (MainThread) [homeassistant.components.zwave_js] Disconnected from server. Reloading integration
2023-08-19 18:12:39.173 DEBUG (MainThread) [zwave_js_server] Trying to connect
2023-08-19 18:12:39.199 WARNING (MainThread) [homeassistant.helpers.service] Referenced entities switch.basement_stair_light are missing or not currently available
2023-08-19 18:12:39.306 DEBUG (MainThread) [zwave_js_server] Received message:
WSMessage(type=<WSMsgType.TEXT: 1>, data='{"type":"version","driverVersion":"11.0.0","serverVersion":"1.29.0","homeId":3571274682,"minSchemaVersion":0,"maxSchemaVersion":29}', extra='')

2023-08-19 18:12:39.306 INFO (MainThread) [zwave_js_server] Connected to Home 3571274682 (Server 1.29.0, Driver 11.0.0, Using Schema 28)
2023-08-19 18:12:39.306 INFO (MainThread) [homeassistant.components.zwave_js] Connected to Zwave JS Server

@TarheelGrad1998
Copy link

Yes, something happened with that one. I tried the firmware update from HA (hadn't done that before). I think the update took but left the device unavailable/disconnected. Upon inclusion ZwaveJS was giving various errors so I kept retrying. It looks normal now but I'm not confident it is working as nothing shows downstream in the graph (previously it had 3 devices, which are all still connected).

Anyway, OT for this thread but maybe related to the separate bug you mentioned.

@issue-triage-workflows
Copy link

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@issue-triage-workflows issue-triage-workflows bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 26, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Dec 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.