Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no more 802.11n connection since 2.6.0 #7965

Open
6 tasks done
stef-ladefense opened this issue Apr 6, 2021 · 101 comments
Open
6 tasks done

no more 802.11n connection since 2.6.0 #7965

stef-ladefense opened this issue Apr 6, 2021 · 101 comments

Comments

@stef-ladefense
Copy link

stef-ladefense commented Apr 6, 2021

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in current master branch (aka latest git).
  • I have searched the issue tracker for a similar issue.
  • If there is a stack dump, I have decoded it.
  • I have filled out all fields below.

Platform

  • Hardware: ESP-12
  • Core Version: 2.51 , 2.6.0, 2.7.4
  • Development Env: Arduino IDE 1.8.13
  • Operating System: Windows

Settings in IDE

  • Module: Wemos D1 mini r2

Problem Description

sketch is "Udp NTP Client" included with release core.

I recently noticed that I had 802.11n and 802.11g devices on my network.
and I noticed that those which were connected in 802.11n are old modules on which I did not make any change in their codes and which were compiled with a core version lower than 2.5.0 (2.4.2 for the majorities)

I therefore put the supplied program "Udp NTP Client" and I tested with different versions of the cores, starting from 2.4.2.

up to version 2.5.1 included, it appears in 802.11n on the wifi router of my box.
except since version 2.6.0 until the last 2.7.4, it appears in 802.11g ...

do you have any idea why?

image

@fsommer1968
Copy link

fsommer1968 commented Apr 8, 2021

Really a firmware problem? My 8266 modules connect well with 2.7.4:
grafik

grafik
NB: 72 MBit is only possible with N not with G

@stef-ladefense
Copy link
Author

that's why I'm asking for advice!
I don't change anything, it's the same Lolin module (original V3.10)
it's the same code (delivered with the core, I didn't write anything more)
no change of config on my router
I install the 2.42 core, I am recognized in 802.11n on the router, ditto in 2.5.1.
as soon as I install the 2.6.0 core I am recognized in 802.11g, ditto until the current version 2.7.4.

for me there is something that has changed with the passage of 2.6.0

@d-a-v
Copy link
Collaborator

d-a-v commented Apr 8, 2021

You can try with other/older firmware version in the IDE menu:
You need to select the board "generic esp8266" and match its configuration with your board.
A new menu entry will propose several versions of the closed nonosdk firmware.

And if possible do it with the current source code. There are three ways for doing so:
latest git master / PlatformIO staging version / arduino board manager snapshot release 0.0.1.

@stef-ladefense
Copy link
Author

stef-ladefense commented Apr 8, 2021

so:
core 2.74
card configured in generic esp8266, with flash DIO mode.

with the nonos-sdk versions "2.2.1 + 100", "111", "113", "119" I am connected to the router in 802.11g.
with the versions "nonos-sdk 2.2.1 (legacy)" and "nonos-sdk pre-3 (180626 known issues)" I am connected in 802.11n !!!
on as I configure in "LOLIN (WEMOS) D1 R2 & mini" I go back to 802.11g.

I did not find where is to configure the version of the nonos-sdk for the lolin?

@d-a-v
Copy link
Collaborator

d-a-v commented Apr 8, 2021

Well that's interesting !

This menu is disabled for non generic boards.
There are three possibilities:

  • Use the generic board and add the content of variants/d1_mini/pins_arduino.h in your sketch
    (selecting a board wemos-d1 in the menu is just a matter of selecting default values in the generic esp8266 board menu entries and including the right variant file for pin naming)
  • Enable this FW menus for all boards
  • Allow users to add this menu entry by using the tools/boards.txt.py script

I think 1) is the easiest.
The newer versions fixed some bugs and we settled to the current menu status a while ago after some discussions.

@stef-ladefense
Copy link
Author

yes i will do that, thanks

on the other hand, how to know which version of nonossdk is used when precisely a non generic card is used?

@d-a-v
Copy link
Collaborator

d-a-v commented Apr 8, 2021

It's the same one as the default value (= the first) for the generic board.

You can also read its (default when not overwritten by menu selection) value in platform.txt:

build.sdk=NONOSDK22x_190703

(which is nonos-sdk 2.2.1+100 (190703) you can see that in boards.txt).

@fsommer1968
Copy link

fsommer1968 commented Apr 8, 2021

Indeed interesting your experience with the different SDK releases, because I´m using V 119 (including some Wemos D1 Mini as generic 8266 to change SDK version). Have to tried deleting WiFi settings during flash?
grafik

@stef-ladefense
Copy link
Author

En effet intéressant votre expérience avec les différentes versions du SDK, car j'utilise V 119 (y compris certains Wemos D1 Mini comme générique 8266 pour changer de version SDK). Avez-vous essayé de supprimer les paramètres WiFi pendant le flash?

i test this, with generic, v119, and erase sketch +wifi settings... and 802.11g too

@vortigont
Copy link

I can confirm it. 2.6 and later can't use n-mode at all.
If I enable "option require_mode 'n'" in OpenWrt (so called "Greenfield" mode when only n-capable clients can connect) esp with core 2.5.1 can connect to the AP, more recent ones - can't connect at all.

"Core":"2_7_4","SDK":"2.2.2-dev(38a443e)" - G-mode, no MCS value

iwinfo wlan0 assoclist
3C:71:BF:29:60:CF  -59 dBm / unknown (SNR -59)  250 ms ago
        RX: 48.0 MBit/s                                   77 Pkts.
        TX: 5.5 MBit/s                                    64 Pkts.
        expected throughput: unknown

"Core":"2_5_1","SDK":"2.2.1(cfd48f3)" - N-mode indicated by "MCS 6"

iwinfo wlan0 assoclist
3C:71:BF:29:60:CF  -58 dBm / unknown (SNR -58)  40 ms ago
        RX: 54.0 MBit/s                                  201 Pkts.
        TX: 58.5 MBit/s, MCS 6, 20MHz                     73 Pkts.
        expected throughput: 4.5 MBit/s

@TD-er
Copy link
Contributor

TD-er commented Apr 24, 2021

Could it be that OpenWRT only advertises a limited set of HT MCS modes and the ESP only supports a different subset?
In MikroTik APs you can see the advertised modes and I guess you can also enable/disable them although the checkboxes in my MikroTik units are greyed out.

@vortigont
Copy link

I've tested 802.11n with default configuration, single channel bandwidth, no limitations or restrictions for specific MCS, etc... same board with the same code (just a simple WiFi connect) is able to connect to the same WiFi AP with an old core and unable with a recent one.

@Jason2866
Copy link
Contributor

Jason2866 commented May 28, 2021

Can not reproduce. Core 2.7.4

00:00:03.970 WIF: Connecting to AP1 Jason_Home_WLAN Channel 12 BSSId 00:A0:57:2A:BD:19 in mode 11n as sonoff-71C254-0596...
00:00:06.268 WIF: Connected

@stef-ladefense
Copy link
Author

stef-ladefense commented May 28, 2021

have tested with new 3.0.0 arduino esp8266
i have only connect 802.11n with nonos sdk 2.2.1 legacy, not with other

@Jason2866
Copy link
Contributor

Jason2866 commented May 28, 2021

Tried with core 3.0.0 connects in mode 11n

00:00:00.065 Project tasmota Tasmota Version 9.4.0.4(lite)-STAGE(2021-05-28T14:25:30)
00:00:00.520 WIF: Connecting to AP1 Jason_Home_WLAN Channel 12 BSSId 00:A0:57:2A:BD:19 in mode 11n as tasmota_D4407C-0124...
00:00:02.751 WIF: Connected

Core 2.7.4 connected to a OpenWRT device

00:00:00 I2C: BME280 found at 0x76
00:00:04 WIF: Connecting to AP1 Jason_Home_WLAN Channel 4 BSSId 88:C3:97:B1:1D:56 in mode 11N as sonoff-17DBAE-7086...
00:00:06 QPC: Reset
00:00:06 WIF: Connected

@vortigont
Copy link

Tried with core 3.0.0 connects in mode 11n

Those logs are from ESP and they lie. It reports mode n but connects as mode g actually.
You should check the access-point itself which mode is used by it's clients. In OpenWrt you can check with 'iwinfo wlan0 assoclist', if esp client's MAC is missing any MCS value than it is in mode g.

And yes, I confirm, with Core 3.0.0 it's the same issue - no mode N. My guess it is related to SDK 2.2.2, not Arduino core.

@TD-er
Copy link
Contributor

TD-er commented May 29, 2021

So you're telling me that if I configure the access point to only allow 802.11n clients, it is impossible to connect to an access point using current builds based on SDK 2.2.2?

My MAC address of an ESP: FC:F5:C4:8B:71:60
SDK: ESP82xx Core 2843a5a, NONOS SDK 2.2.2-dev(38a443e), LWIP: 2.1.2 PUYA support
Stated wifi connection: 802.11g (RSSI -57 dBm)

Mikrotik AP it is connected to:
image

image

I will now set it to 'n' mode to see....

Set to connect via "n" mode:

WiFi Connection: 802.11n (RSSI -55 dBm)
image

TX rate is > 54 Mbps, so this can't be "g".

As a test, the WiFi mode is set to "2GHz-only-N" (cleared the password field for the screenshot)
image

As you can see, the same node is connecting just fine.

@stef-ladefense
Copy link
Author

where do you find the nonos sdk 2.22 ?
with the instalation of the arduino 3.0.0 esp8266 core
I have version 2.2.1
image
image

@d-a-v
Copy link
Collaborator

d-a-v commented May 30, 2021

2.2.2 is not and will probably never be out.
2.2.1+n (git versioning) is noted 2.2.2-dev by espressif's system_get_sdk_version() in their 2.2.x git branch.

@vortigont
Copy link

So you're telling me that if I configure the access point to only allow 802.11n clients, it is impossible to connect to an access point using current builds based on SDK 2.2.2?

yes, that is exactly what I see. Need to make sure that b/g modes are completely disabled.
Are you able to get the details of ESP client's MCS used? TX bandwidth might be confusing, only MCS index used could indicate which modulation type of N mode is used.
Do not have RouterOS devices at hand to test, but I've checked the docs and it seems that "2GHz-only-N" actually means all 2.4 b/g/n Modes.
Screenshot from 2021-05-30 22-59-24

Pls, test it carefully if possible. That issue seems very tricky.

@TD-er
Copy link
Contributor

TD-er commented May 30, 2021

I don't know how to see what MCS's are used.
But if I set my ESP to use 'G' only, and the MikroTik to "2GHz-only-N", the ESP cannot connect to the AP, unless it switches back to "n" mode.
I have it programmed to go to "n" mode as fallback if it cannot connect in "g" mode after 10 attempts (or xx seconds)

So to me it looks like it is working in "n" mode.

The table you posted seems odd, as it does share the "2GHz-only-N" option along with the "b/g/n" options.
image

Here the settings of the Wlan adapter:
image
image
image

@vortigont
Copy link

vortigont commented May 30, 2021

I have it programmed to go to "n" mode as fallback if it cannot connect in "g" mode after 10 attempts (or xx seconds)

So by default it always connects in G mode for you, right? And you have to switch it to N with
WiFi.setPhyMode(WIFI_PHY_MODE_11N) to connect with N, right?

The table you posted seems odd, as it does share the "2GHz-only-N" option along with the "b/g/n" options.

It's from the official doc, I guess that GUI and CLI might have some syntax differences. I used CLI with Mikrotik's quite some time ago, do not have it now unfortunately.

@TD-er
Copy link
Contributor

TD-er commented May 30, 2021

Yep, I made it configurable to what mode should be used to start with and as a fallback it switches to "n" mode after set number/time of failed attempts.
image

https://github.com/letscontrolit/ESPEasy/blob/f3ce88eaef3f88d7a525eb017ac9dec718c5578f/src/src/ESPEasyCore/ESPEasyWifi.cpp#L1062-L1070

@vortigont
Copy link

I've played around with wifi_set_phy_mode(PHY_MODE_11N); not that it changed anything. For the SDK 2.2.1 it works as expected, for 2.2.2-git (arduino core >2.5.1) it makes no difference - MCS and WMM is not available and it does not connects at all in GreenField mode. Maybe it is AP hardware specific, but the fact is the same ESP board with the same user code works differently depending on on build env.

@TD-er BTW, I've noticed that your settings screenshots contains 2GHz-only-N and WMM disabled at the same time. This is completely wrong, for HT rates in N mode WMM is mandatory if I remember WiFi 4 specs correctly. Not sure if it should even work at all in greenfield mode without WMM.

This is a ESP client build with SDK 2.2.1 and AP in greenfield N mode

iw wlan0 station dump
Station 5c:cf:7f:02:50:f9 (on wlan0)
        inactive time:  20 ms
        rx bytes:       38408
        rx packets:     1506
        tx bytes:       2060
        tx packets:     15
        tx retries:     5
        tx failed:      0
        rx drop misc:   0
        signal:         -56 [-57, -56] dBm
        signal avg:     -57 [-57, -59] dBm
        tx bitrate:     6.5 MBit/s MCS 0
        rx bitrate:     6.0 MBit/s
        rx duration:    150093 us
        expected throughput:    4.394Mbps
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       short
        WMM/WME:        yes
        MFP:            no
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
        short preamble: yes
        connected time: 680 seconds

@Jason2866
Copy link
Contributor

It does connect in mode "n". Because we had issues with routers only supporting mode "n".
When STA and AP mode is active ESP8266 can only use 11b and 11g.
When only using STA mode device connects without an issue to router supporting only mode "n".
The whole discussion with solution arendst/Tasmota#12512 (comment)

@dalbert2
Copy link
Contributor

dalbert2 commented Jul 10, 2021

I have also been having significant WiFi issues in 2021.
With a Ubiquiti 802.11n access point, it connects and disconnects successfully a few times,
but at some point, it stops being able to connect and stay connected; it goes from connected
to connection_lost almost immediately and once that happens, it will not be able to
connect again until it is reset.
This is highly reproducible with 2.7.4. Interestingly, with 3.0.x it also springs a huge memory leak,
when this happens, losing 2-2.5K with each scan/connection attempt.
Here's a trace below showing the connect attempt failure in case it is of any use:

scandone
state: 0 -> 2 (b0) *** IDLE->SCAN_COMPLETED ***
state: 2 -> 3 (0) *** CONNECTED ***
state: 3 -> 5 (10) *** CONNECTION_LOST ***
add 0
aid 6
cnt *** CONNECTED ***
<<< Everything above is normal; when things work properly, the next line will be
connected with mySSID, channel 1
However once things start failing, it looks like what's below >>>>
state: 5 -> 2 (2c0) *** CONNECTION_LOST -> SCAN_COMPLETED ***
rm 0
wifi evt: 1
STA disconnect: 2
state: 2 -> 0 (0)
del if0
usl
mode : null

@sblantipodi
Copy link

I have also been having significant WiFi issues in 2021.
With a Ubiquiti 802.11n access point, it connects and disconnects successfully a few times,
but at some point, it stops being able to connect and stay connected; it goes from connected
to connection_lost almost immediately and once that happens, it will not be able to
connect again until it is reset.
This is highly reproducible with 2.7.4. Interestingly, with 3.0.x it also springs a huge memory leak,
when this happens, losing 2-2.5K with each scan/connection attempt.
Here's a trace below showing the connect attempt failure in case it is of any use:

scandone
state: 0 -> 2 (b0) *** IDLE->SCAN_COMPLETED ***
state: 2 -> 3 (0) *** CONNECTED ***
state: 3 -> 5 (10) *** CONNECTION_LOST ***
add 0
aid 6
cnt *** CONNECTED ***
<<< Everything above is normal; when things work properly, the next line will be
connected with mySSID, channel 1
However once things start failing, it looks like what's below >>>>
state: 5 -> 2 (2c0) *** CONNECTION_LOST -> SCAN_COMPLETED ***
rm 0
wifi evt: 1
STA disconnect: 2
state: 2 -> 0 (0)
del if0
usl
mode : null

this is a major problem I am having.
#8292
at some point the ESP stops responding to WiFi until a router reboot or until the ESP reset itself.

@sblantipodi
Copy link

the funny things is that the latest Asus firmware upgrade prevent my ESP8266 devices from connecting on 802.11G,
802.11N works.

@TD-er
Copy link
Contributor

TD-er commented Sep 1, 2021

Maybe the latest firmware now has some default setting which makes the AP effectively "n-only" ?
Could be that it is hard to recognize because of its label. For example (just made up now, no idea what Asus is using) calling it "enhanced stability", "gaming mode" or something similar.
Side effects of an AP allowing both 'b/g' and 'n' devices is that it may decrease responsiveness. So it would make sense to refer to it as some kind of "gaming mode".

@TD-er
Copy link
Contributor

TD-er commented Feb 21, 2023

Please try with another unit too, preferrably also from a different vendor.
Drawing conclusions like this, based on a single unit, is a bit 'statistically unfair' :)

Given channel 6 is right "in the middle", I wonder what results you may see on channel 1.

@dalbert2
Copy link
Contributor

@TD-er I tried the DLink on channels 1, 6, 11 (band edges and center); it worked perfectly on channels 1 and 6 but very poorly on 11. At some point I'll try working my way up the channels to see if it degrades gradually as the frequency increases. I have tested with two access point brands: DLink and Ubiquiti with the same results. I have a few other old APs around and will try them too.

I'm interested in whether the problem is environmental which is why I was curious about others' experiences. If anyone else is experiencing poor connectivity, I'd be very interested in knowing what WiFi channel they are connecting on and if changing the AP to a lower channel fixes the issue.

@TD-er
Copy link
Contributor

TD-er commented Feb 21, 2023

OK, so the antenna seems to be de-tuned to a lower frequency.
You can verify this by moving something with "water" closer to the antenna.
Typically when you place an antenna on your body, the resonance frequency will be lower.
But this is hard to reproduce, so maybe fill a plastic cup with water and move it closer to the ESP.

If this theory is correct, you will gradually see that channel 6 will become worse and with the cup closer to the antenna even channel 1 will loose connectivity.

Now the 'fun' part...
I think that if you try using some binary based on core 2.4.2 or older, it will work perfectly fine. Also on channel 11.

I know the boards I have here within reach, all work perfectly fine on channels 1, 6 and 11, as those were the channels I've been using for a long time.

But maybe you can make these tests a lot simpler by either using some AP which allows to show you the RSSI of the connected client, or make some kind of setup using an ESP32 and an ESP8266.
Set one as AP and let the other connect to it.
Let the STA unit log the RSSI of its connection.
The AP unit can get the RSSI of the other one via an AP probed event.

Doing this on each channel can give you some info on whether the signal is indeed less per channel and maybe also test to see if sending at a lower TX power makes a (surprising) difference.

@dalbert2
Copy link
Contributor

Thanks for confirming that you are not seeing problems on channel 11 @TD-er; that's both good to know, and a bummer; my failure scenario is 100% reproducible on the two ESP8266 units with two different access points: simply shifting to channel 11 causes massive comms problems; I'd hoped it was a good clue.

It is extremely unlikely that a poorly matched/tuned antenna is my issue; the fact that the problem occurs even with the ESP12F module is 1m or less from the AP with extremely strong RSSI (around -41dBm) suggests otherwise; a poorly matched antenna would reduce signal strength. At some point I'll sack an ESP12F module and do some antenna tests properly with the VNA, but the antenna is still pretty low on my list of suspects.

A more likely scenario is a strong nearby interferer. Are there receiver specs for the ESP8266 (things like ACR or IP3?) I guess it's time to grab a spectrum analyzer and see what things look like around 2462MHz...I'll report back after doing that.

@TD-er
Copy link
Contributor

TD-er commented Feb 21, 2023

You shouldn't be so close to the AP.
If you must be this close, then lower the TX power.
The WiFi radio really does act strange with strong RF signals leaking into the radio circuitry.

One other thing you might want to try...
Please erase the entire flash and then start all over with testing.
Make sure to have a proper power supply and good cables, to be sure the RF calibration will not be off due to some small voltage drop on the 3V3.

@dalbert2
Copy link
Contributor

I have two ESP8266 test units; one is far and one is very close; their behavior is identical and solely dependent on the AP channel; I don't think it's receiver overload; if it were, the behavior wouldn't happen only on channel 11.

I grabbed a field analyzer (and also took the opportunity to test a new toy: a Tiny SA Ultra just to see how closely they'd agree and am reasonably pleased with the results!). The displays are on max hold (obviously); marker 1 is on WiFi channel 1, marker 2 is on WiFi channel 11, and marker 3 is at 2.48MHz. The spectrum is somewhat noisy, and I do occasionally see signals near channel 11, but nothing steady, so I think my local interference theory doesn't hold up.

Next up is trying software changes: wipe flash, roll back to earlier core.
2_4Gspectrum

@someburner
Copy link

FWIW- router firmware is often a mess too.

Our customers generally have issues with ASUS routers and mesh routers/APs. When we've gotten enough reports of a router not working, we've gone out and bought that specific router and connected our device without issue. Rate selection is router-dependent and can be pretty complex (just googling about it). Here's a page from cisco about it at a high level. Seems likely that the SDK doesn't implement N correctly or is missing some feature.

And in both the case of ESP8266 and the router/ap, it's also possible that reported or configured settings don't always translate to what's going on underneath.

I've never been able to reproduce the "not connecting" issue with any router / ESP / configuration, but hopefully if someone can, we can find golden SDK version that works in the most cases.

@TD-er
Copy link
Contributor

TD-er commented Feb 22, 2023

Seems likely that the SDK doesn't implement N correctly or is missing some feature.

That isn't just "likely" it has already been proven.
PIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK22x_190313 should work fine on 802.11n
Any later version of the SDK22x is not.

I'm still using PIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK22x_190703 for ESPEasy (as is Tasmota) as this "feels" the one with the least amount of issues.

@mlichvar
Copy link

If anyone else is experiencing poor connectivity, I'd be very interested in knowing what WiFi channel they are connecting on and if changing the AP to a lower channel fixes the issue.

Changing the channel doesn't seem to fix the issue for me. I tried several from the full range of 1 to 13 which is available here and all are unreliable for the ESP8266, while ESP32 works fine. Distance to the AP or TX power doesn't seem to matter.

I suspect the signal from the ESP is out-of-spec in some way, which makes it difficult to receive for some wifi chips. Maybe it could be analysed with an SDR.

@TD-er
Copy link
Contributor

TD-er commented Feb 24, 2023

One thing I did at some point suspect is the stability of the used crystal.
Typically this (26 MHz) crystal should be within 10 ppm.

So I did try to estimate this stability by keeping track of the time wander I need to correct when updating via NTP.
To make sure the network latency doesn't play a role here, you need to have a minimal sync interval of at least 6h.
The time wander with most chips I tested were within this 10 ppm, but that's just an average.
Perhaps it is possible to let some unit run only a loop without WiFi being active and then toggle some pin and record it with some measurement tool which can give a good estimate on the crystal stability.

Maybe the crystal frequency is also affected by fluctations in the (3V3) voltage?

@mlichvar
Copy link

FWIW, I also tried different power supplies, increasing the voltage up to the maximum of 3.6V, even adding an extra ceramic cap on the board (and confirmed with a scope that the drop on wifi activity decreased), but it didn't seem to make a difference. One thing I noticed, but not really sure if real, is that it gets worse over time as the ESP is powered on. It seems to be more likely to succeed at connecting on the first attempt when cold.

@Jason2866
Copy link
Contributor

Buy a few more esp8266. Troubleshooting with just 2 samples is not enough. As TD-er already mentioned there is a high chance that you have "just" bad hardware

@TD-er
Copy link
Contributor

TD-er commented Feb 24, 2023

What I meant is that the quality of the used crystal in the ESP module (or on some boards next to the ESP chip) may be sub-par.
Thus that this crystal perhaps is deviating more than 10 ppm from the given frequency.
This crystal stability may affect everything regarding WiFi.

@mlichvar
Copy link

I was testing with 7 ESP8266, bought at least at three different times from different suppliers (although probably all from ebay). There is a large variance in the observed RX and TX signal between them, but they all seem to impacted by this incompatibility with some of my (Mediatek-based) APs. Maybe it's the APs fault, I don't know, but everything else I have ever used with them worked fine. I wanted to stop running extra APs just for the ESPs, that's why I was looking for a fix. There doesn't seem to be one, so I'll probably switch to ESP32.

@dalbert2
Copy link
Contributor

WRT power: in my scenario, the ESP8266 power source is USB through an LDO with 200uF of output buffering and both a 22uF and 100nF ceramic at the power input pin to the module. Power is never going to get much cleaner than that.

WRT crystal stability: the spec for 802.11b is 25ppm and 20ppm for g/n. I see the same connectivity problems in b/g/n modes. Crystals have three main sources of error: manufacturing tolerances, temperature, and aging. Many manufacturers of RF gear measure the crystal error at manufacture and store (and compensate in software) for that error to offset manufacturing tolerances. All of my testing for this thread has been done indoors where temperature varies very little; aging effects are usually small. I have good (DOCXO) HP counters (and a rubidium standard I can slave them to if needed) so I can measure the crystal accuracy (via a buffered hw timer output) if we have reason to suspect it, but we'd still need to know whether and how manufacturing tolerances are being compensated. The AIThinker module documentation has a bit more in the RF specs including ACR. They spec a 10ppm xtal: https://docs.ai-thinker.com/_media/esp8266/docs/esp-12f_product_specification_en.pdf

At this point, I'm most interested in learning more about the RF calibration; is anyone familiar with that? Exactly what is being calibrated (and against what standard?) For reference:
https://docs.espressif.com/projects/espressif-esp-faq/en/latest/development-environment/debugging.html?highlight=calibration#how-can-i-modify-the-default-method-of-rf-calibration-in-esp8266
The calibration discussion @TD-er started here: #8163

As I've continued testing, connectivity remains perfect on channel 6 but awful at the band edges.

@dalbert2
Copy link
Contributor

@Jason2866 I have more than 200 samples to work with; they all behave the same. Moreover, this same complaint is littered throughout virtually every ESP8266 discussion group and forum; there is no reason to think that I have "just" bad hardware and need to try more (and if I do, then there is a much larger quality problem).

@mlichvar can you tell us more about the variance you are seeing between RX and TX?

@mlichvar
Copy link

I was monitoring the RX signal level reported by the AP and the ESP. I tried to put them in the same position and orientation. For the five ESP-01s modules I have the ESP and AP levels were: -67 -77, -80 -79, -79 -80, -84 -91, -78 -82. More than 20dB between the best and worst, but all had the issue with the AP not receiving a large fraction of the transmissions.

@Jason2866
Copy link
Contributor

@dalbert2 My intention was to shade light on the hardware which has an major aspect here.

It is a complicated issue. For example i have zero problems with any of my esp8266 devices in my wlan (6 APs from different vendors, most of them comercial business APS)
using channels 1, 9, 11.

Mode N works too, since my APs accepts to connect without WMM active

@dalbert2
Copy link
Contributor

@Jason2866 , it would be very helpful to know which ESP8266 devices you are using and which APs. Having some information about what works reliably would be great. Thanks!

@TD-er
Copy link
Contributor

TD-er commented Feb 24, 2023

In my experience, boards using the Espressif castellated modules have the least problems.
At least when they are properly placed on a larger PCB, without copper near their antennas.

Typically the ESP-12F modules, but also the DoIot-ESP12S modules.

@dalbert2
Copy link
Contributor

Thanks @TD-er , I am only using ESP12F modules (for ESP8266)

@Jason2866
Copy link
Contributor

@Jason2866 , it would be very helpful to know which ESP8266 devices you are using and which APs. Having some information about what works reliably would be great. Thanks!

Most of the are NodeMCU with ESP-12F modules, some with bare esp8266 on it and Wemos Mini Clones with replaced 3.3V LDO (with a 500mA one).

Lancom and BintecElmeg APs ( Bintec APs controlled via a WLAN Controller) and two Xiaomi 4A Gigabit flashed with OpenWRT

@someburner
Copy link

WRT crystal stability: the spec for 802.11b is 25ppm and 20ppm for g/n. I see the same connectivity problems in b/g/n modes. Crystals have three main sources of error: manufacturing tolerances, temperature, and aging. Many manufacturers of RF gear measure the crystal error at manufacture and store (and compensate in software) for that error to offset manufacturing tolerances. All of my testing for this thread has been done indoors where temperature varies very little; aging effects are usually small.

Temperature effects can vary widely: https://datasheet.datasheetarchive.com/originals/library/Datasheet-081/DASF0020692.pdf

I don't know much about Wifi but in our case even 1ppm drift was almost too much for a 915MHz radio, but we have tight requirements for that.

In the beta of our 1st gen product we used to have to manually calibrate crystals for a radio module since they used cheap crystals that not only varied widely within a single batch, but would also have "dead zones" where drift PPM would spike in a certain/small temperature range and couldn't be modeled easily to calibrate, so those had to be tossed. AIThinker and so-on don't specify what crystal exactly they use or where they source it, so it's going to be all over the place. 20 ppm tolerance is pretty high, but cheaply sourced crystals can also be pretty bad.

At this point, I'm most interested in learning more about the RF calibration; is anyone familiar with that? Exactly what is being calibrated (and against what standard?) For reference: https://docs.espressif.com/projects/espressif-esp-faq/en/latest/development-environment/debugging.html?highlight=calibration#how-can-i-modify-the-default-method-of-rf-calibration-in-esp8266 The calibration discussion @TD-er started here: #8163

As I've continued testing, connectivity remains perfect on channel 6 but awful at the band edges.

If you can get repeated results with multiple modules, multiple APs, and multiple environments, and other devices are fine on those channels, that would be quite interesting.

@dalbert2
Copy link
Contributor

dalbert2 commented Feb 25, 2023

@someburner thanks; I understand crystal temperature/mfgr/aging tolerances and how to compensate for them. I have been designing outdoor, battery-powered wireless mesh water and gas utility meters that operate in the 902-928MHz band for nearly 20 years with almost a million devices in the field so I understand the challenges of frequency stability (especially over temperature) and how to compensate for them. Unfortunately, I have no visibility into what is being done by EspressIF in this regard and am interested if anyone else knows.

The specific crystal tolerance requirements for WiFi were stated in a prior comment (20ppm and 25ppm depending on wifi flavor). Also mentioned earlier, the AI Thinker ESP12F modules I'm using indicate that they contain 10ppm crystals which are well within tolerance. Usually such crystals are 10ppm/10ppm (mfgr/temp tolerance). Crystal tolerance requirements depend on the bandwidth and modulation scheme: WiFi b/g subcarriers are 312.5kHz wide and the PHY is designed to accommodate automatic frequency correction, so WiFi should be more tolerant of temperature-induced crystal drift than most narrow-band 915MHz FSK systems of the sort you are probably using. A good overview of the 802.11b/g PHY is here: https://www.wirelesstrainingsolutions.com/understanding-ofdm-part-2-refresh/

As you surely know (but adding here for others who might be interested), increasing receiver sensitivity with low data-rate FSK requires increasingly narrow bandwidth (to keep increasing SNR). The narrower the bandwidth, the tighter the crystal tolerance required and at some point it becomes impractical. Newer modulation schemes (notably LoRa) avoid this and achieve very high sensitivity at low data rate using wide (125kHz and 500kHz) channels, eliminating the need for very tight crystal tolerances and achieving huge link budgets in many low data rate applications.

However, as I also mentioned, the tests I've been doing recently are being conducted entirely indoors so there is very little temperature variation. Moreover, whether manufacturing, temperature, or aging, crystal tolerance issues would likely affect all channels equally; so crystal drift seems unlikely to explain the frequency (channel) specific problems and intermittent behaviors I am seeing with the ESP8266 (and that I am not seeing with other devices). I've monitored the wifi spectrum for a while too and it is pretty quiet so the issue doesn't appear to be local interferers (which is not to say that out of band interferers couldn't still be causing problems). However, as of now, this still feels like a software problem to me and the title of this thread suggests software as well.

@someburner
Copy link

Thanks for the link! Was looking for something like that. I wasn't suggesting it wasn't a hardware problem (in our experience, it is extremely rare to be a hardware problem unless subjected to extreme environmental stresses or physical damage), but rather that in some cases I could see there being a big enough PPM to where certain ESP/router combos don't work very well. And found a paper about activity dips which is what I was referring to. Like you say, the crystals should be within tolerance, but without knowing the source of these crystals it could be worse than specified.

I agree this sounds like a software problem. I haven't witnessed channel-specific problems myself but I've also never really audited that across our fleet.

@CRCinAU
Copy link

CRCinAU commented Jun 21, 2023

That isn't just "likely" it has already been proven.
PIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK22x_190313 should work fine on 802.11n
Any later version of the SDK22x is not.

Well, shit.

This has been my needle in the haystack.

I noticed today that none of my ESP8266 devices were able to connect using 802.11n and everything was falling back to 802.11g.

As soon as I added -D PIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK22x_190313 to my build flags, it works perfectly again - and connects using 802.11n.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests