Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

device handle hotfixes for 2.7 #38834

Merged

Conversation

mbolivar-nordic
Copy link
Contributor

@mbolivar-nordic mbolivar-nordic commented Sep 24, 2021

Fixes: #38181

When CONFIG_USERSPACE is enabled, the ELF file from linker pass 1 is
used to create a hash table that identifies kernel objects by address.
We therefore can't allow the size of any object in the pass 2 ELF to
change in a way that would change those addresses, or we would create
a garbage hash table.

Simultaneously (and regardless of CONFIG_USERSPACE's value),
gen_handles.py must transform arrays of handles from their pass 1
values to their pass 2 values; see the file's docstring for more
details on that transformation.

However, it is possible that a device has more direct dependencies in
the pass 2 handles array than its corresponding devicetree node had in
the pass 1 array. When this happens and gen_handles.py has no room to fill
the end of the array with a final DEVICE_HANDLE_ENDS,
the build fails and leaves users with no recourse to fix it. That's a
formerly theoretical showstopper, which became a real life showstopper when
the "supported" devices patchset landed in #37836.

To work around this for now:

  • revert the 'supports' patches; which seem to have other problems anyway (see item 4. in device: insert padding to ordinal array #38487 (review))
  • although the reverts make the problem go away for current tests, add a new config option, CONFIG_DEVICE_HANDLE_PADDING, that users can use to hack around this limitation in case they run into the problem elsewhere

We'll chase a more invasive but better fix after 2.7 is released and try to backport it if we can.

Copy link
Collaborator

@tejlmand tejlmand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that users are informed during build (linking stage) on what to do / workaround the issue when this error occurs is very good.

Accepted as a short term solution.

Copy link
Member

@dcpleung dcpleung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as a workaround.

@mbolivar-nordic
Copy link
Contributor Author

Build errors seem to be related to an inability to "nest" UTIL_LISTIFY calls within each other, which happens in some wrappers that allocate devices using UTIL_LISTIFY. Still investigating.

@mbolivar-nordic
Copy link
Contributor Author

inability to "nest" UTIL_LISTIFY calls within each other

Yep, that was it. I pushed a new version that avoids UTIL_LISTIFY.

We have code in the tree that's doing a UTIL_LISTIFY loop over DEVICE_DEFINE. We therefore can't use UTIL_LISTIFY within DEVICE_DEFINE as I did in my initial attempt. Harbison & Steele explains why:

Macros appearing in their own expansion---either immediately or through some intermediate sequence of nested macro expansions---are not reexpanded in standard C [...] Older C preprocessors do not detect this recursion, and will attempt to continue the expansion until they are stopped by some system error.

E.g. this:

#define BAR(i, j, ...) int bar_##i_##j;
#define FOO(j, ...) UTIL_LISTIFY(2, BAR, j)
UTIL_LISTIFY(3, FOO)

expands to this:

UTIL_LISTIFY(2, BAR, 0) UTIL_LISTIFY(2, BAR, 1) UTIL_LISTIFY(2, BAR, 2)

and stops there.

@ceolin
Copy link
Member

ceolin commented Sep 27, 2021

@mbolivar-nordic Sorry for the late review, isn't e7f2b73 enough to live with the "supported" devices API ? What are the other problems in this API ? I saw problems with the fix proposed, but if adding paddings is a viable workaround why not keep that API ?

@cfriedt cfriedt added the backport v2.7-branch Request backport to the v2.7-branch label Sep 27, 2021
@cfriedt
Copy link
Member

cfriedt commented Sep 27, 2021

Is the undefined Kconfig a false positive?

@mbolivar-nordic mbolivar-nordic added the bug The issue is a bug, or the PR is fixing a bug label Sep 28, 2021
@mbolivar-nordic
Copy link
Contributor Author

Is the undefined Kconfig a false positive?

Yes

@mbolivar-nordic
Copy link
Contributor Author

isn't e7f2b73 enough to live with the "supported" devices API ?

No because of point 4. in #38487 (review)

@carlescufi carlescufi added the dev-review To be discussed in dev-review meeting label Sep 29, 2021
@cfriedt
Copy link
Member

cfriedt commented Sep 30, 2021

@mbolivar-nordic - please rebase and fix conflicts

@mbolivar-nordic mbolivar-nordic removed the dev-review To be discussed in dev-review meeting label Sep 30, 2021
This reverts commit 4c32e21 with some
manual conflict resolution.

It's not clear that the supported devices are being properly computed,
so let's revert this for v2.7.0 until we've had more time to think
it through.

Signed-off-by: Martí Bolívar <[email protected]>
This reverts commit b01e41c.

It's not clear that the supported devices are being properly computed,
so let's revert this for v2.7.0 until we've had more time to think
it through.

Signed-off-by: Martí Bolívar <[email protected]>
This reverts commit 0c6588f.

It's not clear that the supported devices are being properly computed,
so let's revert this for v2.7.0 until we've had more time to think
it through.

Signed-off-by: Martí Bolívar <[email protected]>
This reverts commit ec331c6.

Although it's a valid simplification under the assumption that we're
going to be padding the array out anyway, it would use extra ROM if we
fix the build system issues that are currently forcing gen_handles.py
to introduce extra padding in the handles arrays for linker pass 2.

On the (perhaps optimistic) assumption that we're going to fix the
build system, let's get rid of a commit that would get in the way. The
extra "complexity" in device_required_handles_get() is trivial.

This gets rid of a comment describing the linker passes, but the
structure of the comment is a bit misleading (and it contains
incorrect information for the results of pass 2: the terminator at the
end is DEVICE_HANDLE_ENDS, not DEVICE_HANDLE_NULL).

Signed-off-by: Martí Bolívar <[email protected]>
When CONFIG_USERSPACE is enabled, the ELF file from linker pass 1 is
used to create a hash table that identifies kernel objects by address.
We therefore can't allow the size of any object in the pass 2 ELF to
change in a way that would change those addresses, or we would create
a garbage hash table.

Simultaneously (and regardless of CONFIG_USERSPACE's value),
gen_handles.py must transform arrays of handles from their pass 1
values to their pass 2 values; see the file's docstring for more
details on that transformation.

The way this works is that gen_handles.py just pads out each pass 2
array so its length is the same as its pass 1 value. The padding value
is a repeated run of DEVICE_HANDLE_ENDS values. This value is the
terminator which we look for at runtime in places like
device_required_handles_get(), so there must be at least one, and we
error out in gen_handles.py if there's no room in the pass 2 array for
at least one such value. (If there is extra room, we just keep
inserting extra DEVICE_HANDLE_ENDS values to pad the array to its
original length.)

However, it is possible that a device has more direct dependencies in
the pass 2 handles array than its corresponding devicetree node had in
the pass 1 array. When this happens, users have no recourse, so that's
a potential showstopper.

To work around this possibility for now, add a new config option,
CONFIG_DEVICE_HANDLE_PADDING, whose value defaults to 0.

When nonzero, it is a count of padding handles that are inserted into
each device handles array. When gen_handles.py errors out due to lack
of room, its error message now tells the user how much to increase
CONFIG_DEVICE_HANDLE_PADDING by to work around the problem.

It looks like a real fix for this is to allocate kernel objects whose
addresses are required for hash tables in CONFIG_USERSPACE=y
configurations *before* the handle arrays. The handle arrays could
then be resized as needed in pass 2, which saves ROM by avoiding
unnecessary padding, and would avoid the need for
CONFIG_DEVICE_HANDLE_PADDING altogether.

However, this 'real fix' is not available and we are facing a deadline
to get a temporary solution in for Zephyr v2.7.0, so this is a good
enough workaround for now.

Signed-off-by: Martí Bolívar <[email protected]>
@mbolivar-nordic
Copy link
Contributor Author

Rebased. I removed the comment causing the kconfig false positive also, just to pacify the linter.

@cfriedt cfriedt merged commit a627666 into zephyrproject-rtos:main Oct 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Changes to public APIs area: Device Model area: Devicetree area: Kernel area: Tests Issues related to a particular existing or missing test backport v2.7-branch Request backport to the v2.7-branch bug The issue is a bug, or the PR is fixing a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tests/drivers/uart/uart_basic_api/drivers.uart.cdc_acm fails to build
9 participants