-
Notifications
You must be signed in to change notification settings - Fork 342
Initial support for framebuffer backend #2410
Conversation
20dd7f6
to
28e2291
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not looked at the fbdev code, just the session handling triggered by the changes to logind. There is no good way to combine logind with non-drm gfx devices as attempted here. I'd recommend sticking to the direct backend instead.
Whether an fbdev backend makes sense also needs to be discussed. See #2126. fbdev is an ancient, deprecated interface mostly emulated in the kernel, which speaks against adding support with the associated maintenance overhead. Expect significant pushback. The completely trashed CPU in the video also doesn't seem to suggest fbdev as a practical solution.
If I am not mistaken, Google Nexus 4's Adreno 320 is supported by freedreno, while the Samsung Galaxy S2's Mali400 should be supported by Lima. Thus, they should both be usable through DRM. Presenting a reason for why these cannot be used through DRM would make sense, and if so, why fbdev should be supported over fixing DRM for these devices. I do realize that many more devices are targeted than the two test examples, but it helps to have devices to talk about.
I'll leave the rest of this discussion up to someone else.
// if we have a drm fd we don't depend on this | ||
if (session->has_drm) { | ||
// if we have a drm/framebuffer fd we don't depend on this | ||
if (session->has_drm || session->has_fb) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong. The logic here can only be skipped on has_drm due to it signalling that a drm device is opened through logind, and that listening for a PauseDevice signal on that that device is sufficient.
As you can't open the framebuffer device through logind, has_fb is not useful and cannot be used to bail out here.
// directly. | ||
return open(path, O_RDWR); | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say that using the logind backend to open fbdev is probably wrong, and that if such device is to be opened, it should probably be done with the direct backend instead.
ret = wl_event_loop_dispatch(event_loop, -1); | ||
if (ret < 0) { | ||
wlr_log(WLR_ERROR, "Polling error waiting for 'CanGraphical' on seat %s", | ||
char *no_wait = getenv("WLR_SESSION_LOGIND_NO_WAIT_CAN_GRAPHICAL"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not nice. Another nail in the coffin for using logind.
size_t ret_len, int ret[static ret_len], const char *str) { | ||
#if WLR_HAS_FBDEV_BACKEND | ||
static int open_if_fbdev(struct wlr_session *restrict session, | ||
const char *restrict path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary. Any checks can be done in the backend on create, and attempt_fbdev_backend can skip things as necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention was to make this consistent with open_if_kms
and attempt_drm_backend
. Why would it be better to do this in attempt_fbdev_backend
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, that's true - I forgot the open_if_kms drmGetVersion check. That's what happens when you try to remember things. :)
I believe it would be quite simple to have the checks for both drm and fbdev in their respective attempt_backend
implementations, in which case we can just calll wlr_session_open_file
directly and remove open_if_kms
altogether.
(To people watching, please don't attach
Agreed, the patches become much nicer if it doesn't use logind. Can you give me a hint on how to start wlroots with the direct session properly? I'm always getting this error, even after my user joined the
I understand your concerns and appreciate that you've taken the time to review parts of this PR. The postmarketOS use case is different from "my hypervisor can't do DRM" as in #2126: there are over a billion legacy Android phones in the world. Oftentimes, the hardware is still working as well as on the day they were bought, it's just the outdated software that makes them practically electronic waste. postmarketOS boots on ~200 different device models, which have the screen working with framebuffer (see this table). If wlroots worked on them with reasonable performance, people could run sway + casa, phoc (Phosh) and cage on any of those legacy phones to gain something like a raspberry pi, but with a screen already attached and working. I can think of tons of projects to do with them, from podcast player to displaying data on a small screen over wifi in your car (a friend asked me about just that) to whole school classes being introduced to programming and Linux.
Good point. As seen in the video, all four CPUs of the phone are at ~80%, even if sway is just idling and nothing but the 768x42px toolbar gets redrawn every second. I'm just running weston on the same device for comparison (also with fbdev backend), and the CPU idles around 0 if the screen is just idling. When I open the GTK+3 demos and keep scrolling up and down in a list (so it constantly redraws), one CPU is at ~80% until I stop scrolling, then it goes back to ~0. (If there is demand, I could also make a video of that.) So there appears to be lots of room for optimization. I'll look into improving it (and possibly rebase it on the swapchain branch even before it's merged, that should give a good boost).
Correct, the two phones I've used as example should work with freedreno and lima. But only with a close to mainline kernel, meaining all drivers from the downstream kernel need to be forward ported / rewritten. Not just the display, because a phone where the display works with DRM, but which cannot charge, is not that useful. With the downstream kernel on the other hand, usually only a few things need to be patches to make it build and run postmarketOS. Our community is involved in mainlining and prefers (close to) mainline kernels wherever possible. But unfortunately, this is an huge task, and not feasible to be done for most of the enormous amount of Android device models out there.
No need to answer to the lengthy explanation above then. But please don't overlook the question above that. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give me a hint on how to start wlroots with the direct session properly?
I'm always getting this error, even after my user joined the
tty
group:
[ERROR] [backend/session/direct.c:171] Cannot open /dev/tty: No such device or address
We should be able to get past this by explicitly specifying a TTY, rather than relying on the process-local /dev/tty
:
# Assuming one wants the first TTY - replace compositor as needed.
WLR_DIRECT_TTY=/dev/tty1 WLR_SESSION=direct sway
Depending on how this all turns out, I could also consider adding support for fbdev to seatd. It wouldn't be much work, but as fbdev access can't be revoked, it won't be as nice as drm with applications possibly perpetually trashing the screen, and might need to be hidden behind a compile-flag.
The postmarketOS use case is different from "my hypervisor can't do DRM" as in #2126: there are over a billion legacy Android phones in the world.
I do believe that @ascent12's reply still applies in this case with respect to the ongoing renderer redesign. Such issues would likely be the primary technical concern.
The completely trashed CPU in the video also doesn't seem to suggest fbdev as a practical solution.
I'm just running weston on the same device for comparison (also with fbdev backend), and the CPU idles around 0 if the screen is just idling.
Ah, I see. This makes sense when looking at the fbdev code. I added a review comment that might help address this.
For the rest, I'll summon @ascent12 and @emersion (currently on vacation, so expect a bit of delay).
struct wlr_fbdev_output *output = data; | ||
wlr_output_send_frame(&output->wlr_output); | ||
wl_event_source_timer_update(output->frame_timer, output->frame_delay); | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sending frame events indiscriminately here is likely the cause of your CPU burn, by triggering unnecessary output commits.
What you want to do is to only send frame events if the output has been committed, which is what the DRM and Wayland backends do, using page flip handlers and frame callbacks, respectively.
Try something like this:
struct wlr_fbdev_output {
...
bool timer_running;
bool committed;
}
static bool output_commit(struct wlr_output *wlr_output) {
...
output->committed = true;
if (!output->timer_running) {
wl_event_source_timer_update(output->frame_timer, output->frame_delay);
output->timer_running = true;
}
...
}
static int signal_frame(void *data) {
struct wlr_fbdev_output *output = data;
if (output->committed) {
output->committed = false;
wl_event_source_timer_update(output->frame_timer, output->frame_delay);
wlr_output_send_frame(&output->wlr_output);
} else {
output->timer_running = false;
}
return 0;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding this block:
output->committed = true;
if (!output->timer_running) {
wl_event_source_timer_update(output->frame_timer, output->frame_delay);
output->timer_running = true;
}
First I put it in the if (box_width)
condition, because the outer condition if (wlr_output->pending.committed & WLR_OUTPUT_STATE_BUFFER) {
gets called often with a request to draw an empty damaged area (box_width = 0). However, when I put the snippet above inside the if (box_width)
condition, wlroots will only draw one frame and then stop. Do you happen to know why that's the case?
If I put it inside the outer if (wlr_output->pending.committed & WLR_OUTPUT_STATE_BUFFER) {
condition, wlroots continues to draw frames, but is having a high CPU load again even if nothing changes on the screen.
Thanks a lot for all the help, @kennylevinsen!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so I dug a bit further into it, and I think I managed to work around the issue:
static bool output_attach_render(struct wlr_output *wlr_output,
int *buffer_age) {
...
if (buffer_age != NULL) {
*buffer_age = 1; // HACK HACK HACK
}
return true;
}
static bool output_commit(struct wlr_output *wlr_output) {
struct wlr_fbdev_output *output =
fbdev_output_from_output(wlr_output);
if (!output_test(wlr_output)) {
return false;
}
if (!output->backend->session->active) {
return false;
}
...
if (wlr_output->pending.committed & WLR_OUTPUT_STATE_BUFFER) {
...
wl_event_source_timer_update(output->frame_timer, output->frame_delay);
}
return true;
}
static int signal_frame(void *data) {
struct wlr_fbdev_output *output = data;
wlr_output_send_frame(&output->wlr_output);
return 0;
}
For testing, I did a quicker implementation of the timer change: Set the timer in output_commit and leave it at that. If commit was called in quick succession, this might mess with the timing, but oh well.
The main issue, however, was the buffer_age: See here in wlr_output_damage_attach_render
:
wlroots/types/wlr_output_damage.c
Line 158 in 87836dc
if (buffer_age <= 0 || buffer_age - 1 > WLR_OUTPUT_DAMAGE_PREVIOUS_LEN) { |
buffer_age == 0
is handled the same was as too old a buffer, by requesting re-render and thus another commit. This leads to commits ad nauseam. I haven't looked into what the correct fix is here (return buffer_age = 0 for only first render?), but that's at least the cause.
Seems to render correctly with the above workaround in some simple tests, and does so with lower CPU utilization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll definitively give this a try over the next days. Thanks again, @kennylevinsen!
EDIT: I had to take care of other things, and next week I'm on vacation. I'll continue with the PR when I'm back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This workaround works great, the CPU idles at 0% now when doing almost nothing (e.g. rewriting the clock every second in sway) now, instead of ~80%!
if (wlr_output->pending.committed & WLR_OUTPUT_STATE_MODE) { | ||
assert(wlr_output->pending.mode_type == WLR_OUTPUT_STATE_MODE_CUSTOM); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to fail if the session is inactive. For DRM, this is done here:
Line 546 in 87836dc
if (!drm->session->active) { |
That's an interesting use-case.
Right, using the DRM interface would be better. See also: https://dri.freedesktop.org/docs/drm/gpu/todo.html#convert-fbdev-drivers-to-drm
Well... Expect performance to not be good, since we are downloading frames from the GPU after rendering and then uploading them again to the GPU for scanout. Also the timer-based vsync is not great. But these are fbdev limitations, not issues with your code. I'm also wondering if it would be easier to just re-use the headless backend code instead of copy-pasting it (ie. add a headless output field in the fbdev output struct). Not sure it would necessarily make sense. Thanks a lot of working on this. On the whole I'm not sure I want to merge this, because fbdev is legacy. Weston supports fbdev but has been meaning to drop it for a while.
|
Using this backend as an extra plugable module would be great! |
Thanks!
Okay, I understand that it's not desired to add support for legacy code in wlroots. Making fbdev a third-party backend sounds like a good middle ground, I'll look into that then. Can you point me at an example for third-party wlroots backends, or how the logic to load them in the compositor would look like? |
Move all logic from wlr_session_find_gpus() to a new find_devs() function. In addition to finding DRM devices, this will be used to find framebuffer devices when introducing the fbdev backend.
Allow running modern wlroots based compositors on downstream kernels of old Android phones/tablets. In theory, this makes over a billion devices with hopelessly outdated Android stacks [1] useful again: after swapping out the userspace with postmarketOS + wlroots, they can at least be used for raspberry-pi-with-touchscreen like projects. It would be best to run mainline Linux on those devices, then run wlroots on DRM instead of framebuffer. We are following this approach in postmarketOS too, but it's too time consuming to be done for more than a small percentage out of the countless Android devices in existence. Run the backend by setting WLR_BACKENDS="fbdev,libinput" for now. I have tried to add it to wlr_backend_autocreate to run it automatically when no DRM devices are found, but this needs further refactoring: after failing to create the DRM backend, the session/wlr_backend can't be reused. [1]: https://www.which.co.uk/news/2020/03/more-than-one-billion-android-devices-at-risk-of-malware-threats/
Allow to skip waiting for logind to mark the session with CanGraphical. When using fbdev backend, it never happens.
Enable the fbdev backend in Alpine and Arch Linux. While at it, put each -D arg into its own line to improve readability.
The new logic in the compositor would be to conditionally use your backend instead of struct wlr_backend *backend = wlr_backend_autocreate(display);
if (backend == NULL) {
backend = fbdev_backend_create(display);
} |
Thanks, @emersion! I'll close this PR and follow up with the current status of the third-party backend here: |
Allow running modern wlroots based compositors on downstream kernels of old Android phones/tablets. In theory, this makes over a billion devices with hopelessly outdated Android stacks useful again: after swapping out the userspace with postmarketOS + wlroots, they can at least be used for raspberry-pi-with-touchscreen like projects.
It would be best to run mainline Linux on those devices, then run wlroots on DRM instead of framebuffer. We are following this approach in postmarketOS too, but it's too time consuming to be done for more than a small percentage out of the countless Android devices in existence.
Since modern setups won't need this, I've made it opt-in (
-Dfbdev-backend=enabled
). If somebody is worried about who would maintain this upstream in wlroots: the postmarketOS community and I would be happy to make sure that this keeps on working and gets improved.Demo
I've tested this on both the Samsung Galaxy SII and Google Nexus 4 and made a video of the latter running Sway and Phosh, with gnome-chess and the kgx terminal app:
https://postmarketos.org/static/video/2020-09/wlroots-fbdev-lg-mako.webm
If somebody wants to try it on their phone, see these notes:
https://wiki.postmarketos.org/wiki/User:Ollieparanoid/Run_wlroots_with_fbdev
Beyond this PR
Improvements for future patches, that seem to be out of scope for this initial version:
wlr_backend_autocreate()
: if no DRM devices were found, fall back to fbdev (I've tried this, but after failing to create the DRM backend, the session/wlr_backend can't be reused. So this probably needs further refactoring?)Questions
/dev/fb0
etc. With the current version,logind_take_device
just callsopen
instead of asking logind, if it is a framebuffer device.