-
Notifications
You must be signed in to change notification settings - Fork 46.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate server markup against the DOM rather than string checksum #6618
Validate server markup against the DOM rather than string checksum #6618
Conversation
…g a string checksum to walking the server-generated DOM and comparing to the client component tree. Reworked the way that child crawling works to get all unit tests to pass. Added a bunch more unit tests and fixed code so that those worked too. Still a lot more tests to write, esp around interactivity. Events were broken when reconnecting markup. Fixed that. Inline styles caused a markup mismatch. Fixed that. Small refactor of server unit tests to make them more independent. Reorganized markup reconnection tests for better readability. Added a big unit for reconnecting ES6 class components, createClass components, pure components, and literal components. Added support for boolean attribute values. Also added unit tests that verify that user interaction with inputs before client-side reconnect does not cause a markup mismatch. Changed the test descriptions to fit jasmine's syntax. Added a few simple unit tests for the other namespaces (svg and mathml) Added a bunch of unit tests for wrapped components, including various kinds of uncontrolled form inputs. Fixed the bugs those tests revealed. Added two unit tests for self-closing tags. Added unit tests for newline-eating tags. Added unit tests for uncontrolled multiple selects. Added some unit tests for rendering controlled input fields. Added more tests for controlled and uncontrolled fields. Renaming and other polish Added unit tests for refs Polished up the markup mismatch error messages and API doc. Added a few component hierarchy unit tests. Updated documentation on the nodeToReuse argument to mountComponent. Changed its name to nodesToReuse. Added a path to the node in question for markup mismatch errors.
* @param {String} text to truncate | ||
* @param {Number?} maximum length | ||
*/ | ||
function abridgeContent(text, maxLength = 30) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure if there was a utility function that already did this; please let me know if there is.
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla - and if you have received this in error or have any questions, please drop us a line at [email protected]. Thanks! |
context, | ||
shouldReuseMarkup ? container.firstChild : undefined | ||
); | ||
} catch (e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error handling had to move over from _mountImageIntoNode
because now a markup mismatch results in a thrown Error from mountComponent
.
The build isn't passing, but the error it's giving is pretty strange. I'm going to close and reopen to force a second build just to make sure the error is reproducible. |
Opening to run another build. |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
this.node = node; | ||
this.serverVersion = serverVersion; | ||
this.clientVersion = clientVersion; | ||
this.stack = (new Error()).stack; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this reliably grab stack information across all browsers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the linked document on developer.mozilla.org in the comments:
Note that the thrown MyError will report incorrect lineNumber and fileName at least in Firefox.
I can do some research into other browsers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading further on this at this MDN article and this MSDN article, it seems to me that this will grab a stack for all browsers where a stack is available. Stack is completely non-standard, so I can't guarantee that it will work absolutely everywhere.
It should be fine on platforms where there is no stack property.
Note also that every place in the current code where MismatchError is thrown, it is caught without looking at the stack.
Given all that, I'm going to leave the code as is. Let me know if you disagree, and thanks for the comment!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need to throw it to guarantee that the stack trace is serialized: unexpectedjs/unexpected@5dcd441
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, @Munter!
I'll look at this again and patch it up if we decide to move forward with this PR.
@aickin what about when there's multiple text nodes, that could collapse to form the same text content? like |
@ljharb thanks for replying!
This implementation throws a mismatch error if you have two text components in the server markup and only one text component in the client (and vice versa). The reason is that each non-empty text component generate 3 DOM nodes (two comments and one text), so the markup is not the same between two text components and one. In theory, you could make the client side code more lenient and accept two adjacent text components as the same as one single text component, but that would involve modifying the DOM to remove some of those extra comment nodes, and that seems like a bad idea to me. Thoughts?
As far as I know (and I may be wrong!), other than the |
I feel like as long as the textContent is the same, why would the client care? I don't even know why React renders (I asked about HTML comments because I've suggested that in dev mode, React server rendering produces an HTML comment next to each component's HTML that closely mimics its creator JSX, to help debugging) |
@ljharb it renders it as 2 text nodes because as they are separate nodes as part of the diff algorithm. It makes sense... as if they are different brackets then the chances of them changing separately is highly likely, ie:
Each bracket ends up as an indexed argument... (ie, the children array). In practice this is a highly reliable signal as to which elements / children should be separate from others. Arguably an implementation detail, but keep in mind that until 15, this would have rendered 2 entirely different spans anyway. |
Agree with @lelandrichardson above. Also, remember that while JSX could in theory do static analysis in your example and merge those strings, it's not possible in a general case to know whether the value in curly braces will be a string or a component. For example:
So merging text nodes at the ReactElement level is not possible in the general case. However, I could in theory imagine a renderer that, while walking the tree, attempted to merge sibling text nodes at runtime. It would add some complexity to walking the tree, though, and I think to tree diffs as well (although I don't know as much about that). In any case, I think it's orthogonal to this PR, as I'm trying to keep current behavior, but it would certainly make an interesting PR of its own! |
I'm more asking why the client code would need to care if the client code generated, say, two text nodes, when the server had generated one. Why would I expect to see a console warning that my markup differed, just because the number of text nodes was different, when the effective text was the same? |
I think theoretically it doesn't need to, but it would make the algorithm to bootstrap onto existing markup more complicated, and it would likely have to mutate the DOM on first pass (to separate the one text node into two, and add the comment nodes) which seems to be biting off more than this PR needs to be? The goal of this PR would be to let a separate SSR (potentially) diverge from the React codebase while still making it possible to have client-side react core attach itself to the server rendered markup. We can still expect that the same application code (ie, the actual react components) is being run to render the string server-side, so I think the strings being treated as two separate nodes isn't going to cause any problems in practice? The main issue we were trying to solve was the fact that the ordering of attributes could potentially diverge as the code bases diverge, which with the previous method of using the checksum, would have presented a code synchronization problem. Also, potentially related to your question, there is a separate proposal from this PR that react could decide to only blow away the sub-tree that was mismatched, rather than the entire tree, which could make something like you're proposing sort of possible by default. Although it would still be mutating the DOM for all of those text nodes on initial pass, which seems like a nice failsafe, but not a great default. |
@aickin btw, I read through this PR and it all looks pretty good to me, but I don't have a lot of context for this code base. I'm going to do a second pass with performance in mind now that I have a better understanding of what you've changed, and see if I have anything to add from that angle. The tests seem fairly thorough, although i'm not very familiar with the different strange corner cases of HTML. Has this code been run on any large react code bases to get some "coverage" that way? Perhaps we could pull down the branch and run it on some large components? Great work! 👍 |
@aickin Amazing work!
I think we would like to do something like this eventually, I think there is an open issue for this somewhere.
Merge... or split.
Yes, totally agree. This is a nice beefy PR. I just skimmed it, but it looks good at first glance. I'll need to go through it in more detail later today, but I'm excited. cc @facebook/react-core @sebmarkbage @tomocchino for situational awareness (also, relevant issue is #6420), cc @spicyj and @gaearon to help find the edge cases. |
@lelandrichardson Thanks in advance for your help!
@jimfb Me, too! Thanks so much for your guidance already on this strategy, and I look forward to your feedback. |
@aickin updated the pull request. |
I see a lot of value in getting rid of the HTML generation code on the client. I know @spicyj doesn't agree with me here but I'll try to make a case for it. As for this diff I see a major issue that it rerenders upon a mismatch. That's a behavioral difference because the side-effects that happen in componentWillMount will happen twice without a clean up phase if I read this correctly. We'll need to be a bit more aggressive about being resilient to this because of error boundaries and incremental reconciler. However, it seems like a pretty significant issue that may cause memory leaks and weird artifacts if not handled correctly. I see a plausible way forward to at least use the error boundaries path. However, that is also less than ideal because we've already rendered most of it. We might as well just patch it up. I think a better way forward would be to patch up the DOM using the same logic as for creating it and only logging errors. Not throwing. At that point the checksum vs. reading the DOM becomes a lot more interesting. Because to do "fix up" the current logic have to generate compatible HTML. To get fix up to work without HTML generation you would do most of the same thing in both solutions. Therefore my preference would be something that walks the DOM and sets all the properties as if it was a new tree only it reuses it. Optionally comparing them along the way. |
This is something I hadn't considered at all. This definitely makes the case for the "patching as you go" approach to be a necessary part of this PR. |
@sebmarkbage Thanks for looking over this PR!
That's a great point, and one that I hadn't considered. I thought the current code in master re-rendered on a markup mismatch, but I can see now that it doesn't. This is indeed tricky.
I don't know what this means. Can you explain (or point me in the direction of an explanation) of what "error boundaries path" means?
I'm probably missing something, but doesn't it have to compare to know which attributes in the DOM to remove? (And again, thanks for joining in!) |
@sebmarkbage Is that really something we want to support so badly, people generating mismatching markup on server/client and trying to patch it? It seems excessively expensive at first glance, comparing all properties on all nodes and elements for something that is an error. Also, not all elements are stateless and would respond appropriately to changing them, like object/embed and surely some web components, maybe audio/video in edge-cases? Also, how far would you want to take it? Should it try to patch the DOM hierarchy as well? That doesn't seem like a good idea without nodes being tagged somehow. If not, is there really a point? |
Thinking about this more, I got a bit confused about this point. It seems to me that if At the same time, I recognize that there could be other, non-memory leak side effects in |
@aickin Whether explicitly defined or not, I believe a common pattern is to do some setup/initialization in
You can also imagine the listener doing something such as logging, record keeping, etc. that would have some hard to debug effects if it all of a sudden started silently getting called twice. ("Silently" is maybe a bit generous here since this would only happen in situations where the markup differed and an error was logged to the console. I think the question is: Does react support behavior in the "mismatched markup" case, or is that considered undefined behavior at that point anyway? |
Okay, we talked some more about this today. Server rendering is clearly important to all of you and we'd love to figure out a way to make it better, especially if we can make a significant performance improvement. Overall, I'm still a little shaky on the claim that this PR is necessary in order to build a standalone server renderer so we'd be interested to see if we can facilitate building a fast server rendering without requiring this as a first step. It sounds (based on comments from @jimfb) like you already have a working implementation of a server renderer that stacks on top of this work – would you mind posting it as a separate PR so we can take a look and get a better idea of how it works and discuss the best avenues to getting it landed? It seems like this PR has the potential to make a handful of things better in various ways. As I mentioned above, it seems like there are other individual solutions that would collectively give the same benefits, but it's clear this pull request has a lot of potential so we're interested in it if we can solve a few things together and not regress existing behavior. Here's a summary of what we'd like to see in this pull request to feel comfortable with it: As @sebmarkbage said above, we shouldn't execute user code (componentWillMount, render) twice in the case of a markup mismatch. There are two possibilities for how we can solve this. First, we can look at making this PR patch markup trees instead of throwing them away and starting from scratch. This is what was suggested above and also appeals to me personally the most because it showcases the unique value of this PR and does something that our existing design can't. However, patching existing markup comes with additional complications that we need to solve. Even if there is an opt-in way to accept markup differences, we'd still want to warn in most cases for a difference because differing content on the server and client would still usually point to a bug. We also need to solve how user-mutable state is treated if we're patching things up. In particular, it would be bad if you see a post that says "Ben says hi." (from the server) and start typing a comment below it, then as you're clicking Submit it changes to "Jim says bye." (from the client), because you would be misled. React currently solves this by blowing away the tree when a mismatch happens (@lelandrichardson yes, this is supported), but one option here might be to find a way to completely reset all user state, either by somehow setting the properties back or by recreating the DOM nodes and swapping them out. Second, upon noticing a mismatch we could completely recreate the DOM tree by reusing the React elements that were returned from In addition to solving the render-once concern, it would be good to get a better handle on the performance impact here -- what parts are slow and why. Finally, I'm uneasy with the idea that this would break when extensions add extra nodes. Unless someone (presumably @jimfb) did a lot of real-world logging on Facebook and verified with confidence it won't cause any problems, I'd like to stick with having some markers on the React nodes so we can identify them later. Do those requirements sound reasonable? Those are in addition to any code-level changes we'd end up with after reading through the patch in detail and reviewing it for both style/structure and correctness. I hope you can forgive us for wanting to tread carefully with this code. Even though Facebook doesn't use server-side rendering extensively, this code really is near the heart of React DOM and may affect what we can do in the future so we want to make sure to get it right and not rush instead of merging something now that we'll have to rip out a few months down the road. |
Awesome to hear!
To be clear, I don't really believe that fast server rendering requires this PR as a first step. I'm just trying to get to a faster renderer that supports streaming in whatever way I can, and this looked like the easiest path. I am wrong fairly frequently; it may well be that other paths are easier! (FWIW, while I don't think this PR is necessary for a faster streaming server renderer, I do still believe that this PR or something like it is probably necessary for a standalone server renderer. In addition to the issues I mentioned above, I found some more tricksy issues with object property ordering. In particular, it seems like the current checksum requires that all server renderers guarantee the property ordering of not only
Sure! I think it'll take me a little while (more than a day but less than a week?) to get it presentable enough for others to read. I will work on that. I also have a version in my head of a fast/streaming server renderer that doesn't depend on this PR; would that be interesting for the discussion as well?
Interesting to me that render shouldn't be run twice; my understanding was that render is supposed to be side-effect-free, but I'm probably missing some subtlety. Perhaps it's a case of how the API is documented to be used vs. how it is actually used in the current real world?
They do, but I'm not 100% clear which of them are requirements. Here's what I heard:
Is that a fair restatement?
Of course! It's your project, and you will have to live with the results of any PR, so it's absolutely fair to be deliberate about big changes. As for my next steps, unless y'all tell me to do something different, I think I'll work next on polishing up the server renderer I have in a separate PR and seeing if I can make it work without this PR for comparison. Given your feedback, I don't think it's a good use of my time to address the issues in this PR until y'all have a chance to look at that code and decide if it's a path you'd rather pursue. Does that make sense? Thanks! |
Side note about compression: #6618 (comment) I would like to see similar tests run on http/2 servers, and with brotli compression. https://www.cloudflare.com/http2/ |
As an alternative we can create a document/spec stating what a server-renderer should look like along with a suite of JS tests that a standalone renderer must pass in order to be compliant with This should be part of the main React repo so that new updates to React don't add any unexpected breakages. |
^ This is the idea that I like best, I think. This way there is nothing stopping other people from creating faster/better renderers that may differ. As an example we've already discussed, one renderer could be streaming, and another synchronous, but live in different code bases. |
I like that idea too @goatslacker. |
OK, so it took me a bit longer than I expected, but I just opened a WIP PR of a server renderer that supports streaming. That PR actually works on top of the current client validation in master and has around 600 unit tests to try to make sure it's up to spec. I originally was going to open two different PRs, one with a streaming renderer on top of this branch and one with a streaming renderer on top of master, but the renderers ended up being pretty similar, so it wasn't worth it to me to do both. A few thoughts on the streaming server renderer, in no particular order:
So, some questions I'd like to ask folks on this thread:
(And I said this in the renderer PR, but if it's all right with folks, I'd like to keep the discussion in this PR, except for direct comments on lines of code. I think it'd be helpful to have the discussion of which direction, if any, to go forward in one place. Thanks!) |
I just learned about the benchmark script in
So, loading up React gets about 6% worse with this PR, but cold VM (Note if you want to repro: I changed line 145 of |
Ping for updates~ |
it might improve performance of ssr a lot. mark for updates |
Very excited for this. Can't stream on React 15 just yet, hope this'll make it in soon! |
@aickin @goatslacker @spicyj @sebmarkbage do you have an update on this? It is a feature we are very interested on, and we would like to move it forward. Is there anything we can do to help with the PR? |
I believe this PR is now obsolete, given all the great work @sebmarkbage has done in React 16 to switch to a DOM walking rehydration strategy. I'm closing this PR and heartily thanking @sebmarkbage for his work and everyone else on this thread for their contributions! |
@aickin didn't see their release notes on SSR, where to find it? |
@jiyinyiyong There are no release notes yet; all the work I'm talking about is in master and will presumably be released in React 16. Cheers! |
IIRC,they don't use SSR too much internal,so that's why they do the works which relate to SSR at the end of Fiber.If you want to know more details about SSR,you can just search SSR in the pull request. |
Goal: this PR intends to implement step 1 of the proposal to pull apart server DOM rendering from client DOM rendering and enable server streaming. Instead of a string checksum, this PR walks the server-generated DOM to validate that the markup matches. This makes it somewhat more lenient than the current algorithm, which requires a character-for-character match. This in turn should eventually enable server renderers that are decoupled from the client code and as a result more performant.
Performance: at @sebmarkbage's suggestion, this PR only focuses on correctness, not performance. I have a fairly simple benchmark project with Selenium scripts, though, and I intend to run that tests in different browsers in the next few days. Any suggestions on areas of the PR that look suspect from a perf perspective would be welcome. I'm not super experienced in JS perf optimization.
Tests: All existing unit tests pass, and I added about 90 or so more for the new code path. I realize that might seem like overkill, but I found there were a surprising number of special cases and fiddly bits.
Notes: The basic implementation strategy is to add an optional argument to
mountComponent
/mountChildren
callednodesToReuse
nativeNodeToReuse
, whichareis the nodesin the DOM that the component in question is supposed to attach to.This argument can either be a single DOM node in the document (forNote that this DOM node is the first node corresponding to the component. In the case ofReactDOMComponent
orReactDOMEmptyComponent
) or an array of DOM nodes in the case ofReactDOMTextComponent
.ReactDOMComponent
orReactDOMEmptyComponent
which only produce one DOM node, it is the first and only DOM node, whereas withReactDOMTextComponent
, it is the first node of two or three. This optional argument made it pretty easy to isolate the PR's changes from the non-SSR client code path. Also, therenderToString
code path should be completely unchanged.I'll go through the individual files in the PR and try to add comments where I think they make sense. Please feel free to ask any questions. Also: apologies if I broke any conventions or rules; this is my first PR to React. And if there's a more constructive way to present these changes, please don't hesitate to let me know. Thanks!
Tagging @jimfb, @goatslacker, @lelandrichardson, and @lencioni, all of whom have expressed interest in this PR.