Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clientRouterFilter causes random links to fail due to duplicate basePath #47486

Closed
2 tasks done
pago opened this issue Mar 24, 2023 · 25 comments · Fixed by #60542
Closed
2 tasks done

clientRouterFilter causes random links to fail due to duplicate basePath #47486

pago opened this issue Mar 24, 2023 · 25 comments · Fixed by #60542
Labels
area: app App directory (appDir: true) bug Issue was opened via the bug report template. locked Navigation Related to Next.js linking (e.g., <Link>) and navigation.

Comments

@pago
Copy link

pago commented Mar 24, 2023

Verify canary release

  • I verified that the issue exists in the latest Next.js canary release
  • I verified that the issue exists in the Next.js 13.3 release

Provide environment information

Operating System:
      Platform: darwin
      Arch: arm64
      Version: Darwin Kernel Version 22.3.0: Mon Jan 30 20:39:35 PST 2023; root:xnu-8792.81.3~2/RELEASE_ARM64_T8103
    Binaries:
      Node: 18.15.0
      npm: 9.5.0
      Yarn: N/A
      pnpm: N/A
    Relevant packages:
      next: 13.2.5-canary.16
      eslint-config-next: 13.2.4
      react: 18.2.0
      react-dom: 18.2.0

Which area(s) of Next.js are affected? (leave empty if unsure)

App directory (appDir: true), Routing (next/router, next/navigation, next/link)

Link to the code that reproduces this issue

None

To Reproduce

I have not found a reliable way to reproduce the issue. It seems to affect random links. It is reproducible in our actual production app. I am happy to test potential fixes, etc.

  • activate appDir in an existing application
  • have a basePath configured for that application
  • observe random links to fail loading

Describe the Bug

We have an existing application which was build with the pages router. Once we added appDir: true to the configuration we noticed that a single link in our application (/plan/configuration/activities) where we have the basePath configured to /plan/configuration) stopped working. Instead of navigating to the correct page, it would instead do a hard reload to the page /plan/configuration/plan/configuration/activities.

Through debugging, we were able to trace the issue to the clientRouterFilter flag. Setting that flag to false fixes the issue. Were noticed that https://github.com/vercel/next.js/blob/canary/packages/next/src/shared/lib/router/router.ts#L1098 was causing a hard navigation, due to matchesBflStatic being true. This part of the code also added the unnecessary additional basePath. It seems like the internal link is incorrectly classified as requiring hard navigation.

We tested using the Link component as well as using the router.push method directly. In both cases, the given URL was /activities.

Note that the link to /skills works fine. Only /activities is broken.

Expected Behavior

Activating appDir: true in an existing application does not cause issues with existing links.

Which browser are you using? (if relevant)

No response

How are you deploying your application? (if relevant)

next dev

@pago pago added the bug Issue was opened via the bug report template. label Mar 24, 2023
@github-actions github-actions bot added area: app App directory (appDir: true) Navigation Related to Next.js linking (e.g., <Link>) and navigation. labels Mar 24, 2023
@powerfulyang
Copy link

powerfulyang commented Apr 7, 2023

One month has passed, but this BUG still persists.
A simple way to reproduce:
goto https://nextjs.org/docs/advanced-features/measuring-performance and click Multi Zones.

scrnli_2023_4_7.15-38-39.mp4

@powerfulyang
Copy link

This BUG first appear in v13.2.4-canary.2

@JunlinPan-Crypto
Copy link

Same issue with duplicate basePath and hard navigation for a few routes.

It seems like the matchesBflStatic would be true in certain cases, which causes the above problem.

https://github.com/vercel/next.js/blob/canary/packages/next/src/shared/lib/router/router.ts#L1083

@JunlinPan-Crypto
Copy link

Disable clientRouterFilter in next config can work around this problem.

 experimental: {
    clientRouterFilter: false,
  },

@esponges
Copy link

In my case, neither downgrading to a version before v13.2.4-canary.2 nor setting the clientRouterFilter: false did the work.

@Marcosld
Copy link

Marcosld commented Jul 5, 2023

We are having the same problem...

@anirudh-eka
Copy link

anirudh-eka commented Aug 5, 2023

Okay, so I've reproduced the issue in a completely new project successfully. In this project, the base path is /docs. I have deployed it with vercel, you can take a look here:

  1. If you go to /docs/rendered-in-pages you will see a link "Go to Main (App directory)". If you inspect the html:
<a href="/docs">Go to Main (App directory)</a>
  1. But if you click on that link it will take you to, /docs/docs, which is a broken page.

As suggested on this thread, I turned off clientRouterFilter:

// next.config.js
experimental: {
    clientRouterFilter: false,
},

I deployed the fix to a different vercel deployment:

  1. If you go to /docs/rendered-in-pages you will see a link "Go to Main (App directory)". If you inspect the html:
<a href="/docs">Go to Main (App directory)</a>
  1. This time if you click on that link it will take you correctly to /docs.

Next version: 13.4.12
Repo: https://github.com/anirudh-eka/test-next-13

Perhaps this feature should be opt-in until stable?

@myxoh
Copy link

myxoh commented Aug 21, 2023

This configuration is also tied to this issue (causing full page navigations rather than client navigations): #54231

It seems to me the BloomFilter is leading to collisions - maybe the hash has too little entropy? (Just throwing some ideas but I'm not too familiar with this branch of the code)

@NicolasGiaco
Copy link

NicolasGiaco commented Sep 13, 2023

We're experiencing the same issue in our application.

We've set basePath: /app in our frontend application. However, occasionally the app redirects to localhost:3003/app/app when using the router.push function.

I've delved into the Next.js source code and arrived at the same conclusion as @myxoh.

In our application, the bloom filter is set as:

numItems: 1, errorRate: 0.01, numBits: 10, numHashes: 7, bitArray: [0, 0, 1, 0, 0, 1, 1, 1, 1, 1],

Due to these settings, this condition evaluates to true, leading to a handleHardNavigation with the variable asNoSlashLocale, which has a duplicated /app in the URL.

I attempted to reproduce the behavior in a new application and couldn't replicate the issue. I believe this is because the bloom filter settings in the new app are different:

numItems: 1, errorRate: 0.01, numBits: 20, numHashes: 7, bitArray: [0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1],

How are the bloom filter parameters determined? While setting clientRouterFilter: false does work, what potential drawbacks does this approach have?

You can reproduce the issue using different bloom filter configurations at this link: https://codesandbox.io/s/javascript-forked-f4zl66?file=/index.js

@rflament
Copy link

We have the same issue with a dynamic url, it's very hard to debug: for a specific dynamic route pattern, one page for which the bllomfilter "contains" function return true and for another one it returns false. it creates a random experience for users, where some links will trigger a hard navigation to an unexisting url, and some will not. The only solution for now seems to not use next/link at all.

@ifeltsweet
Copy link

We also encountered this issue with one of the links. Setting clientRouterFilter: false resolved it. Seems like a pretty serious bug and it's not known if the current workaround can cause any other issues or not.

@Katona
Copy link

Katona commented Nov 22, 2023

Isn't the BloomFilter used in a conceptually wrong way?

  matchesBflStatic =
      matchesBflStatic ||
      !!this._bfl_s?.contains(asNoSlash) ||
      !!this._bfl_s?.contains(asNoSlashLocale)

I have never implemented or used it, so I might be wrong, but according to wikipedia:

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set".

So contains can return true for elements not in the filter.

@ijjk
Copy link
Member

ijjk commented Dec 3, 2023

Hi, can anyone facing this issue with clientRouterFilter not set to false please confirm this is still an issue in the latest version of Next.js?

@crisvergara
Copy link
Contributor

@ijjk We just hit this issue on 14.0.1. We can upgrade and give it a shot. What version would have a fix, latest stable or canary?

@ijjk
Copy link
Member

ijjk commented Dec 4, 2023

Latest stable is sufficient, if you are still seeing it there a repo with reproduction would help to investigate further!

@Katona
Copy link

Katona commented Dec 5, 2023

@ijjk I still see it with 14.0.3. I believe it is causing by how the BloomFilter is used, see my comment above.

@crisvergara
Copy link
Contributor

@ijjk I trust @Katona when they say they still see it, because as far as I can tell the implementation of the client-side filter had not changed between 14.0.1 and 14.0.3. Their comment is correct: Bloom filters are probabilistic structures. There will be false positives in the filter that will create a full hard refresh for some number of pages for a large enough population of users.

I can't hand you my repo. It's a private company repo for an ecommerce site with a few thousand product pages. Some of these product pages are now causing a hard refresh thanks to this bloom filter. The fix is to disable the filter (which everyone in this thread has done via experiment.clientRouterFilter = false).

As a side note: it's not at all uncommon to store application state in Context. This state is being dropped whenever a page is hard reset. We're going to work around this by storing application state in session storage and reloading into the context, but this increases the amount of defensive boilerplate we keep in our application code to work around this bloom filter.

@crisvergara
Copy link
Contributor

crisvergara commented Dec 5, 2023

Digging a bit into the commit history, there had been a Bloom Filter in place for a long time, but the implementation of the filter had changed with this change: https://github.com/vercel/next.js/pull/49741/files Major change being which hash function had been implemented.

EDIT: Nevermind re: ^. The issue on our end is not that any implementation had changed, but that this is the filter used to determine whether or not a page is in app router. Anyone experiencing these issues are using both page router and app router. The branch we're having issues in is the first branch where we implemented two pages in app router, and now it's causing issues on various pages in page router. Given the disproportionate number of app router pages vs. page router pages, I suspect there's an issue in the implementation of the filter creation or hashing. I'll dig further and update with what I find.

@crisvergara
Copy link
Contributor

crisvergara commented Dec 5, 2023

I added a test that shows our specific case: Draft PR here. It fails on latest canary.

Ironically we have so few links in app router, it would be more efficient to not use a bloom filter at all. Disabling the client side filter entirely makes it so our app fails to hard navigate on a transition from pageRouter to appRouter, which is undesirable as well.

Edit: I'm going to set clientRouterFilterAllowedRate to 0.000001. After experimenting a bit, it resolves my case. It seems like the default error rate of 0.01 is not acceptable for cases where there are very few app router pages compared to the number of page router pages. I'll leave my draft test up in case the maintainers want to take a hand at adjusting the defaults or altering the hashing mechanism.

I can't find any documentation on the Next site that outlines the client side filter, how it works, or how to configure it. IMHO a Bloom filter works well for cases where someone has many thousands of pages in app router. Every other case, it's overkill, and it seems in cases where there are very few app router pages, it's actively harmful. In my case, the only item in the static filter was _not-found, which caused most of my replicable collisions.

@franky47
Copy link
Contributor

franky47 commented Dec 10, 2023

I got bitten by the bloom filter too. I have an end-to-end test bench for next-usequerystate that uses both the app and pages routers, with a mirror in directory names and pages to test against both routers with Cypress.

When testing with a basePath, it got applied twice in the pages router, on non-shallow router.replace, but only in a particular directory (/{base}/pages/useQueryState). No issues in other directories, except some others I tried, with a seemingly random pattern (useSomething would pass, but useSomethingElse would fail).

Before I found this issue, I was starting to question my own sanity. Friends don't let friends play with non-deterministic data structures.

Edit: I ended up disabling the client filter entirely for this project to solve this issue:

// next.config.js
/** @type {import('next').NextConfig } */
const config = {
  basePath,
  experimental: {
    clientRouterFilter: false
  }
}

@Max101
Copy link

Max101 commented Jan 10, 2024

Hit the same issue. 5 completely unrelated links 2 of them cause a page reload, 3 dont. No way to explain it. Turning this setting off solves the issue.

On a side note though:
This post made me realize how very deeply complex the next routing implementation is. Does it really have to be this complex?
Can anyone explain WTH is going on when someone tries to change the route and how really can it explain that one link works and another doesnt? What logic is applied here that makes this work seemengly completely randomly

@ijjk
Copy link
Member

ijjk commented Jan 12, 2024

What logic is applied here that makes this work seemengly completely randomly

This probabilistic data structure and we originally had a lower filter size which increased the likely hood of false positives, after further investigation we can increase the filter size which reduces the likelihood of false positives quite a bit (< 0.01%) as the filter compresses very well.

x-ref: #60542

Thanks @crisvergara for providing that regression test!

ijjk added a commit that referenced this issue Jan 12, 2024
This updates our default error rate to be much more precise and reduce
false positives by increasing the default size of the client filter we
generate. We can afford to increase the default size as it compresses
extremely well and gives us more accurate navigations. This carries over
the failing test case from #59293
which showed one case of false positive in a smaller filter.

Closes: #47486

Closes NEXT-2070

---------

Co-authored-by: Cris Vergara <[email protected]>
@crisvergara
Copy link
Contributor

@ijjk Thanks for looking into this and merging my test in!

I want to repeat my other feedback though: this should be documented so others know that they can adjust the error rate in the config or turn the client filter on/off, as well as the consequences for doing so. At a minimum, it may be good to communicate that hard refreshes could be expected, at least to signal to devs that any information that must survive a refresh must be stashed in session storage. In my case, my functional issue is that ephemeral data stored in memory in Context was being be dropped between pages where it was expected to persist.

I self-host NextJS on a site that gets high peak traffic, so I understand the performance benefit that a client-side route filter would have and why it would be desirable, hence why I chose to reduce the error rate instead of turning off the filter outright. However, this thread reflects that many other implementors may not have those sort of scaling issues and may not understand what the benefit is. In aggregate, having a default client-side filter is beneficial for Vercel since they're hosting so many Next projects, but for an individual implementor the benefit may not be obvious.

@ijjk
Copy link
Member

ijjk commented Jan 12, 2024

Hey yeah we can look at getting this behavior/flags documented so it's more obvious how this should work and how it can be customized more.

ijjk added a commit that referenced this issue Jan 12, 2024
Follow-up to #60542 this adds
regression tests for `basePath` with the client router filter to ensure
we hard navigate to the correct URL when going from pages -> app.


x-ref: #47486
Closes NEXT-2086
ijjk added a commit that referenced this issue Jan 12, 2024
This adds documentation around the client router filter we leverage in
`pages` to allow incremental migration from `pages` to `app`. Also adds
mention of two experimental configs that can be useful for managing the
client router filter.

x-ref:
#47486 (comment)

Closes NEXT-2090

---------

Co-authored-by: Sam Ko <[email protected]>
Copy link
Contributor

This closed issue has been automatically locked because it had no new activity for 2 weeks. If you are running into a similar issue, please create a new issue with the steps to reproduce. Thank you.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area: app App directory (appDir: true) bug Issue was opened via the bug report template. locked Navigation Related to Next.js linking (e.g., <Link>) and navigation.
Projects
None yet
Development

Successfully merging a pull request may close this issue.