PACER Dockets get Intermixed (probably due to messed up RECAP uploads) #2211

albertisfu · 2022-07-26T20:40:09Z

I've checked this issue related to Docket Alerts that shows different metadata and slug for a case like the one reported by a user:

https://www.courtlistener.com/docket/63240678/briana-nakole-anderson/?order_by=desc

This case name is: Atlanta Light Bulbs, Inc. (22-52950)
And the docket entries that it shows seem to belong to this case.

So doing some tests I think that the problem is:

At some point, the docket was updated with the wrong metadata (maybe an error from PACER or an error triggered by the RECAP extension).
So when the docket was processed find_docket_object chose the wrong case and update_docket_metadata changed to the wrong metadata including the case name and slug.
So the Docket Alert was triggered with the wrong metadata.
Time after the case was updated properly by another source and it restored the correct metadata for the case.

I think is possible to check the HTML we received on the date the wrong Docket Alert was triggered so we could figure out what was wrong with the metadata and we can find out if it's possible to avoid this from happening.

So as a next step I'll submit a PR to add the PacerHtmlFiles to the admin.

In addition, as we talked about this issue, it might be related to the Recap extension when users open it on different windows, so that's something we also can look at to avoid this problem.

The text was updated successfully, but these errors were encountered:

mlissner · 2022-07-26T20:51:38Z

I had a new theory that this might be caused by tabIds not being unique across browser windows, but the documentation for Firefox says:

The tab's ID. Tab IDs are unique within a browser session.

Chrome's documentation says the same. Darn. Still not sure what causes this issue.

albertisfu · 2022-07-26T21:49:34Z

Interesting, well after checking the HTML I hope we can have more details about what might be happening.

albertisfu · 2022-07-28T16:56:44Z

Well, we could identify and review the HTML related to the processing queue that updated the Docket with the wrong metadata.

Now I'm able to explain what the problem is:

The processing queue that triggered the Docket update was 6646469, in there it's possible to see that the pacer_case_id received into the JSON is: 1217988. This pacer_case_id belongs to Atlanta Light Bulbs, Inc. (22-52950)

However, the HTML file received for this processing queue doesn't belong to Atlanta Light Bulbs, Inc. (22-52950) but to Briana Nakole Anderson (22-50574)

So, the main problem is that the HTML Docket was sent with the wrong pacer_case_id probably due to a bug on the RECAP extension.

On the CL side, the problem is on find_docket_object

There are some lookups to try in order to find the docket e.g:

lookups = [
        {
            "pacer_case_id": pacer_case_id,
            "docket_number_core": docket_number_core,
        },
        {"pacer_case_id": pacer_case_id},
    ]

In this case, the first lookup failed because there was no docket that matched the pacer_case_id:1217988 and docket_number_core: 22052950.

So the problem was on the second lookup: {"pacer_case_id": 1217988} that returned the wrong case.

A couple of proposals for this issue on the CL side are:

Prioritize a lookup {"docket_number_core": docket_number_core} over {"pacer_case_id": pacer_case_id} because the number core is generated based on the HTML content.

So we could do something like this:

    if docket_number_core:
        lookups.append(
            {"docket_number_core": docket_number_core},
        )
    lookups.append({"pacer_case_id": pacer_case_id} )

However, I'm not sure if this might be a problem in case there is more than one docket with the same docket_number_core and court.

The second proposal is to keep the current lookups but before returning the selected docket we could check if the docket has a different docket_number_core from the one returned from make_docket_number_core if so it might be a hint of an inconsistency between the HTML sent and the pacer_case_id sent. So we could abort the upload and mark it as an INVALID_CONTENT

We could also is to try to find the issue on the Recap extension, which seems to be the root problem.

@mlissner please let me know what you think.

mlissner · 2022-07-28T18:00:24Z

Good to have that all confirmed. I'd love to fix the root problem, but it makes sense to put in some defensive measures server-side as well. We get the pacer_case_id from the HTML as well as from the POST, right? Can we compare those and reject ones that differ?

albertisfu · 2022-07-28T19:24:16Z

Well, I checked the HTML and seems there isn't a reliable way to retrieve the pacer_case_id from the HTML.

These are the fields that we get from Juriscraper after parsing the HTML:
"court_id"
"docket_number"
"case_name"
"date_filed
"date_terminated"
"date_converted"
"date_discharged"
"assigned_to_str"
"referred_to_str"
"cause"
"nature_of_suit"
"jury_demand"
"demand"
"jurisdiction"
"mdl_status"

Looking at the HTML. This link is generated by the Recap Extension and contains the pacer_case_id

The bad news is that the intermixed issue is also affecting this link because this pacer_case_id doesn't match with the case.

The pacer_case_id is also present in this link:

However, this link is not always present. For example, the HTML docket we got for this case, doesn't have it.

The pacer_case_id sometimes could be found in the docket entries:

However, I think that's not reliable because the pacer_case_id might be not always present on a docket entry.

That's why I think we couldn't just compare the pacer_case_id received on the request.

Do you think one of the proposals above using the docket_number_core might work as a defensive server-side measure?

Either way, if it's ok I'll open an issue on /recap/ to start debugging the issue from the Recap extension side.

mlissner · 2022-07-28T20:52:14Z

Either way, if it's ok I'll open an issue on /recap/ to start debugging the issue from the Recap extension side.>

Yes, please do.

Do you think one of the proposals above using the docket_number_core might work as a defensive server-side measure?

Let's see what you can find in the extension, and maybe we do the second option above (punting when the docket number doesn't match up), if we still feel like we need to.

albertisfu self-assigned this Jul 26, 2022

mlissner added this to @albertisfu's backlog Jul 26, 2022

mlissner moved this to 🏗 In progress in @albertisfu's backlog Jul 26, 2022

mlissner changed the title ~~Docket alerts are triggered with the wrong metadata~~ PACER Dockets get Intermixed (probably due to messed up RECAP uploads) Jul 26, 2022

albertisfu mentioned this issue Jul 27, 2022

Added PacerHtmlFiles admin #2215

Merged

albertisfu mentioned this issue Jul 28, 2022

PACER Dockets get Intermixed due to a RECAP extension bug freelawproject/recap#305

Closed

mlissner mentioned this issue Nov 2, 2022

Messed up RECAP dockets #688

Closed

mlissner linked a pull request Nov 17, 2022 that will close this issue

fix(docket report): Update logic to detect docket report freelawproject/recap-chrome#273

Merged

mlissner removed this from @albertisfu's backlog Nov 17, 2022

mlissner added this to @erosendo's backlog Nov 17, 2022

mlissner moved this to In Review in @erosendo's backlog Nov 17, 2022

mlissner closed this as completed in freelawproject/recap-chrome#273 Nov 18, 2022

Repository owner moved this from In Review to Done in @erosendo's backlog Nov 18, 2022

ERosendo mentioned this issue Mar 5, 2024

Docket display- two dockets combined, not displaying all available PDFs, #3833

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PACER Dockets get Intermixed (probably due to messed up RECAP uploads) #2211

PACER Dockets get Intermixed (probably due to messed up RECAP uploads) #2211

albertisfu commented Jul 26, 2022

mlissner commented Jul 26, 2022

albertisfu commented Jul 26, 2022

albertisfu commented Jul 28, 2022

mlissner commented Jul 28, 2022

albertisfu commented Jul 28, 2022

mlissner commented Jul 28, 2022

PACER Dockets get Intermixed (probably due to messed up RECAP uploads) #2211

PACER Dockets get Intermixed (probably due to messed up RECAP uploads) #2211

Comments

albertisfu commented Jul 26, 2022

mlissner commented Jul 26, 2022

albertisfu commented Jul 26, 2022

albertisfu commented Jul 28, 2022

mlissner commented Jul 28, 2022

albertisfu commented Jul 28, 2022

mlissner commented Jul 28, 2022