Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PACER Dockets get Intermixed (probably due to messed up RECAP uploads) #2211

Closed
albertisfu opened this issue Jul 26, 2022 · 6 comments · Fixed by freelawproject/recap-chrome#273
Assignees

Comments

@albertisfu
Copy link
Contributor

I've checked this issue related to Docket Alerts that shows different metadata and slug for a case like the one reported by a user:

https://www.courtlistener.com/docket/63240678/briana-nakole-anderson/?order_by=desc

This case name is: Atlanta Light Bulbs, Inc. (22-52950)
And the docket entries that it shows seem to belong to this case.

So doing some tests I think that the problem is:

  • At some point, the docket was updated with the wrong metadata (maybe an error from PACER or an error triggered by the RECAP extension).

  • So when the docket was processed find_docket_object chose the wrong case and update_docket_metadata changed to the wrong metadata including the case name and slug.

  • So the Docket Alert was triggered with the wrong metadata.

  • Time after the case was updated properly by another source and it restored the correct metadata for the case.

I think is possible to check the HTML we received on the date the wrong Docket Alert was triggered so we could figure out what was wrong with the metadata and we can find out if it's possible to avoid this from happening.

So as a next step I'll submit a PR to add the PacerHtmlFiles to the admin.

In addition, as we talked about this issue, it might be related to the Recap extension when users open it on different windows, so that's something we also can look at to avoid this problem.

@albertisfu albertisfu self-assigned this Jul 26, 2022
@mlissner mlissner moved this to 🏗 In progress in @albertisfu's backlog Jul 26, 2022
@mlissner mlissner changed the title Docket alerts are triggered with the wrong metadata PACER Dockets get Intermixed (probably due to messed up RECAP uploads) Jul 26, 2022
@mlissner
Copy link
Member

I had a new theory that this might be caused by tabIds not being unique across browser windows, but the documentation for Firefox says:

The tab's ID. Tab IDs are unique within a browser session.

Chrome's documentation says the same. Darn. Still not sure what causes this issue.

@albertisfu
Copy link
Contributor Author

Interesting, well after checking the HTML I hope we can have more details about what might be happening.

@albertisfu
Copy link
Contributor Author

Well, we could identify and review the HTML related to the processing queue that updated the Docket with the wrong metadata.

Now I'm able to explain what the problem is:

The processing queue that triggered the Docket update was 6646469, in there it's possible to see that the pacer_case_id received into the JSON is: 1217988. This pacer_case_id belongs to Atlanta Light Bulbs, Inc. (22-52950)

However, the HTML file received for this processing queue doesn't belong to Atlanta Light Bulbs, Inc. (22-52950) but to Briana Nakole Anderson (22-50574)

Screen Shot 2022-07-28 at 10 53 32

So, the main problem is that the HTML Docket was sent with the wrong pacer_case_id probably due to a bug on the RECAP extension.

On the CL side, the problem is on find_docket_object

There are some lookups to try in order to find the docket e.g:

lookups = [
        {
            "pacer_case_id": pacer_case_id,
            "docket_number_core": docket_number_core,
        },
        {"pacer_case_id": pacer_case_id},
    ]

In this case, the first lookup failed because there was no docket that matched the pacer_case_id:1217988 and docket_number_core: 22052950.

So the problem was on the second lookup: {"pacer_case_id": 1217988} that returned the wrong case.

A couple of proposals for this issue on the CL side are:

  • Prioritize a lookup {"docket_number_core": docket_number_core} over {"pacer_case_id": pacer_case_id} because the number core is generated based on the HTML content.

So we could do something like this:

    if docket_number_core:
        lookups.append(
            {"docket_number_core": docket_number_core},
        )
    lookups.append({"pacer_case_id": pacer_case_id} )

However, I'm not sure if this might be a problem in case there is more than one docket with the same docket_number_core and court.

  • The second proposal is to keep the current lookups but before returning the selected docket we could check if the docket has a different docket_number_core from the one returned from make_docket_number_core if so it might be a hint of an inconsistency between the HTML sent and the pacer_case_id sent. So we could abort the upload and mark it as an INVALID_CONTENT

We could also is to try to find the issue on the Recap extension, which seems to be the root problem.

@mlissner please let me know what you think.

@mlissner
Copy link
Member

Good to have that all confirmed. I'd love to fix the root problem, but it makes sense to put in some defensive measures server-side as well. We get the pacer_case_id from the HTML as well as from the POST, right? Can we compare those and reject ones that differ?

@albertisfu
Copy link
Contributor Author

Well, I checked the HTML and seems there isn't a reliable way to retrieve the pacer_case_id from the HTML.

These are the fields that we get from Juriscraper after parsing the HTML:
"court_id"
"docket_number"
"case_name"
"date_filed
"date_terminated"
"date_converted"
"date_discharged"
"assigned_to_str"
"referred_to_str"
"cause"
"nature_of_suit"
"jury_demand"
"demand"
"jurisdiction"
"mdl_status"

  • Looking at the HTML. This link is generated by the Recap Extension and contains the pacer_case_id
    Screen Shot 2022-07-28 at 13 56 59

The bad news is that the intermixed issue is also affecting this link because this pacer_case_id doesn't match with the case.

  • The pacer_case_id is also present in this link:
    Screen Shot 2022-07-28 at 14 01 33

However, this link is not always present. For example, the HTML docket we got for this case, doesn't have it.Screen Shot 2022-07-28 at 14 03 33

  • The pacer_case_id sometimes could be found in the docket entries:
    Screen Shot 2022-07-28 at 14 05 20

However, I think that's not reliable because the pacer_case_id might be not always present on a docket entry.

That's why I think we couldn't just compare the pacer_case_id received on the request.

Do you think one of the proposals above using the docket_number_core might work as a defensive server-side measure?

Either way, if it's ok I'll open an issue on /recap/ to start debugging the issue from the Recap extension side.

@mlissner
Copy link
Member

Either way, if it's ok I'll open an issue on /recap/ to start debugging the issue from the Recap extension side.>

Yes, please do.

Do you think one of the proposals above using the docket_number_core might work as a defensive server-side measure?

Let's see what you can find in the extension, and maybe we do the second option above (punting when the docket number doesn't match up), if we still feel like we need to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants