Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RECAP extension should send iquery.pl docket summary pages #251

Closed
johnhawkinson opened this issue Jun 4, 2018 · 11 comments · Fixed by freelawproject/recap-chrome#277
Closed
Assignees

Comments

@johnhawkinson
Copy link
Collaborator

(which are free)
And juriscraper should parse them.

And I argue that it should not have a special UPLOAD_TYPE for them, because there are zillions of pages that should probably be sent and it's silly to have the extension have to figure out what they are (pre-parse them), even if it has a ruleset based on URLs. Just send them to the server which is going to do the parsing anyhow, regardless.

Although it's probably wise to send the URL, possibly with some filtering.

@johnhawkinson
Copy link
Collaborator Author

These are pages that include case title, judge, and last update date. Generally with judge initials in district courts (mad):

<center>
 <b><font size="+1">1:18-cv-10225-MLW</font></b> Calderon Jimenez v. Cronen et al<br>
 Mark L. Wolf, presiding<br>
 <b>Date filed:</b> 02/05/2018<br>
 <b>Date of last filing:</b> 06/04/2018<br>
</center>

Sometimes (mab) lacking that in BK land, but with other goodies:

<center>
 <b><font size="+1">18-10943</font></b><b></b>Lynnel M. Cox                                     <br>
 
 <b>Case type:</b> bk
 <b>Chapter:</b> 13
 <b>Asset:</b> Yes
 <b>Vol: </b> v
 <b>Judge:</b> Joan N. Feeney <br>
 
 <b>Date filed:</b> 03/19/2018
 <b>Date of last filing:</b> 06/04/2018
 <br>
</center>

But sometimes with initials for BK too (e.g. nysb):

<center>
  <b><font size="+1">18-10943-smb</font></b>
  <b></b>Martha L. Osorio                                  <br>
  
  <b>Case type:</b> bk
  <b>Chapter:</b> 7
  <b>Asset:</b> No
  <b>Vol: </b> v
  <b>Judge:</b> Stuart M. Bernstein <br>
  
  <b>Date filed:</b> 04/06/2018
  <b>Date of last filing:</b> 05/23/2018
  <br>
</center>````

@mlissner
Copy link
Member

mlissner commented Jun 4, 2018

Thanks for this. Per discussion on Slack, filtering would be to cut out random nonces that PACER includes in URLs. I'm on the fence. OTOH, private info shouldn't be in GET params and they run the risk of those links being clicked. OTOH, who knows what abuse could be caused by sharing a PACER link?

@johnhawkinson
Copy link
Collaborator Author

Well, not necessarily only the apparent nonces (as in https://ecf.nysb.uscourts.gov/cgi-bin/iquery.pl?657620796569868-L_1_0-1). But also things like magic numbers: https://ecf.mad.uscourts.gov/doc1/09518715161?caseid=196119&de_seq_num=183&magic_num=47282941

(Now, the magic number is no longer valid after use, so it's not the worst thing in the world, but there's still no excuse to retain it. Although arguably we wouldn't be sending doc1 URLs anyhow).

OTOH, private info shouldn't be in GET params

I'm not sure what "shouldn't be" has anything to do with reality or what we care about.

@mlissner
Copy link
Member

mlissner commented Nov 1, 2022

This seems worth doing as @ERosendo is working on new RECAP features. Onto the heap it goes, but only to do the iquery.pl pages, not the rest. The rest is worth doing, ideally, but we're not interested in that kind of overhaul.

@albertisfu, you'll have to do some backend work for this too, in two stages. First, just to accept these pages from the extension and store them. Later to actually parse them. You could do it in one step, if that's not too much trouble too, but we want to get the first step at least working so @ERosendo can get the extension part of this done.

@mlissner
Copy link
Member

Just to do a bit of timing planning on this, @albertisfu, this is the next issue for @ERosendo on this backlog and he'll need your help. He can jump ahead to the next thing if you're busy, but please find time this week to work on this together however is best for you guys.

@albertisfu
Copy link

Sure! I've already met with @ERosendo and we talked about this issue. We agree that the API request for iquery.pl pages might look like the ones for Dockets. But we might need to add a new upload_type in order to differentiate them.

Maybe it might be called: IQUERY_PAGE = 12

@mlissner does this new upload_type seem good to you?

So we will accept this new upload type and store it.
Then in the next step, I will add support to Juriscraper to parse them and update Dockets using iquery page data.

@mlissner
Copy link
Member

Sounds great. There's already a parser and merging code in juriscraper and courtlistener too.

@albertisfu
Copy link

Perfect! I'll check them. Thanks!

@johnhawkinson
Copy link
Collaborator Author

We agree that the API request for iquery.pl pages might look like the ones for Dockets. But we might need to add a new upload_type in order to differentiate them.

I have suggested in the past that we should move away from this.
There are a lot of pages that the extension and the server don't parse ,and requiring upload types for them gets in the way of incremental progress.

I think we should just upload all the pages the extension is going to upload and let the server sort them out, without requiring new upload types, going forward.

@mlissner
Copy link
Member

Someday, maybe, I don't think it really affects progress much though.

@mlissner
Copy link
Member

Nice teamwork on this one, everybody. Now onwards to the main event.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants