Spider: Chicago Northwest Home Equity Assurance Program #672

pjsier · 2019-02-03T19:50:21Z

URL: https://nwheap.com/category/meet-minutes-and-agendas/
Spider Name: chi_northwest_home_equity
Agency Name: Chicago Northwest Home Equity Assurance Program

See the contribution guide for information on how to get started

The text was updated successfully, but these errors were encountered:

GeorgeDubuque · 2019-06-21T01:48:48Z

I would like to take this one!

pjsier · 2019-06-21T14:06:32Z

@GeorgeDubuque sorry, I missed this initially. Right now our policy is for people to take on one at a time, so feel free to start this or the O'Hare scraper and move on to the other once you're done

mingchan96 · 2019-10-30T19:38:53Z

Hi. For a class, my partner and I are looking for an issue to contribute to. I was wondering if this issue is still up for grabs? If this isn't available then is there an issue that is still open that we can look at?

pjsier · 2019-10-30T19:45:33Z

@mingchan96 this is open, all yours if you're interested!

erikkristoferanderson · 2020-04-01T20:31:57Z

I'd like to claim this one, please.

mingchan96 · 2020-04-01T21:22:12Z

@Ekand you can have it. My partner and I are currently business with other projects.

pjsier · 2020-04-02T12:54:56Z

@Ekand all yours!

erikkristoferanderson · 2020-04-03T17:27:08Z

@pjsier Thanks!
I'll start by studying the contributors guide and try to have something in a pull request in two weeks.

erikkristoferanderson · 2020-04-21T18:32:51Z

@pjsier Well, I'm sorry to do this again, but I'm going to bow out and release this task. I just got a job (yay!) and I'm going to proritize that for now.

pjsier · 2020-04-21T18:57:23Z

@Ekand no problem, and congrats on the job!

SubtleHyperbole · 2020-06-07T22:46:45Z

Hey Pj, so I am working on this one (bc the illinois department of corrections seems to have not been doing what they are supposed to do in terms of posting info about public meetings for the last couple years), and I have a question.

It looks like, in general, the response variable used in the test .py is coming from a method called file_response which pulls a saved offline version of the webpage which was created (i think?) when the spider was created on the command line, leaving no way of pulling additional pages which might be needed to completely parse all meetings.

For the example on this issue (chi_northwest_home_equity, i think), the meetings are listed in pages of 10, with each additional page having a /page/2/ or /page/3/ on up. Normally when I would be scraping a site like this, I would use requests to try get a page, checking its status code looking for a 4## and upon reaching that 4##, to stop the scraper.

However, because the parser seems to be pulling from offline files which were taken when the spider was generated, I'm not sure what to do. I figure that on the command line when I create the spider, I could probably put in a list of urls, but on the command line I can't (or at least, don't know how to) check url's response code to know how many /page/#/'s to include on the list to go up to.

There are a few methods in the CityScrapersSpider class which sound promising, like .make_requests_from_url() but what little documentation I can see, that specific one is deprecated. Besides, i imagine there must be a general best practices which this should be accomplished. I've looked at the contributions guideline page and couldn't find it, though if I missed it, I apologize in advance.

pjsier · 2020-06-08T13:19:00Z

Hi @SubtleHyperbole, I commented on the other issue but we are still interested in agencies that aren't updating as often as they should be. If you'd like to do this one instead let me know.

For your question on file_response, we've been generally saving the HTML files for other pages manually with something like wget or curl since it's on a case-by-case basis and the template is focused on the most common cases. You can see an example of a spider with multiple pages for tests in the tests for chi_ssa_42.

For this spider, the simplest way to handle pagination is scrape the "Older posts" link each time it's on the page rather than list all of the pages up front. Because the first page already goes well back into 2019 though it may be fine to just pull the first page of results

SubtleHyperbole · 2020-06-09T04:30:45Z

great thank you!

SubtleHyperbole · 2020-06-09T04:37:46Z

Actually are you sure its ssa_42? I'm looking on their website at both https://ssa42.org/ssa-42-meeting-dates/ and https://ssa42.org/minutes-of-meetings/ but I don't see any additional pages of meetings info.

pjsier · 2020-06-09T12:48:21Z

That scraper is just one example of including an additional page. il_commerce is another example that might be more similar, but either one is following the same overall idea of downloading separate pages to HTML for tests

SubtleHyperbole · 2020-06-25T19:43:26Z

Okay so I think I have the spider for this finished and (at least from what I can see) have the tests page also finished.

Unfortunately, because I bounced around on a couple of other issues before finally landing on this one to complete, there are files within my file directory which aren't correct (they have default spiders and test pages for il_corrections, chi_housing, and cook_human_rights), so I don't want to submit a pull request because I'm pretty sure it will also try to submit these as well.

Should I just start a whole new clone directory of the project (fork? not sure the nomenclature) and start a new branch for this issue, then just copy the spider and test file over to that one, then submit the pull? er... why isn't it called a push? It seems like I'm requesting that the changes i've made locally on my laptop get PUSHED to the main project directory. Why is this called a pull request?

pjsier · 2020-06-25T21:35:24Z

@SubtleHyperbole glad to hear it! You should be able to only stage the files that are relevant and then commit those. So it could be something like this:

git add city_scrapers/spiders/chi_northwest_home_equity.py
git add tests/test_chi_northwest_home_equity.py
git commit -m "Add chi_northwest_home_equity"

And "pull request" is a GitHub-specific term (GitLab uses "merge request"), but my understanding has been because it's requesting the project maintainer to "pull" in your changes

SubtleHyperbole · 2020-06-25T21:50:17Z

oh, duh. lol that makes sense. I have a tendency to only think about things from my own perspective sometimes hah!

SubtleHyperbole · 2020-06-26T20:22:11Z

crap. I just submitted the request and realized that I never ran those code cleaners the faq says to run on the code beforehand. Lint i think?

pjsier · 2020-06-26T21:16:52Z

@SubtleHyperbole No problem! I'm not seeing the request, but it's fine to make commits to a branch after you've opened up a pull request, and that's usually the case when we review them. You can run the style checks with these commands in the docs

SubtleHyperbole · 2020-06-26T21:43:54Z

hmmm i ran those three lines of code you listed in the last post into my terminal, while inside the pipenv shell, while sitting in the directory of the main cityscrapers folder (so that the relative file paths used in the 3 lines of code would make sense).

SubtleHyperbole · 2020-06-26T21:44:40Z

(git) bash-3.2$ git add city_scrapers/spiders/chi_northwest_home_equity.py
(git) bash-3.2$ git add tests/test_chi_northwest_home_equity.py
(git) bash-3.2$ git commit -m "Add chi_northwest_home_equity"
[0672-spider-chi_northwest_home_equity 64dfa3d] Add chi_northwest_home_equity
2 files changed, 190 insertions(+)
create mode 100644 city_scrapers/spiders/chi_northwest_home_equity.py
create mode 100644 tests/test_chi_northwest_home_equity.py
(git) bash-3.2$

pjsier · 2020-06-26T22:00:43Z

Gotcha, that was to create a commit, but you'll need to push that and submit a pull request separately. It's usually called the "GitHub Flow" and there's more information on it here

SubtleHyperbole · 2020-07-06T02:14:26Z

Just as an update, I literally had the spider completed but out of an effort at completeness, I emailed the admin of the site to ask a question about what seemed like a small discrepancy between the lists of events (yes, on the page there seems to be multiple sources of meetings lists data), and to my chagrin i got a reply that they decided to revamp how the site provides info on the meetings.

In other words, my spider is now entirely broken LMAO. Right now I am waiting for their new system to work out a last kink, before I get back onto reworking the spider. Just wanted to update that I hadn't given up on this or anything.

Oh, also, the main events page (nwheap.com/events/) now is a 404 -- it might come back though, that is what I am waiting to find out.

pjsier · 2020-07-14T01:38:31Z

Thanks for the update! I think it's fine to submit as is for now if it's still working

KevivJaknap · 2023-09-02T04:40:43Z

Hey, I would like to tackle this issue.

haileyhoyat · 2023-09-02T14:25:18Z

@KevivJaknap Hello! Thanks so much for checking out our project. Go for it.

KevivJaknap · 2023-09-09T16:57:56Z

@haileyhoyat Just wanted to inform that I've submitted a pull request

pjsier added good first issue help wanted new spider needed location: chicago labels Feb 3, 2019

pjsier added the Hacktoberfest label Oct 20, 2019

pjsier added claimed and removed help wanted labels Oct 30, 2019

JKReynolds mentioned this issue Nov 18, 2019

WIP Spider for issue #672 #929

Closed

pjsier added help wanted and removed claimed labels Feb 6, 2020

pjsier added claimed and removed help wanted labels Apr 2, 2020

pjsier added help wanted and removed claimed labels Apr 21, 2020

SubtleHyperbole mentioned this issue Jul 17, 2020

0672 spider chi northwest home equity #966

Open

4 tasks

haileyhoyat assigned haileyhoyat and KevivJaknap and unassigned haileyhoyat Sep 2, 2023

This was referenced Sep 6, 2023

Chicago Northwest Home Equity Assurance Program #1046

Closed

0672 spider chi northwest home equity #1047

Closed

0672 spider chi northwest home equity #1048

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spider: Chicago Northwest Home Equity Assurance Program #672

Spider: Chicago Northwest Home Equity Assurance Program #672

pjsier commented Feb 3, 2019

GeorgeDubuque commented Jun 21, 2019

pjsier commented Jun 21, 2019

mingchan96 commented Oct 30, 2019 •

edited

Loading

pjsier commented Oct 30, 2019

erikkristoferanderson commented Apr 1, 2020

mingchan96 commented Apr 1, 2020

pjsier commented Apr 2, 2020

erikkristoferanderson commented Apr 3, 2020

erikkristoferanderson commented Apr 21, 2020

pjsier commented Apr 21, 2020

SubtleHyperbole commented Jun 7, 2020

pjsier commented Jun 8, 2020

SubtleHyperbole commented Jun 9, 2020

SubtleHyperbole commented Jun 9, 2020

pjsier commented Jun 9, 2020

SubtleHyperbole commented Jun 25, 2020

pjsier commented Jun 25, 2020

SubtleHyperbole commented Jun 25, 2020

SubtleHyperbole commented Jun 26, 2020

pjsier commented Jun 26, 2020

SubtleHyperbole commented Jun 26, 2020

SubtleHyperbole commented Jun 26, 2020

pjsier commented Jun 26, 2020

SubtleHyperbole commented Jul 6, 2020

pjsier commented Jul 14, 2020

KevivJaknap commented Sep 2, 2023

haileyhoyat commented Sep 2, 2023

KevivJaknap commented Sep 9, 2023

Spider: Chicago Northwest Home Equity Assurance Program #672

Spider: Chicago Northwest Home Equity Assurance Program #672

Comments

pjsier commented Feb 3, 2019

GeorgeDubuque commented Jun 21, 2019

pjsier commented Jun 21, 2019

mingchan96 commented Oct 30, 2019 • edited Loading

pjsier commented Oct 30, 2019

erikkristoferanderson commented Apr 1, 2020

mingchan96 commented Apr 1, 2020

pjsier commented Apr 2, 2020

erikkristoferanderson commented Apr 3, 2020

erikkristoferanderson commented Apr 21, 2020

pjsier commented Apr 21, 2020

SubtleHyperbole commented Jun 7, 2020

pjsier commented Jun 8, 2020

SubtleHyperbole commented Jun 9, 2020

SubtleHyperbole commented Jun 9, 2020

pjsier commented Jun 9, 2020

SubtleHyperbole commented Jun 25, 2020

pjsier commented Jun 25, 2020

SubtleHyperbole commented Jun 25, 2020

SubtleHyperbole commented Jun 26, 2020

pjsier commented Jun 26, 2020

SubtleHyperbole commented Jun 26, 2020

SubtleHyperbole commented Jun 26, 2020

pjsier commented Jun 26, 2020

SubtleHyperbole commented Jul 6, 2020

pjsier commented Jul 14, 2020

KevivJaknap commented Sep 2, 2023

haileyhoyat commented Sep 2, 2023

KevivJaknap commented Sep 9, 2023

mingchan96 commented Oct 30, 2019 •

edited

Loading