-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🏗 Build Spider: Fort Worth Boards & Commissions #9
base: main
Are you sure you want to change the base?
🏗 Build Spider: Fort Worth Boards & Commissions #9
Conversation
WalkthroughThe changes introduce a new spider class, Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (6)
tests/test_fortx_Fort_Worth_Boards.py (2)
39-71
: LGTM with suggestion: Comprehensive attribute testingThese tests thoroughly verify various attributes of the parsed items, which is crucial for ensuring the spider's accuracy. The multiple-item checks for description and start time are particularly good for catching inconsistencies.
Consider making the
test_id
function more robust by checking the format of the id rather than an exact string match. This would make the test less brittle to minor changes in the id generation logic.
106-108
: LGTM with suggestion: Efficient all-day attribute testingThis test efficiently verifies the all_day attribute for all parsed items using pytest's parametrize decorator. This approach ensures consistency across all items.
Consider adding an assertion message to provide more context if the test fails, e.g.,
assert item["all_day"] is False, f"Item {item['title']} should not be an all-day event"
.tests/files/fortx_Fort_Worth_Boards.json (2)
1-1
: Consider the trade-offs of multiple date representations.The JSON includes multiple representations of dates (e.g., "Date", "DD", "MM", "YYYY", "Day", "Month"). While this provides flexibility, it also introduces redundancy. Consider if all these fields are necessary for the scraper's functionality and downstream applications.
If some fields are not required, you might want to remove them to reduce data size and potential inconsistencies. However, if all fields are actively used, maintaining them is acceptable.
1-1
: LGTM: Event structure is comprehensive. Consider ISO 8601 for DateTime.The event structure provides necessary details with unique identifiers. However, for improved standardization and easier parsing, consider using ISO 8601 format for the "DateTime" field (e.g., "2024-10-01T00:00:00").
Example of ISO 8601 format:
"DateTime": "2024-10-01T00:00:00"city_scrapers/spiders/fortx_Fort_Worth_Boards.py (2)
33-33
: Set a meaningfulUser-Agent
headerAn empty
User-Agent
header may lead to request denials or violate the server's usage policies. Providing a specificUser-Agent
helps identify your scraper and promotes responsible web scraping practices.Update the headers as follows:
- headers = {"User-Agent": ""} + headers = {"User-Agent": "city-scrapers-fortx/1.0 (+https://github.com/yourusername/city-scrapers-fortx)"}
63-63
: Avoid shadowing built-in functionzip
Using
zip
as a variable name shadows the built-inzip()
function, which can lead to unexpected behavior or confusion.Rename the variable to
postal_code
:- zip = obj["PostCode"] + postal_code = obj["PostCode"]Also, update its usage in line 64:
- values = [street, suburb, zip, "TX"] + values = [street, suburb, postal_code, "TX"]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- city_scrapers/spiders/fortx_Fort_Worth_Boards.py (1 hunks)
- tests/files/fortx_Fort_Worth_Boards.json (1 hunks)
- tests/test_fortx_Fort_Worth_Boards.py (1 hunks)
🧰 Additional context used
🔇 Additional comments (8)
tests/test_fortx_Fort_Worth_Boards.py (6)
1-23
: LGTM: Well-structured test setupThe imports and test setup are well-organized. Freezing time and using a mock response from a JSON file are good practices for ensuring consistent and reproducible test results.
25-26
: LGTM: Concise count verificationThis test effectively verifies that the spider parses the expected number of items from the mock response.
29-36
: LGTM: Thorough title verificationThis test effectively checks the titles of multiple parsed items, which is a good practice for ensuring consistent title extraction across different entries.
74-83
: LGTM: Comprehensive location testingThis test effectively verifies location parsing for multiple items with different formats. It's particularly good that it covers cases both with and without a location name, ensuring robust location extraction.
86-103
: LGTM: Thorough metadata verificationThese tests effectively verify important metadata of the parsed items, including source URL, links, and classification. The links test is particularly thorough, checking both the title and href of the link.
1-108
: Overall: Well-structured and comprehensive test suiteThis test file provides excellent coverage of the
FortxFortWorthBoardsSpider
's functionality. It includes tests for all relevant attributes of the parsed items and uses appropriate testing techniques such as time freezing, mock responses, and parameterized tests.The structure is clear, and the tests are well-organized. The use of multiple test cases for some attributes helps ensure consistency across different types of parsed items.
While there are a few minor suggestions for improvements (such as making the id test more robust and adding assertion messages), the overall quality of the test suite is high. It should effectively catch any regressions or issues in the spider's parsing logic.
Great job on creating a thorough and well-structured test suite!
tests/files/fortx_Fort_Worth_Boards.json (1)
1-1
: LGTM: Overall JSON structure is well-organized.The JSON structure with "success" and "data" keys is appropriate for representing the collection of events. The "success" boolean is useful for error handling in the scraper.
city_scrapers/spiders/fortx_Fort_Worth_Boards.py (1)
40-40
: Verify ifdayfirst=True
is appropriate in date parsingUsing
dayfirst=True
assumes dates are inDD/MM/YYYY
format. Since the timezone isAmerica/Chicago
, the dates from the API might be inMM/DD/YYYY
format.Please confirm the date format returned by the API to ensure accurate date parsing. If the dates are in
MM/DD/YYYY
format, you should removedayfirst=True
or set it toFalse
.
@@ -0,0 +1 @@ | |||
{"success":true,"data":[{"Date":"01/10/2024","DD":1,"MM":10,"YYYY":2024,"Items":[{"CalendarId":"788ffb59-05d1-457d-b9dd-423d4b95a06e","Id":"e3182d81-2385-4796-809f-8a330d1c7ec9","MainContentId":"e3182d81-2385-4796-809f-8a330d1c7ec9","Name":"Notice of Public Comment for Brownsfields Program","DateTime":"1/10/2024 12:00:00 AM"},{"CalendarId":"788ffb59-05d1-457d-b9dd-423d4b95a06e","Id":"f812c8f3-336c-44d8-83e0-d6ffa3574d11","MainContentId":"f812c8f3-336c-44d8-83e0-d6ffa3574d11","Name":"DFW International Airport Board Operations Committee","DateTime":"1/10/2024 12:30:00 PM"},{"CalendarId":"788ffb59-05d1-457d-b9dd-423d4b95a06e","Id":"c03a2079-94c1-4ae9-b688-88751656ac90","MainContentId":"c03a2079-94c1-4ae9-b688-88751656ac90","Name":"DFW International Airport Board Finance, Audit, and IT Committee","DateTime":"1/10/2024 12:35:00 PM"},{"CalendarId":"788ffb59-05d1-457d-b9dd-423d4b95a06e","Id":"dbc79239-74ab-4406-9ec2-2e32784d83d8","MainContentId":"dbc79239-74ab-4406-9ec2-2e32784d83d8","Name":"DFW International Airport Board Concessions/Commercial Development","DateTime":"1/10/2024 12:45:00 PM"}],"Day":"Tuesday","Month":"October"},{"Date":"02/10/2024","DD":2,"MM":10,"YYYY":2024,"Items":[{"CalendarId":"788ffb59-05d1-457d-b9dd-423d4b95a06e","Id":"e3182d81-2385-4796-809f-8a330d1c7ec9","MainContentId":"e3182d81-2385-4796-809f-8a330d1c7ec9","Name":"Notice of Public Comment for Brownsfields Program","DateTime":"2/10/2024 12:00:00 AM"},{"CalendarId":"788ffb59-05d1-457d-b9dd-423d4b95a06e","Id":"b37500ac-d047-41db-8c9d-c7cd4cd3da00","MainContentId":"b37500ac-d047-41db-8c9d-c7cd4cd3da00","Name":"Tax Increment Reinvestment Zone No. 8 (Lancaster Corridor TIF)","DateTime":"2/10/2024 2:00:00 PM"}],"Day":"Wednesday","Month":"October"},{"Date":"03/10/2024","DD":3,"MM":10,"YYYY":2024,"Items":[{"CalendarId":"788ffb59-05d1-457d-b9dd-423d4b95a06e","Id":"e3182d81-2385-4796-809f-8a330d1c7ec9","MainContentId":"e3182d81-2385-4796-809f-8a330d1c7ec9","Name":"Notice of Public Comment for Brownsfields Program","DateTime":"3/10/2024 12:00:00 AM"},{"CalendarId":"788ffb59-05d1-457d-b9dd-423d4b95a06e","Id":"7e5b64cf-901b-4a5b-bd25-89c6df16ccc2","MainContentId":"7e5b64cf-901b-4a5b-bd25-89c6df16ccc2","Name":"DFW International Airport Board and Committee Meetings","DateTime":"3/10/2024 8:30:00 AM"},{"CalendarId":"788ffb59-05d1-457d-b9dd-423d4b95a06e","Id":"3fcb8be2-8129-42d9-9975-deac97c3abae","MainContentId":"3fcb8be2-8129-42d9-9975-deac97c3abae","Name":"Downtown Design Review Board (DDRB)","DateTime":"3/10/2024 3:00:00 PM"},{"CalendarId":"788ffb59-05d1-457d-b9dd-423d4b95a06e","Id":"664cfb35-e16e-4974-bdb0-de6a9b7bfc53","MainContentId":"664cfb35-e16e-4974-bdb0-de6a9b7bfc53","Name":"Library Advisory Board Meeting (LAB)","DateTime":"3/10/2024 6:30:00 PM"}],"Day":"Thursday","Month":"October"},{"Date":"04/10/2024","DD":4,"MM":10,"YYYY":2024,"Items":[{"CalendarId":"788ffb59-05d1-457d-b9dd-423d4b95a06e","Id":"e3182d81-2385-4796-809f-8a330d1c7ec9","MainContentId":"e3182d81-2385-4796-809f-8a330d1c7ec9","Name":"Notice of Public Comment for Brownsfields Program","DateTime":"4/10/2024 12:00:00 AM"}],"Day":"Friday","Month":"October"},{"Date":"07/10/2024","DD":7,"MM":10,"YYYY":2024,"Items":[{"CalendarId":"788ffb59-05d1-457d-b9dd-423d4b95a06e","Id":"822ad0c4-25c8-4150-ae67-b62d4a0cc706","MainContentId":"822ad0c4-25c8-4150-ae67-b62d4a0cc706","Name":"Business Equity Advisory Board","DateTime":"7/10/2024 11:30:00 AM"}],"Day":"Monday","Month":"October"},{"Date":"09/10/2024","DD":9,"MM":10,"YYYY":2024,"Items":[{"CalendarId":"788ffb59-05d1-457d-b9dd-423d4b95a06e","Id":"88c422cf-e46c-4b77-adb3-dda32d722852","MainContentId":"88c422cf-e46c-4b77-adb3-dda32d722852","Name":"Zoning Commission","DateTime":"9/10/2024 12:00:00 PM"},{"CalendarId":"788ffb59-05d1-457d-b9dd-423d4b95a06e","Id":"f0f5f8da-12e6-4c2a-8e4a-a70b13da7fa2","MainContentId":"f0f5f8da-12e6-4c2a-8e4a-a70b13da7fa2","Name":"Collective Bargaining","DateTime":"9/10/2024 2:00:00 PM"},{"CalendarId":"788ffb59-05d1-457d-b9dd-423d4b95a06e","Id":"1f32c7ae-4860-4c61-9518-7db1882360dc","MainContentId":"1f32c7ae-4860-4c61-9518-7db1882360dc","Name":"Tax Increment Reinvestment Zone No. 3 (Downtown TIF)","DateTime":"9/10/2024 3:00:00 PM"},{"CalendarId":"788ffb59-05d1-457d-b9dd-423d4b95a06e","Id":"a272314f-e865-44ac-ad4a-906d6f1ee3cc","MainContentId":"a272314f-e865-44ac-ad4a-906d6f1ee3cc","Name":"Community Development Council","DateTime":"9/10/2024 6:30:00 PM"}],"Day":"Wednesday","Month":"October"},{"Date":"10/10/2024","DD":10,"MM":10,"YYYY":2024,"Items":[{"CalendarId":"788ffb59-05d1-457d-b9dd-423d4b95a06e","Id":"08ce922d-9def-4564-a3ba-72dd3cb62d4a","MainContentId":"08ce922d-9def-4564-a3ba-72dd3cb62d4a","Name":"Mayor\u0027s Committee on Persons with Disabilities (MCPD)","DateTime":"10/10/2024 10:00:00 AM"},{"CalendarId":"788ffb59-05d1-457d-b9dd-423d4b95a06e","Id":"13359613-3931-4d7c-8f2b-d0049cc37e32","MainContentId":"13359613-3931-4d7c-8f2b-d0049cc37e32","Name":"Fort Worth After School Coordinating Board Meeting","DateTime":"10/10/2024 11:30:00 AM"}],"Day":"Thursday","Month":"October"}]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Address Data Completeness and Event Duplication Issues
-
Missing Dates:
- Issue: The dataset for October 1-10, 2024, lacks entries for October 5, 6, and 8.
- Action: Verify if events are expected on these dates. If events should be present, investigate the data scraping process to ensure it captures all necessary dates.
-
Repeated Events:
- Issue: The "Notice of Public Comment for Brownsfields Program" is listed on October 1st through 4th with the same ID.
- Action: Confirm whether this event is intended to span multiple days. If it is a multi-day event, consider adding a duration or recurrence field to the data model to prevent duplication.
🔗 Analysis chain
Verify data completeness and repeated events.
-
Data Completeness: The JSON covers October 1-10, 2024, but some dates are missing (e.g., October 5, 6, 8). Verify if this is expected behavior or if the scraper should retrieve data for all dates in the range.
-
Repeated Events: The "Notice of Public Comment for Brownsfields Program" event appears on multiple dates with the same ID. If this is a multi-day event, consider adding a field to indicate its duration or recurring nature.
To check for missing dates and repeated events, you can run the following script:
This script will help you visualize the date coverage and the occurrences of the repeated event.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify data completeness and identify repeated events
# Check for missing dates
echo "Checking for missing dates:"
jq '.data[].Date' tests/files/fortx_Fort_Worth_Boards.json | sort | uniq
# Identify repeated events
echo -e "\nChecking for repeated events:"
jq '.data[].Items[] | select(.Name == "Notice of Public Comment for Brownsfields Program") | {Name, Id, Date: .DateTime[:10]}' tests/files/fortx_Fort_Worth_Boards.json
Length of output: 1027
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is data retrieved from the API. We do not have control over the dates or IDs. Missing dates represent no meetings on those dates. Duplicate IDs represent the same meeting, just on different dates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cruznunez, thank you for the clarification. I've noted this information for future reviews.
(_/)
(•_•)
/>💭
✏️ Learnings added
Learnt from: cruznunez
PR: City-Bureau/city-scrapers-fortx#9
File: tests/files/fortx_Fort_Worth_Boards.json:1-1
Timestamp: 2024-10-08T19:11:25.402Z
Learning: The data retrieved from the API may contain missing dates, which represent days with no meetings.
Learnt from: cruznunez
PR: City-Bureau/city-scrapers-fortx#9
File: tests/files/fortx_Fort_Worth_Boards.json:1-1
Timestamp: 2024-10-08T19:11:25.402Z
Learning: Duplicate IDs in the API data represent the same meeting occurring on different dates.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
headers = {"User-Agent": ""} | ||
info = requests.get(info_url, headers=headers).json()["data"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use scrapy.Request
instead of requests.get
In line 34, replace requests.get
with scrapy.Request
to leverage Scrapy's asynchronous processing and built-in features.
Refactor the code as follows:
- headers = {"User-Agent": ""}
- info = requests.get(info_url, headers=headers).json()["data"]
+ yield scrapy.Request(
+ url=info_url,
+ headers={"User-Agent": "city-scrapers-fortx/1.0 (+https://github.com/yourusername/city-scrapers-fortx)"},
+ callback=self.parse_meeting_detail,
+ meta={'item': item}
+ )
Then, implement the parse_meeting_detail
method to handle the response:
def parse_meeting_detail(self, response):
item = response.meta['item']
info = response.json()["data"]
meeting = Meeting(
title=item["Name"],
description=info["Description"],
classification=COMMISSION,
start=parse(item["DateTime"]),
end=None,
all_day=False,
time_notes="",
location=self._parse_location(info),
links=self._parse_links(info),
source=self._parse_source(response),
)
meeting["status"] = self._get_status(meeting)
meeting["id"] = self._get_id(meeting)
yield meeting
def parse(self, response): | ||
""" | ||
Parse meetings from multiple API responses. | ||
""" | ||
|
||
data = response.json() | ||
|
||
for calendar_day in data["data"]: | ||
for item in calendar_day["Items"]: | ||
# make another API requests to get more info about meeting | ||
info_url = f"https://www.fortworthtexas.gov/ocapi/get/contentinfo?calendarId={item['CalendarId']}&contentId={item['Id']}&language=en-US&mainContentId={item['Id']}" # noqa | ||
headers = {"User-Agent": ""} | ||
info = requests.get(info_url, headers=headers).json()["data"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance testability by refactoring network requests
To facilitate testing without relying on network availability, consider refactoring your code to use Scrapy's request mechanisms, which can be more easily mocked in tests. This aligns with Scrapy's design and improves the reliability of your tests.
I can help refactor the code to better support mocking of network requests in your tests. Would you like me to provide an example of how to stub out these requests using Scrapy's facilities?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cruznunez, certainly! I can help refactor your spider to utilize Scrapy's request mechanisms, which will make it easier to mock network requests during testing.
Currently, you're making external requests using the requests
library within the parse
method. In Scrapy, it's recommended to use scrapy.Request
and let the framework handle the requests. This approach leverages Scrapy's asynchronous request handling and makes it straightforward to mock requests in your tests.
Here's how you can refactor your spider:
import scrapy
from city_scrapers_core.constants import COMMISSION
from city_scrapers_core.items import Meeting
from city_scrapers_core.spiders import CityScrapersSpider
from dateutil.parser import parse
class FortxFortWorthBoardsSpider(CityScrapersSpider):
name = "fortx_Fort_Worth_Boards"
agency = "Fort Worth Boards and Commissions"
timezone = "America/Chicago"
start_urls = [
"https://www.fortworthtexas.gov/ocapi/calendars/getcalendaritems?Ids=788ffb59-05d1-457d-b9dd-423d4b95a06e&LanguageCode=en-US"
]
def parse(self, response):
data = response.json()
for calendar_day in data["data"]:
for item in calendar_day["Items"]:
# Build the URL for the meeting details
info_url = (
f"https://www.fortworthtexas.gov/ocapi/get/contentinfo?"
f"calendarId={item['CalendarId']}&"
f"contentId={item['Id']}&"
f"language=en-US&"
f"mainContentId={item['Id']}"
)
# Yield a new request for each meeting
yield scrapy.Request(
url=info_url,
callback=self.parse_meeting,
cb_kwargs={'item': item}
)
def parse_meeting(self, response, item):
info = response.json()["data"]
meeting = Meeting(
title=info.get("Name", ""),
description=info.get("Description", ""),
classification=COMMISSION,
start=parse(info.get("DateTime")),
end=None,
all_day=False,
time_notes="",
location=self._parse_location(info),
links=self._parse_links(info),
source=self._parse_source(response),
)
meeting["status"] = self._get_status(meeting)
meeting["id"] = self._get_id(meeting)
yield meeting
def _parse_location(self, info):
return {
"address": info.get("Location", "").strip(),
"name": "",
}
def _parse_links(self, info):
links = []
for file in info.get("Files", []):
links.append({
"href": file.get("Url", ""),
"title": file.get("Title", ""),
})
return links
def _parse_source(self, response):
return response.url
Explanation:
- Use of
scrapy.Request
: Instead of usingrequests.get
, we usescrapy.Request
to schedule new requests for the meeting details. - Callback functions: We define a new method
parse_meeting
to handle the response from each meeting's detail page. - Passing data using
cb_kwargs
: We pass the originalitem
data to theparse_meeting
method usingcb_kwargs
. - Response handling: The
response
inparse_meeting
is a ScrapyResponse
object, and we can callresponse.json()
to get the JSON data.
Benefits:
- Testability: By using Scrapy's request and response objects, you can easily mock them in your tests without relying on actual network calls.
- Efficiency: Scrapy handles request scheduling and concurrency efficiently, which can improve the performance of your spider.
Updating Your Tests:
You can now mock the network requests in your tests using Scrapy's facilities. Here's an example of how you might adjust your tests:
import pytest
from unittest.mock import MagicMock
from scrapy import Request
from scrapy.http import TextResponse
from city_scrapers_core.constants import COMMISSION
from city_scrapers_core.items import Meeting
from city_scrapers_core.spiders import CityScrapersSpider
from city_scrapers_core.tests import SpiderTest
from city_scrapers.spiders.fortx_Fort_Worth_Boards import FortxFortWorthBoardsSpider
@pytest.fixture
def spider():
return FortxFortWorthBoardsSpider()
def test_parse(spider):
with open('tests/files/fortx_Fort_Worth_Boards.json', 'r') as f:
json_data = f.read()
response = TextResponse(
url=spider.start_urls[0],
body=json_data,
encoding='utf-8',
)
# Mock the scrapy.Request to return a predefined response
def mock_request(url, callback, cb_kwargs):
with open('tests/files/meeting_detail.json', 'r') as f:
meeting_data = f.read()
meeting_response = TextResponse(
url=url,
body=meeting_data,
encoding='utf-8',
)
return callback(meeting_response, **cb_kwargs)
with patch.object(spider, 'parse_meeting') as mock_parse_meeting:
with patch('scrapy.Request', side_effect=mock_request):
results = list(spider.parse(response))
assert len(results) > 0
# Add assertions for your meetings
# Example assertion
meeting = results[0]
assert isinstance(meeting, Meeting)
assert meeting['title'] != ''
Notes:
- Mocking
scrapy.Request
: We patchscrapy.Request
to return a predefined response from a local file (meeting_detail.json
), simulating the network response. - Mocking
parse_meeting
: Alternatively, you can patchparse_meeting
if you want to control or inspect how it's called. - Assertions: You can now write assertions based on the mocked data to ensure your spider extracts the correct information.
By refactoring your spider and tests this way, you eliminate the dependency on network availability during testing, making your tests more reliable and faster.
Let me know if you have any questions or need further assistance!
@@ -0,0 +1,83 @@ | |||
import requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using the requests
library within Scrapy spiders
Using the requests
library inside Scrapy spiders bypasses Scrapy's asynchronous request handling, middleware, and caching mechanisms, which can lead to blocking behavior and inefficient resource utilization.
Consider removing the import of requests
:
- import requests
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import requests |
What's this PR do?
This PR adds a scraper for Forth Worth Boards & Commissions. The scraper parses an API endpoint for upcoming meetings.
Why are we doing this?
Scraper requested from spreadsheet.
Steps to manually test
test_output.csv
to ensure the data looks valid.Are there any smells or added technical debt to note?
This scraper uses multiple API endpoints. After the page initially loads, a request is made to the index endpoint. When we click on a calendar item, another request is made to a meeting endpoint.
The test is using a file to act as the response for the first API request but all other requests actually are fetched over the network. I believe we should be stubbing all requests but not sure how to do that with our current setup. I tried downloading the other responses and adding similar code like from the scrapy generator but it didn't seem to do much. Tests would still fail if I turned my wifi off.
Summary by CodeRabbit
New Features
Tests