-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update fix/reduce redundant device data called[wip] #3526
Update fix/reduce redundant device data called[wip] #3526
Conversation
…alled Updates from staging
📝 Walkthrough📝 WalkthroughWalkthroughThe changes enhance the logging and error handling capabilities across several modules in the codebase. The Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (4)
src/workflows/airqo_etl_utils/airnow_utils.py (1)
14-16
: Excellent addition of logging functionality!The introduction of logging is a commendable improvement to the codebase. It will significantly enhance our ability to track and debug issues.
A small suggestion to consider:
Consider setting the logging level explicitly to ensure consistent behavior across different environments. You can add the following line after the logger initialization:
logger.setLevel(logging.INFO) # Or any other appropriate levelThis ensures that the logger will capture all intended log messages regardless of the global logging configuration.
src/workflows/airqo_etl_utils/airqo_utils.py (3)
164-164
: Improved error handling, but consider enhancing the logging message.The change from raising a
ValueError
to usinglogger.exception
is a good improvement in error handling. It allows the program to continue running while still recording the error. However, the log message could be more informative.Consider enhancing the log message to provide more context:
- logger.exception("A valid device category must be provided") + logger.exception(f"Invalid device category provided: {device_category}")This change would make debugging easier by explicitly stating which device category caused the issue.
Line range hint
151-180
: Consider enhancing error handling and method clarity.The
flatten_field_8
method is well-structured, but there are a few areas for potential improvement:
- The general exception handler in the loop might mask specific errors. Consider catching and logging specific exceptions.
- The method returns a pandas Series, but this isn't clear from the method signature. Adding a return type hint would improve clarity.
- The method doesn't explicitly handle the case where
field_8
is None or an empty string, which might lead to unexpected behavior.Here's a suggested refactor to address these points:
- def flatten_field_8(device_category: DeviceCategory, field_8: str = None): + def flatten_field_8(device_category: DeviceCategory, field_8: str = None) -> pd.Series: """ Maps thingspeak field8 data to airqo custom mapping. Mappings are defined in the config file. Args: device_category(DeviceCategory): Type/category of device field_8(str): Comma separated string returns: Pandas Series object of mapped fields to their appropriate values. """ - values: List[str] = field_8.split(",") if field_8 else "" + values: List[str] = field_8.split(",") if field_8 else [] series = pd.Series(dtype=float) match device_category: case DeviceCategory.BAM: mappings = configuration.AIRQO_BAM_CONFIG case DeviceCategory.LOW_COST_GAS: mappings = configuration.AIRQO_LOW_COST_GAS_CONFIG case DeviceCategory.LOW_COST: mappings = configuration.AIRQO_LOW_COST_CONFIG case _: - logger.exception("A valid device category must be provided") + logger.exception(f"Invalid device category provided: {device_category}") + return series for key, value in mappings.items(): try: series[value] = values[key] - except Exception as ex: - logger.exception(f"An error occurred: {ex}") + except IndexError: + logger.exception(f"Field {key} not found in values: {values}") series[value] = None + except ValueError as ex: + logger.exception(f"Error converting value for field {key}: {ex}") + series[value] = None return seriesThese changes will:
- Add a return type hint for clarity.
- Handle the case where
field_8
is None or empty more explicitly.- Catch and log specific exceptions for better error diagnosis.
- Return an empty series if an invalid device category is provided.
Line range hint
1-1043
: Consider addressing TODOs and refactoring for improved maintainability.While the changes in this PR are focused on the
flatten_field_8
method, there are some general observations about the file that could be addressed in future work:
- There are several TODO comments throughout the file. It would be beneficial to create issues for these and address them in future PRs.
- Some methods in the
AirQoDataUtils
class are quite long and complex. Consider breaking these down into smaller, more focused methods to improve readability and maintainability.- The
AirQoDataUtils
class consists entirely of static methods. Consider refactoring this into a module with functions instead of a class, which might be more appropriate for this use case.Here are some specific suggestions:
- Create issues for the TODO comments, such as the one on line 4: "# TODO: Add tests".
- Consider breaking down long methods like
extract_devices_data
(lines 509-636) into smaller, more focused methods.- Evaluate whether
AirQoDataUtils
should be a class or a module with functions, given that all methods are static.These changes could significantly improve the maintainability and readability of the code in the long term.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
- src/workflows/airqo_etl_utils/airnow_utils.py (3 hunks)
- src/workflows/airqo_etl_utils/airqo_api.py (1 hunks)
- src/workflows/airqo_etl_utils/airqo_utils.py (1 hunks)
- src/workflows/dags/dag_docs.py (3 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/workflows/dags/dag_docs.py
🔇 Additional comments (4)
src/workflows/airqo_etl_utils/airnow_utils.py (2)
81-89
: Well-crafted docstring addition!The new docstring for the
process_bam_data
method is clear, concise, and informative. It effectively communicates the method's purpose, input parameters, and return value. This addition significantly improves the code's readability and maintainability.Your adherence to the Google style guide for Python docstrings is commendable. It ensures consistency and makes the codebase more professional.
Line range hint
1-150
: Excellent work on optimizing and improving error handling!This pull request successfully addresses its objectives of optimizing the
get_devices
method and reducing redundant data calls. The introduction of precomputed device mapping is a standout feature that significantly enhances performance.The addition of comprehensive logging and improved error handling greatly enhances the maintainability and debuggability of the code. These changes will undoubtedly lead to more robust and efficient data processing.
A few minor suggestions have been made to further refine the code, but overall, this is a high-quality contribution that meaningfully improves the codebase.
This pull request is ready to be merged, pending any minor adjustments based on the suggestions provided.
src/workflows/airqo_etl_utils/airqo_api.py (1)
179-203
: 🛠️ Refactor suggestionConsider updating the docstring and verifying API response structure
The changes to the
get_devices
method look promising in terms of optimization. However, there are a few points to consider:
- The docstring hasn't been updated to reflect the changes in the method's behavior, particularly the removal of device category filtering.
- The API endpoint has changed from "devices" to "devices/summary". Please ensure that the structure of the response from this new endpoint matches what the method expects to process.
- The use of
pop()
instead ofget()
when constructing thedevices
list modifies the original data. This might have unintended side effects if the data is used elsewhere.Consider updating the docstring to accurately reflect the current behavior of the method. Also, it might be beneficial to add a comment explaining why the device category filtering was removed and how it affects the method's functionality.
To ensure the changes don't introduce any regressions, please run the following verification script:
This script will help identify any potential issues arising from the changes to the
get_devices
method and the API endpoint.✅ Verification successful
Docstring and API Response Structure Updated Successfully
The changes to the
get_devices
method have been properly reflected in the docstring, and the new API endpoint's response structure matches expectations. No issues were found.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify the usage of get_devices method and the new API endpoint # Check for any remaining references to the old 'devices' endpoint echo "Checking for references to old 'devices' endpoint:" rg '"devices"' --type python # Check for any usage of get_devices method that might be affected by the changes echo "Checking usage of get_devices method:" ast-grep --lang python --pattern 'get_devices($$$)' # Check if there are any other methods relying on the device_category filtering echo "Checking for methods that might rely on device_category filtering:" rg 'device_category' --type pythonLength of output: 526
Script:
#!/bin/bash # Verify the usage of get_devices method and the new API endpoint # Check for any remaining references to the old 'devices' endpoint echo "Checking for references to old 'devices' endpoint:" rg '"devices"' -t python # Check for any usage of get_devices method that might be affected by the changes ast-grep --lang python --pattern 'get_devices($$$)' # Check if there are any other methods relying on the device_category filtering echo "Checking for methods that might rely on device_category filtering:" rg 'device_category' -t pythonLength of output: 433
Script:
#!/bin/bash # Verify the usage of get_devices method and the new API endpoint # Find all Python files and search for references to the old 'devices' endpoint echo "Checking for references to old 'devices' endpoint:" fd --type f --extension py | xargs rg '"devices"' # Search for usage of get_devices method that might be affected by the changes echo "Checking usage of get_devices method:" ast-grep --lang python --pattern 'get_devices($$$)' # Find all Python files and search for methods that might rely on device_category filtering echo "Checking for methods that might rely on device_category filtering:" fd --type f --extension py | xargs rg 'device_category'Length of output: 12051
src/workflows/airqo_etl_utils/airqo_utils.py (1)
Line range hint
1-1043
: Summary of review for airqo_utils.pyThe changes to the
flatten_field_8
method improve error handling by logging exceptions instead of raising them. This is a positive change that allows for better error tracking without interrupting execution. However, there's room for further improvement in error message clarity and handling of edge cases.While reviewing this file, we've also identified some general areas for improvement:
- Addressing TODO comments
- Breaking down complex methods
- Considering a refactor from a class of static methods to a module of functions
Overall, the changes are a step in the right direction, and the suggestions provided can help further enhance the code's robustness and maintainability.
# Precompute device mapping for faster lookup | ||
device_mapping = {} | ||
for device in devices: | ||
for device_code in device["device_codes"]: | ||
device_mapping[device_code] = device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Excellent performance optimization with device mapping!
The introduction of a precomputed device mapping is a smart move. This change significantly improves the efficiency of device lookups, reducing the time complexity from O(n) to O(1) for each iteration of the main loop. Well done!
A small optimization to consider:
You could potentially further optimize memory usage by using a generator expression instead of a list comprehension when creating the devices
list. This would be beneficial if the list of devices is large. Here's how you could modify line 93:
devices = AirQoApi().get_devices(tenant=Tenant.ALL, category=DeviceCategory.BAM)
This change would fetch only the BAM devices, reducing the amount of data processed and stored in memory.
# Lookup device details based on FullAQSCode | ||
device_details = device_mapping.get(device_id) | ||
if not device_details: | ||
logger.exception(f"Device with ID {device_id} not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Robust error handling and logging improvements!
The addition of specific error logging for various scenarios (device not found, tenant mismatch, and general exceptions) greatly enhances the debugging capabilities of this method. The use of logger.exception
is particularly appropriate as it includes the stack trace in the log, which will be invaluable for troubleshooting.
A suggestion to consider:
To make the logs even more informative, consider including more context in the log messages. For example, you could modify the tenant mismatch log as follows:
logger.exception(f"Tenant mismatch for device ID {device_id}. Expected: {device_details.get('tenant')}, Got: {row['tenant']}")
This additional information could help quickly identify the source of mismatches without needing to dig through the data.
Also applies to: 121-121, 148-148
Specifiy device category
Description
This PR changes the implementation of the get_devices method to reduce the amount of data received but also reduce the number of operations done.
Related Issues
Summary by CodeRabbit