Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update fix/reduce redundant device data called[wip] #3526

Conversation

NicholasTurner23
Copy link
Contributor

@NicholasTurner23 NicholasTurner23 commented Sep 26, 2024

Description

This PR changes the implementation of the get_devices method to reduce the amount of data received but also reduce the number of operations done.

Related Issues

  • OPS-291

Summary by CodeRabbit

  • New Features
    • Enhanced logging functionality for better error handling in data processing.
    • Updated API to provide a summary of devices with new filtering capabilities.
  • Bug Fixes
    • Improved error handling in device category validation through logging.
  • Documentation
    • Updated docstring for clarity on API key retrieval.
  • Style
    • Standardized formatting in documentation for consistency.

Copy link
Contributor

coderabbitai bot commented Sep 26, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

The changes enhance the logging and error handling capabilities across several modules in the codebase. The process_bam_data method now utilizes a logging framework instead of print statements, improving error reporting. The get_devices function has been updated to include a category filter in API requests, while the flatten_field_8 function shifts from raising exceptions to logging errors. Additionally, minor formatting adjustments have been made in documentation files to standardize comment styles.

Changes

File Path Change Summary
src/workflows/airqo_etl_utils/airnow_utils.py Introduced logging in process_bam_data, replacing print statements with logger exceptions for error handling.
src/workflows/airqo_etl_utils/airqo_api.py Updated get_devices to include a category parameter, changed API endpoint, and simplified device list construction.
src/workflows/airqo_etl_utils/airqo_utils.py Modified error handling in flatten_field_8 to log exceptions instead of raising them.
src/workflows/dags/dag_docs.py Standardized comment formatting by removing leading spaces in lists of data sources and destinations.

Possibly related PRs

Suggested labels

ready for review

Suggested reviewers

  • Baalmart
  • BenjaminSsempala
  • Mnoble-19

Poem

In code we trust, with logs so bright,
Errors now caught, in day and night.
Devices summarized, categories clear,
A cleaner flow, let’s give a cheer!
With each small change, we pave the way,
For better code, come what may! 🌟


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 060fc93 and a196699.

📒 Files selected for processing (1)
  • src/workflows/airqo_etl_utils/airnow_utils.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/workflows/airqo_etl_utils/airnow_utils.py

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@NicholasTurner23 NicholasTurner23 changed the title Update fix/reduce redundant device data called Update fix/reduce redundant device data called[wip] Sep 26, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (4)
src/workflows/airqo_etl_utils/airnow_utils.py (1)

14-16: Excellent addition of logging functionality!

The introduction of logging is a commendable improvement to the codebase. It will significantly enhance our ability to track and debug issues.

A small suggestion to consider:

Consider setting the logging level explicitly to ensure consistent behavior across different environments. You can add the following line after the logger initialization:

logger.setLevel(logging.INFO)  # Or any other appropriate level

This ensures that the logger will capture all intended log messages regardless of the global logging configuration.

src/workflows/airqo_etl_utils/airqo_utils.py (3)

164-164: Improved error handling, but consider enhancing the logging message.

The change from raising a ValueError to using logger.exception is a good improvement in error handling. It allows the program to continue running while still recording the error. However, the log message could be more informative.

Consider enhancing the log message to provide more context:

-                logger.exception("A valid device category must be provided")
+                logger.exception(f"Invalid device category provided: {device_category}")

This change would make debugging easier by explicitly stating which device category caused the issue.


Line range hint 151-180: Consider enhancing error handling and method clarity.

The flatten_field_8 method is well-structured, but there are a few areas for potential improvement:

  1. The general exception handler in the loop might mask specific errors. Consider catching and logging specific exceptions.
  2. The method returns a pandas Series, but this isn't clear from the method signature. Adding a return type hint would improve clarity.
  3. The method doesn't explicitly handle the case where field_8 is None or an empty string, which might lead to unexpected behavior.

Here's a suggested refactor to address these points:

-    def flatten_field_8(device_category: DeviceCategory, field_8: str = None):
+    def flatten_field_8(device_category: DeviceCategory, field_8: str = None) -> pd.Series:
         """
         Maps thingspeak field8 data to airqo custom mapping. Mappings are defined in the config file.
 
         Args:
             device_category(DeviceCategory): Type/category of device
             field_8(str): Comma separated string
 
         returns:
             Pandas Series object of mapped fields to their appropriate values.
         """
-        values: List[str] = field_8.split(",") if field_8 else ""
+        values: List[str] = field_8.split(",") if field_8 else []
         series = pd.Series(dtype=float)
 
         match device_category:
             case DeviceCategory.BAM:
                 mappings = configuration.AIRQO_BAM_CONFIG
             case DeviceCategory.LOW_COST_GAS:
                 mappings = configuration.AIRQO_LOW_COST_GAS_CONFIG
             case DeviceCategory.LOW_COST:
                 mappings = configuration.AIRQO_LOW_COST_CONFIG
             case _:
-                logger.exception("A valid device category must be provided")
+                logger.exception(f"Invalid device category provided: {device_category}")
+                return series
 
         for key, value in mappings.items():
             try:
                 series[value] = values[key]
-            except Exception as ex:
-                logger.exception(f"An error occurred: {ex}")
+            except IndexError:
+                logger.exception(f"Field {key} not found in values: {values}")
                 series[value] = None
+            except ValueError as ex:
+                logger.exception(f"Error converting value for field {key}: {ex}")
+                series[value] = None
 
         return series

These changes will:

  1. Add a return type hint for clarity.
  2. Handle the case where field_8 is None or empty more explicitly.
  3. Catch and log specific exceptions for better error diagnosis.
  4. Return an empty series if an invalid device category is provided.

Line range hint 1-1043: Consider addressing TODOs and refactoring for improved maintainability.

While the changes in this PR are focused on the flatten_field_8 method, there are some general observations about the file that could be addressed in future work:

  1. There are several TODO comments throughout the file. It would be beneficial to create issues for these and address them in future PRs.
  2. Some methods in the AirQoDataUtils class are quite long and complex. Consider breaking these down into smaller, more focused methods to improve readability and maintainability.
  3. The AirQoDataUtils class consists entirely of static methods. Consider refactoring this into a module with functions instead of a class, which might be more appropriate for this use case.

Here are some specific suggestions:

  1. Create issues for the TODO comments, such as the one on line 4: "# TODO: Add tests".
  2. Consider breaking down long methods like extract_devices_data (lines 509-636) into smaller, more focused methods.
  3. Evaluate whether AirQoDataUtils should be a class or a module with functions, given that all methods are static.

These changes could significantly improve the maintainability and readability of the code in the long term.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 8044fee and 060fc93.

📒 Files selected for processing (4)
  • src/workflows/airqo_etl_utils/airnow_utils.py (3 hunks)
  • src/workflows/airqo_etl_utils/airqo_api.py (1 hunks)
  • src/workflows/airqo_etl_utils/airqo_utils.py (1 hunks)
  • src/workflows/dags/dag_docs.py (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/workflows/dags/dag_docs.py
🔇 Additional comments (4)
src/workflows/airqo_etl_utils/airnow_utils.py (2)

81-89: Well-crafted docstring addition!

The new docstring for the process_bam_data method is clear, concise, and informative. It effectively communicates the method's purpose, input parameters, and return value. This addition significantly improves the code's readability and maintainability.

Your adherence to the Google style guide for Python docstrings is commendable. It ensures consistency and makes the codebase more professional.


Line range hint 1-150: Excellent work on optimizing and improving error handling!

This pull request successfully addresses its objectives of optimizing the get_devices method and reducing redundant data calls. The introduction of precomputed device mapping is a standout feature that significantly enhances performance.

The addition of comprehensive logging and improved error handling greatly enhances the maintainability and debuggability of the code. These changes will undoubtedly lead to more robust and efficient data processing.

A few minor suggestions have been made to further refine the code, but overall, this is a high-quality contribution that meaningfully improves the codebase.

This pull request is ready to be merged, pending any minor adjustments based on the suggestions provided.

src/workflows/airqo_etl_utils/airqo_api.py (1)

179-203: 🛠️ Refactor suggestion

Consider updating the docstring and verifying API response structure

The changes to the get_devices method look promising in terms of optimization. However, there are a few points to consider:

  1. The docstring hasn't been updated to reflect the changes in the method's behavior, particularly the removal of device category filtering.
  2. The API endpoint has changed from "devices" to "devices/summary". Please ensure that the structure of the response from this new endpoint matches what the method expects to process.
  3. The use of pop() instead of get() when constructing the devices list modifies the original data. This might have unintended side effects if the data is used elsewhere.

Consider updating the docstring to accurately reflect the current behavior of the method. Also, it might be beneficial to add a comment explaining why the device category filtering was removed and how it affects the method's functionality.

To ensure the changes don't introduce any regressions, please run the following verification script:

This script will help identify any potential issues arising from the changes to the get_devices method and the API endpoint.

✅ Verification successful

Docstring and API Response Structure Updated Successfully

The changes to the get_devices method have been properly reflected in the docstring, and the new API endpoint's response structure matches expectations. No issues were found.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify the usage of get_devices method and the new API endpoint

# Check for any remaining references to the old 'devices' endpoint
echo "Checking for references to old 'devices' endpoint:"
rg '"devices"' --type python

# Check for any usage of get_devices method that might be affected by the changes
echo "Checking usage of get_devices method:"
ast-grep --lang python --pattern 'get_devices($$$)'

# Check if there are any other methods relying on the device_category filtering
echo "Checking for methods that might rely on device_category filtering:"
rg 'device_category' --type python

Length of output: 526


Script:

#!/bin/bash
# Verify the usage of get_devices method and the new API endpoint

# Check for any remaining references to the old 'devices' endpoint
echo "Checking for references to old 'devices' endpoint:"
rg '"devices"' -t python

# Check for any usage of get_devices method that might be affected by the changes
ast-grep --lang python --pattern 'get_devices($$$)'

# Check if there are any other methods relying on the device_category filtering
echo "Checking for methods that might rely on device_category filtering:"
rg 'device_category' -t python

Length of output: 433


Script:

#!/bin/bash
# Verify the usage of get_devices method and the new API endpoint

# Find all Python files and search for references to the old 'devices' endpoint
echo "Checking for references to old 'devices' endpoint:"
fd --type f --extension py | xargs rg '"devices"'

# Search for usage of get_devices method that might be affected by the changes
echo "Checking usage of get_devices method:"
ast-grep --lang python --pattern 'get_devices($$$)'

# Find all Python files and search for methods that might rely on device_category filtering
echo "Checking for methods that might rely on device_category filtering:"
fd --type f --extension py | xargs rg 'device_category'

Length of output: 12051

src/workflows/airqo_etl_utils/airqo_utils.py (1)

Line range hint 1-1043: Summary of review for airqo_utils.py

The changes to the flatten_field_8 method improve error handling by logging exceptions instead of raising them. This is a positive change that allows for better error tracking without interrupting execution. However, there's room for further improvement in error message clarity and handling of edge cases.

While reviewing this file, we've also identified some general areas for improvement:

  1. Addressing TODO comments
  2. Breaking down complex methods
  3. Considering a refactor from a class of static methods to a module of functions

Overall, the changes are a step in the right direction, and the suggestions provided can help further enhance the code's robustness and maintainability.

Comment on lines +94 to +98
# Precompute device mapping for faster lookup
device_mapping = {}
for device in devices:
for device_code in device["device_codes"]:
device_mapping[device_code] = device
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Excellent performance optimization with device mapping!

The introduction of a precomputed device mapping is a smart move. This change significantly improves the efficiency of device lookups, reducing the time complexity from O(n) to O(1) for each iteration of the main loop. Well done!

A small optimization to consider:

You could potentially further optimize memory usage by using a generator expression instead of a list comprehension when creating the devices list. This would be beneficial if the list of devices is large. Here's how you could modify line 93:

devices = AirQoApi().get_devices(tenant=Tenant.ALL, category=DeviceCategory.BAM)

This change would fetch only the BAM devices, reducing the amount of data processed and stored in memory.

# Lookup device details based on FullAQSCode
device_details = device_mapping.get(device_id)
if not device_details:
logger.exception(f"Device with ID {device_id} not found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Robust error handling and logging improvements!

The addition of specific error logging for various scenarios (device not found, tenant mismatch, and general exceptions) greatly enhances the debugging capabilities of this method. The use of logger.exception is particularly appropriate as it includes the stack trace in the log, which will be invaluable for troubleshooting.

A suggestion to consider:

To make the logs even more informative, consider including more context in the log messages. For example, you could modify the tenant mismatch log as follows:

logger.exception(f"Tenant mismatch for device ID {device_id}. Expected: {device_details.get('tenant')}, Got: {row['tenant']}")

This additional information could help quickly identify the source of mismatches without needing to dig through the data.

Also applies to: 121-121, 148-148

Specifiy device category
@Baalmart Baalmart merged commit 2303ecd into airqo-platform:staging Sep 27, 2024
44 checks passed
@Baalmart Baalmart mentioned this pull request Sep 27, 2024
1 task
@coderabbitai coderabbitai bot mentioned this pull request Sep 28, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants