#66/ Deserializing of events from flow_proto fails with transactions with many events #67

lealobanov · 2024-10-28T05:42:04Z

Resolves issue #66

Summary by CodeRabbit

Release Notes

New Features
- Introduced a new example for debugging transaction fetching by ID.
- Added a new JSON structure for event capabilities, enhancing metadata representation.
Bug Fixes
- Improved error handling and logging for event deserialization and transaction results.
Chores
- Updated .gitignore to include myenv for better environment management.

coderabbitai · 2024-10-28T05:42:25Z

Walkthrough

The changes in this pull request introduce several new features and modifications across multiple files. A new class, DebuggingFetchTransactionByIdExample, is added to provide an example for fetching transactions by ID. A new JSON structure is established in failed_events.json to detail event capabilities. The decode function in flow_py_sdk/cadence/decode.py is enhanced to handle nested structures and includes improved error handling. Changes in flow_py_sdk/cadence/kinds.py simplify the decode method in the EntitlementsKind class. Additionally, significant error handling improvements are made in flow_py_sdk/client/entities.py, particularly for event deserialization. Lastly, .gitignore is updated to ignore the myenv directory.

Changes

File	Change Summary
examples/transactions_examples.py	Added new class `DebuggingFetchTransactionByIdExample` for fetching transaction by ID.
failed_events.json	Introduced a new JSON structure with event objects detailing capabilities and metadata.
flow_py_sdk/cadence/decode.py	Modified `decode` function to handle nested structures, updated signature, and added error logging.
flow_py_sdk/cadence/kinds.py	Simplified return statement in `decode` method of `EntitlementsKind` class.
flow_py_sdk/client/entities.py	Enhanced error handling and logging in `Event` and `TransactionResultResponse` classes.
.gitignore	Added entry for `myenv` to ignore virtual environment files.

Poem

In the code where rabbits play,
New examples hop and sway.
Fetching transactions, oh what fun,
With logs and errors, we’ve just begun!
Ignore myenv, let’s keep it neat,
In this code, our joys repeat! 🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

codecov-commenter · 2024-10-28T05:45:50Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 33.92857% with 37 lines in your changes missing coverage. Please review.

Project coverage is 83.34%. Comparing base (ac0bbb9) to head (f9f4dc6).

Files with missing lines	Patch %	Lines
flow_py_sdk/client/entities.py	4.34%	22 Missing ⚠️
flow_py_sdk/cadence/decode.py	53.12%	10 Missing and 5 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #67      +/-   ##
==========================================
- Coverage   84.79%   83.34%   -1.46%     
==========================================
  Files          29       29              
  Lines        1927     1963      +36     
  Branches      102      113      +11     
==========================================
+ Hits         1634     1636       +2     
- Misses        277      307      +30     
- Partials       16       20       +4

Flag	Coverage Δ
unittests	`83.34% <33.92%> (-1.46%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai

Actionable comments posted: 10

🧹 Outside diff range and nitpick comments (7)

myenv/bin/activate.csh (1)
1-26: Remove virtual environment files from version control.

Virtual environment files should not be committed to version control as they:

Are environment-specific and should be regenerated locally

Can cause conflicts between different development environments

May contain sensitive information

Recommended actions:

Remove all virtual environment files from the repository

Add the following entries to .gitignore:
+# Virtual Environment
+venv/
+env/
+.venv/
+myenv/
Document the virtual environment setup steps in the project's README.md
flow_py_sdk/cadence/constants.py (1)
68-69: LGTM! Consider grouping related constants.

The new constants follow the established naming conventions. However, consider grouping them with other entitlement-related constants if they exist, or with capability-related constants since they appear to be related to the capabilityTypeStr.
 capabilityTypeStr = "Capability"
 functionTypeStr = "Function"
+entitlementConjunctionSetStr = "EntitlementConjunctionSet"
+entitlementKey = "Entitlement"
 
-entitlementConjunctionSetStr = "EntitlementConjunctionSet"
-entitlementKey = "Entitlement"

 fix64_scale = int(8)
flow_py_sdk/cadence/decode.py (2)
45-51: LGTM! The recursive handling addresses the deserialization issue.

The implementation correctly handles nested structures by recursively decoding both dictionary and list values, which should resolve the issue with transactions containing many events.

Consider adding recursion depth protection to prevent stack overflow:
+_MAX_RECURSION_DEPTH = 100
+
 def decode(obj: [dict[Any, Any]], depth: int = 0) -> Union[Value, Kind]:
+    if depth > _MAX_RECURSION_DEPTH:
+        raise ValueError("Maximum recursion depth exceeded")
+
     # ... existing checks ...
 
     for key, value in obj.items():
         if isinstance(value, dict):
-            obj[key] = decode(value)
+            obj[key] = decode(value, depth + 1)
         elif isinstance(value, list):
-            obj[key] = [decode(item) for item in value]
+            obj[key] = [decode(item, depth + 1) for item in value]
46-48: Add type validation before recursion.

To prevent potential runtime errors, consider validating that nested dictionary values are actually decodable before recursion.
     for key, value in obj.items():
         if isinstance(value, dict):
+            if not (c.kindKey in value or c.typeKey in value or c.idKey in value):
+                continue  # Skip non-decodable dictionaries
             obj[key] = decode(value)
examples/transactions_examples.py (1)
286-289: Add documentation explaining the debugging purpose.

Please add docstring documentation explaining:

The purpose of this debugging class

How it helps diagnose event deserialization issues

Expected output format
 class DebuggingFetchTransactionByIdExample(Example):
+    """Debug helper for investigating event deserialization issues.
+    
+    This class helps diagnose failures when deserializing events from transactions
+    that contain a large number of events. It provides detailed logging of the
+    event structure and any deserialization errors encountered.
+    """
     def __init__(self) -> None:
         super().__init__(tag="T.LL.", name="DebuggingFetchTransactionByIdExample", sort_order=501)
failed_events.json (1)

24-39: Consider simplifying type system for robust deserialization

The nested type system with multiple levels of References, Resources, and Capabilities increases deserialization complexity. Consider:

Flattening the type structure where possible

Implementing incremental deserialization for complex types

Adding type validation before full deserialization

Also applies to: 70-153, 197-280

flow_py_sdk/client/entities.py (1)

315-316: Review the impact of changing the method signature of from_proto

Adding a new parameter max_events with a default value to the from_proto method changes its signature. If from_proto is part of the public API, ensure this change is backward compatible and consider documenting the change for users who might rely on the method's parameters.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between ac0bbb9 and 3a5433d.

📒 Files selected for processing (17)

examples/transactions_examples.py (1 hunks)
failed_events.json (1 hunks)
flow_py_sdk/cadence/constants.py (1 hunks)
flow_py_sdk/cadence/decode.py (1 hunks)
flow_py_sdk/cadence/kinds.py (1 hunks)
flow_py_sdk/client/entities.py (4 hunks)
myenv/bin/Activate.ps1 (1 hunks)
myenv/bin/activate (1 hunks)
myenv/bin/activate.csh (1 hunks)
myenv/bin/activate.fish (1 hunks)
myenv/bin/pip (1 hunks)
myenv/bin/pip3 (1 hunks)
myenv/bin/pip3.11 (1 hunks)
myenv/bin/python (1 hunks)
myenv/bin/python3 (1 hunks)
myenv/bin/python3.11 (1 hunks)
myenv/pyvenv.cfg (1 hunks)

✅ Files skipped from review due to trivial changes (7)

myenv/bin/pip
myenv/bin/pip3
myenv/bin/pip3.11
myenv/bin/python
myenv/bin/python3
myenv/bin/python3.11
myenv/pyvenv.cfg

🧰 Additional context used

🪛 Ruff

examples/transactions_examples.py

300-300: Local variable transaction is assigned to but never used

Remove assignment to unused variable transaction

(F841)

flow_py_sdk/client/entities.py

164-164: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

169-169: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

🔇 Additional comments (6)

flow_py_sdk/cadence/decode.py (1)
45-51: Verify the fix with large event transactions.

Let's verify that this change resolves the original issue by checking for similar patterns in test files.

✅ Verification successful

Let me search for more specific test cases that might verify the recursive decoding functionality.

Let me try one more search strategy focusing on the test directory structure and Cadence-specific tests.

Let me check the actual test cases in encode_test.py to see if they cover nested structures.

Based on the test file content, I can now provide a final response:

The recursive decoding implementation is thoroughly tested with complex nested structures

The test file tests/cadence/encode_test.py contains comprehensive test cases that verify the recursive decoding functionality with:

Nested dictionaries with multiple levels of nesting

Complex event structures with nested resources

Arrays containing nested objects

Composite types with nested fields

The test coverage includes large event transactions like StorefrontEvent that contain multiple nested structures, validating that the recursive decoding works correctly for complex real-world scenarios.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Look for test files that verify handling of large event transactions
# Expected: Find test cases that exercise the recursive decoding with nested structures

# Search for test files related to event handling
echo "Searching for relevant test files..."
rg -l "test.*event" --type py

# Look for specific test cases with large or nested event structures
echo "Searching for test cases with nested structures..."
rg -A 5 "decode.*event" --type py
Length of output: 905

Script:
#!/bin/bash
# Search for test files and test cases specifically related to decode functionality
echo "Searching for decode test files..."
rg -l "test.*decode" --type py

# Look for test cases with nested dictionaries or lists
echo "Searching for test cases with nested structures..."
rg -A 10 "def test.*decode" --type py

# Search for actual test data structures
echo "Searching for test data structures..."
rg -A 5 "(dict|list).*decode" --type py tests/
Length of output: 396

Script:
#!/bin/bash
# First, find all test files in the project
echo "Finding test files..."
fd "test.*\.py$"

# Look specifically for Cadence-related test files
echo "Searching for Cadence test files..."
rg -l "cadence" --type py $(fd "test.*\.py$")

# Search for decode usage in test files with more context
echo "Searching for decode usage in tests..."
rg -B 3 -A 7 "decode\(" $(fd "test.*\.py$")
Length of output: 2508

Script:
#!/bin/bash
# Get the content of encode_test.py to check test cases
echo "Content of encode_test.py..."
cat tests/cadence/encode_test.py

# Also search for any dictionary or list test cases
echo "Searching for dictionary/list test cases..."
rg -B 2 -A 5 "testEncode.*(Dict|List|Composite)" tests/cadence/encode_test.py
Length of output: 52716
examples/transactions_examples.py (1)

286-302: Verify the transaction ID contains many events.

Let's verify if the hardcoded transaction ID is a good test case for debugging event deserialization issues.

🧰 Tools

🪛 Ruff

300-300: Local variable transaction is assigned to but never used

Remove assignment to unused variable transaction

(F841)
flow_py_sdk/cadence/kinds.py (1)
353-361: ⚠️ Potential issue

Improve error handling implementation

While adding error handling is a good step towards fixing the deserialization issues, there are a few concerns to address:

Returning None could lead to null pointer exceptions downstream. Consider either:

Raising a specific exception type that can be handled by callers

Or returning a sentinel object that represents the error state

Using print for error logging isn't ideal for production code.

Consider this improved implementation:
    @classmethod
    def decode(cls, value) -> "Kind":
        try:
            entitlements_val = value[c.entitlementsKey]
            entitlements = [
                decode(v).as_kind(EntitlementBaseKind) for v in entitlements_val
            ]
            return cls(entitlements)
        except Exception as e:
-           print(f"Error decoding entitlements for {cls.kind_str()}: {value}. Error: {e}")
-           return None
+           import logging
+           logging.error(f"Error decoding entitlements for {cls.kind_str()}: {value}", exc_info=e)
+           raise ValueError(f"Failed to decode entitlements: {e}") from e
Let's verify if this class is used in event deserialization:
failed_events.json (3)

1-291: LGTM: Well-structured test data for Flow events

The JSON structure follows Flow protocol standards with consistent event formatting.

16-17: LGTM: Consistent cross-references between events

All addresses, paths, and IDs maintain consistency across the event chain.

Also applies to: 62-63, 179-180, 162-164, 187-189, 281-282

1-47: Verify test data coverage for deserialization failure

The events represent a complete capability flow (issuance → storage → publishing). To ensure this test data reproduces the deserialization issue:

Verify that these events actually failed to deserialize in the original context

Consider if more events are needed to reach the failure threshold

Also applies to: 48-171, 172-290

myenv/bin/activate.csh

flow_py_sdk/cadence/constants.py

myenv/bin/activate

myenv/bin/activate.fish

myenv/bin/Activate.ps1

examples/transactions_examples.py

flow_py_sdk/client/entities.py

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 3a5433d and 402e0f9.

📒 Files selected for processing (2)

.gitignore (1 hunks)
flow_py_sdk/client/entities.py (3 hunks)

✅ Files skipped from review due to trivial changes (1)

.gitignore

🔇 Additional comments (1)

flow_py_sdk/client/entities.py (1)

2-2: LGTM: Logging module import added

The addition of the logging module import is appropriate for implementing proper error handling and logging throughout the file.

flow_py_sdk/client/entities.py

lealobanov · 2024-10-28T06:05:45Z

examples/transactions_examples.py

@@ -281,3 +281,22 @@ async def run(self, ctx: Config):
                f"transaction proposer: {transaction.proposal_key.address.hex()}"
            )
            self.log.info(f"transaction script: {transaction.script.decode('utf-8')}")
+
+
+class DebuggingFetchTransactionByIdExample(Example):


Added example which makes a mainnet call to the failing tx ID ef6e6dcba87c5f853e0ec940dd382d609ffe11109c28ed239843d71618fa1d71, will remove this before merging changes

lealobanov · 2024-10-28T06:06:39Z

failed_events.json

This is an output of the failing event payloads for reference, will remove before merging changes

… into bug-66

coderabbitai

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (2)

flow_py_sdk/cadence/decode.py (2)
20-20: Consider adding docstring to clarify return types.

The function can now return three different types. Consider adding a docstring to explain when each type is returned:

Value: When decoded as a Cadence value

Kind: When decoded as a Cadence kind

dict: When handling nested structures or in error cases
 def decode(obj: dict[Any, Any]) -> Union[Value, Kind, dict]:
+    """Decode a dictionary into a Cadence value, kind, or processed dictionary.
+    
+    Args:
+        obj: Dictionary to decode
+    
+    Returns:
+        Union[Value, Kind, dict]:
+            - Value: When successfully decoded as a Cadence value
+            - Kind: When successfully decoded as a Cadence kind
+            - dict: When handling nested structures or in error cases
+    """
23-24: Optimize multiple isinstance checks.

Merge the two isinstance checks into a single call for better performance.
-        if isinstance(obj, Value) or isinstance(obj, Kind):
+        if isinstance(obj, (Value, Kind)):
🧰 Tools

🪛 Ruff

23-23: Multiple isinstance calls for obj, merge into a single call

Merge isinstance calls for obj

(SIM101)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 402e0f9 and 8f8b88d.

📒 Files selected for processing (4)

examples/transactions_examples.py (1 hunks)
flow_py_sdk/cadence/decode.py (2 hunks)
flow_py_sdk/cadence/kinds.py (1 hunks)
flow_py_sdk/client/entities.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

flow_py_sdk/cadence/kinds.py

🧰 Additional context used

🪛 Ruff

examples/transactions_examples.py

303-303: Local variable transaction is assigned to but never used

Remove assignment to unused variable transaction

(F841)

flow_py_sdk/cadence/decode.py

23-23: Multiple isinstance calls for obj, merge into a single call

Merge isinstance calls for obj

(SIM101)

🔇 Additional comments (6)

flow_py_sdk/cadence/decode.py (3)
5-5: LGTM: Logging import added for error handling.

43-57: LGTM: Well-implemented recursive structure handling.

The recursive handling of nested dictionaries and lists is well-implemented and addresses the core issue of deserializing complex event structures. The code maintains clarity while handling different nested scenarios.

20-65: Verify the changes with a complex transaction.

The changes look good and should fix the event deserialization issues. Let's verify the implementation with a complex transaction.

✅ Verification successful

Based on the extensive test coverage shown in encode_test.py and the implementation in decode.py, I can now generate a final response:

The changes are well-tested and handle complex cases correctly

The decode implementation has been thoroughly verified through the comprehensive test suite that covers:

Complex nested structures (dictionaries, arrays)

Resource types with nested fields

Event decoding with complex payloads (e.g., StorefrontEvent test)

Error handling for various edge cases

The recursive handling in the decode function correctly processes all these cases, as evidenced by the passing test suite. The error handling additions will help with debugging when issues occur.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for test files that might contain complex transaction events
echo "Searching for relevant test files..."
fd -e json -e py "transaction|event" test/

# Look for existing error handling tests
echo "Checking for error handling test coverage..."
rg -l "test.*decode.*error" test/
Length of output: 401

Script:
#!/bin/bash
# Let's first check if tests are in a different directory structure
fd -t d "test"

# Search for Python test files related to decoding
fd -e py "test.*decode|decode.*test"

# Look for imports of the decode module to find related test files
rg "from flow_py_sdk.cadence.decode import"

# Check for any test files that might use the decode functionality
rg -l "decode\(" .
Length of output: 879

Script:
#!/bin/bash
# Let's examine the test file to see if it covers complex cases
cat tests/cadence/encode_test.py

# Check for any example transactions that might show usage
cat examples/transactions_examples.py

# Look for any test fixtures or test data
fd -e json . tests/
Length of output: 64207
🧰 Tools

🪛 Ruff

23-23: Multiple isinstance calls for obj, merge into a single call

Merge isinstance calls for obj

(SIM101)
flow_py_sdk/client/entities.py (1)
172-182: ⚠️ Potential issue

Improve error handling strategy in from_proto

Returning None silently on failure could mask issues in the calling code. Consider using custom exceptions for better error handling.

Apply this diff:
+class EventDeserializationError(Exception):
+    """Raised when an event cannot be deserialized from proto"""
+    pass
+
 @classmethod
 def from_proto(cls, proto: entities.Event) -> "Event":
     try:
         return Event(
             _type=proto.type,
             transaction_id=proto.transaction_id,
             transaction_index=proto.transaction_index,
             event_index=proto.event_index,
             payload=proto.payload,
         )
     except Exception as e:
-        logging.error(f"Failed to deserialize event {proto.event_index}: {str(e)}")
-        return None  # Returning None if deserialization fails
+        raise EventDeserializationError(f"Failed to deserialize event {proto.event_index}") from e
Likely invalid or redundant comment.
examples/transactions_examples.py (2)
296-299: Avoid hardcoding production endpoints.

301-305: ⚠️ Potential issue

Enhance debugging implementation to address event deserialization.

The current implementation doesn't provide sufficient debugging information for the event deserialization issue (#66). Consider implementing comprehensive event debugging:

Add error handling for transaction fetching

Process and log event details

Add debugging for event deserialization steps

Here's a suggested implementation that better aligns with the PR objectives:
     async def run(self, ctx: Config):
         async with flow_client(
-            # host=ctx.access_node_host, port=ctx.access_node_port
-            host="access.mainnet.nodes.onflow.org",
-            port=9000,
+            host=ctx.access_node_host,
+            port=ctx.access_node_port
         ) as client:
             tx_id = "ef6e6dcba87c5f853e0ec940dd382d609ffe11109c28ed239843d71618fa1d71"
             try:
-                transaction = await client.get_transaction_result(id=bytes.fromhex(tx_id))
-                print(f"Transaction ID: {tx_id}")
+                result = await client.get_transaction_result(id=bytes.fromhex(tx_id))
+                self.log.info(f"Transaction ID: {tx_id}")
+                self.log.info(f"Status: {result.status}")
+                
+                # Debug event deserialization
+                self.log.info(f"Total events: {len(result.events)}")
+                for i, event in enumerate(result.events, 1):
+                    self.log.info(f"\nEvent {i}:")
+                    self.log.info(f"  Type: {event.type}")
+                    self.log.info(f"  Transaction Index: {event.transaction_index}")
+                    self.log.info(f"  Event Index: {event.event_index}")
+                    self.log.info(f"  Payload: {event.payload}")
+            except Exception as e:
+                self.log.error(f"Failed to process transaction {tx_id}: {str(e)}")
+                raise
This implementation:

Uses the logger instead of print statements

Adds comprehensive event debugging

Includes proper error handling

Helps diagnose event deserialization issues

Let's verify if this transaction has many events:

🧰 Tools

🪛 Ruff

303-303: Local variable transaction is assigned to but never used

Remove assignment to unused variable transaction

(F841)

flow_py_sdk/cadence/decode.py

flow_py_sdk/client/entities.py

examples/transactions_examples.py

janezpodhostnik

Thank you very much for the contribution!

Can you add a short explanation of why the events failed to de-serialize, I'm not quite sure I get it. Was there an unexpected dict/list in the event?

As for the error handling:

I very much like the Idea of extra logging, that will make it a lot easier to figure out whats wrong next time.
I'm not sure I like the idea of just logging and continuing. That seems very dangerous. I would rather the error was logged and the exception re-thrown. There could be an optional parameter where instead of re-throwing it would just return the json object if decoding failed. What do you think?

flow_py_sdk/client/entities.py

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)

flow_py_sdk/cadence/decode.py (2)
20-20: Consider making the input type annotation more precise.

The current annotation dict[Any, Any] is very permissive. Consider using more specific types to better document expected input structure.
-def decode(obj: dict[Any, Any]) -> Union[Value, Kind, dict]:
+def decode(obj: Union[dict[str, Any], Value, Kind, list]) -> Union[Value, Kind, dict]:
23-24: Optimize multiple isinstance checks.

Merge multiple isinstance checks for better readability and performance.
-        if isinstance(obj, Value) or isinstance(obj, Kind):
+        if isinstance(obj, (Value, Kind)):
🧰 Tools

🪛 Ruff

23-23: Multiple isinstance calls for obj, merge into a single call

Merge isinstance calls for obj

(SIM101)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8f8b88d and f9f4dc6.

📒 Files selected for processing (1)

flow_py_sdk/cadence/decode.py (2 hunks)

🧰 Additional context used

🪛 Ruff

flow_py_sdk/cadence/decode.py

23-23: Multiple isinstance calls for obj, merge into a single call

Merge isinstance calls for obj

(SIM101)

🔇 Additional comments (1)

flow_py_sdk/cadence/decode.py (1)

5-5: LGTM: Logging import is appropriately placed.

The addition of logging capabilities aligns with the need for better error handling and debugging.

coderabbitai · 2024-10-31T14:56:50Z

flow_py_sdk/cadence/decode.py

+        # Recursively handle nested structures
+        if isinstance(obj, dict):
+            for key, value in obj.items():
+                if isinstance(value, dict):
+                    obj[key] = decode(value)  # Recursive call for nested dict
+                elif isinstance(value, list):
+                    obj[key] = [
+                        decode(item) for item in value
+                    ]  # Recursive list handling
+
+        elif isinstance(obj, list):
+            obj = [
+                decode(item) for item in obj
+            ]  # Handle obj itself as a list if list type
+


🛠️ Refactor suggestion

Add safety measures for recursive handling.

The recursive handling of nested structures could potentially lead to:

Stack overflow with deeply nested structures

Infinite recursion if there are circular references

Consider adding these safety measures:

+MAX_RECURSION_DEPTH = 100 + def decode(obj: dict[Any, Any]) -> Union[Value, Kind, dict]: + def _decode_recursive(obj: Any, depth: int = 0) -> Any: + if depth > MAX_RECURSION_DEPTH: + raise ValueError("Maximum recursion depth exceeded") + + if isinstance(obj, (Value, Kind)): + return obj + # ... existing type checks ... + if isinstance(obj, dict): for key, value in obj.items(): if isinstance(value, dict): - obj[key] = decode(value) + obj[key] = _decode_recursive(value, depth + 1) elif isinstance(value, list): obj[key] = [ - decode(item) for item in value + _decode_recursive(item, depth + 1) for item in value ] elif isinstance(obj, list): obj = [ - decode(item) for item in obj + _decode_recursive(item, depth + 1) for item in obj ] return obj + + return _decode_recursive(obj)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Recursively handle nested structures

if isinstance(obj, dict):

for key, value in obj.items():

if isinstance(value, dict):

obj[key] = decode(value) # Recursive call for nested dict

elif isinstance(value, list):

obj[key] = [

decode(item) for item in value

] # Recursive list handling

elif isinstance(obj, list):

obj = [

decode(item) for item in obj

] # Handle obj itself as a list if list type

MAX_RECURSION_DEPTH = 100

def decode(obj: dict[Any, Any]) -> Union[Value, Kind, dict]:

def _decode_recursive(obj: Any, depth: int = 0) -> Any:

if depth > MAX_RECURSION_DEPTH:

raise ValueError("Maximum recursion depth exceeded")

if isinstance(obj, (Value, Kind)):

return obj

# ... existing type checks ...

if isinstance(obj, dict):

for key, value in obj.items():

if isinstance(value, dict):

obj[key] = _decode_recursive(value, depth + 1)

elif isinstance(value, list):

obj[key] = [

_decode_recursive(item, depth + 1) for item in value

]

elif isinstance(obj, list):

obj = [

_decode_recursive(item, depth + 1) for item in obj

]

return obj

return _decode_recursive(obj)

@janezpodhostnik what do we think about this suggestion? Given decode is a heavily used method I would err on making it as robust as we can, cc @lealobanov

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

coderabbitai · 2024-10-31T14:56:50Z

flow_py_sdk/cadence/decode.py

+    except KeyError as e:
+        logging.error(f"Unhandled key during decode: {e}. Object: {obj}")
+        raise
+
+    except NotImplementedError:
+        logging.error(f"Decoding not implemented for object: {obj}")
+        raise


💡 Codebase verification

⚠️ Potential issue

Error handling improvements needed with test coverage

The error handling needs test coverage - there are no tests for error cases in encode_test.py

The suggested log level changes are valid - KeyError should use warning level as it may be recoverable

The error messages can be more descriptive as suggested

except KeyError as e: - logging.error(f"Unhandled key during decode: {e}. Object: {obj}") + logging.warning( + f"Unhandled key '{e}' during decode of {type(obj).__name__}. " + f"Value: {obj}" + ) raise except NotImplementedError: - logging.error(f"Decoding not implemented for object: {obj}") + logging.error( + f"Decoding not implemented for type {type(obj).__name__}. " + f"Value: {obj}" + ) raise

Add test cases:

Test KeyError handling with invalid JSON input

Test NotImplementedError with unsupported types

🔗 Analysis chain

Enhance error messages and add test coverage.

The error handling could be improved with:

More descriptive error messages including object type

Different log levels for different scenarios

Test coverage for error cases

except KeyError as e: - logging.error(f"Unhandled key during decode: {e}. Object: {obj}") + logging.warning( + f"Unhandled key '{e}' during decode of {type(obj).__name__}. " + f"Value: {obj}" + ) raise except NotImplementedError: - logging.error(f"Decoding not implemented for object: {obj}") + logging.error( + f"Decoding not implemented for type {type(obj).__name__}. " + f"Value: {obj}" + ) raise

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Check test coverage for error handling paths # Test: Look for test files covering decode.py fd "test.*decode.*\.py$" # Test: Check for error handling test cases rg -A 5 "def test.*decode.*error"

Length of output: 62

Script:

#!/bin/bash # Let's try a broader search for test files and error handling patterns # Find all test files in the project fd "test.*\.py$" # Look for decode-related test files with a more flexible pattern rg -l "decode" $(fd "test.*\.py$") # Search for any error handling tests in Python files rg -A 5 "(def|async def) test.*error" $(fd "test.*\.py$") # Check the actual decode.py file location and its test coverage fd "decode.py$"

Length of output: 835

Script:

#!/bin/bash # Let's examine the encode_test.py file since it's related to decode functionality cat tests/cadence/encode_test.py # Also check if there are any existing error handling tests in decode.py rg -B 2 -A 5 "except.*Error" flow_py_sdk/cadence/decode.py # Look for any existing logging patterns in tests rg "logging\." tests/cadence/encode_test.py

Length of output: 52727

lealobanov · 2024-10-31T14:58:36Z

Hi @janezpodhostnik ,

Just updated the exception handling to raise an exception and exit the decoding if one of the events failed to decode (instead of continuing).

I think the source of the bug might have been the recent migration to Cadence 1.0, as I noticed all of the events which failed to decode had complex nested dict/list fields such as authorizations and staticType found in the EntitlementConjunctionSet class. Entitlements seem to be a new syntax introduced since the C1.0 release (https://www.youtube.com/watch?v=aZ1aCBtPpXo), so I suspect this might be why this wasn't throwing an error before.

Some other preventative measures we could take on the decoding of events array:

Imposing a max number of events that get decoded per transaction
Imposing a max limit on event payload size

Initially I tested out both of these changes, but ultimately this TX id ended up decoding successfully once I updated to decode the nested structures. So for now I think it's ok to skip on the above 2 items and potentially reconsider if events will continue to cause issues.

Thanks!

lealobanov added 2 commits October 25, 2024 17:05

Debugging events decoding

3a49202

Clean up logs

3a5433d

coderabbitai bot reviewed Oct 28, 2024

View reviewed changes

Clean up

402e0f9

coderabbitai bot reviewed Oct 28, 2024

View reviewed changes

flow_py_sdk/client/entities.py Outdated Show resolved Hide resolved

flow_py_sdk/client/entities.py Show resolved Hide resolved

flow_py_sdk/client/entities.py Outdated Show resolved Hide resolved

Clean up logging

a6df9dd

lealobanov commented Oct 28, 2024

View reviewed changes

Delete myenv directory

7e67d9b

lealobanov self-assigned this Oct 28, 2024

lealobanov mentioned this pull request Oct 28, 2024

Deserializing of events from flow_proto fails with transactions with many events #66

Open

lealobanov added 2 commits October 28, 2024 15:16

Lint

51edd23

Merge branch 'bug-66' of https://github.com/janezpodhostnik/flow-py-sdk…

8f8b88d

… into bug-66

coderabbitai bot reviewed Oct 28, 2024

View reviewed changes

janezpodhostnik reviewed Oct 29, 2024

View reviewed changes

flow_py_sdk/client/entities.py Show resolved Hide resolved

Raise exception instead of skipping decoding event when failure

f9f4dc6

coderabbitai bot reviewed Oct 31, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#66/ Deserializing of events from flow_proto fails with transactions with many events #67

#66/ Deserializing of events from flow_proto fails with transactions with many events #67

lealobanov commented Oct 28, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 28, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

codecov-commenter commented Oct 28, 2024 •

edited

Loading

coderabbitai bot left a comment

coderabbitai bot left a comment

lealobanov Oct 28, 2024

lealobanov Oct 28, 2024

coderabbitai bot left a comment

janezpodhostnik left a comment

coderabbitai bot left a comment

coderabbitai bot Oct 31, 2024 •

edited

Loading

franklywatson Oct 31, 2024 •

edited

Loading

coderabbitai bot Oct 31, 2024

coderabbitai bot Oct 31, 2024

lealobanov commented Oct 31, 2024 •

edited

Loading

#66/ Deserializing of events from flow_proto fails with transactions with many events #67

Are you sure you want to change the base?

#66/ Deserializing of events from flow_proto fails with transactions with many events #67

Conversation

lealobanov commented Oct 28, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Oct 28, 2024 • edited Loading

Walkthrough

Changes

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

codecov-commenter commented Oct 28, 2024 • edited Loading

Codecov Report

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

lealobanov Oct 28, 2024

Choose a reason for hiding this comment

lealobanov Oct 28, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

janezpodhostnik left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

franklywatson Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

coderabbitai bot Oct 31, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 31, 2024

Choose a reason for hiding this comment

lealobanov commented Oct 31, 2024 • edited Loading

lealobanov commented Oct 28, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 28, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov-commenter commented Oct 28, 2024 •

edited

Loading

coderabbitai bot Oct 31, 2024 •

edited

Loading

franklywatson Oct 31, 2024 •

edited

Loading

lealobanov commented Oct 31, 2024 •

edited

Loading