Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add option for string node representation update #16100

Merged
merged 5 commits into from
Sep 23, 2024

Conversation

prrao87
Copy link
Contributor

@prrao87 prrao87 commented Sep 19, 2024

Description

Related to the Property Graph Index retriever's as_retriever method that allows users to print the string representation of a property graph node or relationship.

I think it makes sense to revert the unintended consequence of string node representation update per #14707, where the properties dict is coerced to a string and appended to the node name prior to printing the string representation. This caused the string representation of the node to return a garbled mess of name + properties that make it hard to read when running the following code in the property graph index retriever:

retriever = index.as_retriever(
    include_text=False,  # include source text in returned nodes, default True
)

nodes = retriever.retrieve("What happened at Interleaf and Viaweb?")

for node in nodes:
    print(node.text)

The suggested revert to the change made by @logan-markewich now returns the correct results, pretty-printed as follows:

Paul Graham -> WORKED_ON -> Viaweb
Viaweb -> PART_OF -> Yahoo
Paul Graham -> WORKED_ON -> Technology companies
Paul Graham -> WORKED_ON -> World Wide Web

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Tested on Kùzu, Neo4j and FalkorDB by rerunning the example notebooks end-to-end, and nothing breaks.

  • Added new unit/integration tests
  • Added new notebook (that tests end-to-end)
  • I stared at the code and made sure it makes sense

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran make format; make lint to appease the lint gods

@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Sep 19, 2024
@logan-markewich
Copy link
Collaborator

@prrao87 I'm pretty sure i made this change because people wanted the node properties in the string representation lol

Instead of removing, there should be a toggle (likely on the retriever)

@prrao87
Copy link
Contributor Author

prrao87 commented Sep 19, 2024

Interesting.. the call to {triplet[n]!s} per this line is hardcoded tho, so I'm wondering what a good toggle could be - would it be in the subretriever, or at this upper level in types.py?

@logan-markewich
Copy link
Collaborator

@prrao87 maybe something roughly like

.as_retriever(include_properties=True)
if self.include_properties
  node_text = f"{node} ({node.properties}) -> ..."
else:
  node_text = f"{node} -> ..."

@prrao87
Copy link
Contributor Author

prrao87 commented Sep 19, 2024

@prrao87 maybe something roughly like

.as_retriever(include_properties=True)
if self.include_properties
  node_text = f"{node} ({node.properties}) -> ..."
else:
  node_text = f"{node} -> ..."

Was exactly trying this but am getting stuck in inheritance hell lol. It applies to the subretriever but I think I need to have the same kwarg in all the retrievers that are being subclassed for the PropertyGraphIndex 😅.

@logan-markewich logan-markewich self-assigned this Sep 19, 2024
@logan-markewich
Copy link
Collaborator

@prrao87 yea, there's the base class and 5 other retrievers, not too bad :)

@prrao87
Copy link
Contributor Author

prrao87 commented Sep 19, 2024

yea, there's the base class and 5 other retrievers, not too bad :)

Now that you confirmed this, yes, not too bad at all 😅. Fixing now.

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Sep 19, 2024
@prrao87
Copy link
Contributor Author

prrao87 commented Sep 19, 2024

Done - I left the default value as False because this way, none of the existing example notebooks that call as_retriever would need to be rerun. If users want, they can always call the retriever this way:

retriever = index.as_retriever(include_properties=True)

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Sep 23, 2024
@logan-markewich logan-markewich enabled auto-merge (squash) September 23, 2024 22:50
@logan-markewich logan-markewich merged commit 7160c0c into run-llama:main Sep 23, 2024
10 checks passed
@logan-markewich logan-markewich changed the title Revert unintended consequence of string node representation update add option for string node representation update Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants