Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM-Enhanced Text-Attributed Graph (TAG) Representation Learning #9428

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

devanshamin
Copy link
Contributor

@devanshamin devanshamin commented Jun 17, 2024

PR for #9361.

I have skipped the Design Choices section from the README. You can find the Design Choices section here. I can include it if necessary.

Please let me know if any changes are required.


Pre-commit Issue

Due to the pre-commit configuration for the torch_geometric package, the isort hook removes the newline between the external library imports and the tape package library imports. For example, in train.py:

import copy
from dataclasses import is_dataclass
from typing import Optional

import numpy as np
import pandas as pd
import torch
from jsonargparse import ActionConfigFile, ArgumentParser
from tape.config import DatasetName, FeatureType
from tape.dataset.dataset import GraphDataset
from tape.dataset.llm.engine import LlmOfflineEngineArgs, LlmOnlineEngineArgs
from tape.dataset.lm_encoder import LmEncoderArgs
from tape.gnn_model import NodeClassifierArgs
from tape.trainer.gnn_trainer import GnnTrainer, GnnTrainerArgs
from tape.utils import profile_execution

It also disrupts the order of imports. For example, in gnn_trainer.py:

from dataclasses import dataclass
from typing import Literal, Optional

import torch
from tape.dataset.dataset import GraphDataset
from tape.gnn_model import NodeClassifier, NodeClassifierArgs

from torch_geometric.data import Data

from vllm import LLM, SamplingParams


class LlmOfflineEngine(LlmEngine):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's some overlap with Rishi's current work on adding G-Retriever: #9462

It would be nice to keep the LLM models in a specific directory, as it was done by Rishi in the above PR.

+@puririshi98 for further insights

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes i definitely agree here. as it stands this pr is basicaly the equivalent of making your own research repository on github and using PyG. there is no integration into the actual framework. there are 32 changed files which is far too much to effectively review at once. For adding G-retriever I have broken my PR down into smaller PRs first. however before embarking on that I recommend you understand how your work could actualy be integrated into torch_geometric itself, rather than just what is essentially a standalone example repository.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @Kh4L and @puririshi98, for your feedback.

@puririshi98, I agree with you. There is no integration into the actual framework; hence, it is under the examples directory. The proposed TAPE method converts title/abstract text into LLM responses (prediction and explanation) into node features and uses it to train respective GNN models and combine the result of each GNN model. I have created a repository extending this method: tag-llm.

@rusty1s, how would you like to move forward with this since you proposed it?

@puririshi98 puririshi98 self-requested a review July 11, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants