Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FiGraph dataset #9630

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

XiaoguangWang23
Copy link

Overview

This Pull Request (PR) introduces a real-world dataset, FiGraph dataset, to the PyTorch Geometric (PyG) library. FiGraph is a dynamic heterogeneous graph dataset that captures the evolving relationships within financial networks over a span of nine years. This dataset is particularly useful for node classification tasks where both the temporal dynamics and heterogeneous nature of the graph are crucial.

Dataset Details

Dynamic Heterogeneous Graph

FiGraph is structured as a dynamic heterogeneous graph, meaning it not only evolves over time but also contains multiple types of nodes and edges. Each year from 2014 to 2022 is represented as a distinct graph snapshot within the dataset.

  • Time Span: 2014 to 2022

  • Graph Snapshots: 9 snapshots, one for each year

  • Node Types: 5 distinct types of nodes, labeled as:

    • L: Listed companies
    • U: Unlisted companies
    • H: Holding companies
    • A: Auditors
    • R: Regulatory bodies
  • Edge Types: 4 types of edges, representing different types of relationships:

    • Related-party transaction
    • Investment
    • Audit
    • Supply chain

Yearly Snapshots

Each year's data is stored as a separate snapshot, capturing the state of the financial network at that time. The nodes' features and labels, as well as the graph structure, are allowed to change from year to year, making this dataset particularly suitable for studying temporal dynamics in graph-based learning tasks.

  • Node Features: Only nodes of type L (Listed companies) have features, which include financial attributes such as profit and liabilities. These features can vary annually.
  • Node Labels: Similarly, only L type nodes have labels, which indicate whether a company's financial report for that year is fraudulent (Label = 1) or normal (Label = 0).

Code Structure

  • Dataset Code: Implemented in torch_geometric/datasets/figraph.py.
  • Data Files: The corresponding yearly CSV files are located in torch_geometric/datasets/figraph/data/.

Example Usage

Researchers can load the FiGraph dataset as follows:

from torch_geometric.datasets import FiGraphDataset

dataset = FiGraphDataset(root='path_to_dataset')

@rusty1s
Copy link
Member

rusty1s commented Sep 3, 2024

It's not possible for me to merge 50 files and 9000 LOC. Do we need the model implementations for this PR?

@XiaoguangWang23
Copy link
Author

XiaoguangWang23 commented Sep 4, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants