Data Integration and Knowledge Integration
- GeoFlux: Hands-Off Data Integration Leveraging Join Key Knowledge (SIGMOD 2018) [PDF, demo] 🌟
- Non-binary evaluation measures for big data integration (VLDBJ 2018) 🌟
- Meta-Mappings for Schema Mapping Reuse [PDF] (VLDB 2019) 🌟
- Representing Temporal Attributes for Schema Matching (KDD 2020) 🌟
- Bayesian Networks for Data Integration in the Absence of Foreign Keys (TKDE 2020) 🌟
- Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks [Paper] (SIGMOD 2020) 🌟
- JenTab: Bridging Tabular Data and Knowledge Graphs – A Detailed System Overview (Semantic Web 2024) [Paper]
KG and Blockchains
- BlockChain + KG [Link]
Data Extraction or Knowledge Extraction from The Web
- When Open Information Extraction Meets the Semi-Structured Web (OpenCERES, NAACL 2019)
- CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web (CERES, VLDB 2018) 🌟
- Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment [PDF] (SIGMOD 2019) 🌟
- RED: Redundancy-Driven Data Extraction from Result Pages (WWW 2019)
- TCN: Table Convolutional Network for Web Table Interpretation (WWW 2021) [Paper]
AIOps (Artificial Intelligence for IT Operations)
Multi-hop Reading
- Cognitive Graph for Multi-Hop Reading Comprehension at Scale (ACL 2019) [Paper]
- BERT + GNN
- Is Graph Structure Neccessary for Multi-Hop Reading? (EMNLP 2020) [Paper] [Notes]
- Dynamically fused graph network for multi-hop reading (ACL 2019)
- AdaLoGN: Adaptive Logic Graph Network for Reasoning-Based Machine Reading Comprehension (ACL 2022) [Paper] [[Github](https://github.com/nju-websoft/AdaLoGN]
Graph Functional Dependencies
Notes: For this topic, we can check Wenfei Fan's homepage for more related publications
- Discovering Graph Functional Dependencies (SIGMOD 2018) [Paper]
- Functional Dependencies for Graphs (SIGMOD 2016) [Paper]
- Rule-Based Graph Repairing: Semantic and Efficient Repairing Methods (ICDE 2018) [Paper]
Subgraph Isomorphism
- An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases (VLDB 2013)
- On the equivalence between graph isomorphism testing and function approximation with GNNs (NeurIPS 2019) [Paper]
Fraud Detection
- https://info.tigergraph.com/graph-ai-world-fintell
- https://www.youtube.com/watch?v=Mf8PuOElGpg
- https://neo4j.com/use-cases/fraud-detection/
K-Core in Graphs
- Hierarchical Core Maintenance on Large Dynamic Graphs (VLDB 2021) [Paper]
- Efficient Progressive Minimum k-Core Search (VLDB 2020) [Paper]
Transformers!
- The Illustrated Transformer [GitHub]
BERT+KG
- ENRIE (Tsinghua) [References]
- ENRIE (Baidu) [References]
XAI and Explanable GNN
- On Explainability of Graph Neural Networks via Subgraph Explorations [Paper] 🌟
- Shapley value --> taxi sharing
- ReduNet: A White-box Deep Network from the Principle of Maximizing Rate Reduction [Paper] [Some discussion]
- Evaluating XAI: A comparison of rule-based and example-based explanations [Paper]
HGNN
- Heterogeneous Graph Structure Learning for Graph Neural Networks (AAAI 2021) [Paper]
Others
- Neural Subgraph Isomorphism Counting (KDD 2020) [Paper] [Code] 🌟
- A Fresh Look on Knowledge Bases Distilling Named (CIKM 2014)🌟
- Event KB. Each news article is regarded as a event. Build the semantic similarity relations and the tmporal relations between evernts.
- A Generic Ontology Framework for Indexing Keyword Search on Massive Graphs
- Extending Graph Patterns with Conditions
- LUSTRE: An Interactive System for Entity Structured Representation and Variant Generation (ICDE 2018) [PDF, demo] 🌟
- TableView: A Visual Interface for Generating Preview Tables of Entity Graphs (ICDE 2018) [PDF, demo] 🌟
- Mining Summaries for Knowledge Graph Search (TKDE 2018) [PDF, ICDM2016 version] 🌟
- Embedded Functional Dependencies and Data-completeness Tailored Database Design [PDF] (VLDB 2019) 🌟
- Tutorial: Combating Fake News: A Data Management and Mining Perspective [Link] (VLDB 2019) 🌟
- Tutorial: Data Lake Management: Challenges and Opportunities [Link] (VLDB 2019) 🌟
- Spade: A Modular Framework for Analytical Exploration of RDF Graphs [PDF, demo] (VLDB 2019) 🌟
- PivotE: Revealing and Visualizing the Underlying Entity Structures for Exploration [PDF, demo] (VLDB 2019) 🌟
- Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks [Paper, Presentation] (KDD 2019) 🌟
- Next work: MultiImport: Inferring node importance in a knowledge graph from multiple input signals [Paper] (KDD 2020) 🌟
- Embedding-based Retrieval in Facebook Search (KDD 2020) [Paper]
- Automatically Generating Interesting Facts from Wikipedia Tables [PDF, industrial track] (SIGMOD 2019) 🌟
- Knowledge Graphs and Enterprise AI: The Promise of an Enabling Technology [PDF, keynote] (ICDE 2019) 🌟
- Collective Keyword Query on a Spatial Knowledge Base (TKDE 2019) 🌟
- Distribution-Aware Crowdsourced Entity Collection (TKDE 2019) 🌟
- Effective and Efficient Relational Community Detection and Search in Large Dynamic Heterogeneous Information Networks (VLDB 2020) 🌟
- Obi-Wan: Ontology-Based RDF Integration of Heterogeneous Data (VLDB 2020) 🌟
- RDFFrames: Knowledge Graph Access for Machine Learning Tools (demo, VLDB 2020) 🌟
- SPHINX: A System for Metapath-based Entity Exploration in Heterogeneous Information Networks (demo, VLDB 2020) 🌟
- Dataset Discovery in Data Lakes [Video][Slides][Paper] (ICDE 2020) 🌟
- SLIM: Scalable Linkage of Mobility Data [Paper] (SIGMOD 2020) 🌟
- Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks [Paper] (SIGMOD 2020) 🌟
- A survey of community search over big graphs (VLDBJ 2020) 🌟
- An analytical study of large SPARQL query logs (VLDBJ 2020) 🌟
- Generalizing Tensor Decomposition for N-ary Relational Knowledge Bases (WWW 2020)
- Adaptive Low-level Storage of Very Large Knowledge Graphs (WWW 2020)
- Be Concise and Precise: Synthesizing Open-Domain Entity Descriptions from Facts (WWW 2019)
- Knowledge-Enhanced Ensemble Learning for Word Embeddings (WWW 2019)
- Effective and Scalable Clustering on Massive Attributed Graphs (WWW 2021)
- k-attributed graph clustering (k-AGC) groups nodes in G into k disjoint clusters, such that nodes within the same cluster share similar topological and attribute characteristics, while those in different clusters are dissimilar.
- Trav-SHACL: Efficiently Validating Networks of SHACL Constraints (WWW 2021)
- Sampling from Large Graphs (KDD 2006) [Paper]
- A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective [Paper] (TKDE 2021) 🌟
- Learning Dynamic User Interest Sequence in Knowledge Graphs for Click-Through Rate Prediction [Paper] (TKDE 2021) 🌟
- Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond (SIGMOD 2021) 🌟
- Alibaba - GraphScope (VLDB 2021/22 industrial) 🌟 三个独立引擎 GAIA (NSDI 2021), GRAPE (SIGMOD 2017), AliGraph
- vertex central GNN (SIGMOD 2021) James Cheng, CUHK
- KungFu: Taking Training in Distributed Machine Learning Adaptive (OSDI)
- ArangoML Pipeline [GitHub]
- FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU (ICML 2023) [Paper]
LLM: From Beginers to Researcher(?)
- A Survey of Large Language Models (Arxiv, 2023) [Paper] [A good summary and notes, fig 3 and fig 5 are quite useful]
Note: There are a few valuable survey collection regarding the data processing/management/collection for AI/LLM, including