Goblin Dependency Miner

This project allows you to generate and update a Maven Central dependency graph in a Neo4j database.

A weaver used to enrich queries is available here: https://github.com/Goblin-Ecosystem/goblinWeaver.
A Zenodoo archive that contains the generated dataset dump and the Weaver jar is available here: https://zenodo.org/records/10291589.

If you use the dataset dump present in Zenodo, please use a Neo4j version 4.x.

The times shown in this document have been realized by a machine with the following characteristics:

OS: Red Hat Enterprise Linux
OS version: 8.7
16 CPUs: Intel(R) Xeon(R) CPU E7-8880 v4 @ 2.20GHz
Memory: 64 GB

Requirements

Java 17
Maven, with MAVEN_HOME defines

Maven Central Index

To get all Maven releases data, we use the Central index archive here: https://repo.maven.apache.org/maven2/.index/nexus-maven-repository-index.gz
Initially, this program will download the most recent archive and unpack it with the Maven Indexer CLI jar present on the lib folder.
This will create a "central-lucene-index" folder at the root of the project during the execution, this folder will be deleted at the end of the program.

Doc: https://maven.apache.org/repository/central-index.html

Size on disk:

central-lucene-index: 21G

Configuration

Configuration file

To run the application you need to edit the configuration file in: src/main/resources/configuration.yml.

dataBaseExport: Choose the database you want to export data (can be Postgres, neo4J or both).
update: Set true if you want to update an existing neo4j graph, set false to generate a graph from scratch.
thread: Define the number of threads allocated to run the program.

Database configuration

Postgres

To configure your Postgres database, you have to put your database information in the src/main/resources/META-INF/persistence.xml file.

Neo4J

To configure your Neo4J database, you have to put your database information in the src/main/resources/configuration.yml file.

Run

_JAVA_OPTIONS="-Xmx30G" mvn clean compile exec:java

Time to run the project from scratch with 12 threads on October 05, 2023: 4.1 days.
Time to update our dataset from October 05, 2023, to October 14, 2023: 1h06m.
Time to update our dataset from April 14, 2023, to October 14, 2023, (six months): 6h23.
Time to update our dataset from October 14, 2022, to October 14, 2023, (one year): 11h27m.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
lib		lib
src		src
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
pom.xml		pom.xml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Goblin Dependency Miner

Requirements

Maven Central Index

Configuration

Configuration file

Database configuration

Postgres

Neo4J

Run

Licensing

About

Releases 1

Packages

Languages

License

Goblin-Ecosystem/goblinDependencyMiner

Folders and files

Latest commit

History

Repository files navigation

Goblin Dependency Miner

Requirements

Maven Central Index

Configuration

Configuration file

Database configuration

Postgres

Neo4J

Run

Licensing

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages