Skip to content

pepkit/pepdbagent

Repository files navigation

pepdbagent

PEP compatible Run pytests pypi-badge pypi-version Coverage Github badge


Documentation: https://pep.databio.org

Source Code: https://github.com/pepkit/pepdbagent


pepdbagent is a Python library and toolkit that gives a user-friendly interface to connect, upload, update and retrieve information from the pep database. This library is designed to work with PEPhub, but it can be used for any other purpose.

pepdbagent creates a connection to the database and creates table schemas for the PEPhub database if necessary. Core database is postgres database, but it can be easily extended to other relational databases. To use pepdbagent, you need to have a database instance running with its credentials. If the version of the database schema is not compatible with the version of pepdbagent, it will throw an exception.

Installation

To install pepdbagent use this command:

pip install pepdbagent

or install the latest version from the GitHub repository:

pip install git+https://github.com/pepkit/pepdbagent.git

Overview:

The pepdbagent provides a core class called PEPDatabaseAgent. This class has 4 modules, divided to increase readability, maintainability, and user experience of pepdbagent, which are:

The pepdbagent consists of 6 main modules:

  • Namespace: Includes methods for searching namespaces, retrieving statistics, and fetching information.
  • Project: Provides functionality for retrieving, uploading, updating, and managing projects.
  • Annotation: Offers features for searching projects in the database and namespaces, retrieving annotations, and other related information.
  • Sample: Handles the creation, modification, and deletion of samples, without modification of the entire project.
  • View: Manages the creation, modification, and deletion of views for specific projects.
  • User: Contains user-related information such as favorites and other user-related data.

Example:

Instiantiate a PEPDatabaseAgent object and connect to database:

import pepdbagent
# 1) By providing credentials and connection information:
agent = pepdbagent.PEPDatabaseAgent(user="postgres", password="docker", )
# 2) or By providing connection string:
agent = pepdbagent.PEPDatabaseAgent(dsn="postgresql://postgres:docker@localhost:5432/pep-db")

Example of usage of the pepdbagent modules:

import peppy

prj_obj = peppy.Project("sample_pep/basic/project_config.yaml")

# create a project
namespace = "demo"
name = "basic_project"
tag = None
agent.project.create(prj_obj, namespace, name, tag)

update_dict = {"is_private" = True}
# after creation of the dict, update record by providing update_dict and namespace, name and tag:
agent.project.update(update_dict, namespace, name, tag)

Annotation example:

The .annotation module provides an interface to PEP annotations. PEP annotations refers to the information about the PEPs (or, the PEP metadata). Retrieved information contains: [number of samples, submission date, last update date, is private, PEP description, digest, namespace, name, tag]

```python
# Get annotation of one project:
agent.annotation.get(namespace, name, tag)

# Get annotations of all projects from db:
agent.annotation.get()

# Get annotations of all projects within a given namespace:
agent.annotation.get(namespace='namespace')

# Search for a project with partial string matching, either within namespace or entire database
# This returns a list of projects
agent.annotation.get(query='query')
agent.annotation.get(query='query', namespace='namespace')

# Get annotation of multiple projects given a list of registry paths
agent.annotation.get_by_rp(["namespace1/project1:tag1", "namespace2/project2:tag2"])

# By default get function will retrun annotations for public projects,
# To get annotation including private projects admin list should be provided.
# admin list means list of namespaces where user has admin rights
# For example:
agent.annotation.get(query='search_pattern', admin=['databio', 'ncbi'])

Namespace

The .namespace module contains search namespace functionality that helps to find namespaces in database and retrieve information: number of samples, number of projects.

Example:

# Get info about namespace by providing query argument. Then pepdbagent will
# search for a specified pattern of namespace in database.
agent.namespace.get(query='Namespace')

# By default all get functions will return namespace information for public projects,
# To get information with private projects, admin list should be provided.
# admin list means list of namespaces where user has admin rights
# For example:
agent.namespace.get(query='search_pattern', admin=['databio', 'geo', 'ncbi'])

For more information, developers should use pepdbagent pytest as documentation due to its natural language syntax and the ability to write tests that serve as executable examples. This approach not only provides detailed explanations but also ensures that code examples are kept up-to-date with the latest changes in the codebase.