Skip to content

Identifiers

Jouni Siren edited this page Jun 9, 2018 · 7 revisions

Introduction

GBWT assumes that node identifiers and path identifiers are arbitrary integers without any structure. In many applications, we want to store the orientation of a node or a path, and then quickly find the identifier with the opposite orientation.

For this purpose, GBWT uses a simple encoding, where the orientation is stored in the low-order bit and the actual identifier in the high-order bits of the identifier value. The encodings for node identifiers and path identifiers are semantically separate but functionally identical. They are defined in support.h.

The endmarker is always assumed to be in forward orientation. (Node::encode(ENDMARKER, false) == ENDMARKER, while Node::encode(ENDMARKER, true) should never be used.)

See Data Model for further discussion of the data model and Query Interface for the definitions of data types.

Node

struct Node
{
  static size_type id(node_type node);
  static bool is_reverse(node_type node);
  static node_type encode(size_type node_id, bool reversed);
  static node_type reverse(node_type node);
};
  • static size_type id(node_type node): The identifier encoded in node.
  • static bool is_reverse(node_type node): The orientation encoded in node.
  • static node_type encode(size_type node_id, bool reversed): The encoding of identifier node_id, orientation reversed.
  • static node_type reverse(node_type node): The same node in the opposite orientation.

Path

struct Path
{
  static size_type id(size_type path);
  static bool is_reverse(size_type path);
  static size_type encode(size_type path_id, bool reversed);
  static size_type reverse(size_type path);
};
  • static size_type id(size_type path): The identifier encoded in path.
  • static bool is_reverse(size_type path): The orientation encoded in path.
  • static size_type encode(size_type path_id, bool reversed): The encoding of identifier path_id, orientation reversed.
  • static size_type reverse(size_type path): The same path in the opposite orientation.

In practice

Some construction methods support inserting the sequences in both orientations (see Construction Interface). The i-th inserted sequence gets path identifier Path::encode(i, false) in forward orientation and Path::encode(i, true) in reverse orientation. (When using other construction methods, the i-th sequence gets identifier i.)

The reverse path traverses the reverse orientations of the nodes in reverse order. The following functions reverse a path:

void reversePath(vector_type& path);
void reversePath(const vector_type& path, vector_type& output);
void reversePath(const vector_type& path, text_type& output, size_type& tail);
  • The first form reverses path in-place.
  • The second form appends the reverse of path to output.
  • The third form writes the reverse of path to output, starting at offset tail, and updates tail = tail + path.size().
Clone this wiki locally