Skip to content

RADOS Storage Plugin

Peter Burow edited this page Dec 7, 2017 · 69 revisions

The Hybrid Storage Model

The mails are saved directly as RADOS objects. All other data are stored as before in the file system. This applies in particular to the data of the lib-index of Dovecot. We assume the file system is designed as shared storage based on CephFS.

Based on the code of the Dovecot storage format Cydir we developed a hybrid storage as Dovecot plugin. The hybrid storage directly uses the librados for storing mails in Ceph objects. The mail objects are immutable and get stored in one RADOS object. Immutable metadata is stored in omap KV and XATTR. The index data is completely managed by Dovecot’s lib-index and ends up in CephFS volumes.

Because of the way MUAs access mails, it may be necessary to provide a local cache for mail objects. The cache can be located in the main memory or on local (SSD) storage. However, this optimization is optional and will be implemented only if necessary.

The mail objects and CephFS should be placed in different RADOS pools. The mail objects are immutable and require a lot of storage. They would benefit a lot from erasure coded pools. The index data required a lot of writing and are placed on an SSD based CephFS pool.

RADOS Mail Object Format

A mail is immutable regarding its RFC5322 content and some attributes know at the time of saving. The RFC5322 content is written as RADOS object data without any modifications. The immutable attributes Dovecot is using, are stored as RADOS XATTR. Their values are stored in string representation. Right now, the following attributes are stored with the objects

I

rbox format version, string, currently "0.1"

G

Mail GUID, UUID stored as hex string

R

Received Date as Unix time, long stored as string

S

Save Date as Unix time, long stored as string

P

POP3 UIDL, string

O

POP3 Order, unsigned int stored as string

M

Mailbox GUID, Mailbox GUID as hex string

Z

Physical Size, mail’s physical size in bytes as uint64_t stored as string

V

Virtual Size, mail’s virtual size in bytes as uint64_t stored as string

U

Mail UID, uint32_t stored as string

A

From Envelope, string

K

Mail Keywords, string with prefix 'k_'+<keyword_idx>

F

Mail Flags, string as hex with leading 0x

All writable attributes like flags or keywords are stored in Dovecot index files only.

The mail objects are addressed using a UUID in hex as OID for each mail. This OID is stored in the Dovecot index data as an obox extension record. The extension record is compatible to the obox format and can be inspected using doveadm dump <path-to-mailbox>:

RECORD: seq=1, uid=568, flags=0x19 (Seen Answered Draft)
 - ext 0 keywords  :            (1000)
 - ext 1 modseq    :       4056 (d80f000000000000)
 - ext 4 cache     :        692 (b4020000)
 - ext 5 obox      :            (8eed840764b05359f12718004d2485ee8ded840764b05359f12718004d2485ee)
                   : guid = 8eed840764b05359f12718004d2485ee
                   : oid  = 8ded840764b05359f12718004d2485ee
 - ext 6 vsize     :       2214 (a6080000)
                   : vsize         = 2765958940838
                   : highest_uid   = 121556224
                   : message_count = 1404118538
⚠️
If the index data for a mailbox gets lost, there are currently only limited ways to reconstruct the mailbox from the RADOS objects. We use a RADOS namespace per user to make object iteration for a recovery possible.

Restore Index

To restore a rbox index, Dovecot’s tool doveadm force-resync -u user <MAILBOX_NAME> can be used. The tool first makes a backup of the current index file if it exists. Afterwards, it tries to read the mailbox GUID from the index file to determine the mailbox unique identifier. If GUID cannot be found, the mailbox name given in the doveadm command line argument will be used.

The restore process works in the following way.

  1. The tool searches for all RADOS objects matching XATTR M == <Mailbox GUID> in user namespace (in case no GUID is available, it uses B == <Mailbox name>)

  2. Found mail objects are added to the newly created index file.

  3. The tool tries to find a matching mail record in cache or backup index file (seq, uid) to complete the newly created index entry.

  4. The resulting index will only include mails which have a corresponding RADOS mail object.

File System Layout

The file system layout of rbox is quite similar to dbox. Wihtout any special configuration the layout is as follows:

<mail location root>/mailboxes/INBOX/rbox-Mails/dovecot.index*

Index files for INBOX

<mail location root>/mailboxes/foo/rbox-Mails/dovecot.index*

Index files for mailbox "foo"

<mail location root>/mailboxes/foo/bar/rbox-Mails/dovecot.index*

Index files for mailbox "foo/bar"

<mail location root>/dovecot.mailbox.log*

Mailbox changelog

<mail location root>/subscriptions

subscribed mailboxes list

<mail location root>/dovecot-uidvalidity*

IMAP UID validity

Note that with rbox the Index files actually contain significant data which is held nowhere else. Index files for rbox contain message flags and keywords. This data cannot be automatically recreated, so it is important that Index files are treated with the same care as message data files.

Index files can be stored in a different location by using the INDEX parameter in the mail location specification. If the INDEX parameter is specified, it will make Dovecot look for the Index files as follows:

<INDEX location>/mailboxes/INBOX/rbox-Mails/dovecot.index*

Index files for INBOX

<INDEX location>/mailboxes/foo/rbox-Mails/dovecot.index*

Index files for mailbox "foo"

<INDEX location>/mailboxes/foo/bar/rbox-Mails/dovecot.index*

Index files for mailbox "foo/bar"

The mail messages itself are stored in RADOS objects.

Configuration

Dovecot - 10-mail.conf

To load the plugin, add storage_rbox to the list of mail plugins to be loaded. There are several ways to do this. Add for example the plugin in 10-mail.conf to mail_plugins.

mail_plugins = $mail_plugins storage_rbox

To enable or disable the plugin per user, you can make your userdb return mail_plugins as an extra field. See UserDatabase/ExtraFields for examples.

Add the plugin to 10-mail.conf as mail_location. The name of the mailbox format is rbox. See Mail location for details. Because the index management of Dovecot is used, the description of path and a lot of the optional parameters are valid for rbox, too.

The optional parameters of the mailbox location specification that differ for rbox are:

LAYOUT

specifies the directory layout to use:

  • fs: The default used by rbox

  • index: Uses mailbox GUIDs as the directory names. The mapping between mailbox names and GUIDs exists in dovecot.list.index* files.

DIRNAME

specifies the directory name used in mailbox directories to store the mailbox files. With rbox the default is "rbox-Mails/". Note that this directory is used only for the mail directory not for index/control directories (but see below).

ALT

specifies the Alternate storage path for dbox formats. Not yet supported for rbox.

All Dovecot variables for mail_location can be applied. Add for example to 10-mail.conf:

mail_location = rbox:/home/user/dovecot/var/mail/rados/%u

Dovecot - 90-plugin.conf

When running Ceph with default settings, the default cluster name is ceph, which means you would save your Ceph configuration file with the file name ceph.conf in the /etc/ceph/ default directory. When you are running several clusters you may need to change this value.

rbox_cluster_name = ceph

If you do not specify a RADOS user name, the plugin will use client.admin as the default user name. You can change the username with the rbox_user_name option.

rbox_user_name = client.admin

The following setting defines the RADOS pool used to store the e-mails. The default is mail_storage. If the pool is missing, it will be created.

rbox_pool_name = mail_storage

You can specify which configuration object should be used by the dovecot-ceph-plugin. The plugin will try to read the plugin Ceph configuration from a Ceph object with this oid. If it fails, it will create a default configuration.

rbox_config_obj_name=rmb_cfg

All this can be configured in the plugin section in 90-plugin.conf:

plugin {
  rbox_cluster_name = ceph
  rbox_user_name = client.admin
  rbox_pool_name = mail_storage
  rbox_config_obj_name = rmb_cfg
}

Ceph

The plugin uses the default way for Ceph configuration described in Step 2: Configuring a Cluster Handle:

  1. rados_conf_parse_env(): Evaluate the CEPH_ARGS environment variable.

  2. rados_conf_read_file(): Search the default locations, and the first found is used. The locations are:

    • $CEPH_CONF (environment variable)

    • /etc/ceph/ceph.conf

    • ~/.ceph/config

    • ceph.conf (in the current working directory)

librmb

The configuration for the librmb is stored in a wellknown RADOS object named rmb_cfg as a JSON document. The name can be overridden by the Dovecot plugin configuration rbox_config_obj_name and can be modified and created with the rmb CLI.

The following shows the default librmb configuration. This configuration will be created if the configuration object in a RADOS pool is missing.

 {
   "user_mapping": "false",
   "user_ns": "users",
   "user_suffix": "_u",
   "rbox_public_namespace": "public",
   "rbox_mail_attributes": "MGPORZVBUIK",
   "rbox_updateable_attributes": "B",
   "rbox_update_attributes": "false"
 }

Namespace Usage

Per default, user e-mails are saved using the username as the RADOS namespace. For public e-mails which do not have an owner, the namespace public is used. This leads to a problem, when you allow to rename a username. All mails have to be copied from the namespace oldusername to the namespace newusername. To overcome this, it is possible to use a username indirection.

For each user a GUID will be generated and used as RADOS namespace. The mapping from the username to the internally used GUID will be done using a mapping object named username and containing the GUID as object data. If a username changes, only the object id of the mapping object has to be changed.

πŸ”₯
This is a setting which can only be changed before you start storing e-mails. This means, if you already have e-mails stored with user_mapping=false and change this value to true, the old e-mails are no longer accessible and vice versa.
user_mapping

Enable username mapping. The default is false.

user_ns

You can specify the namespace where the username mapping objects will be saved. The default is users.

user_suffix

If user_mapping is active, a new Ceph object for each user will be created, holding the namespace information. The name of this object will be <username><user_suffix>. This avoids collisions between system namespaces and username based namespaces. The default is _u.

rbox_public_namespace

For mails which are stored in a public folder, the configured public namespace will be used. No suffix will be added to this namespace to avoid collisions with username based namespaces. The default is public.

Metadata Usage

A mail object stores, besides the mail itself, some immutable or mutable metadata as desribed above. The configuration allows to define which and when the attributes are saved.

rbox_update_attributes

It may make sense to update attributes after the initial save to keep track of changes. The default is false.

rbox_mail_attributes

You can define which metadata attributes should be saved as Ceph XATTR. See RADOS Mail Object Format for metadata details. The default is MGPORZVBUI.

rbox_updateable_attributes

Currently, the only immutable metadata attribute which can be updated is the original mailbox 'B' metadata attribute to keep track of the mailbox name of a mail. This information may be helpful when rebuilding a lost index. The default is B

The possible values for rbox_mail_attributes and rbox_updateable_attributes can be found under RADOS Mail Object Format. For rbox_updateable_attributes currently only RBOX_METADATA_ORIG_MAILBOX (B) is supported.

Ceph Object and Namespace Layout

The RADOS objects that hold the mails are stored in a per user namespace. The tree based on the default configuration would look like this.

root
β”œβ”€β”€ public                                (1)
β”‚Β Β  └── 78DA8E6FA63746FAB89271CA2AF72BA4
β”œβ”€β”€ rmb_cfg                               (2)
└── <username>_u                          (3)
    └── 4463b919b3b9275a5f3100009c60b9f7
  1. This namespace contains the public mail objects.

  2. This object holds the JSON configuration.

  3. This object contains the GUID of the actual namespace for username. In this example 4440BDE7DC844DBC88BBCD3185A038B5.

If indirect namespaces are configured, to be independent of username changes, the namespace rbox_ns_cfg holds the mapping objects that map from username to the generated namespace GUID.

root
β”œβ”€β”€ 4440BDE7DC844DBC88BBCD3185A038B5      (1)
β”‚Β Β  └── 4463b919b3b9275a5f3100009c60b9f7
β”œβ”€β”€ 9ACDA05123BC4C5D96D3E322AF241CFE      (2)
β”‚Β Β  └── 78DA8E6FA63746FAB89271CA2AF72BA4
β”œβ”€β”€ rmb_cfg                               (3)
└── user                                  (4)
    β”œβ”€β”€ public                            (5)
    └── <username>_u                      (6)
  1. This namespace contains the mail objects of username.

  2. This namespace contains the mail objects of public.

  3. This object holds the JSON configuration.

  4. This namespace contains the indirection objects.

  5. This object contains the GUID of the actual namespace for public. In this example 9ACDA05123BC4C5D96D3E322AF241CFE.

  6. This object contains the GUID of the actual namespace for username. In this example 4440BDE7DC844DBC88BBCD3185A038B5.

Shared and Public Folders

To configure shared folder access the ACL plugin needs to be activated in the dovecot configuration as usual. In the namespace configuration you need to use rbox format as mailbox format. The configuration follows the mdbox configuration so rbox:%%h as location is sufficient.

dovecot-ceph-plugin uses the username as RADOS namespaces. In case of the public folder the namespace public is set.

Testing

We use ImapTest for testing the plugin. The Ceph cluster we used for the first tests runs locally and has been created using vstart.sh (See ceph/README.md). We test the protocols IMAP and POP3. Before you can start the tests you have to fit the environment.

For librmb we use googletest C++ Framework. Googletest library is added as git submodule you can clone googletest with: git submodule update --init --recursive

The configuration assumes a Ceph cluster running locally without cephx, that has for example been created using vstart.sh as decribed in Developer Guide (quick) or ceph/README.md.

../src/vstart.sh -X -n -l

Common

Create 100 user: Name = t1 .. t100, Password = t

etc/passwd:

t1:{PLAIN}t::::::
t2:{PLAIN}t::::::
t3:{PLAIN}t::::::
...
t100:{PLAIN}t::::::

Script to create the users:

#!/bin/bash
for i in {1..100}
    do
         echo "t$i:{PLAIN}t::::::" >> passwd
    done

IMAP

It is not ncecessary to add or modify test profiles. ImapTest can be started with the following command.

imaptest user=t%d pass=t port=10143

POP3

If POP3 is used for the ImapTest, it is necessary to add or modify some configuration entries.

System

ulimit -n 3072
ulimit -s unlimited

Dovecot / LMTP

Enable POP3 and LMTP via etc/dovecot/dovecot.conf:

protocols = imap pop3 lmtp

Add or change the following entry of etc/dovecot/conf.d/10-master.conf:

default_process_limit = 500
default_client_limit = 3000

service lmtp {
  unix_listener lmtp {
    #mode = 0666
  }

  inet_listener lmtp {
    address = 127.0.0.1 ::1
    port = 10024
  }
}

ImapTest

To run ImapTest with POP3 you have to use a profile file which sets POP3 as the client protocol.

POP3 profile example
lmtp_port = 10024
#lmtp_max_parallel_count = 500
total_user_count = 100
rampup_time = 30s

user aggressive {
  #username_prefix = test
  username_format = t%n
  count = 80%

  mail_inbox_delivery_interval = 10s
  mail_spam_delivery_interval = 5s
  mail_action_delay = 2s
  mail_action_repeat_delay = 1s
  mail_session_length = 3 min

  mail_send_interval = 10s
  mail_write_duration = 5s

  mail_inbox_reply_percentage = 50
  mail_inbox_delete_percentage = 5
  mail_inbox_move_percentage = 5
  mail_inbox_move_filter_percentage = 10
}

user normal {
  username_format = t%n
  count = 20%

  mail_inbox_delivery_interval = 5 min
  mail_spam_delivery_interval = 3 min
  mail_action_delay = 3 min
  mail_action_repeat_delay = 10s
  mail_session_length = 20 min

  mail_send_interval = 10 min
  mail_write_duration = 2 min

  mail_inbox_reply_percentage = 50
  mail_inbox_delete_percentage = 5
  mail_inbox_move_percentage = 5
  mail_inbox_move_filter_percentage = 10
}

client pop3 {
  count = 90%
  connection_max_count = 1
  protocol = pop3
  pop3_keep_mails = no
  login_interval = 1min
}

client pop3 {
  count = 10%
  connection_max_count = 1
  protocol = pop3
  pop3_keep_mails = yes
  login_interval = 1min
}