Revision info:
Last modified: 12/02/2011
Version: 1.0.3
Notes:
Document not ready for public distribution.
Depending upon criticality of your production environment, and in most cases a requirement when running various Clusters on Linux, such as Heartbeat or RedHat Cluster, it may be necessary to ensure highly-available access to allocated iSCSI LUNs. There are a number of ways to accomplish this, but in this document we are going to focus on an approach where we define two targets each bound to a Target Portal Group, which in-turn is related to a network interface. Each interface is on its own subnet. End-result of this is a configuration where we should always have two distinct paths, by which we can reach our Storage. If one path completely fails, we should still be able to operate in a reduced capacity, but not sustain a complete failure.
The objective of this document is to present one of multiple designs for iSCSI multipathing between the NexentaStor SAN and a Linux client which in a real-world scenario may be part of large HA cluster. Same steps apply whether this is a single stand-alone system or a cluster. We are expecting that our Linux client(s) will be configured in a way where four paths exist to each block device mapped from the SAN. There will essentially be two paths per iSCSI interface from the client(s). This is due to the fact that each iSCSI interface will be bound to a dedicated physical network interface, and will login into both targets. It is best to establish a meaningful naming convention in advance of this setup, and follow the convention as much as possible. For clarity, we use the name of network interfaces in the Target Portal Group names, to easily identify which TPG ties to which physical network interface.
We are going to effectively observe a configuration similar to the following, as a result of completing steps in this document.
Login to [iface: sw-iscsi-1, target: iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58, portal: 10.10.8.90,3260] successful.
Login to [iface: sw-iscsi-0, target: iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58, portal: 10.10.8.90,3260] successful.
Login to [iface: sw-iscsi-1, target: iqn.2011-09.dom.homer:02:0e4f921a-407e-eb9f-9f53-aded8d99a723, portal: 10.10.9.90,3260] successful.
Login to [iface: sw-iscsi-0, target: iqn.2011-09.dom.homer:02:0e4f921a-407e-eb9f-9f53-aded8d99a723, portal: 10.10.9.90,3260] successful.
We are making a number of assumptions about the environment, and they are as follows:
- Client(s) are CentOS 6.
- Client(s) and SAN will both have two physical interfaces, each configured on a unique subnet
10.10.8
and10.10.9
. - We are not using a single physical interface with multiple aliases on either client(s) or SAN.
- SAN and client(s) are either directly attached, or are switched via a network that does not exhibit single points of failure, such a single switch, single path between switches, etc.
- SAN is configured with two targets, each bound to a Target Portal Group.
- Each Target Portal Group is tied to a single IP address, assigned to a physical interface, not an aliased interface and default iSCSI port 3260.
- Each network interface assigned to each Target Portal Group has a completely isolated physical path to the client.
- Both targets belong to single SCSI Target Group.
- iscsid and dm packages are correctly installed on the client, and dm-multipath service set to start automatically.
- For the purposes of this guide we are assuming that Remote Host and Target groups already exist. Please refer to the User's Guide for further details about creation of Host and Target groups.
-
iSCSI (COMSTAR) Target Names
iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58 iqn.2011-09.dom.homer:02:0e4f921a-407e-eb9f-9f53-aded8d99a723
-
iSCSI (COMSTAR) Target Portal Group name(s) and associated IP(s) TGP / ip:port
tpg_lab9_e1000g1 / 10.10.8.90:3260 tpg_lab9_e1000g2 / 10.10.9.90:3260
-
SCSI (COMSTAR) Target Group name(s)
tg_hydra_databases
-
SCSI (COMSTAR) Initiator Group(s)
hg_hydra_databases
-
Initiator(s) from the CentOS client
iqn.2011-10.dom.homer:1:server-lab6-583bf04eae36
Perform following steps via Management GUI (NMV).
-
We are going to create target portal groups first, to which we are going to bind our iscsi targets.
-
Navigate to Data Management > SCSI Target/Target Plus > iSCSI > Target Portal Groups. Click on Create under Manage iSCSI TPGs.
-
Enter the name, in our case
tpg_lab9_e1000g1
in the Name field, and IP Address:Port, in our case10.10.8.90:3260
in the Addresses field, then click Create. We choose to use a descriptive name for Target Portal Groups, including the name of network interface to which the group is bound. -
Repeat above step for second Target Portal Group. Each group is bound to a single IP Address. If more than two interfaces are being used, repeat for each interface/IP.
-
-
Next, we are adding iSCSI Targets, Target Groups and Initiator Groups.
-
Navigate to Data Management > SCSI Target/Target Plus > iSCSI > Targets. Click Create under Manage iSCSI Targets.
-
Enter the name, in our case
iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58
in the Name field. You many choose to leave this field empty to auto-generate a name. -
We are not using aliases, as they are optional, but you may choose to setup an alias for each target.
-
Under iSCSI Target Portal Groups select one of the Target Portal groups created earlier. For simplicity of management, we used
:01:
and:02:
in the name of the targets to designate first and second target and we relate first target totpg_lab9_e1000g1
and second target totpg_lab9_e1000g2
, then click Create. -
Navigate to Data Management > SCSI Target/Target Plus > SCSI Target > Target Groups. Click on Create under Manage Target Groups.
-
Enter the name, in our case
tg_hydra_databases
in the Name field, and select all targets, in our case, we select two targets we created earlier, then click Create. -
Navigate to Data Management > SCSI Target/Target Plus > SCSI Target > Target Groups.
-
Enter the name, in our case
hg_hydra_databases
in the Name field, and if Initiator names are already known you may enter them manually in the Additional Initiators field, otherwise we will update this group later after logging in from our client(s), then click Create.
-
-
At this point we can create mappings as necessary. We have to have ZVOLs created first, prior to setting up mappings. Please, refer to the User's Guide for further details on creation of ZVOLs and mappings. For the purposes of this document we are using four ZVOLs, with LUN ID's 10 through 13 and
hg_hydra_databases
as the Host Group andtg_hydra_databases
as the Target Group.
-
Quick validation of the iscsid service is necessary to make sure that indeed it is setup correctly. Command
chkconfig --list iscsid
should return state of the iscsid service. We expect to have it enabled in runlevels 3, 4, and 5. If not enabled, runchkconfig iscsid on
, which will assume defaults and enable service in runlevels 3, 4, and 5. Keep in mind, different distribution of Linux may have different names for services and tools/methods to enable/disable automatic startup of services.prague# chkconfig --list iscsid iscsid 0:off 1:off 2:off 3:on 4:on 5:on 6:off
-
Next, we validate multipathd service is working correctly. The
mpathconf
utility will return information about the state of the multipath configuration.prague# mpathconf multipath is enabled find_multipaths is disabled user_friendly_names is enabled dm_multipath module is loaded multipathd is chkconfiged on
-
Now we will configure our iSCSI initiator settings. Depending upon your client, configuration files may/may not be in the same location(s). We will modify the
/etc/iscsi/initiatorname.iscsi
with a custom initiator name. By default, the file will already have anInitiatorName
entry. We replace it with a custom entry, but this is strictly optional, and should be part of your naming convention decisions.InitiatorName=iqn.2011-10.dom.homer:1:server-lab6-583bf04eae36
-
We are going to create a virtual iSCSI interface for each physical network interface, and bind the physical interfaces to the virtual iSCSI interfaces. The end-result will be two new iscsi interfaces:
sw-iscsi-0
andsw-iscsi-1
, bound to physical network interfaceseth1
andeth2
. Avoid naming logical interfaces with the same names as the physical NICs.-
First, we create the logical interfaces for which a corresponding configuration file named
sw-iscsi-0
andsw-iscsi-0
will be generated under/var/lib/iscsi/ifaces
.prague# iscsiadm --mode iface --op=new --interface sw-iscsi-0 prague# iscsiadm --mode iface --op=new --interface sw-iscsi-1
-
Following the two
iscsiadm
commands we are going to modify two newly created config files with additional parameters. Here's an example of one of the configuration files modified with details about the physical interface. Note, we are defining parameters here specific to each interface, and your configuration will certainly vary. To quickly collect information about each interface you could simply use theip
command:ip addr show <interface-name>|egrep 'inet|link'
. This configuration explicitly binds our virtual interfaces to physical interfaces. Each interface is on its own network, in our case10.10.8
and10.10.9
. We can always validate our configuration with this:for i in 0 1; do iscsiadm -m iface -I sw-iscsi-$i; done
, replacing 0 and 1 with whatever number of the iSCSI interface(s) we used.prague# cat /var/lib/iscsi/ifaces/sw-iscsi-0 # BEGIN RECORD 2.0-872 iface.iscsi_ifacename = sw-iscsi-0 iface.net_ifacename = eth1 iface.hwaddress = 00:0c:29:94:5b:83 iface.ipaddress = 10.10.8.60 iface.transport_name = tcp # END RECORD prague# cat /var/lib/iscsi/ifaces/sw-iscsi-1 # BEGIN RECORD 2.0-872 iface.iscsi_ifacename = sw-iscsi-1 iface.net_ifacename = eth2 iface.hwaddress = 00:0c:29:94:5b:8d iface.ipaddress = 10.10.9.60 iface.transport_name = tcp # END RECORD
-
-
Depending upon your client, tcp/ip kernel parameter
rp_filter
may need to be tuned, in order to allow for correct multipathing with iSCSI. For the purposes of this guide we set0
. For each physical interface we add an entry to/etc/sysctl.conf
. In our example, we are modifying this tunable foreth1
andeth2
. If the kernel tuning is being applied, either set the parameters via thesysctl
command, or reboot system prior to next steps.prague# grep eth[0-9].rp_filter /etc/sysctl.conf net.ipv4.conf.eth1.rp_filter = 0 net.ipv4.conf.eth2.rp_filter = 0
-
Next, we will perform a discovery of the targets via the portals that we exposed previously. We can do a discovery against one of both portals, and the result should be identical.
-
We discover targets for each configured logical iSCSI interface. If you choose to have more than two logical interfaces, and therefore more than two paths to the SAN, perform the following step for each logical interface.
prague# iscsiadm -m discovery -t sendtargets --portal=10.10.8.90 -I sw-iscsi-0 --discover prague# iscsiadm -m discovery -t sendtargets --portal=10.10.9.90 -I sw-iscsi-1 --discover
-
We should validate nodes created as a result of the discovery. We expect to see two nodes for each portal on the SAN.
prage# iscsiadm -m node 10.10.8.90:3260,2 iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58 10.10.8.90:3260,2 iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58 10.10.9.90:3260,2 iqn.2011-09.dom.homer:02:0e4f921a-407e-eb9f-9f53-aded8d99a723 10.10.9.90:3260,2 iqn.2011-09.dom.homer:02:0e4f921a-407e-eb9f-9f53-aded8d99a723
-
-
At this point we have each logical iSCSI interface configured to login into all known portals on the SAN, and logging into all known targets. Thus, we have a choice to make. We can allow for this configuration to remain, or we can choose to isolate each logical interface to a single portal on the SAN.
Each node will have a directory under/var/lib/iscsi/nodes
with name identical to target name on the SAN, and a sub-directory for each portal. Here, we can see that leaf objects of this tree structure are files named identical to our logical iSCSI interfaces. In fact, these are config files generated upon successful target discovery for each interface.prague# ls -l /var/lib/iscsi/nodes/iqn.2011-09.dom.homer:02:0e4f921a-407e-eb9f-9f53-aded8d99a723/10.10.9.90,3260,2 total 8 -rw-------. 1 root root 1784 Oct 16 12:42 sw-iscsi-0 -rw-------. 1 root root 1784 Oct 16 12:42 sw-iscsi-1 prague# ls -l /var/lib/iscsi/nodes/iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58/10.10.8.90,3260,2 total 8 -rw-------. 1 root root 1784 Oct 16 12:42 sw-iscsi-0 -rw-------. 1 root root 1784 Oct 16 12:42 sw-iscsi-1
Deletion of one of the interface configuration files under each node will in effect restrict that interface from logging into the target. At any time, you can choose which interfaces will login into which portals. For the purposes of this document we are going to assume that it is acceptable for each logical iSCSI interface to login into both portals.
-
Next, we validate ability to login into both portals. We expect to see a successful login for each logical iSCSI interface into each configured portal.
prague# iscsiadm -m node -l Logging in to [iface: sw-iscsi-1, target: iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58, portal: 10.10.8.90,3260] ...trimmed... Login to [iface: sw-iscsi-1, target: iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58, portal: 10.10.8.90,3260] successful. ...trimmed...
-
We validate that we are able see new iSCSI LUNs after logging into the portals by listing contents of
/dev/disk/by-*
directories. We expect to see four device files for each LUN, assuming we did not remove any interface configuration files as described in step 7 above.prague# ls /dev/disk/by-path/|egrep "10.10.*.90"|egrep lun-1[0-3] ip-10.10.8.90:3260-iscsi-iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58-lun-10 ip-10.10.8.90:3260-iscsi-iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58-lun-11 ip-10.10.8.90:3260-iscsi-iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58-lun-12 ip-10.10.8.90:3260-iscsi-iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58-lun-13 ...trimmed...
-
Because we have one additional layer between OS and iSCSI (DM-MP layer), we want to tune iSCSI parameters adjusting the time it takes to hand-off failed commands to the DM-MP layer. Typically, it takes 120 seconds for the iSCSI service to give-up on a command when there are issues with completing the command. We want this period to be a lot shorter, and allow DM-MP to try and switch to another path instead of potentially trying down the same path for 2 minutes. We need to modify
/etc/iscsi/iscsid.conf
, and comment out the default entry with value 120 (seconds), and add new entry with value 10.prague# grep node.session.timeo.replacement_timeout /etc/iscsi/iscsid.conf #node.session.timeo.replacement_timeout = 120 node.session.timeo.replacement_timeout = 10
-
There are a number of parameters that we have to configure in order for the device mapper to correctly multipath with these LUNs. First we are going to configure our
/etc/multipath.conf
configuration file, with basic settings necessary to properly manage multipath behavior and path failure. This is not an end-all-be-all configuration, rather a very good starting point for most NexentaStor deployments. Please be aware that this configuration of multipath may fail on systems with older version of multipath. If you are running Debian and older RedHat-based distributions, parameters that will most likely need to be adjusted are:checker_timer
,getuid_callout
,path_selector
. Be certain to review your distribution's multipath documentation, or a commented samplemultipath.conf
file.defaults { checker_timer 120 getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n" no_path_retry 12 path_checker directio path_grouping_policy group_by_serial prio const polling_interval 10 queue_without_daemon yes rr_min_io 1000 rr_weight uniform selector "round-robin 0" udev_dir /dev user_friendly_names yes } blacklist { devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" devnode "^hd[a-z]" devnode "^sda" devnode "^sda[0-9]" device { vendor DELL product "PERC|Universal|Virtual" } } devices { device { ## This section Applicable to ALL NEXENTA/COMSTAR provisioned LUNs ## And will set sane defaults, necessary to multipath correctly ## Specific parameters and deviations from the defaults should be ## configured in the multipaths section on a per LUN basis ## vendor NEXENTA product COMSTAR path_checker directio path_grouping_policy group_by_node_name failback immediate getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n" rr_min_io 2000 ## ## ## Adjust `node.session.timeo.replacement_timeout` in /etc/iscsi/iscsid.conf ## in order to rapidly fail commands down to the multipath layer ## and allow DM-MP to manage path selection after failure ## set node.session.timeo.replacement_timeout = 10 ## ## The features parameter works with replacement_timeout adjustment features "1 queue_if_no_path" } }
-
Next, we describe each LUN that we are going to access via multiple paths. Any parameters that we already set in
device
anddefaults
sections, we can skip, assuming we are accepting the global settings in the thedevice
anddefaults
sections. Notice, we explicitly define WWID for each LUN. We also supply an alias, which will make it much easier to specifically identify LUNs; this is strictly optional. Parameters vendor and product will always beNEXENTA
andCOMSTAR
respectively, unless explicitly changed on the SAN, which is out of scope of this configuration.multipaths { ## Define specifics about each LUN in this section, including any ## parameters that will be different from defaults and device ## multipath { alias hydra_apps-a01 wwid 3600144f0f484400000004e8685e70001 vendor NEXENTA product COMSTAR path_selector "service-time 0" failback immediate } multipath { alias hydra_db-a01 wwid 3600144f0f484400000004e8685f80002 vendor NEXENTA product COMSTAR path_selector "service-time 0" failback immediate } multipath { alias hydra_db-a02 wwid 3600144f0f484400000004e86860a0003 vendor NEXENTA product COMSTAR path_selector "service-time 0" failback immediate } multipath { alias hydra_db-a03 wwid 3600144f0f484400000004e8686210004 vendor NEXENTA product COMSTAR path_selector "service-time 0" failback immediate } }
-
After we save the file as
/etc/multipath.conf
, we are going to flush and reload device-mapper maps. At this point we are assuming that themultipathd
service is running on the system. Runningmultipath -v2
gives verbose enough details to make sure that maps are being created correctly.prague# multipath -F prague# multipath -v2
A typical multipath configuration for any single LUN will look very similar to the following:
hydra_db-a02 (3600144f0f484400000004e86860a0003) dm-4 NEXENTA,COMSTAR size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='service-time 0' prio=-1 status=active |- 10:0:0:12 sdf 8:80 active undef running `- 11:0:0:12 sdg 8:96 active undef running
-
Because we assigned an
alias
to each LUN, by default we will observe symbolic links to devices representing the LUNs, created under/dev/mapper
. At this point we are ready to format these LUNs and apply a filesystem over them. Aliases are very convenient and will allow you to maintain access to the multipath device via a more accessible name.prague# ls -l /dev/mapper total 0 crw-rw----. 1 root root 10, 58 Oct 22 15:25 control lrwxrwxrwx. 1 root root 7 Oct 23 07:42 hydra_apps-a01 -> ../dm-6 lrwxrwxrwx. 1 root root 7 Oct 23 07:42 hydra_db-a01 -> ../dm-2 lrwxrwxrwx. 1 root root 7 Oct 23 07:42 hydra_db-a02 -> ../dm-4 lrwxrwxrwx. 1 root root 7 Oct 23 07:42 hydra_db-a03 -> ../dm-3 ...trimmed...