Wednesday, October 5, 2016

Walk-thru- Deploying Lustre 2.8 + ZFS on Centos7.2



Summary

From Lustre wiki: When using the Lustre ldiskfs OSD only, it will be necessary to patch the kernel before building Lustre.  The Lustre ZFS OSD and the Lustre client code do not require patches to the kernel.

server side:

1. download Lustre 2.8 pre-build rpms

3. config zfs with right version, Lustre 2.8 pre-built rpm works with zfs version 0.6.4.2

5. config lustre modprobe

6. prepare lustre disks and /etc/ldev.conf

7. start lustre service


client side:



3. install lustre client rpms




Here I have test1 and test2 two VMs, test1 is client, test2 is server

/etc/hosts

10.0.15.11 test1

10.0.15.12 test2

lustre rpms
https://downloads.hpdd.intel.com/public/lustre/latest-maintenance-release/el7/

download Lustre 2.8 rpms to a local folder

/././rpms
-/server
-/client


Server side:
----------------------------------------------------------------
remove old kernel
#yum remove kernel*

cd server folder
#yum install kernel*.rpm

reboot into new kernel

ZFS:
#yum install --nogpgcheck http://archive.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm
#yum install zfs-0.6.4.2
#lsmod | grep zfs
#modprobe zfs
#echo zfs > /etc/modules-load.d/zfs.conf

[root@test2 x86_64]# lsmod | grep zfs
zfs 2713912 0
zunicode 331170 1 zfs
zavl 15236 1 zfs
zcommon 55411 1 zfs
znvpair 93227 2 zfs,zcommon
spl 92223 3 zfs,zcommon,znvpair

yum install lustre rpms

enp0s8 is my NIC connecting servers and clients
#echo options lnet networks=tcp0(enp0s8) >> /etc/modprobe.d/lustre.conf

#modprobe lustre

[root@test2 x86_64]# lsmod | grep lustre

lustre 841108 0

lmv 232654 1 lustre

mdc 177582 1 lustre

lov 305123 1 lustre

ptlrpc 2065168 6 fid,fld,lmv,mdc,lov,lustre

obdclass 2064697 13 fid,fld,lmv,mdc,lov,lustre,ptlrpc

lnet 448955 4 lustre,obdclass,ptlrpc,ksocklnd

libcfs 399776 10 fid,fld,lmv,mdc,lov,lnet,lustre,obdclass,ptlrpc,ksocklnd


if NIC not correct lustre service will fail, configure NIC before modprobe lustre!


[root@test2 vagrant]# lctl list_nids
10.0.15.12@tcp

/etc/ldev.conf : lustre device configuration file

format:
local foreign/ - label [md:|zfs] device-path [journal-path]/ - [raidtab]

cat >> /etc/ldev.conf <<EOF

test2 - mgs zfs:lustre-mgt0/mgt0
test2 - mdt zfs:lustre-mdt0/mdt0
test2 - ost0 zfs:lustre-ost0/ost0
test2 - ost1 zfs:lustre-ost1/ost1

EOF

make some lustre disks on the VM since I don't have real disks here

#dd if=/dev/zero of=/var/tmp/lustre-mgt-disk0 bs=1M count=1 seek=256
#dd if=/dev/zero of=/var/tmp/lustre-mdt-disk0 bs=1M count=1 seek=256
#dd if=/dev/zero of=/var/tmp/lustre-ost-disk0 bs=1M count=1 seek=4095
#dd if=/dev/zero of=/var/tmp/lustre-ost-disk1 bs=1M count=1 seek=4095

#MyIP=10.0.15.12
#mkfs.lustre --mgs --backfstype=zfs lustre-mgt0/mgt0 /var/tmp/lustre-mgt-disk0
#mkfs.lustre --mdt --backfstype=zfs --index=0 --mgsnode=${MyIP}@tcp --fsname lustrefs lustre-mdt0/mdt0 /var/tmp/lustre-mdt-disk0
#mkfs.lustre --ost --backfstype=zfs --index=0 --mgsnode=${MyIP}@tcp --fsname lustrefs lustre-ost0/ost0 /var/tmp/lustre-ost-disk0
#mkfs.lustre --ost --backfstype=zfs --index=1 --mgsnode=${MyIP}@tcp --fsname lustrefs lustre-ost1/ost1 /var/tmp/lustre-ost-disk1

[root@test2 vagrant]# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
lustre-mdt0 240M 142K 240M - 1% 0% 1.00x ONLINE -
lustre-mgt0 240M 134K 240M - 1% 0% 1.00x ONLINE -
lustre-ost0 3.97G 137K 3.97G - 0% 0% 1.00x ONLINE -
lustre-ost1 3.97G 132K 3.97G - 0% 0% 1.00x ONLINE -

# systemctl start lustre
[root@test2 vagrant]# systemctl status lustre
● lustre.service - SYSV: Part of the lustre file system.
Loaded: loaded (/etc/rc.d/init.d/lustre)
Active: active (exited) since Wed 2016-10-05 16:27:33 UTC; 15s ago
Docs: man:systemd-sysv-generator(8)
Process: 3796 ExecStop=/etc/rc.d/init.d/lustre stop (code=exited, status=0/SUCCESS)
Process: 4170 ExecStart=/etc/rc.d/init.d/lustre start (code=exited, status=0/SUCCESS)
Oct 05 16:27:25 test2 systemd[1]: Starting SYSV: Part of the lustre file system....
Oct 05 16:27:25 test2 lustre[4170]: Mounting lustre-mgt0/mgt0 on /mnt/lustre/local/mgs
Oct 05 16:27:27 test2 lustre[4170]: Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/mdt
Oct 05 16:27:29 test2 lustre[4170]: Mounting lustre-ost0/ost0 on /mnt/lustre/local/ost0
Oct 05 16:27:31 test2 lustre[4170]: Mounting lustre-ost1/ost1 on /mnt/lustre/local/ost1
Oct 05 16:27:33 test2 systemd[1]: Started SYSV: Part of the lustre file system..


mount:
lustre-mgt0/mgt0 on /mnt/lustre/local/mgs type lustre (ro)
lustre-mdt0/mdt0 on /mnt/lustre/local/mdt type lustre (ro)
lustre-ost0/ost0 on /mnt/lustre/local/ost0 type lustre (ro)
lustre-ost1/ost1 on /mnt/lustre/local/ost1 type lustre (ro)

#mkdir /lustre

[root@test2 vagrant]# mount -t lustre $MyIP@tcp:/lustrefs /lustre
[root@test2 vagrant]# mount
lustre-mgt0/mgt0 on /mnt/lustre/local/mgs type lustre (ro)
lustre-mdt0/mdt0 on /mnt/lustre/local/mdt type lustre (ro)
lustre-ost0/ost0 on /mnt/lustre/local/ost0 type lustre (ro)
lustre-ost1/ost1 on /mnt/lustre/local/ost1 type lustre (ro)
10.0.15.12@tcp:/lustrefs on /lustre type lustre (rw)


Client side:
--------------------------------------------------

#yum remove kernel*
#yum install kernel-3.10.0-327.3.1.el7
#yum install *.rpm (in client rpm folder)
reboot into new kernel

[root@test1 vagrant]# mount -t lustre test2@tcp:/lustrefs /lustre
[root@test1 vagrant]# mount
10.0.15.12@tcp:/lustrefs on /lustre type lustre (rw)






Wednesday, September 23, 2015

Deploying a Ceph storage cluster using Warewulf on Centos-7 in easy steps

Setup overview:
- Virtualbox 5 with extension pack
- Centos-7
- warewulf 3.6.99 built from svn source code
- one manage node and three storage node(or client nodes)
- stateful provision

Manage node setup:
- 1 OS disk
- two network interfaces, first as NAT, second as internal network

Storage node setup:
- each has 1 OS disk, 1 journal disk, 3 OSD disk, all disk is 8Gb in size
- OS disk use ext4, others use xfs format ( just letting you know, no config needed)
- one network interface as internal network, name eth1 (do not use eth0!)
- enable network boot and set as first choice

If you want to add more NICs to your client nodes, such as NAT to connect to outside, my experience is to make the internal network as first(named eth1), NAT second, otherwise the client nodes can't find manage node from DHCP or tftp. And you will need to put /etc/sysconfig/network-scripts/ifcfg-eth1 or enp0s8, or things like that in your WW vnfs image.





Here is how to do it:
1. install centos-7 to manage node, everything default
2. edit /etc/sysconfig/network-scripts/ifcfg-enp0s? and enable network for both network interface, set first as DHCP and second as static with IP=172.16.2.250
3. systemclt restart network
4. yum install git
5. git clone https://github.com/ruilinchu/warewulf_ceph

you will have the warewulf_ceph folder, inside which there are bash scripts indexed in sequence. All you need to do is run them one by one.

Ok, let's get started:

6. run script 1, this will prepare for and install ceph to manage node and prepare to build warewulf, after done system will reboot
7. run script 2, this takes a while to build and install warewulf
8. run script 3, this will set up warewulf, build the vnfs image, at the end it will prompt you to boot up the client nodes
9. power on the client nodes, warewulf will record them in object store, after all are recorded ctrl-c to stop script 3
10. check every node is booted ok, this is a stateless provision. You should be able to ssh to client nodes but not directly log in
11. run script 4, 5 and 6 one by one, this will install pdsh to manage nodes, enable ssh password-free login, install ceph to client node image
12. run script 7, this will install kernel and grub to client node image, at the end will reboot all client nodes and statefully provision them with centos-7, this will take a while, be patient and wait



13. check every client node is booted and installed ok
14. run script 8, this will set client nodes to boot from local disk and reboot them



now the OS are done on all nodes, we are ready to deploy Ceph:

15. run script 9, this will set up a Ceph cluster, install 3 ceph monitor(1 on each) and 9 OSD( 3 on each) to client nodes, takes a while.





Ceph cluster is ready! If you see clock skew just like in the picture, check if ntp and ntpdate service is enabled and running, restart ceph mon and you will be fine.

Having some fun with block device, I'm using the manage node also as a ceph file system client:




Friday, July 31, 2015

Having fun with Warewulf linux cluster provisioning in Virtualbox

OS: Linux mint 17.1 Xfce x64
Virtualbox 5.0 with extension pack
VM: centos-6.6 x64
I wrote a set of scripts for this stateless provision process:
https://github.com/ruilinchu/warewulf_stateless
Run them in order will give you a well functioning cluster:
  • centos-6.6 x86_64 minimum
  • warewulf 3.6, stateless provision
  • passwd free ssh for all users across all nodes
  • /home, /opt, /usr/local are NFS mounted from master node
  • Lmod module environment
  • Slurm job scheduler
  • OpenMPI
  • local disk /dev/sda on compute nodes are mounted as /scratch
  • warewulf-monitor