Blogroll

Container corruption easy repair using fsck

Broken Virtuozzo container

Virtuozzo Storage Platform

Sometimes some broken Virtuozzo container cannot start due to container corruption like example below;

root@pcs4 [~]# vzctl start 9068405
Mount image: /vz/private/9068405/root.hdd

/dev/ploop16823p1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)
Failed to mount image /vz/private/9068405/root.hdd: Error in e2fsck (fsutils.c:282): e2fsck failed (exit code 4)

Failed to mount image: Error in e2fsck (fsutils.c:282): e2fsck failed (exit code 4)
[152]
Unmount image: /vz/private/9068405/root.hdd

**Update** The newest Virtuozzo 7 using UUID style ID for container ID. Just start the container with verbose mode to know the ploop ID.


[root@pcs-backup ~]# vzctl --verbose start 19ddb349-4c36-4607-9be2-4eca20141fd9
Lock /var/vz/19ddb349-4c36-4607-9be2-4eca20141fd9.conf.lck fd=5
Unlock conf fd=5
Lock /vz/private/19ddb349-4c36-4607-9be2-4eca20141fd9/.lck fd=5
Lock /var/vz/19ddb349-4c36-4607-9be2-4eca20141fd9.conf.lck fd=7
Unlock conf fd=7
Starting Container ...
running: /usr/sbin/cpufeatures --quiet sync
Unmount image: /vz/private/19ddb349-4c36-4607-9be2-4eca20141fd9/root.hdd
[ 0.006408] Unmounting device /dev/ploop23041
[ 0.007739] Opening delta /vz/private/19ddb349-4c36-4607-9be2-4eca20141fd9/root.hdd/root.hds
[ 0.008937] Store CBT uuid=1191d70d-f82a-4cee-8a60-f53020e90441 L1Size=1 bytes=204800 blocksize=1048576 offset=33257684992
Container is unmounted
vcmmd: unregister
Mount image: /vz/private/19ddb349-4c36-4607-9be2-4eca20141fd9/root.hdd
[ 0.019107] Opening delta /vz/private/19ddb349-4c36-4607-9be2-4eca20141fd9/root.hdd/root.hds
[ 0.019219] Opening delta /vz/private/19ddb349-4c36-4607-9be2-4eca20141fd9/root.hdd/root.hds
[ 0.801863] Opening delta /vz/private/19ddb349-4c36-4607-9be2-4eca20141fd9/root.hdd/root.hds
[ 0.806028] Adding delta dev=/dev/ploop23041 img=/vz/private/19ddb349-4c36-4607-9be2-4eca20141fd9/root.hdd/root.hds (rw)
[ 0.816486] Opening delta /vz/private/19ddb349-4c36-4607-9be2-4eca20141fd9/root.hdd/root.hds
[ 0.817204] Start CBT uuid=1191d70d-f82a-4cee-8a60-f53020e90441
[ 0.822267] Running: fsck.ext4 -p /dev/ploop23041p1

/dev/ploop23041p1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)
[ 0.901372] Error in e2fsck (fsutils.c:471): e2fsck failed (exit code 4)

Failed to mount image /vz/private/19ddb349-4c36-4607-9be2-4eca20141fd9/root.hdd: Error in e2fsck (fsutils.c:471): e2fsck failed (exit code 4)
[41]
Unlock fd=5

**update**

Data corruption can occur from a variety of reasons. Usually this can be fixed by doing file system check to the disk image.

First we need to know the ploop ID. This can be done by mounting the ploop image.

root@pcs4 [~]# ploop mount /vz/private/9068405/root.hdd/DiskDescriptor.xml
Opening delta /pstorage/pcs1-cluster/private/9068405/root.hdd/root.hds
Opening delta /pstorage/pcs1-cluster/private/9068405/root.hdd/root.hds
Adding delta dev=/dev/ploop16823 img=/pstorage/pcs1-cluster/private/9068405/root.hdd/root.hds (rw)

Once we know the ploop ID, we need to know ploop partition device.

root@pcs4 [~]# fdisk -l /dev/ploop16823

WARNING: GPT (GUID Partition Table) detected on '/dev/ploop16823'! The util fdisk doesn't support GPT. Use GNU Parted.

Disk /dev/ploop16823: 107.4 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Device Boot Start End Blocks Id System
/dev/ploop16823p1 1 13055 104857599+ ee GPT

Actually there are an easy fix using fsck to repair the container.

root@pcs4 [~]# fsck /dev/ploop16823p1 -Cy
fsck from util-linux-ng 2.17.2
e2fsck 1.41.12 (17-May-2010)
/dev/ploop16823p1: clean, 807261/6553600 files, 16353823/26213888 blocks

Umount back the ploop.

root@pcs4 [~]# ploop umount -d /dev/ploop16823p1
Unmounting device /dev/ploop16823
Opening delta /pstorage/pcs1-cluster/private/9068405/root.hdd/root.hds

Start back the container.

root@pcs4 [~]# vzctl --verbose start 9068405
Mount image: /vz/private/9068405/root.hdd
Container is mounted
Starting the Container ...
running: /usr/sbin/cpufeatures --quiet sync
running: /usr/sbin/vzpkg info -q centos-6-x86_64 osrelease
Warning: VSwap_slm compatiblity mode all
RAM: 2097152 Swap: 0 ovr: 1.00
UB_SWAPPAGES, {0, 0}
UB_LOCKEDPAGES, {2097152, 2097152}
UB_PRIVVMPAGES, {9223372036854775807, 9223372036854775807}
UB_NUMPROC, {500, 500}
UB_PHYSPAGES, {2097152, 2097152}
UB_VMGUARPAGES, {2097152, 2097152}
UB_OOMGUARPAGES, {2097152, 2097152}
CPU limit: 100%
Running the command: /etc/sysconfig/vz-scripts/vz-start
Setting permissions 20002 dev 0x7d00
Adding offline management to Container(1): 4643
Adding IP addresses: 192.168.0.93/255.255.255.0 192.168.0.94/255.255.255.0
Running the command: /etc/sysconfig/vz-scripts/vz-net_add
Run the script /etc/sysconfig/vz-scripts/dists/scripts//redhat-add_ip.sh
Hostname of the Container set: main.test.com.my
Run the script /etc/sysconfig/vz-scripts/dists/scripts//redhat-set_hostname.sh
Run the script /etc/sysconfig/vz-scripts/dists/scripts//set_dns.sh
File resolv.conf was modified
Run the script /etc/sysconfig/vz-scripts/dists/scripts//set_console.sh
Starting the Container ...

Sunday ~ November 11, 2019 by admin Posted in Virtualization | No Comments

Fix broken virtuozzo container

On this guide, we will learn how to fix broken virtuozzo container. The server is actually a cloud storage server. The purpose of the cloud storage server function like almost high availability. When the hardware node down, the vps container inside the node will automatically migrate to available active hardware node. In this setup we have 5 hardware node or we also called it Parallels Cloud Storage(PCS).

Sometimes during this event of hardware failure or server down, some of the container will be broken due to sometimes there is no much time to recover from the downtime.

The broken container will be tag as ‘B’ if we run command ‘shaman stat’ example;

8826061 on 101.30.110.101 0
8833527 on 101.30.110.103 0
B 8916305 on Unknown 0
8959776 on 101.30.110.103 0

For this case the container ID ‘8916305’ is broken.

We need to go to ‘/pstorage/pcs1-cluster/.shaman/broken’ to see the actual location of the PCS server.

root@pcs3 [/pstorage/pcs1-cluster/.shaman/broken]# cat ct-8916305
RELOCATE_PRIO=0
PATH=private/8916305
NODE=101.30.110.104
TIME=1572058900
INFO=Resource relocation failed
root@pcs3 [/pstorage/pcs1-cluster/.shaman/broken]#

In this case we now know the actual container located at PCS4. So we can just delete the file name ‘ct-8916305’

root@pcs3 [/pstorage/pcs1-cluster/.shaman/broken]# rm ct-8916305

rm: remove regular file `ct-8916305'? y

After that we need to check the configuration file of the container by login to the PCS where the container located. Make sure all setting are correct.

root@pcs4 [/pstorage/pcs1-cluster/.shaman/broken]# cat /vz/private/8916305/ve.conf
DESCRIPTION="Test=20User=20Test=20User"
HOSTNAME="svr1.test.com.my"
IP_ADDRESS="192.168.26.8/255.255.255.0 192.168.26.9/255.255.255.0"
NAMESERVER="8.8.8.8 8.8.4.4"
NETFILTER="full"
NETIF=""
OFFLINE_MANAGEMENT="yes"
OFFLINE_SERVICE="vzpp"
ONBOOT="yes"
ORIGIN_SAMPLE="4vCPU-Supreme-Plan"
OSTEMPLATE=".centos-7-x86_64"
SEARCHDOMAIN="google.com"
SLMMODE="all"
TEMPLATES=""
UUID="0820b9ee-af22-4f0a-bb67-18578b1ce678"
CPULIMIT_MHZ="8192"
DISKSPACE="104857600:104857600"
SLMMEMORYLIMIT="8589934592:8589934592"
VE_ROOT="/vz/root/$VEID"
VE_PRIVATE="/vz/private/$VEID"
DISTRIBUTION="centos"
TECHNOLOGIES="x86_64 nptl"
VEID="8916305"
CONFIG_CUSTOMIZED="yes"
RATEBOUND="yes"
NUMPROC="500:500"
NAME="Test User Test User"

Register the container to the hardware node;

root@pcs4 [/pstorage/pcs1-cluster/.shaman/broken]# vzctl register /vz/private/8916305 8916305
Container was successfully registered
Adding IP addresses to the pool: 192.168.26.8 192.168.26.9
Deleting offline management
Adding offline management to Container(1): 4643
root@pcs4 [/pstorage/pcs1-cluster/.shaman/broken]#

Resume the container;

root@pcs4 [/pstorage/pcs1-cluster/.shaman/broken]# vzctl start 8916305
Mount image: /vz/private/8916305/root.hdd
Container is mounted
Starting the Container ...
Os release: 3.10.0-042stab139.1
CPU limit: 100%
Setting permissions 20002 dev 0x7d00
Adding offline management to Container(1): 4643
Adding IP addresses: 192.168.26.8/255.255.255.0 192.168.26.9/255.255.255.0
Hostname of the Container set: svr1.test.com.my
File resolv.conf was modified
Starting the Container ...

Tuesday ~ October 10, 2019 by admin Posted in Virtualization | 1 Comment