{"id":57,"date":"2015-11-13T23:15:52","date_gmt":"2015-11-13T12:15:52","guid":{"rendered":"https:\/\/icicimov.com\/blog\/?p=57"},"modified":"2016-11-09T23:20:23","modified_gmt":"2016-11-09T12:20:23","slug":"glusterfs-internals","status":"publish","type":"post","link":"https:\/\/icicimov.com\/blog\/?p=57","title":{"rendered":"GlusterFS internals"},"content":{"rendered":"<p>GlusterFS stores metadata info in extended attributes which is supported and enabled by default in the XFS file system we use for the bricks. This is different approach then some other distributed storage cluster systems like Ceph for example that have separate metadata service running instead.<\/p>\n<p>Each extended attribute has a value which is 24 hexa decimal digits. First 8 digits represent changelog of data. Second 8 digits represent changelog of metadata. Last 8 digits represent Changelog of directory entries.<\/p>\n<pre>\r\n0x 000003d7 00000001 00000000\r\n        |      |       |\r\n        |      |        \\_ changelog of directory entries\r\n        |       \\_ changelog of metadata\r\n         \\ _ changelog of data\r\n<\/pre>\n<p>The metadata and entry changelogs are valid for directories. For regular files data and metadata changelogs are valid. For special files like device files etc. the metadata changelog is valid. When a file split-brain happens it could be either data split-brain or meta-data split-brain or both.<\/p>\n<p>Version 3.3 introduced a new structure to the bricks, the <cite>.glusterfs<\/cite> directory. The <cite>GFID<\/cite> is used to build the structure of the <cite>.glusterfs<\/cite> directory in the brick. Each file is hardlinked to a path that takes the first two digits and makes a directory, then the next two digits makes the next one, and finally the complete <cite>uuid<\/cite>. For example:<\/p>\n<pre>\r\n[root@server ~]# getfattr -m . -d -e hex \/data\/activemq-data\/db-1755.log\r\ngetfattr: Removing leading '\/' from absolute path names\r\n# file: data\/activemq-data\/db-1755.log\r\ntrusted.afr.gfs-volume-prod-client-0=0x000000610000000000000000\r\ntrusted.afr.gfs-volume-prod-client-1=0x000000000000000000000000\r\ntrusted.gfid=0x8ee3e44467464c4f96429eca42ffc629\r\n<\/pre>\n<p>in our case should make a hardlink to:<\/p>\n<pre>\r\n\/data\/.glusterfs\/8e\/e3\/8ee3e444-6746-4c4f-9642-9eca42ffc629\r\n<\/pre>\n<p>as we can confirm on the file system:<\/p>\n<pre>\r\n[root@server ~]# ls -l \/data\/.glusterfs\/8e\/e3\/8ee3e444-6746-4c4f-9642-9eca42ffc629\r\n-rw-r--r-- 2 root root 41675807 Nov 12 11:15 \/data\/.glusterfs\/8e\/e3\/8ee3e444-6746-4c4f-9642-9eca42ffc629\r\n<\/pre>\n<p>Each directory creates symlink that points to the gfid of themselves within the gfid of their parent.<\/p>\n<pre>\r\n[root@server ~]# getfattr -m . -d -e hex \/data\/documents\/2015-11-12\/\r\ngetfattr: Removing leading '\/' from absolute path names\r\n# file: data\/documents\/2015-11-12\r\ntrusted.afr.gfs-volume-prod-client-0=0x000000000000000000000093\r\ntrusted.afr.gfs-volume-prod-client-1=0x000000000000000000000001\r\ntrusted.gfid=0xdbd7932e6fda49e1abfb19799a5f50e6\r\n<\/pre>\n<p>which creates the hardlink:<\/p>\n<pre>\r\n[root@server ~]# ls -l \/data\/.glusterfs\/db\/d7\/dbd7932e-6fda-49e1-abfb-19799a5f50e6\r\nlrwxrwxrwx 1 root root 59 Nov 12 09:36 \/data\/.glusterfs\/db\/d7\/dbd7932e-6fda-49e1-abfb-19799a5f50e6 -> ..\/..\/4b\/47\/4b470aab-5124-4aa8-9b78-4592afd2c4dd\/2015-11-12\r\n<\/pre>\n<p>The consequence of all this is if you delete a file from a brick without deleting it&#8217;s gfid hardlink, the filename will be restored as part of the self-heal process and that filename will be linked back with it&#8217;s gfid file. If that gfid file is broken, the filename file will be as well.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>GlusterFS stores metadata info in extended attributes which is supported and enabled by default in the XFS file system we use for the bricks. This is different approach then some other distributed storage cluster systems like Ceph for example that&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[],"class_list":["post-57","post","type-post","status-publish","format-standard","hentry","category-high-availability"],"_links":{"self":[{"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/57","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=57"}],"version-history":[{"count":5,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/57\/revisions"}],"predecessor-version":[{"id":62,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/57\/revisions\/62"}],"wp:attachment":[{"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=57"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=57"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=57"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}