Apache Ozone is a distributed object store built on top of Hadoop Distributed Data Store service. It can manage billions of small and large files that are difficult to handle by other distributed file systems. As an important part of achieving better scalability, Ozone separates the metadata management among different services:
Ozone Manager (OM) service manages the metadata of the namespace such as volume, bucket and keys.
Storage Container Manager(SCM) service manages the metadata of the cluster nodes and all the containers and pipelines.
Datanode service manages the metadata of blocks, containers and pipelines running on the datanode..
Ozone metadata is persisted into a pluggable metadata store (RockDB by default) and replicated consistently across all the instances using Apache Ratis when High Availability is configured. Apache Ozone heavily uses Apache Ratis for metadata and data replication. Apache Ratis is a RAFT based library for high performance replication. The RAFT log is one of the most important Ozone/Ratis metadata where transactions are appended/applied to the leader first and then to the followers.
In this blog, we will look into the Apache Ozone metadata and the related Apache Ratis metadata in detail and give best practices for different scenarios.
In Ozone, HDDS (Hadoop Distributed Data Storage) layer including SCM and Datanodes provides a generic replication of containers/blocks without namespace metadata. The ozone.metadata.dir serves as a default location for all Ozone services such SCM, OM and Datanodes to persist their metadata when dedicated service metadata directories are not defined (Details in Section 2).
This makes it easier to spin up a secure ozone cluster for dev-test environments with minimal number of configuration keys. For example, many of the docker-compose samples in Ozone release builds and some of the acceptance tests take this approach.
When security is enabled, ozone.metadata.dir also serves as the default location for security related metadata such as public/private keys and certificates. For details of Ozone Security, please refer to our early blog[1]. Note: A Secure Ozone cluster can’t be initialized if you don’t have ozone.metadata.dir defined.
In secure deployment, SCM acts as Certificate Authority and issues certificates to other services like Ozone Manager, Datanodes. Each individual service will have their own public/private keys and signed certificates persisted under this location. Below are examples of security metadata from a CDP cluster when Ozone security enabled:
Service | Location(ozone.metadata.dir ) | Purpose |
SCM | /var/lib/hadoop-ozone/scm/ozone-metadata/scm/(key|certs) | SCM public key, private key and certificates |
Ozone Manager | /var/lib/hadoop-ozone/om/ozone-metadata/om/(key/certs) | Ozone manager public key, private key and certificates |
Datanode | /var/lib/hadoop-ozone/datanode/ozone-metadata/datanode/(key/certs) | Datanode public key, private key and certificates |
Ozone also allows individual components to have their own metadata stored at a dedicated location outside ozone.metadata.dirs and preferably on SSD for best performance. The table below summarizes the configuration keys for dedicated DB locations of OM, SCM and Datanode, respectively. We will illustrate the contents of these metadata directories in more detail.
Service | Configuration Key | Purpose |
Ozone Manager | ozone.om.db.dirs | Ozone Manager RocksDB |
SCM | ozone.scm.db.dirs | Storage Container Manager RocksDB |
Datanode | dfs.container.ratis.datanode.storage.dir | Datanode Ratis Log Directory |
Recon | ozone.recon.db.dir | Where recon manages in-house RocksDB + SQL DB. |
ozone.recon.om.db.dirs | Where recon keeps OM snapshot DB. | |
ozone.recon.scm.db.dirs | Where Recon keeps SCM DB. |
Let’s first look at the dedicated SCM metadata db directory configured in a security enabled CDP cluster.
ozone.scm.db.dirs= /var/lib/hadoop-ozone/scm/data
Inside it, we can find subdirectory for db checkpoints, scm version file and scm db directories, respectively.
The scm.db subdirectory contains the Rocks DB that stores SCM metadata such as pipelines and certificates. The db.checkpoints directory is a snapshot of SCM metadata DB for management tools like Recon to pick up SCM metadata to update its Ozone cluster states.
/var/lib/hadoop-ozone/scm/data/db.checkpoints
/var/lib/hadoop-ozone/scm/data/scm(version file)
/var/lib/hadoop-ozone/scm/data/scm.db
Below is a complete tree view of the scm db directory.
# tree /var/lib/hadoop-ozone/scm . ├── data │ ├── db.checkpoints │ ├── scm │ │ └── current │ │ └── VERSION │ └── scm.db │ ├── 000037.sst │ ├── 000038.sst │ ├── 000046.log │ ├── CURRENT │ ├── IDENTITY │ ├── LOCK │ ├── LOG │ ├── LOG.old.1603303095810006 │ ├── LOG.old.1603303104931067 │ ├── LOG.old.1603303114345020 │ ├── LOG.old.1603305861771784 │ ├── LOG.old.1603305872066885 │ ├── LOG.old.1603305882309121 │ ├── LOG.old.1603305893958267 │ ├── LOG.old.1603306105528850 │ ├── LOG.old.1603307879113091 │ ├── LOG.old.1603313802457451 │ ├── LOG.old.1603314448059783 │ ├── MANIFEST-000045 │ ├── OPTIONS-000045 │ └── OPTIONS-000048 └── ozone-metadata └── scm └── ca ├── certs │ └── certificate.crt └── keys ├── private.pem └── public.pem
SCM VERSION file is a test file that includes nodeType, clusterID, scmID, creation time and layout version information of the cluster. It is created when the cluster is initialized with “scm –init” commands. The cluster ID is used by SCM to cross check datanodes and ozone managers when they attempt to get a certificate request approved by SCM.
#Wed Oct 21 10:58:02 PDT 2020 nodeType=SCM scmUuid=fa1376f0-23ad-4cda-93d6-0c9fd79c7ae3 clusterID=CID-2a3f36e8-506f-40eb-986e-bb79d188fd55 cTime=1603303082557 layoutVersion=0
The ozone-metadata/scm/ca directory contains key and certs sub-directory, which are used to persist the certificate and the public/private key pairs of SCM. The SCM private key is used to sign the issued certificates and tokens. In the context of SCM HA, the SCM can be either Root CA that issues certificates for SCM instances (a.k.a Sub SCM) that issues certificates to Ozone Manager and Datanodes. We will cover metadata directories specific to SCM HA in section 3.
Ozone manager metadata can be defined in the dedicated directory specified by ozone.om.db.dirs. OM metadata configuration below is from a default CDP deployment.
ozone.om.db.dirs = /var/lib/hadoop-ozone/om/data
ozone.metadata.dirs = /var/lib/hadoop-ozone/om/ozone-metadata
The tree view of the OM metadata configuration below from CDP are shown as follows. The metadata directory looks similar to SCM. You may notice there is a ratis subdirectory defined by
ozone.om.ratis.storage.dir=/var/lib/hadoop-ozone/om/ratis.
It is used by Ozone managers to save the Ratis(Raft) log to support Ozone Manager High Availability.
tree /var/lib/hadoop-ozone/om . ├── data │ ├── db.checkpoints │ ├── om │ │ └── current │ │ └── VERSION │ ├── om.db │ │ ├── 000026.sst │ │ ├── 000027.sst │ │ ├── 000030.sst │ │ ├── 000032.log │ │ ├── CURRENT │ │ ├── IDENTITY │ │ ├── LOCK │ │ ├── LOG │ │ ├── LOG.old.1603313822135015 │ │ ├── LOG.old.1603314464834813 │ │ ├── MANIFEST-000031 │ │ ├── OPTIONS-000031 │ │ └── OPTIONS-000034 │ └── omMetrics ├── data1 ├── ozone-metadata │ └── om │ ├── certs │ │ ├── 7980125250388.crt │ │ └── CA-1.crt │ └── keys │ ├── private.pem │ └── public.pem └── ratis ├── d39ebeec-41e8-35f1-a92b-9f198c4c3682 │ ├── current │ │ ├── log_0-0 │ │ ├── log_11-12 │ │ ├── log_1-2 │ │ ├── log_3-4 │ │ ├── log_5-10 │ │ ├── log_inprogress_13 │ │ ├── raft-meta │ │ └── raft-meta.conf │ ├── in_use.lock │ └── sm └── snapshot
A VERSION file of Ozone Manager looks like below with its certificate serial id persisted as part of the version file.
#Thu Apr 15 22:17:41 UTC 2021 omCertSerialId=712848865968982 cTime=1618525042083 clusterID=CID-e2f29b31-b104-4558-b6ab-d6b7d7a883e7 omUuid=be9314e4-a0d7-4f69-b88d-40c7e13f56cb nodeType=OM layoutVersion=0
Ozone datanode runs Ratis(RAFT) based container replication with 1 or more pipelines. The Ratis log directory is used by replication pipelines on datanodes are located under
dfs.container.ratis.datanode.storage.dir=/var/lib/hadoop-ozone/datanode/ratis
As we can see here, this datanode supports multi-RAFT with two Ratis pipelines configured.
As discussed in section 1, ozone.metadata.dirs contains all the security related metadata:
ozone.metadata.dirs = /var/lib/hadoop-ozone/om/ozone-metadata
# tree /var/lib/hadoop-ozone/datanode /var/lib/hadoop-ozone/datanode ├── datanode.id ├── ozone-metadata │ └── dn │ ├── certs │ │ ├── 29933096459761329.crt │ │ └── CA-1.crt │ └── keys │ ├── private.pem │ └── public.pem └── ratis └── data ├── 2b59bade-9e16-49b3-bca1-c8128a0d73f8 │ ├── current │ │ ├── log_0-0 │ │ ├── log_inprogress_1 │ │ ├── raft-meta │ │ └── raft-meta.conf │ ├── in_use.lock │ └── sm └── b82fc0d4-4f44-4c00-82ce-61c7cb5d4b9d ├── current │ ├── log_inprogress_0 │ ├── raft-meta │ └── raft-meta.conf ├── in_use.lock └── sm
ozone.scm.datanode.id.dir is for persisting the datanode id file.
ozone.scm.datanode.id.dir = /var/lib/hadoop-ozone/datanode
Unlike the VERSION file on SCM or Ozone Manager, datanode maintains a YAML based datanode.id file with uuid, the certificate serial id, hostname, ip address, etc. persisted.
!!org.apache.hadoop.ozone.container.common.helpers.DatanodeIdYaml$DatanodeDetailsYaml { certSerialId: '29933096459761329', hostName:, ipAddress: , portDetails: { }, uuid: 7be8b370-bc0b-41e1-bb86-97b7e4d40c1e }
The actual data blocks are persisted at location specified by hdds.datanode.dir. hdds.datanode.dir=/hadoop-ozone/datanode/data
As we can see below, there is one container (id=1) with its chuck files and metadata in rocks db saved in chunks and metadata directories, respectively.
#tree /hadoop-ozone/datanode/data /hadoop-ozone/datanode/data └── hdds ├── e3357e22-4aa4-4262-84d4-9c178658bb8a │ └── current │ └── containerDir0 │ └── 1 │ ├── chunks │ │ └── 105936417289469952.block │ └── metadata │ ├── 1.container │ ├── 1-dn-container.db │ │ ├── 000003.log │ │ ├── CURRENT │ │ ├── IDENTITY │ │ ├── LOCK │ │ ├── LOG │ │ ├── MANIFEST-000001 │ │ └── OPTIONS-000005 │ └── db.checkpoints └── VERSION
tree /var/lib/hadoop-ozone/recon/ /var/lib/hadoop-ozone/recon/ ├── data │ ├── db.checkpoints │ ├── ozone_recon_derby.db (Recon’s SQL DB instance. By default it is derby) │ │ ├── dbex.lck …. … |──recon-container-key.db_1621843831171 (Recon’s own RocksDB instance) │ ├── 000003.log │ ├── CURRENT │ ├── IDENTITY │ ├── LOCK │ ├── LOG │ ├── MANIFEST-000004 │ ├── OPTIONS-000010 │ └── OPTIONS-000012 ├── om (Recon’s OM DB root) │ └── data │ ├── db.checkpoints │ └── om.snapshot.db_1621843830221 │ ├── 000030.sst │ ├── 000031.sst │ ├── 000033.log │ ├── C ├── ozone-metadata (Ozone Metadata is currently empty when individual dirs are configured) └── scm (Recon’s SCM DB root) └── data ├── db.checkpoints └── recon-scm.db ├── 000003.log ├── CURRENT ├── IDENTITY ├── LOCK ├── LOG ├── MANIFEST-000004 ├── OPTIONS-000028 └── OPTIONS-000030
SCM HA was merged recently to avoid a single point of failure for Ozone. With SCM HA, we have multiple SCM instances forming a RAFT ring using Apache Ratis library. The issuing of certificates will be handled by SCM Ratis leader and the persistence of certificates into RocksDB will be replicated to SCM follower instances consistently.
Among the SCM CA instances, there will be one designated as Primary SCM which acts as Root CA during SCM init. All the other SCM instances will be running bootstrap to get the Primary SCM issued SCM instance certificate.
The SCM metadata samples below use the all-in-one
ozone.metadata.dir = /var/lib/hadoop-ozone/scm/data/ozone-metadata/
without metadata DB on different drives. As a result of that, the Primary SCM will have its metadata saved under /var/lib/hadoop-ozone/scm/data/ozone-metadata/scm/ca. Below is an example of fully initiated and bootstrapped Primary SCM security metadata layout.
# tree /var/lib/hadoop-ozone/scm/data/ozone-metadata/ |-- db.checkpoints |-- ratis |-- scm | |-- ca ## Primary(or Root) SCM CA key and cert | | |-- certs | | | `-- certificate.crt ### Primary CA cert | | `-- keys | | |-- private.pem | | `-- public.pem | |-- current | | `-- VERSION | `-- sub-ca ## Sub SCM CA key and certs | |-- certs | | |-- 6151220701168.crt | | |-- CA-1.crt ### Primary CA cert | | `-- certificate.crt ### Sub SCM CA cert | `-- keys | |-- private.pem | `-- public.pem |-- scm-ha ## SCM HA RATIS LOG | `-- 12113362-371a-4252-8f00-33152e1043a7 | |-- current | | |-- log_0-0 | | |-- log_1-34 | | |-- log_35-270 | | |-- log_inprogress_271 | | |-- raft-meta | | `-- raft-meta.conf | |-- in_use.lock | `-- sm |-- scm.db ## SCM metadata DB | |-- 000003.log | |-- CURRENT | |-- IDENTITY | |-- LOCK | |-- LOG | |-- MANIFEST-000004 | |-- OPTIONS-000024 | `-- OPTIONS-000026 `-- snapshot
All the Sub SCM instances’ (including the one running on the primary SCM) security metadata are stored at /var/lib/hadoop-ozone/scm/data/ozone-metadata/scm/sub-ca, with keys and certs under it, respectively. Here is an example of Sub SCM security metadata layout.
#tree /var/lib/hadoop-ozone/scm/data/ozone-metadata/ |-- db.checkpoints |-- ratis |-- scm | |-- current | | `-- VERSION | `-- sub-ca ## Sub SCM CA key and certs | |-- certs | | |-- 6164923274359.crt | | |-- CA-1.crt | | `-- certificate.crt | `-- keys | |-- private.pem | `-- public.pem |-- scm-ha ## SCM HA RATIS LOG | `-- 12113362-371a-4252-8f00-33152e1043a7 | |-- current | | |-- log_0-0 | | |-- log_1-34 | | |-- log_35-270 | | |-- log_inprogress_271 | | |-- raft-meta | | `-- raft-meta.conf | |-- in_use.lock | `-- sm |-- scm.db ## SCM metadata DB | |-- 000003.log | |-- CURRENT | |-- IDENTITY | |-- LOCK | |-- LOG | |-- MANIFEST-000004 | |-- OPTIONS-000024 | `-- OPTIONS-000026 `-- snapshot
As mentioned earlier, Ozone security meatada are located under ozone.metadata.dirs = /var/lib/hadoop-ozone. Below is an example of a datanode key persisted in PEM format.
#ls /var/lib/hadoop-ozone/datanode/ozone-metadata/dn/keys/ private.pem public.pem
Below shows we have a datanode certificate with serial id 29933096459761329 issued by SCM and a CA certificate from SCM.
# ls /var/lib/hadoop-ozone/datanode/ozone-metadata/dn/certs/ 29933096459761329.crt CA-1.crt
The certificates are persisted in PEM format under the ozone.metadata.dirs. We can use openssl or keytool from JDK to check certificates issued and used by ozone services. For privacy reasons, we anonymize the original hostname and ip address in the output with SCM.FQDN, DN.FQDN, etc.
[dn]# openssl x509 -in /var/lib/hadoop-ozone/datanode/ozone-metadata/dn/certs/29933096459761329.crt -text Certificate: Data: Version: 3 (0x2) Serial Number: 29933096459761329 (0x6a57fe1d82f6b1) Signature Algorithm: sha256WithRSAEncryption Issuer: CN=scm@, OU=e3357e22-4aa4-4262-84d4-9c178658bb8a, O=CID-abc007fc-286a-4f84-a9c7-dd69250ed5d7 Validity Not Before: Mar 22 00:00:00 2021 GMT Not After : Mar 22 00:00:00 2022 GMT Subject: CN=dn@ , OU=e3357e22-4aa4-4262-84d4-9c178658bb8a, O=CID-abc007fc-286a-4f84-a9c7-dd69250ed5d7 Subject Public Key Info: Public Key Algorithm: rsaEncryption Public-Key: (2048 bit) Modulus: 00:aa:4b:05:f5:1c:c8:44:d0:e7:c2:12:aa:4d:87: 0d:dc:d9:1f:04:ec:fa:ab:8e:46:e8:43:9a:07:67: 4e:50:79:5a:02:11:d1:03:bf:c7:81:91:8f:1a:f0: 14:16:74:84:e0:ae:a3:4f:3b:f6:a7:d4:9d:17:74: 3c:06:38:e7:58:e0:7b:34:7c:1a:5f:03:e6:05:83: 13:39:7b:52:ea:67:1d:4d:33:1e:49:61:b2:5e:35: 7e:42:9f:12:3d:97:88:ea:0a:cb:25:68:5c:9a:60: 2e:0e:93:d1:ef:f8:af:f1:ae:7e:bb:6b:17:37:25: 8b:0f:d3:8d:10:39:64:6f:d3:7a:fd:4b:c5:26:20: a1:9d:fe:73:53:b1:30:e7:d4:6e:41:0c:ad:7c:d1: 85:fe:0f:b1:c6:9a:69:98:34:33:1b:d6:32:96:39: 52:a3:ba:90:6d:f4:1b:7d:74:0b:9f:a2:f1:b3:41: 40:a2:ca:36:45:08:3f:24:37:42:3c:c3:86:b3:1e: aa:bd:be:27:11:99:45:ac:67:a5:ec:b5:fc:be:60: 04:3b:5b:fb:55:64:2e:22:eb:2f:7f:f3:3e:3a:b1: 4a:df:a2:ec:15:70:62:c3:0d:02:c6:75:c2:20:eb: 83:71:b2:4b:1f:97:91:bf:c5:2c:e3:54:3c:ac:a2: ee:09 Exponent: 65537 (0x10001) X509v3 extensions: X509v3 Subject Alternative Name: IP Address: , DNS: X509v3 Key Usage: critical Digital Signature, Key Encipherment, Data Encipherment, Key Agreement Signature Algorithm: sha256WithRSAEncryption 77:55:9e:3f:3d:17:19:58:2a:1f:ed:fa:b2:70:d9:85:d0:01: ee:4b:4d:67:1e:b3:61:83:61:1e:82:10:fc:f7:49:aa:51:67: 96:c7:ed:4f:22:83:83:39:75:c2:22:1f:11:cf:31:bf:f5:83: bf:57:ab:e0:66:de:68:ff:8f:8a:94:5a:3a:2b:0c:b9:52:77: 0a:76:09:a2:84:97:88:87:e0:39:8d:87:20:0d:1d:21:51:b2: 7f:d1:28:7f:d5:b5:8d:f2:01:cc:1e:68:b1:0b:f0:64:22:3c: 05:30:60:b1:d0:36:46:45:35:bc:73:06:2d:0f:9b:5c:10:f2: 4a:60:f3:15:c4:a5:d9:ee:54:eb:15:24:5c:11:24:69:2f:39: 64:f6:05:98:a9:90:f9:ba:11:c8:4d:c6:17:d3:1f:f4:18:6d: 3c:c4:e2:84:bc:90:55:4b:92:e5:1a:33:51:5a:6c:b9:52:e8: d6:c7:75:c1:f3:5f:0a:cb:c4:85:6f:6f:04:8e:bf:9a:46:d4: 42:4e:95:3e:d8:ff:43:53:09:75:e0:64:b1:e4:a3:90:e3:dd: ef:e9:d2:67:95:35:7e:00:1b:94:ab:ae:65:66:1b:48:fb:cc: ea:12:61:5a:8a:3f:f4:3b:5d:86:7c:32:44:d6:fb:e1:04:49: d3:5f:c5:94 -----BEGIN CERTIFICATE----- MIID8jCCAtqgAwIBAgIHalf+HYL2sTANBgkqhkiG9w0BAQsFADCBlDEwMC4GA1UE Awwnc2NtQG5pZ2h0bHk3eC0xLm5pZ2h0bHk3eC5yb290Lmh3eC5zaXRlMS0wKwYD VQQLDCRlMzM1N2UyMi00YWE0LTQyNjItODRkNC05YzE3ODY1OGJiOGExMTAvBgNV BAoMKENJRC1hYmMwMDdmYy0yODZhLTRmODQtYTljNy1kZDY5MjUwZWQ1ZDcwHhcN MjEwMzIyMDAwMDAwWhcNMjIwMzIyMDAwMDAwWjCBkzEvMC0GA1UEAwwmZG5Abmln aHRseTd4LTEubmlnaHRseTd4LnJvb3QuaHd4LnNpdGUxLTArBgNVBAsMJGUzMzU3 ZTIyLTRhYTQtNDI2Mi04NGQ0LTljMTc4NjU4YmI4YTExMC8GA1UECgwoQ0lELWFi YzAwN2ZjLTI4NmEtNGY4NC1hOWM3LWRkNjkyNTBlZDVkNzCCASIwDQYJKoZIhvcN AQEBBQADggEPADCCAQoCggEBAKpLBfUcyETQ58ISqk2HDdzZHwTs+quORuhDmgdn TlB5WgIR0QO/x4GRjxrwFBZ0hOCuo0879qfUnRd0PAY451jgezR8Gl8D5gWDEzl7 UupnHU0zHklhsl41fkKfEj2XiOoKyyVoXJpgLg6T0e/4r/GufrtrFzcliw/TjRA5 ZG/Tev1LxSYgoZ3+c1OxMOfUbkEMrXzRhf4PscaaaZg0MxvWMpY5UqO6kG30G310 C5+i8bNBQKLKNkUIPyQ3QjzDhrMeqr2+JxGZRaxnpey1/L5gBDtb+1VkLiLrL3/z PjqxSt+i7BVwYsMNAsZ1wiDrg3GySx+Xkb/FLONUPKyi7gkCAwEAAaNIMEYwNAYD VR0RBC0wK4cErBuoxIIjbmlnaHRseTd4LTEubmlnaHRseTd4LnJvb3QuaHd4LnNp dGUwDgYDVR0PAQH/BAQDAgO4MA0GCSqGSIb3DQEBCwUAA4IBAQB3VZ4/PRcZWCof 7fqycNmF0AHuS01nHrNhg2EeghD890mqUWeWx+1PIoODOXXCIh8RzzG/9YO/V6vg Zt5o/4+KlFo6Kwy5UncKdgmihJeIh+A5jYcgDR0hUbJ/0Sh/1bWN8gHMHmixC/Bk IjwFMGCx0DZGRTW8cwYtD5tcEPJKYPMVxKXZ7lTrFSRcESRpLzlk9gWYqZD5uhHI TcYX0x/0GG08xOKEvJBVS5LlGjNRWmy5UujWx3XB818Ky8SFb28Ejr+aRtRCTpU+ 2P9DUwl14GSx5KOQ493v6dJnlTV+ABuUq65lZhtI+8zqEmFaij/0O12GfDJE1vvh BEnTX8WU -----END CERTIFICATE-----
In summary, for development/test environments, we recommend users to configure all the metadata directories with a single location a.k.a All-In-One location for simplicity. However, this is not a recommended configuration for production environments for performance and reliability considerations.
For production environments, we recommend individual services such as Ozone Manager, Storage Container Manager and Datanode may choose to use faster SSD disks for Ozone metadata in Rocks DB or RAFT replication logs for the best performance.
In this blog, we explain ozone metadata directories for the dev/test and production environment in both standalone and high availability deployment. Feel free to reach out to me or the Ozone team via slack/email if you have any questions.
Reference
[1] Apache Hadoop Ozone Security https://cloudera.com/blog/technical/apache-hadoop-ozone-security-authentication/
This may have been caused by one of the following: