RGW metadata Search with Elasticsearch

Last Updated On: October 20, 2020 | By: Pei Feng Lin

(This blog is wrtiten by Xidian Chen)

1. Understand data organization and storage

An object is the basic unit of data organization and storage in an object storage system. An object contains information that should be composed of metadata of data entities, data entities and user-defined metadata of data entities.

Data refers to the real data maintained by the user, such as the content of a text file or a video file
The basic and necessary meta-information in the data entity includes: which storage space belongs to, type, size, check value, last modification time, and other information to be saved except the data, which is generally composed of KV key-value pair
For some businesses, more meta-information may be required. For example, a video file, in addition to the type, size, checksum, last modification time, the user’s business may also want additional traces, such as video style, lead actor, etc.
Key: The Key value, namely the name of the object, is the character sequence with utF-8 encoding length greater than 0 and no more than 1024. Each object in a bucket must have a unique object key value

2. Metadata Search

Plan 1

The architecture of this scheme is straightforward. The front-end application uploads the object to the Ceph RGW and sends the custom metadata of this object to the ElasticSearch cluster. When a user needs to get an object, a search request can be sent to ElasticSearch to get the object address. This request can be an object name or an object ID number, or a user-defined metadata. Once ElasticSearch returns the address of this object, the front-end application USES this address to obtain the object itself directly from the Ceph RGW.

It is not difficult to see the implementation process of the scheme. The idea of this scheme is relatively straightforward and its implementation is not difficult, but the data consistency between RGW and ElasticSearch completely depends on the front-end application. In addition to SDK, S3 has s3CMD, a command-line tool. RGW can also upload objects via HTTP request. Imagine that users can upload objects directly in the background rather than through the front-end application, which creates ElasticSearch which cannot synchronize corresponding metadata.

Plan 2

Since Ceph has added support for ElasticSearch after the Jewel release, we can achieve automatic synchronization of RGW metadata into ElasticSearch by defining new zone types and synchronization plug-ins. In this way, the consistency of RGW and ElasticSearch data is guaranteed, greatly reducing the coupling between front and rear ends.

As can be seen from the architecture diagram, the only difference between Plan 1 and Plan 2 is that there is no need to upload metadata to ElasticSearch when the front-end uploads objects. The built-in Sync Plugin in the Ceph RGW can automatically synchronize metadata to ElasticSearch.

Final Plan

To achieve ElasticSearch for RGW metadata synchronization, we configured a ZoneGroup (CN) and added two zones: US-1 (master) and US-2 (Slave). In addition, an RGW instance is started on each zone respectively. Rgw.us-1 is used to accept read and write requests from the front end, and RGW.US-2 is used to synchronize metadata to ElasticSearch.

3. Demo

Env

Architecture

ES Cluster

Host Name	Public Network	Admin Network
es-node001	192.168.2.101	172.200.50.101
es-node002	192.168.2.102	172.200.50.102
es-node003	192.168.2.103	172.200.50.103

Ceph Cluster

Host Name	Public Network	Admin Network	Cluster Network
admin	192.168.2.39	172.200.50.39	192.168.3.39
node001	192.168.2.40	172.200.50.40	192.168.3.40
node002	192.168.2.41	172.200.50.41	192.168.3.41

Deploy ES Cluster

Install JDK and ES packages, all nodes

# zypper -n in java-1_8_0-openjdk
# zypper -n in java-1_8_0-openjdk-devel
# zypper –no-gpg-checks -n in elasticsearch-5.6.0.rpm

Configure ES
es-node001

# cat /etc/elasticsearch/elasticsearch.yml | grep -v ^#
cluster.name: my-application
node.name: es-node001
network.host: 0.0.0.0
http.port: 9200
discovery.zen.ping.unicast.hosts: [“es-node001”, “es-node002″,”es-node003”]

es-node002

# cat /etc/elasticsearch/elasticsearch.yml | grep -v ^# | grep -v ^$
cluster.name: my-application
node.name: es-node002
network.host: 0.0.0.0
http.port: 9200
discovery.zen.ping.unicast.hosts: [“es-node001”, “es-node002″,”es-node003”]

es-node003

# cat /etc/elasticsearch/elasticsearch.yml | grep -v ^# | grep -v ^$
cluster.name: my-application
node.name: es-node003
network.host: 0.0.0.0
http.port: 9200
discovery.zen.ping.unicast.hosts: [“es-node001”, “es-node002″,”es-node003”]

Enable Service

# systemctl daemon-reload
# systemctl enable elasticsearch.service
# systemctl start elasticsearch.service
# systemctl status elasticsearch.service

Check port and network

# netstat -ntulp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1670/sshd
tcp6 0 0 :::9200 :::* LISTEN 14082/java
tcp6 0 0 :::9300 :::* LISTEN 14082/java
tcp6 0 0 :::22 :::* LISTEN 1670/sshd

Check ES Cluster
ES Cluster Version

# curl 192.168.2.101:9200
{
“name” : “5JyoL9w”,
“cluster_name” : “elasticsearch”,
“cluster_uuid” : “vCFofUJBR46zUmOKp_bDWA”,
“version” : {
“number” : “5.6.0”,
“build_hash” : “781a835”,
“build_date” : “2017-09-07T03:09:58.087Z”,
“build_snapshot” : false,
“lucene_version” : “6.6.0”
},
“tagline” : “You Know, for Search”
}

Cluster nodes info

# curl -XGET ‘172.200.50.101:9200/_cat/nodes?v’
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.200.50.53 16 71 0 0.01 0.01 0.00 mdi – node-3
172.200.50.52 17 70 0 0.08 0.07 0.02 mdi – node-2
172.200.50.51 13 65 0 0.02 0.01 0.00 mdi * node-1

Status of Cluster

# curl -XGET ‘172.200.50.101:9200/_cluster/health?&pretty’
{
“cluster_name” : “my-application”,
“status” : “green”,
“timed_out” : false,
“number_of_nodes” : 3,
“number_of_data_nodes” : 3,
“active_primary_shards” : 0,
“active_shards” : 0,
“relocating_shards” : 0,
“initializing_shards” : 0,
“unassigned_shards” : 0,
“delayed_unassigned_shards” : 0,
“number_of_pending_tasks” : 0,
“number_of_in_flight_fetch” : 0,
“task_max_waiting_in_queue_millis” : 0,
“active_shards_percent_as_number” : 100.0
}

Deploy Ceph

realm: gold
zonegroup: us
data zone: us-east-1
metadata search zone: us-east-2

1、Create Master Zone

（1）Create Pool （node001）

# ceph osd pool create .rgw.root 8 8
# ceph osd pool create us-east-1.rgw.control 8 8
# ceph osd pool create us-east-1.rgw.meta 16 16
# ceph osd pool create us-east-1.rgw.log 8 8
# ceph osd pool create us-east-1.rgw.buckets.index 8 8
# ceph osd pool create us-east-1.rgw.buckets.data 64 64

# ceph osd pool application enable .rgw.root rgw
# ceph osd pool application enable us-east-1.rgw.control rgw
# ceph osd pool application enable us-east-1.rgw.meta rgw
# ceph osd pool application enable us-east-1.rgw.log rgw
# ceph osd pool application enable us-east-1.rgw.buckets.index rgw
# ceph osd pool application enable us-east-1.rgw.buckets.data rgw

# ceph osd pool create .rgw.root 8 8
# ceph osd pool create us-east-2.rgw.control 8 8
# ceph osd pool create us-east-2.rgw.meta 16 16
# ceph osd pool create us-east-2.rgw.log 8 8
# ceph osd pool create us-east-2.rgw.buckets.index 8 8
# ceph osd pool create us-east-2.rgw.buckets.data 64 64

# ceph osd pool application enable .rgw.root rgw
# ceph osd pool application enable us-east-2.rgw.control rgw
# ceph osd pool application enable us-east-2.rgw.meta rgw
# ceph osd pool application enable us-east-2.rgw.log rgw
# ceph osd pool application enable us-east-2.rgw.buckets.index rgw
# ceph osd pool application enable us-east-2.rgw.buckets.data rgw

（2）Delete Default Zone Group and Zone (Optional)

A default zone group named Default is created when the object gateway is installed with the default Settings. Since we no longer need the default locale group, we delete it

# radosgw-admin zonegroup list
{
“default_info”: “”,
“zonegroups”: [
“default” ]
}

# radosgw-admin zonegroup remove –rgw-zonegroup=default –rgwzone=default
# radosgw-admin period update –commit
# radosgw-admin zone delete –rgw-zone=default
# radosgw-admin period update –commit
# radosgw-admin zonegroup delete –rgw-zonegroup=default
# radosgw-admin period update –commit

（3）Create realm （admin）

# radosgw-admin realm create –rgw-realm=gold –default
# radosgw-admin realm list
{
“default_info”: “ded6e77f-afe6-475c-8fdb-e09f684acf18”,
“realms”: [
“gold”
]

（4）Create Master Zonegroup ( us ) （admin）

# radosgw-admin zonegroup create –rgw-zonegroup=us \
–endpoints=http://192.168.2.41:80 –master –default

# radosgw-admin zonegroup list
{
“default_info”: “6ac5588a-a0ae-44e7-9a91-6cc285e9d521”,
“zonegroups”: [
“us”
]

（5）Create Master Zone (us-east-1)

Randomly generate a key, and then use that key

# SYSTEM_ACCESS_KEY=MebOITA7uiemM3UeASMn
# SYSTEM_SECRET_KEY=PIZYauzILJlMG0MylUkBwnR73hA0FQ1qb0qvOxER

# radosgw-admin zone create –rgw-zonegroup=us –rgw-zone=us-east-1 \
–endpoints=http://192.168.2.41:80 –access-key=$SYSTEM_ACCESS_KEY \
–secret=$SYSTEM_SECRET_KEY

# radosgw-admin zone list
{
“default_info”: “b7467d31-fb6b-46f5-aff2-8b6418356109”,
“zones”: [
“us-east-1”
]

（6）Delete default zone （Optional）

# radosgw-admin zone delete –rgw-zone=default

（7）Create User

# radosgw-admin user create –uid=zone.user \
–display-name=”Zone User” –access-key=$SYSTEM_ACCESS_KEY \
–secret=$SYSTEM_SECRET_KEY –system
{
“user_id”: “zone.user”,
“display_name”: “Zone User”,
“email”: “”,
“suspended”: 0,
“max_buckets”: 1000,
“subusers”: [],
“keys”: [
{
“user”: “zone.user”,
“access_key”: “MebOITA7uiemM3UeASMn”,
“secret_key”: “PIZYauzILJlMG0MylUkBwnR73hA0FQ1qb0qvOxER”
}

# radosgw-admin user list
# radosgw-admin user info –uid=zone.user

（8）Updates and Commit Period（Admin）

# radosgw-admin period update –commit
# radosgw-admin period get
{
“id”: “3f07279f-1182-47e3-9388-fc9999b3317c”,
“epoch”: 1,
“predecessor_uuid”: “b62f7c97-fa71-4a5e-9859-b4faa242ddef”,
“sync_status”: [],
“period_map”: {
“id”: “3f07279f-1182-47e3-9388-fc9999b3317c”,
“zonegroups”: [
{
“id”: “1d3b5143-f575-4f9f-91d2-9fdc62e82992”,
“name”: “us”,
“api_name”: “us”,
“is_master”: “true”,
“endpoints”: [
“http://192.168.2.41:80”
],

（9）Create node002 GW key (Admin)

# ceph auth add client.rgw.us-east-1 mon ‘allow rwx’ osd ‘allow rwx’ mgr ‘allow r’
# ceph auth get client.rgw.us-east-1 > /etc/ceph/ceph.client.us-east-1.keyring
# scp /etc/ceph/ceph.client.us-east-1.keyring node002:/etc/ceph/

（10）Start RADOS gateway (node002)

# zypper ref && sudo zypper in ceph-radosgw

# vim /etc/ceph/ceph.conf
[client.rgw.us-east-1]
rgw_frontends=”beast port=80″
rgw_zone=us-east-1
keyring = /etc/ceph/ceph.client.us-east-1.keyring
log file = /var/log/radosgw/rgw.us-east-1.radosgw.log

# mkdir /var/log/radosgw/

# systemctl restart ceph-radosgw@rgw.us-east-1
# systemctl enable ceph-radosgw@rgw.us-east-1
# systemctl status ceph-radosgw@rgw.us-east-1

2、Create Secondary Zone

（1）Create secondary zone：us-east-2 （Admin）

# radosgw-admin zone create –rgw-zonegroup=us –endpoints=http://192.168.2.42:80 \
–rgw-zone=us-east-2 –access-key=$SYSTEM_ACCESS_KEY \
–secret=$SYSTEM_SECRET_KEY

# radosgw-admin zone list
{
“default_info”: “57fd7201-3789-4fbd-adfa-b473614df315”,
“zones”: [
“us-east-1”,
“us-east-2”
]

（2）Update and Commit Period（Admin）

# radosgw-admin period update –commit

（3）Create rgw key （Admin）

# ceph auth add client.rgw.us-east-2 mon ‘allow rwx’ osd ‘allow rwx’ mgr ‘allow r’
# ceph auth get client.rgw.us-east-2 > /etc/ceph/ceph.client.us-east-2.keyring
# scp /etc/ceph/ceph.client.us-east-2.keyring node003:/etc/ceph/

（4）Start RADOS gateway (node003)

# zypper ref && sudo zypper in ceph-radosgw

# vim /etc/ceph/ceph.conf
[client.rgw.us-east-2]
rgw_frontends=”beast port=80″
rgw_zone=us-east-2
keyring = /etc/ceph/ceph.client.us-east-2.keyring
log file = /var/log/radosgw/rgw.us-east-2.radosgw.log

# mkdir /var/log/radosgw/

# systemctl restart ceph-radosgw@rgw.us-east-2
# systemctl enable ceph-radosgw@rgw.us-east-2
# systemctl status ceph-radosgw@rgw.us-east-2

（5）Check sync status（Admin）

# radosgw-admin sync status
realm c859877c-22aa-41ed-bcb4-23d36d8c212f (gold)
zonegroup 1d3b5143-f575-4f9f-91d2-9fdc62e82992 (us)
zone 57fd7201-3789-4fbd-adfa-b473614df315 (us-east-1)
metadata sync no sync (zone is master)
data sync source: a8ef6d51-d8de-40a2-98cc-c92ac62fb84f (us-east-2)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source

（6）Check disk of Capacity

# ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 66 GiB 48 GiB 12 GiB 18 GiB 27.44
TOTAL 66 GiB 48 GiB 12 GiB 18 GiB 27.44

（7）Modify us-east-2 Deployment ，Modify tier-type and tier-config , Point the Port of Elasticsearch.

ElasticSearch Tier Type Configuration Parameters：

endpoint: Specifies the ElasticSearch server endpoint to access.
num_shards: (integer) The number of shards that ElasticSearch will be configured with on data synchronization initialization.
num_replicas: (integer) The number of replicas that ElasticSearch will be configured with on data synchronization initialization.
explicit_custom_meta: Specifies whether all user custom metadata will be indexed or whether the user needs to configure (at the bucket level) which customer metadata items should be indexed. This parameter defaults to false

Notes：Address is ES Master IP

# radosgw-admin zone modify –rgw-zone=us-east-2 –tier-type=elasticsearch \
–tier-config=endpoint=http://192.168.2.101:9200,num_shards=5,num_replicas=1
{
“id”: “7b2733a8-cbd6-4564-a509-b9abbb86f02a”,
“name”: “us-east-2”,
“domain_root”: “us-east-2.rgw.meta:root”,
“control_pool”: “us-east-2.rgw.control”,
“gc_pool”: “us-east-2.rgw.log:gc”,
“lc_pool”: “us-east-2.rgw.log:lc”,
“log_pool”: “us-east-2.rgw.log”,
“intent_log_pool”: “us-east-2.rgw.log:intent”,
“usage_log_pool”: “us-east-2.rgw.log:usage”,
“reshard_pool”: “us-east-2.rgw.log:reshard”,
“user_keys_pool”: “us-east-2.rgw.meta:users.keys”,
“user_email_pool”: “us-east-2.rgw.meta:users.email”,
“user_swift_pool”: “us-east-2.rgw.meta:users.swift”,
“user_uid_pool”: “us-east-2.rgw.meta:users.uid”,
“otp_pool”: “us-east-2.rgw.otp”,
“system_key”: {
“access_key”: “MebOITA7uiemM3UeASMn”,
“secret_key”: “PIZYauzILJlMG0MylUkBwnR73hA0FQ1qb0qvOxER”
},
“placement_pools”: [
{
“key”: “default-placement”,
“val”: {
“index_pool”: “us-east-2.rgw.buckets.index”,
“storage_classes”: {
“STANDARD”: {
“data_pool”: “us-east-2.rgw.buckets.data”
}
},
“data_extra_pool”: “us-east-2.rgw.buckets.non-ec”,
“index_type”: 0
}
}
],
“metadata_heap”: “”,
“tier_config”: {
“endpoint”: “http://192.168.2.101:9200”,
“num_replicas”: 1,
“num_shards”: 5
},
“realm_id”: “30114dc2-6e8d-41fa-9284-35e9fe8673eb”
}

Verify with Postman

(Visited 76 times, 1 visits today)

Mar 09th, 2026

RGW metadata Search with Elasticsearch

1. Understand data organization and storage

2. Metadata Search

Plan 1

Plan 2

Final Plan

3. Demo

Env

Deploy ES Cluster

Deploy Ceph

Verify with Postman

Related Articles

From Silicon to Solution: Bridging the Gap at NVIDIA GTC 2026

Explore Confidential Computing with AMD SEV-SNP in SUSE Linux

Why AI Security Starts at the OS Level: SUSE’s Role in Protecting AI Models and Data

openQA introduces native MCP Support

Leave a Reply Cancel reply