Kubernetes
Hive Metastore
Installation Guide
Data Management
Cloud Deployment

How to install Hive Metastore in Kubernetes?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

The Hive Metastore is the service that stores table definitions, partitions, and schema metadata for engines such as Hive, Spark, and Trino. Running it in Kubernetes is common because the service is stateless once its metadata lives in an external database, which makes scaling and operations much easier.

Understand the Pieces Before You Deploy

A production-style metastore deployment has three parts:

  • A relational database such as PostgreSQL or MySQL for metadata
  • A Hive Metastore container that serves Thrift requests on port 9083
  • Kubernetes objects for configuration, secrets, deployment, and service discovery

The actual warehouse data does not live in the metastore pod. It stays in object storage or HDFS, while the metastore only stores definitions and locations.

Create Secrets and Hive Configuration

The metastore needs JDBC credentials and a hive-site.xml file. The example below assumes PostgreSQL is already reachable from the cluster at postgres.default.svc.cluster.local.

yaml
1apiVersion: v1
2kind: Secret
3metadata:
4  name: hive-metastore-db
5type: Opaque
6stringData:
7  username: hive
8  password: hive-password
9---
10apiVersion: v1
11kind: ConfigMap
12metadata:
13  name: hive-metastore-config
14data:
15  hive-site.xml: |
16    <configuration>
17      <property>
18        <name>javax.jdo.option.ConnectionURL</name>
19        <value>jdbc:postgresql://postgres.default.svc.cluster.local:5432/metastore</value>
20      </property>
21      <property>
22        <name>javax.jdo.option.ConnectionDriverName</name>
23        <value>org.postgresql.Driver</value>
24      </property>
25      <property>
26        <name>javax.jdo.option.ConnectionUserName</name>
27        <value>hive</value>
28      </property>
29      <property>
30        <name>javax.jdo.option.ConnectionPassword</name>
31        <value>hive-password</value>
32      </property>
33      <property>
34        <name>hive.metastore.uris</name>
35        <value>thrift://hive-metastore:9083</value>
36      </property>
37    </configuration>

Apply it with:

bash
kubectl apply -f hive-metastore-config.yaml

You can replace the inline password with environment variable templating if your image or chart supports it, but keeping the example explicit makes the required settings clear.

Initialize the Metastore Schema

Before starting the service, initialize the schema in the database. Hive ships the schematool utility for this step. Running it as a Kubernetes Job is a clean way to make initialization reproducible.

yaml
1apiVersion: batch/v1
2kind: Job
3metadata:
4  name: hive-metastore-schema
5spec:
6  template:
7    spec:
8      restartPolicy: Never
9      containers:
10        - name: schema-init
11          image: apache/hive:4.0.0
12          command:
13            - /bin/sh
14            - -lc
15            - |
16              cp /config/hive-site.xml /opt/hive/conf/hive-site.xml
17              /opt/hive/bin/schematool -dbType postgres -initSchema
18          volumeMounts:
19            - name: hive-config
20              mountPath: /config
21      volumes:
22        - name: hive-config
23          configMap:
24            name: hive-metastore-config

Wait for the job to finish successfully before creating the service itself. If schema initialization fails, the metastore pod will often start and then crash when it tries to query missing tables.

Deploy the Hive Metastore Service

Once the database schema exists, deploy the metastore container and expose it with a ClusterIP service.

yaml
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4  name: hive-metastore
5spec:
6  replicas: 1
7  selector:
8    matchLabels:
9      app: hive-metastore
10  template:
11    metadata:
12      labels:
13        app: hive-metastore
14    spec:
15      containers:
16        - name: metastore
17          image: apache/hive:4.0.0
18          command:
19            - /bin/sh
20            - -lc
21            - |
22              cp /config/hive-site.xml /opt/hive/conf/hive-site.xml
23              /opt/hive/bin/hive --service metastore
24          ports:
25            - containerPort: 9083
26          volumeMounts:
27            - name: hive-config
28              mountPath: /config
29      volumes:
30        - name: hive-config
31          configMap:
32            name: hive-metastore-config
33---
34apiVersion: v1
35kind: Service
36metadata:
37  name: hive-metastore
38spec:
39  selector:
40    app: hive-metastore
41  ports:
42    - port: 9083
43      targetPort: 9083

After applying the manifest, verify the service with kubectl get pods, kubectl logs deployment/hive-metastore, and kubectl get svc hive-metastore. Spark or Trino clients inside the cluster can then point to thrift://hive-metastore:9083.

Operational Notes

In production, use a managed database or a highly available in-cluster database, add readiness probes, and pin the image version you have tested. If multiple compute engines share the metastore, treat schema upgrades carefully and plan them the same way you would plan a database migration.

Common Pitfalls

  • Skipping schematool -initSchema is the most common reason for startup failures.
  • Using an in-memory or ephemeral database means metadata disappears when the pod restarts.
  • Missing JDBC drivers or a mismatched dbType value will break the schema job.
  • Do not expose the service publicly unless there is a strong reason and proper network controls.

Summary

  • Hive Metastore runs well in Kubernetes because the service is stateless and the metadata lives in a database.
  • You need secrets, a valid hive-site.xml, schema initialization, and a deployment that serves port 9083.
  • Initialize the schema before starting the metastore service.
  • Verify connectivity from the engines that will consume the metastore, not just from the pod itself.

Course illustration
Course illustration

All Rights Reserved.