HBase ORM is a light-weight, production-grade, thread-safe and performant library that enables object-oriented access of HBase rows (Data Access Object) with minimal code and good testability.
This can also be used as an ORM for Bigtable. Scroll down till the relevant section to know how.
Let’s say you’ve an HBase table citizens
with row-key format of country_code#UID
. Now, let’s say this table is created with three column families main
, optional
and tracked
, which may have columns (qualifiers) uid
, name
, salary
etc.
This library enables to you represent your HBase table as a bean-like class, as below:
@HBTable(namepsace = "govt", name = "citizens",
families = {
@Family(name = "main"),
@Family(name = "optional", versions = 3),
@Family(name = "tracked", versions = 10)
}
)
public class Citizen implements HBRecord<String> {
private String countryCode;
private Integer uid;
@HBColumn(family = "main", column = "name")
private String name;
@HBColumn(family = "optional", column = "age")
private Short age;
@HBColumn(family = "optional", column = "income")
private Integer annualIncome;
@HBColumn(family = "optional", column = "registration_date")
private LocalDateTime registrationDate;
@HBColumn(family = "optional", column = "counter")
private Long counter;
@HBColumn(family = "optional", column = "custom_details")
private Map<String, Integer> customDetails;
@HBColumn(family = "optional", column = "dependents")
private Dependents dependents;
@HBColumnMultiVersion(family = "tracked", column = "phone_number")
private NavigableMap<Long, Integer> phoneNumber;
@HBColumn(family = "optional", column = "pincode", codecFlags = {
@Flag(name = BestSuitCodec.SERIALIZE_AS_STRING, value = "true")
})
private Integer pincode;
@Override
public String composeRowKey() {
return String.format("%s#%d", countryCode, uid);
}
@Override
public void parseRowKey(String rowKey) {
String[] pieces = rowKey.split("#");
this.countryCode = pieces[0];
this.uid = Integer.parseInt(pieces[1]);
}
// Constructors, getters and setters
}
That is,
Citizen
represents the HBase table citizens
in namespace govt
, using the @HBTable
annotation.Citizen
objects and vice-versa are implemented using parseRowKey
and composeRowKey
methods respectively.HBRecord
generic interface (in above case, String
).
String
is both ‘Comparable
with itself’ and Serializable
.@HBColumn
or @HBColumnMultiVersion
annotations.String
, Integer
), generic data types (e.g. Map
, List
), custom class (e.g. Dependents
) or even generics of custom class (e.g. List<Dependent>
)@HBColumnMultiVersion
annotation allows you to map multiple versions of column in a NavigableMap<Long, ?>
. In above example, field phoneNumber
is mapped to column phone_number
within the column family tracked
(which is configured for multiple versions)Alternatively, you can model your class as below:
class CitizenKey implements Serializable, Comparable<CitizenKey> {
String countryCode;
Integer uid;
@Override
public int compareTo(CitizenKey key) {
// your custom logic involving countryCode and uid
}
}
public class Citizen implements HBRecord<CitizenKey> {
private CitizenKey rowKey;
@Override
public CitizenKey composeRowKey() {
return rowKey;
}
@Override
public void parseRowKey(CitizenKey rowKey) {
this.rowKey = rowKey;
}
}
See source files Citizen.java and Employee.java for detailed examples. Specifically, Employee.java demonstrates using “column inheritance” of this library, a useful feature if you have many HBase tables with common set of columns.
Boolean
, Short
, Integer
, Long
, Float
, Double
, String
and BigDecimal
(see: Bytes)null
as null
codecFlags
(supported by both @HBColumn
and @HBColumnMultiVersion
annotations) can be used to pass custom flags to the underlying codec. (e.g. You may want your codec to serialize field Integer id
in Citizen
class differently from field Integer id
in Employee
class)BestSuitCodec
takes a flag BestSuitCodec.SERIALIZE_AS_STRING
, whose value is “serializeAsString” (as in the above Citizen
class example). When this flag is set to true
on a field, the default codec serializes that field (even numerical fields) as strings.
This library provides an abstract class to define your own data access object. For example, you can create one for Citizen
class in the above example as follows:
import org.apache.hadoop.hbase.client.Connection;
import com.flipkart.hbaseobjectmapper.AbstractHBDAO;
import java.io.IOException;
public class CitizenDAO extends AbstractHBDAO<String, Citizen> {
// in above, String is the 'row type' of Citizen
public CitizenDAO(Connection connection) throws IOException {
super(connection); // if you need to customize your codec, you may use super(connection, codec)
// alternatively, you can construct CitizenDAO by passing instance of 'org.apache.hadoop.conf.Configuration'
}
}
(see CitizenDAO.java)
Additionally, you can also create asynchronous DAO objects by extending ReactiveHBDAO
as illustrated in the snippet below. Reactive DAOs are based on Java’s
CompletableFuture
abstraction and require an AsyncConnection
object to be supplied rather than Connection
.
import com.flipkart.hbaseobjectmapper.ReactiveHBDAO;
import com.flipkart.hbaseobjectmapper.testcases.entities.Citizen;
import org.apache.hadoop.hbase.client.AsyncConnection;
public class CitizenDAO extends ReactiveHBDAO<String, Citizen> {
// in above, String is the 'row type' of Citizen
public CitizenDAO(AsyncConnection connection) {
super(connection); // if you need to customize your codec, you may use super(connection, codec)
// alternatively, you can construct CitizenDAO by passing instance of 'org.apache.hadoop.conf.Configuration'
}
}
(see reactive/CitizenDAO.java)
Once defined, you can instantiate your data access object as below:
CitizenDAO citizenDao = new CitizenDAO(connection);
Side note: As you’d know, HBase’s Connection
creation is a heavy-weight operation
(Details: Connection).
So, it is recommended that you create Connection
instance once and use it for the entire life cycle of your program across all the DAO classes that you create (such as above).
Now, you can access, manipulate and persist records of citizens
table as shown in below examples. Note that the signatures of the reactive DAO may differ; all operations return CompletableFuture
s.
Create new record:
String rowKey = citizenDao.persist(new Citizen("IND", 1, /* more params */));
// In above, output of 'persist' is a String, because Citizen class implements HBRecord<String>
Fetch a single record by its row key:
// Fetch row from "citizens" HBase table whose row key is "IND#1":
Citizen pe = citizenDao.get("IND#1");
Fetch multiple records by their row keys:
Citizen[] ape = citizenDao.get(new String[] {"IND#1", "IND#2"}); //bulk get
Fetch records by range of row keys (start row key, end row key):
List<Citizen> lpe1 = citizenDao.get("IND#1", "IND#5");
// above uses default behavior: start key inclusive, end key exclusive, 1 version
List<Citizen> lpe2 = citizenDao.get("IND#1", true, "IND#9", true, 5, 10000);
// above fetches with: start key inclusive, end key inclusive, 5 versions, caching set to 10,000 rows
Iterate over large number of records by range of row keys:
// Read records from 'IND#000000001' (inclusive) to 'IND#100000000' (exclusive):
try (Records<Citizen> citizens = citizenDao.records("IND#000000001", "IND#100000000")) {
for (Citizen citizen : citizens) {
// your code
}
}
// Read records from 'IND#000000001' (inclusive) to 'IND#100000000' (inclusive) with
// 5 versions, caching set to 1,000 rows:
try (Records<Citizen> citizens = citizenDao.records("IND#000000001", true, "IND#100000000", true, 5, 1000)) {
for (Citizen citizen : citizens) {
// your code
}
}
Note: All the .records(...)
methods efficiently use iterators internally and do not load records upfront into memory. Hence, it is safe to fetch millions of records using them.
Fetch records by row key prefix:
List<Citizen> lpe3 = citizenDao.getByPrefix(citizenDao.toBytes("IND#"));
Iterate over large number of records by row key prefix:
try (Records<Citizen> citizens = citizenDao.recordsByPrefix(citizenDao.toBytes("IND#"))) {
for (Citizen citizen : citizens) {
// do something
}
}
Fetch records by HBase’s native Scan
object: (for very advanced access patterns)
Scan scan = new Scan().setAttribute(...)
.setReadType(...)
.setACL(...)
.withStartRow(...)
.withStopRow(...)
.readAllVersions(...);
try (Records<Citizen> citizens = citizenDao.records(scan)) {
for (Citizen citizen : citizens) {
// do something
}
}
Fetch specific field(s) for given row key(s):
// for row keys in range ["IND#1", "IND#5"), fetch 3 versions of field 'phoneNumber':
NavigableMap<String, NavigableMap<Long, Object>> phoneNumberHistory
= citizenDao.fetchFieldValues("IND#1", "IND#5", "phoneNumber", 3);
// bulk variants of above range method are also available
Read data from HBase using HBase’s native Get
:
Get get1 = citizenDao.getGet("IND#2");
// above returns object of HBase's Get corresponding to row key "IND#2", to enable advanced read patterns
Counter counter1 = counterDAO.getOnGet(get1);
Get get2 = citizenDao.getGet("IND#2").setTimeRange(1, 5).setMaxVersions(2); // Advanced HBase row fetch
Counter counter2 = counterDAO.getOnGet(get2);
Manipulate and persist an object back to HBase:
// change a field:
pe.setPincode(560034);
// Save the record back to HBase:
citizenDao.persist(pe);
Delete records in various ways:
// Delete a row by its object reference:
citizenDao.delete(pe);
// Delete multiple rows by list of object references:
citizenDao.delete(Arrays.asList(pe1, pe2));
// Delete a row by its row key:
citizenDao.delete("IND#2");
// Delete a bunch of rows by their row keys:
citizenDao.delete(new String[] {"IND#3", "IND#4"});
Increment a column in HBase:
// Increment value of counter by 3:
citizenDao.increment("IND#2", "counter", 3L);
Append to a column:
citizenDao.append("IND#2", "name", " Kalam");
// there are 'bulk methods' available
Other operations:
Table citizenTable = citizenDao.getHBaseTable()
// in case you want to directly class HBase's native methods
(see TestsAbstractHBDAO.java for more detailed examples)
Please note: Since we’re dealing with HBase (and not a classical RDBMS), fitting a Hibernate-like ORM may not make sense. So, this library does not intend to evolve as a full-fledged ORM. However, if that’s your intent, I suggest you use Apache Phoenix.
The provided HBAdmin
class helps you programatically create/delete tables.
You may instantiate the class using Connection
or AsyncConnection
object:
import org.apache.hadoop.hbase.client.Connection;
import com.flipkart.hbaseobjectmapper.HBAdmin;
// some code
HBAdmin hbAdmin = HBAdmin.create(connection);
Once instantiated, you may do the following DDL operations:
hbAdmin.createTable(Citizen.class);
// Above statement creates table with name and column families specification as per
// the @HBTable annotation on the Citizen class
hbAdmin.tableExists(Citizen.class); // returns true/false
hbAdmin.disableTable(Citizen.class);
hbAdmin.deleteTable(Citizen.class);
Note that DDL operations on HBase are typically heavy and time-consuming.
The HBObjectMapper
class in this library provides the useful methods such as below:
Result writeValueAsResult(T record)
T readValue(Result result, Class<T> clazz)
where T
is your bean-like class that extends this library’s HBRecord
interface (e.g. Citizen
class above).
Using these, you can convert your object to HBase’s Result
and vice versa.
Read article.
Being an object mapper, this library works for predefined columns only. For example, this library doesn’t provide ways to fetch:
If you are using Maven, add below entry within the dependencies
section of your pom.xml
:
<dependency>
<groupId>com.flipkart</groupId>
<artifactId>hbase-object-mapper</artifactId>
<version>1.19</version>
</dependency>
See artifact details: com.flipkart:hbase-object-mapper on Maven Central.
If you’re using Gradle or Ivy or SBT, see how to include this library in your build: com.flipkart:hbase-object-mapper:1.19.
To build this project, follow below simple steps:
git clone
of this repositorygit checkout v1.19
mvn clean install
from shellmvn
build times can sometimes be longer, depending on your machine configuration.INMEMORY_CLUSTER_START_TIMEOUT
(seconds). For example, on Linux you may run the command export INMEMORY_CLUSTER_START_TIMEOUT=8
on terminal, before running the aforementioned mvn
command.USE_REAL_HBASE
environmental variable to true
.
hbase-site.xml
.The change log can be found in the releases section.
If you intend to request a feature or report a bug, you may use Github Issues for hbase-orm.
Google’s Cloud Bigtable provides first-class support for accessing Bigtable using HBase client.
This library can be used as a Bigtable ORM in 3 simple steps:
Connection
class as below:
import com.google.cloud.bigtable.hbase.BigtableConfiguration;
import org.apache.hadoop.hbase.client.Connection;
// some code
Connection connection = BigtableConfiguration.connect(projectId, instanceId);
// some code
Connection
instance as mentioned earlier in this README, to create your DAO classThat’s it! Now you’re all set to access Bigtable.
Copyright 2023 Flipkart Internet Pvt Ltd.
Licensed under the Apache License, version 2.0 (the “License”). You may not use this product or its source code except in compliance with the License.