Apache Accumulo

Secure Data Delivery for Government Applications

Apache Accumulo is an ideal solution for government agencies looking for a secure, distributed NoSQL data store to serve their most performance-intensive Big Data applications.

Accumulo is an open source project integrated with CDH and provides the ability to store data in massive tables (billions of rows / millions of columns) for fast, random access. Accumulo was created and contributed to the Apache Software Foundation by the National Security Agency (NSA) and it has quickly gained adoption as a Hadoop-based key/value store for applications that require access to sensitive data sets.

Primary Accumulo Use Cases

Securely serve data to many users

Initiatives like Open Government and efforts in various sectors (such as Intelligence) designed to make better use of Big Data require applications that can scale to unprecedented levels of capacity, serve larger populations of users, and maintain the most stringent levels of security.

Accumulo forms the foundation for these applications by providing a distributed data platform (built on Hadoop) where storage, memory, and CPU resources can be scaled horizontally as load and performance demands increase. In addition to linear scalability, Accumulo delivers the granular, cell-level access controls required to serve data to diverse sets of users with varying levels of permissions and security clearance.

Providing fast, random read/write access for Hadoop

HDFS is a massively scalable file system tuned for data processing and analytic workloads. It's optimized for scan performance and doesn't provide record mutability. Accumulo augments HDFS by providing record-based storage that allows users and applications to perform fast, random reads and writes to data. Changes are cataloged in memory and eventually pushed down to HDFS for persistence. This enables the Hadoop system to serve random reads and writes to users and applications across big tables in real time.

Key Features of Apache Accumulo

Scale-out architecture – Reduce data movement as well as duplicate storage with specialized systems by performing interactive analysis directly on full fidelity data

Security – table and cell-level security

Full consistency – guard against node failures or simultaneous writes to the same record

High availability – multiple master nodes ensure continuous access to data

Automatic sharding – transparently and efficiently scale out your data across machines in the cluster

Server-side programming – perform additional processing on key/value pairs locally through iterators to minimize data movementsystem.

Get Support for Accumulo with Cloudera Enterprise

Cloudera Enterprise is the best way to leverage the power of Apache Accumulo in production environments. When you deploy Accumulo as part of Cloudera Enterprise Flex Edition or Data Hub Edition as part of an enterprise data hub, you can rely on our market-leading technical support for Accumulo, as well as actively influence the future of the project.

Learn More About Cloudera Enterprise