SAP HANA - An Introduction for the beginners
SAP HANA: High-Performance Analytic Appliance (HANA) is an In-Memory Database from SAP to store data and analyze large volumes of non aggregated transactional data in Real-time with unprecedented performance ideal for decision support & predictive analysis.
The In-Memory Computing Engine is a next generation innovation that uses cache-conscious data-structures and algorithms leveraging hardware innovation as well as SAP software technology innovations. It is ideal for Real-time OLTP and OLAP in one appliance i.e. E-2-E solution from Transactional to high performance Analytics. SAP HANA can also be used as a secondary database to accelerate analytics on existing applications.
Hardware Innovations -Leading to SAP HANA
In real world we have so many variety of data sources, e.g. Unstructured Data, Operational Data Stores, Data Marts, Data Warehouses, Online Analytical Stores, etc. To do analytics or information mining from this Big Data at real time we come across the hurdles like Latency, High Cost and Complexity.
Disk I/O was the Performance bottleneck in the past, whereas in memory computing was always much faster than that. Earlier, however, the cost of in-memory computing was prohibitive for any large scale implementation. Now with Multi-Core CPU and high capacity of RAM, we can host the entire database in memory. So now CPU is waiting for data to be loaded from main memory into CPU cache - and that's what is the Performance bottleneck today.
This is a total paradigm shift; Tape is Dead, Disk is Tape, Main Memory is Disk & CPU Cache is Main Memory. HANA is optimized to exploit the parallel processing capabilities of modern multi-core/CPU architectures. With this architecture, SAP applications can benefit from current hardware technologies.
Memory Overview - Where we stand
Let us have a quick look on Multi-Core CPU Caches, Main Memory i.e. RAM & traditional Hard Disk with respect to response time.
- L1 cache - Primary & within core. SRAM - Fastest. L1 cache | ~ 1ns | 64k
- L2 cache – Intermediate & within core. DRAM - Slower. L2 cache | ~ 5ns | 256k
- L3 Cache – Shared across all cores. DRAM - Slowest. L3 cache | ~ 20ns | 8M
- Main Memory | ~ 100ns | TBs
- Hard Disk | > 1.000.000ns | TBs
SAP HANA Hardware Requirement
SAP HANA can be installed on many certified SAP hardware partners: Hewlett Packard, IBM, Fujitsu Computers, CISCO systems, DELL.
Currently SUSE Linux Enterprise Server x86-64 (SLES) 11 SP1 is the Operating System supported by SAP HANA.
A typical example of CPU and RAM can be 4 Intel E7-4870 / 40 cores and 512 GB RAM. SAP recommends a dedicated server network communication of 10 GBit/s between the SAP HANA landscape and the source system for efficient data replication.
SAP HANA Database Features
Important database features of HANA include OLTP & OLAP capabilities, Extreme Performance, In-Memory , Massively Parallel Processing, Hybrid Database, Column Store, Row Store, Complex Event Processing, Calculation Engine, Compression, Virtual Views, Partitioning and No aggregates. HANA In-Memory Architecture includes the In-Memory Computing Engine and In-Memory Computing Studio for modeling and administration. All the properties need a detailed explanation followed by the SAP HANA Architecture.
Basic Concepts behind SAP HANA Database
Extreme Hardware Innovations:
Main memory is no-longer a limited resource, modern servers can have 2TB of system memory and this allows complete databases to be held in RAM. Currently processors have up to 64 cores, and 128 cores will soon be available. With the increasing number of cores, CPUs are able to process increased data per time interval. This shifts the performance bottleneck from disk I/O to the data transfer between main memory and CPU cache.
SAP HANA fully leverages the hardware innovations like Multi-Core CPU, High capacity RAM availability. The basic concept is to cache the entire database into fast accessible Main Memory close to CPU for faster execution and to avoid disk I/O. Disk storage is still required for permanent persistency since Main Memory is volatile. SAP HANA, holds the bulk of its data in memory for maximum performance, but still uses persistent storage to provide a fallback in case of failure. Data and log are automatically saved to disk at regular save points, the log is also saved to disk after each COMMIT of a database transaction. Disk write operations happens asynchronously and as a background task. Generally on system start-up HANA loads the tables into memory.
Massively Parallel Processing:
With availability of Multi-Core CPUs, higher CPU execution speeds can be achieved. Multiple CPUs call for new parallel algorithms to be used in databases in order to fully utilize the computing resources available. HANA Column-based storage makes it easy to execute operations in parallel using multiple processor cores. In a column store data is already vertically partitioned. This means that operations on different columns can easily be processed in parallel. If multiple columns need to be searched or aggregated, each of these operations can be assigned to a different processor core. In addition operations on one column can be parallelized by partitioning the column into multiple sections that can be processed by different processor cores. With the SAP HANA database, queries can be executed rapidly and in parallel.
Hybrid Data Store:
Common databases store tabular data row-wise, i.e. all data for a record are stored adjacent to each other in memory. Row store tables are linked list of memory pages. Conceptually, a database table is a two-dimensional data structure with cells organized in rows and columns. Computer memory however is organized as a linear structure. To store a table in linear memory, two options exist:
- A row-oriented storage stores a table as a sequence of records, each of which contain the fields of one row.
- A column-oriented storage stores all the values of a column in contiguous memory locations.
Use of column store will help to prevent table scan of unnecessary columns while performing searching and aggregation operations on single column values stored in contiguous memory locations. Such an operation has high spatial locality and can efficiently be executed in the CPU cache. With row-oriented storage, the same operation would be much slower because data of the same column is distributed across memory and the CPU is slowed down by cache misses. Column store is optimized for high performance of read operation and efficient data compression. This combination of both classical and innovative technologies of data storage and access allows the developer to choose the best technology for their application and, where necessary, use both in parallel.
OLTP and OLAP Database:
SAP HANA is a hybrid database, having both read optimized column store ideally suited for OLAP and write optimized row store best for OLTP systems relational engines. Both the stores are In-Memory. Using column stores in OLTP applications requires a balanced approach to insertion and indexing of column data to minimize cache misses. The SAP HANA database allows the developer to specify whether a table is to be stored column-wise or row-wise. It is also possible to alter an existing table from columnar to row-based and vice versa.
Higher Data Compression:
The goal of keeping all relevant data in main memory can be achieved with less cost if data compression is used. Columnar data storage allows highly efficient compression. If a column is sorted, there will normally be several contiguous values placed adjacent to each other in memory. In this case compression methods, such as run-length encoding, cluster coding or dictionary coding can be used. In column stores a compression factor of 10 can typically be achieved compared to traditional row-oriented storage systems.