If you have any basic idea on any database, we all know that for most of operations , data is being accessed from Disks since our Databases are provided with limited RAM and limited cores.
If requested information is not available on RAM / CPU , the request will reach Disks to fetch the data. So if Information is already available in RAM / CPU buffers … the DB can immediately respond to the request other way if data need to access from disks … always disk I/O’s come into picture which takes its time. Request Processing time will be less if the Data is
In Processor buffers < In RAM ( In-memory ) < on Disks
In addition to this basic idea, the technology evolution in Multi core Processors and Bearable cost on RAM made HANA invention as database.
HANA is an In-memory database where its entire data can reside in In-Memory ( RAM ) and process all the client requests. If we have sufficient In-memory to hold entire Database data, we are avoiding data requests on to Disks, So performance is Improved —- Convinced !!!
Additionally, Here are other Improvements in Software & Hardware
Software Improvements :
- Row & Column Store
- Data Compression
- Data Partitioning
- No aggregate tables required
- Row to Column & Column to Row conversion flexibility
- Push down the App logic to DB
Hardware Improvements :
- Multi-core architecture ( 8 CPU * 10 Cores per blade)
- Massive Parallel Scaling with many blades
- 64 Bit address space – 2TB in-memory available on current servers
- Decline in Price and Improvement in Performance of Hardware
- Scale up / Scale out configuration support for increasing demands
How column store help?
As we all know that the table in any DB is two dimensional and the memory is one dimensional which means
- In row store, all the row elements are stored in consecutive memory locations
- In column store , all the column elements are stored in consecutive memory locations.
Ex: Customer Table
| Cust | Name | Location |
| 1 | A | E |
| 2 | B | F |
| 3 | C | G |
| 4 | D | H |
In memory when loaded,
For Row store,
| 1 | A | E | 2 | B | F | 3 | C | G | 4 | D | H |
For Column store,
| 1 | 2 | 3 | 4 | A | B | C | D | E | F | G | H |
Now imagine if any read SQL is initiated,
Select name from Cust_table;
For this, in case of Row store, the pointer needed to identify the location Primary key and then the requested field ( Assume the primary key is Cust ID which is unique ).
In case of column store, the pointer identify the location of Name and read the consecutive memory locations and get the data for operation.
So , read operation is quicker with Column store table.
Please note the row store / column store are referring to the way the table data loaded into the Memory.
How Compression help?
Compression referring the data management in In-memory. Since the HANA license is based on the In memory usage, if the utilization is minimum , the license cost will be minimum. Since HANA maintain the unique value identifier tables and replace the actual data with identifier in In memory area.
If the duplicate values are more in the columns, the compression will be better and the in memory requirement will be minimum.
How no aggregates will help?
Usually any customer is not very much interested on what data is available in the Table / tables alone. For analysis purpose, they request developer to manage so many operations ( Joins , Sums , calculations ..etc ). Once the data is loaded these operations will take certain time ( let us say the BG Job internal table processing is taking 12 hours ) and by that time the actual data may have additional updates. So while end user ( who is usually a management ) may not be seeing the original data and the decision which he/ she is going to take based on the report could be a wrong one … Chances !!.
If these data processing is happening on real time ( let us say within 5mins ) , then the chances are that the management is having full insight into the business upto date and taking decisions.
The effort for preparation of aggregates and the system time is saved with HANA and provided the report which is more accurate until last minutes.
How Data partitioning help ?
Let us say , our database is having 50 years of data but actual insight is needed for only for last years. so instead of maintaining the Big HANA database, we can partition the data and keep the data in different Database machines ( of course with same SID ).
Now the users who need the insight on a particular data, process the data from that DB machine and get the insight. This helps for parallel machine processing.
This is just an example.
Why the Inserts are slow in Column Store ?
Table 1
| A | 2 |
| B | 3 |
Now want to include another row of the table : Table 1.1
| A | 2 |
| B | 3 |
| C | 4 |
In case of row store in Memory
| A | 2 | B | 3 |
| A | 2 | B | 3 | C | 4 |
In the memory the data structure which is being saving the data need to update the pointer only at 1 place and there is no need to re-org the 1 dimensional memory structure.
But in case of Column store
| A | B | 2 | 3 |
| A | B | C | 2 | 3 | 4 |
The writer need to switch two memory locations and update the pointers ( Entire consecutive data structure is re-organized ). If the no of columns are more the restructuring the 1 dimensional data structure is complicated and time consuming. But in row store just need to add one pointer and add consecutive memory locations for entire Row.
Hope it is clear that the write operations are slow in column store and read operations are fast. This is the reason for OLAP column store was preferred and OLTP row store is preferred.
To overcome the write operations bottlenecks in column store , SAP introduced Delta Store and delta merge concept which is discussed in further chapters.
Please note that the data partitioning is supported for only Column tables.