A Brief Introduce of Database Index(索引简介)

这是我写的一篇研究生申请的writing sample,关于数据库索引的,对关系型数据库的索引从数据结构到用途再到作用对象简单分析了一下,因为我发现在实际工作中,index是个好东西,但是很多DBA并不能找到合适的index使用,这样会使查询效率提高得不大,甚至影响查询效率。所以这篇文章希望帮助大家理解一下index。我没有在算法上展开太多,因为很多index算法相似,如果展开容易喧宾夺主。等过段时间我会加入一些实验数据和代码进这篇文章,最近比较懒不想安装数据库.DB2安装实在太烦了。

ABSTRACT

Index is an extremely important item for a database system. The purpose of this article is to explore the concept and function of the index, how the index is able to improve the speed of retrieve data significantly, and help database administrator distinct and choose the correct index in real life.

1. 1. How DBMS stores data on disk?

The relational database store data in the disk by pages, which is the minimum unit of storage. The page only stores three things: the data in the tables, indexes, and execution plans. No matter when the database receiving a query request, RDBMS must load the page to memory first. Once the page complete working, it won’t discard immediately. DBMS will move the page to buffer pool, which is cache table and index data from disk for next time.
As we know, data is stored on disk-based storage devices, and it is stored as blocks of data. The structure of disk block as much as linked list, both of them contain a data section and a pointer (or link) for the next block. The fact is multiple records can only be stored one field, if we use the linear search to retrieve a record will require N/2 block accesses on average. If the field doesn’t contain unique entries, then the entire time cost is Nm which means you have to search the entire disk [1].

2. What is index?

As we mentioned early, DBMS stores index in the page. Index is a data structure which could significantly improve the speed of data retrieval operations. Why the index improve the retrieve speed effectively? We have to talk about the index algorithm and data structure first [2].
Different DBMS provide different types of index, but majority database index designs exhibit logarithmic (O (log (N))) retrieval performance. Generally speaking, there are some different indexes based on the different data structures. Indexes could be implemented by different kinds of data structure such as balance trees, B+ trees hash table, R+ trees.

3. Index architecture

We can distinguish the index architecture by the rows’ order
1) Non-clustered index
In a non-clustered index, the physical order of the rows is not the same as the index order, which means the data located in the disk has the arbitrary order, but logical ordering is specified by the index order.
2) Clustered index
Clustered index will change the data block into a specific index order to match the index. As the result, the row data will be stored in the index order. Therefore, we can create only one clustered index in a specific database table.

4. Index Implementations

1) Bitmap Index

Bitmap index is designed by a B-tree data structure, a self-balancing binary search tree, to retrieve data from database. So many facts indicate bitmap index working perfect for low-cardinality columns, which include a meaningful quantity of distinct values, no matter absolutely or relative to the number of records which contain the data. [3] Bitmap index uses bit array, which segment only has two value, 0 and 1 (False or True). Due to the values of column are distinct, each value could be presented by 1 or 0, standing for the value included or not. Therefore, bit map index has a significant space and performance over other index structures for retrieving such data [4].

2) Dense index and sparse index

Dense index and sparse index are very similar. Dense index contains the search key value and pointer for each record in the data file, thus dense index record could be very large. While sparse index contains the search key value and pointer only for each data block in the data file, due to the data is sorted or ordered, so sparse index only need to point to the head of each data block. Relatively speaking, the sparse index record is smaller than dense index record [5].

3) Reverse index

Reverse index uses B-tree structure, which reverses the key value before inserting in the index. Reverse index is very effective for indexing sequence numbers data, because each key value is highly greater than the prior value. For instance, 29, 30, 31, all these three keys will be in the same block possible. If we use reverse index to query them as 92, 03, 13, they may be distributed at the different blocks. Due to the B+ tree structure, so we can query them faster than the sequence keys [6].
In addition, there are some other index methods such as R+ tree, which is used by Google map, to retrieve the new type data as times go by. Finally, it is good to research deeply about index to improve database performance.

Works Cited:

[1] Garcia-Molina, Hector, Jeffrey D. Ullman, and Jennifer Widom. Database Systems: The Complete Book. Upper Saddle River, NJ: Prentice Hall, 2002. Print.

[2] Fritchey, Grant. “Statistics, Data Distribution, and Cardinality.” SQL Server Query Performance Tuning (2014): 193-235. Web.

[3] “Bitmap Index vs. B-tree Index: Which and When?” Bitmap Index vs. B-tree Index: Which and When? N.p., n.d. Web. 26 Nov. 2016.

[4] Fujioka, Kengo, Yukio Uematsu, and Makoto Onizuka. “Application of Bitmap Index to Information Retrieval.” Proceeding of the 17th International Conference on World Wide Web – WWW ’08 (2008): n. pag. Web.

[5] “Dense and Sparse Indices.” Dense and Sparse Indices. http://www.cs.sfu.ca/CourseCe… Web. 30 Nov. 2016.

[6] “Introduction To Reverse Key Indexes: Part I.” Richard Foote’s Oracle Blog. N.p., 2014. Web. 26 Nov. 2016.

    原文作者:dykin
    原文地址: https://segmentfault.com/a/1190000007656798
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞