Projects

CheetahDB (2018-2020)

CheetahDB is a database management system which initially designs for using GPUs as the main computing unit. The intensive operations, such as join, group by, and searching operation, are accelerated by GPUs, which result in magnitude outperform traditional database management system. The system supports the standard SQL language and returns tuples result. The system uses the client/server model by TCP/IP and JSON format for communication.

Publication: GTC2020

Parallel Index Search Operation for DBMS via GPUs (2018-2020)

Concurrently handle a large number of queries is a crucial characteristic of today's in-memory database system. By supporting such characteristics, index structure becomes a vital role in a database system. In recent years, Graphics Processing Units (GPUs) have become the leading hardware for parallel computing. The unique architecture of high-performance computation, however, provides abundant opportunities for optimizing the algorithm towards better performance and achieving high utilization of GPU resources. This study presents our recent work in designing and optimizing parallel algorithms for index-search on GPUs. We introduce a dynamic index structure on GPUs and techniques to optimize the searching operation. Since the dynamic data structure has a high overhead on GPUs, we introduce our memory allocator, which only costs tiny overhead. In terms of searching operation, we present techniques to optimize the search operation on both exact and range searches by using a novel grouping technique that can maximize the utilization of an on-chip GPU cache system.

Publication: GTC2020

Support Measures for Counting Frequent Patterns in single-graph databases (2017-2020)

Frequent pattern mining has been an attractive subject in data mining research for decades. Furthermore, as today, rapid growing network data is creating a great potential for knowledges discovery from graph data, especially single-graph databases as efficient structures representing these complex relationships are becoming more and more valuable and significant. There are two problems in frequent pattern mining in a single-graph setting, support measure, and search scheme. In this project, we address the challenge of implement a high-performance polynomial and non-polynomial calculation of new support measures by mapping to the optimization problem. The outcome of the project is a software that can calculate a support measure in graph frequent pattern analysis with the linear time but still preserve the number of counting close to the actual value.

Publication: SSDBM2019, DMKD2020

Algorithms for Computing 2-body Statistics on GPUs (2015-2018)

Various types of two-body statistics (2-BS) are regarded as essential components of low-level data analysis in scientific database systems. In relational algebraic terms, a 2-BS is essentially a Cartesian product between two datasets (or two instances of the same dataset) followed by a user-defined aggregate. The quadratic complexity of these computations hinders timely processing of data. Use of modern parallel hardware has thus become an obvious solution to meet such challenges. This paper presents our recent work on designing and optimizing parallel algorithms for 2-BS computation on Graphics Processing Units (GPUs). Although a typical 2-BS problem can be summarized into a straightforward parallel computing pattern, traditional knowledge from (general) parallel computing often falls short in delivering the best possible performance. Therefore, we present a suite of techniques to decompose 2-BS problems and methods for effective use of computing resources on GPUs. We also develop analytical models that guide us towards finding the best parameters of our GPU programs. As a result, we achieve the design of highly-optimized 2-BS algorithms that significantly outperform the best known GPU and CPU implementations. Although 2-BS problems share the same core computations, each 2-BS problem however carries its own characteristics that calls for different strategies in code optimization. For that, we develop a software framework that automatically generates high-performance GPU code based on a few parameters and short primer code input. We further present two case studies to demonstrate that code generated by this framework reaches a very high level of efficiency.

Publications: ICPP2016, GTC2017, DAPB2018, GTC2019

Study the access pattern of tuples/pages in a database system (2014)

The database management system (DBMS) manages a massive amount of data, and to process the data, which significantly larger than the main memory, the system required to separate the data into pages unit. There are only a small number of pages load from disk into the memory at a time due to the space limit. The buffer manager is responsible primarily for managing this operation. In this project, we modified PostgreSql server to track the frequency of each page and tuple that is accessed by the DBMS. Monitoring the rate of each page and tuple accessing helps to manage the buffer more efficiently. For example, if we can place the data items of similar popularities together, the system can reduce the number of access to the disk.

Parallel Keyword Searching utilizing a GPUs for Network Forensics (2012)

Nowadays, there is an increasing number of cybercrime on the Internet. Governments, businesses, and individuals are in great risk. In order to trace back to criminals, lawful interception may be required. However, it is a complex and computation-intensive task. The advent of Graphics Processing Units (GPUs) can offer highly parallel computing power with great performance at a low cost to such a task. This paper; therefore, focuses on using a GPU for keyword searching in network traffic. The KMP skip searching algorithm was selected as the experimental string searching algorithm. Various parameters of the GPU were also investigated to obtain the optimal value for this application.

Publication: ICSEC2012