Database Internel Architecture: SQLite

Introduction

A database is an essential part of building a software system which used to store and read data efficiently. Here, We are going to discuss some architectural details of database implementation by using an early version of SQLite.

SQLite is a small database application which used in millions of software and devices. SQLite invented by D.Richard Hipp in August 2000. SQLite is a high performance, lightweight relational database. If you are willing to learn internal of a database in coding level, then SQLite is the best open source database available out there with highly readable source code with lots of documentation. Reading later versions of SQLite become a little harder since it contains lots of new features. In order to understand the basic implementation of database internals, You should have good knowledge about data structures, some knowledge about Theory of Computing and how an operating system works.

Here we are looking into the SQLite 2.5.0 version. Here below you can find out a simple implementation of SQLite backend at Github. https://github.com/madushadhanushka/simple-sqlite

Why database?

Keeping data in a flat file is not that so efficient to access and keep data. Database organize the data in proper order such that data reading and writing make much more faster. Data can be structured, semi-structured or unstructured. Databases are mainly to store structured and semi-structured data. Databases can be dived as follows based on the type of data structure which used to implement the software.

Relational database: Commonly used database type which has a table structure. Tables can have relations with other tables. SQL language used to manipulate data on this type of database.
Key-value database: data stored along with a key associated with it. Data can be retrieved back with the given key. In-memory databases are commonly found as this type of database.
Object database: Data structure is more like an object rather than a table.
Graph database: Graph database is a collection of node and edges which mostly used in data mining and social media applications.

Database architecture

SQLite database architecture split into two different sections named as core and backend. Core section contains Interface, Tokenizer, Parser, Code generator and the virtual machine which create execution order for database transactions. Backend contains B-tree, Pager and OS interface to access the file system. Tokenizer, Parser and code generator altogether named as the compiler which generates a set of opcodes that runs on a virtual machine.

Where do I start?

To understand the architecture of a database you need to have following prerequisite.

Good understanding of data structures and algorithm. Especially data structures such as B-tree, Linked list, Hashmaps etc.
Some understanding of computer architecture. How to read-write into a disk, how paging and caching works.
Theoretical computers such as finite automata and some regular expression knowledge.

SQLite Architecture

VFS (Virtual File System)

File access on Unix and Windows are different from each other. VFS provides common API to access files without considering the type of the operating system it runs on. This API includes functions to open, read, write and close a file. Here follows are some of the API used in VFS to read, write data into a file.

/* 
Create a connection to file to read write 
zFilename : file name 
id : file pointer 
pReadonly : read or write 
*/
int sqliteOsOpenReadWrite(const char *zFilename, OsFile *id,int *pReadonly);

/* 
Acqure the lock to read file. Return 0 if success 5 if failed
id : file pointer 
*/
int sqliteOsReadLock(OsFile *id);

/* 
Get the write lock to write into a file. Return 0 if success 5 if failed
id : file pointer
*/
int sqliteOsWriteLock(OsFile *id);

/* 
Move to the given number of offest to read or write into the file
*/
int sqliteOsSeek(OsFile *id, int offset);

/* 
Read the amt bytes from the file with offset pointed by sqliteOsSeek
*/
int sqliteOsRead(OsFile *id, void *pBuf, int amt);

/* 
Write amt bytes from the pBuf buffer into the file
*/
int sqliteOsWrite(OsFile *id, const void *pBuf, int amt);

Pager

Pages are the smallest unit of transaction on the file system. When database needs to read data from a file, it request it as a page. Once the page loaded into the database engine it can store page if it access frequently on its cache. Pages are numbered and start from one. The first page is called the root page. Size of a page is constant.

/*
Open pager with the given file name and nPage maximum cache limit
*/
int sqlitepager_open(Pager **ppPager,const char *zFilename,int nPage,int nEx);

/*
Get page specified by the page number
*/
int sqlitepager_get(Pager *pPager, Pgno pgno, void **ppPage);

/*
Start to write data into a page specified in pData
*/
int sqlitepager_write(void *pData);

/*
Commit page changes into the file
*/
int sqlitepager_commit(Pager*);

/*
Close the connection to the file
*/
int sqlitepager_close(Pager *pPager);

Btree

Btree is a data structure that used to store data as a tree based on its value. The simplest form of the BTree is the Binary tree. Databases use Btree data structure to store indexes to improve the performance of the database. A cursor is a special pointer which used to point a record wich given with page id and offset idx.

/*
Open file connection to a page file name specified by zFileName with nCache size cache
*/
int sqliteBtreeOpen(const char *zFilename, int mode, int nCache, Btree **ppBtree)

/*
Start transaction. This function should called before any btree modification operations
*/
int sqliteBtreeBeginTrans(Btree *pBt)

/* Insert key pKey with nKey byte and value pData with nData byte put into the Btree
*/
int sqliteBtreeInsert(BtCursor *pCur, const void *pKey, int nKey, const void *pData, int nData)

/*
Write data into the file
*/
int sqliteBtreeCommit(Btree *pBt)

/*
Move cursor to the matching pKey with nKey bytes
*/
int sqliteBtreeMoveto(BtCursor *pCur, const void *pKey, int nKey, int *pRes)

VDBE(Virtual Database Engine)

VDBE is a virtual machine that runs a set of operations which was generated by Code Generator. All SQL commands including insert, delete, update, select converted into a set of opcodes and then it runs on this virtual machine. Each opcode contains three input named as p1, p2, and p3. You can think this input as input for a function.

The most interesting thing in SQLite for me is I can see the set of VBDE opcode instruction for a give SQL code by just appending explain keyword at the beginning of SQL query. Here below the sample execution opcodes stack for following SQL select statement. Here, the first column is operation id, Second column is opcode, third, fourth and fifth column is arguments provided for the opcode.

explain select * from foo where value = "value";

0 |ColumnCount   |1 |0 |
1 |ColumnName    |0 |0 |value
2 |Open          |0 |3 |foo     // Open cursor and point it to third page which is root page for foo table(p3 is not important )

3 |VerifyCookie  |46|0 |       // Make sure schema not changed
4 |Rewind        |0 |11|       // Go to first entry
5 |Column        |0 |0 |       // read data and push it on stack
6 |Column        |0 |0 |
7 |Ne            |1 |10|       //Pop the top two elements from the stack.  If they are not equal, then jump to instruction P2.  Otherwise, continue to the next instruction.

8 |Column        |0 |0 |
9 |Callback      |1 |0 |       // Pop P1 values off the stack and form them into an array

10|Next          |0 |5 |       // Move cursor to next record, if data exit goto P2 else go to next line

11|Close         |0 |0 |       // Close the cursor

Compiler

Tokenizer, Parser and Code Generator together known as Compiler which generates sets of opcode which runs on VBDE. Tokenizer generates a set of token by scanning SQL code. Then it validates the syntax and generates the parse tree. Code Generator converts this parse tree into a mini program, written in SQLite opcodes.

Conclusion

SQLite is a simple lightweight, high-performance, relational database which is widely used in software designs. Early version of SQLite was written with simple architecture and highly readable code. Pager provides an abstraction layer to read-write data into the file system as fixed size blocks. While Btree provides a way to store data in the memory efficient way to access data faster. When SQL comes into the SQLite, it converts SQL into the SQLite machine code and runs it on VBDE. The result sends back to the user through the API.

Comments

MaariahSeptember 20, 2019 at 9:23 PM
Having the scored in India discharges that fabricate from captains likewise arranges it in the crisiss of lecturers who drive altogether inquiry American idioms or cultural materials. marketing solutions UAE
ReplyDelete
Replies
Bhanu SreeDecember 10, 2019 at 1:38 AM
This comment has been removed by the author.
ReplyDelete
Replies
Bhanu SreeDecember 10, 2019 at 9:31 PM
Very useful blog!!!!
SQL Azure Online Training
Azure SQL Training
SQL Azure Training
ReplyDelete
Replies
keerthanaJuly 21, 2020 at 3:39 AM
It's interesting that many of the bloggers to helped clarify a few things for me as well as giving.Most of ideas can be nice content.The people to give them a good shake to get your point and across the command.
https://www.acte.in/php-training-in-chennai
https://www.acte.in/machine-learning-training-in-chennai
https://www.acte.in/iot-training-in-chennai
https://www.acte.in/blockchain-training-in-chennai
https://www.acte.in/openstack-training-in-chennai

ReplyDelete
Replies
Data Science October 1, 2020 at 10:12 AM
Excellent blog with very impressive writing and unique content, information shared was very valuable and useful thanks for sharing.
Data Science Course in Hyderabad
ReplyDelete
Replies
Data Science TrainingOctober 19, 2020 at 10:59 PM
With so many books and articles appearing to user in the field of making money online and further confusing the reader on the real way to make money. PMP Training in Hyderabad
ReplyDelete
Replies
rajmohan1140January 19, 2021 at 8:53 AM
After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.

Java Training in Chennai

Java Course in Chennai
ReplyDelete
Replies
Indus Design WorksFebruary 7, 2021 at 9:23 PM
Congratulation to you. I am happy after reading your post that you have posted in this blog. Thanks for this wonderful post and hoping to post more of this.
https://indusdesignworks.com/structural-engineering.php
Outsource structural engineering
ReplyDelete
Replies
Indus Design WorksFebruary 7, 2021 at 9:23 PM
Congratulation to you. I am happy after reading your post that you have posted in this blog. Thanks for this wonderful post and hoping to post more of this.
https://indusdesignworks.com/structural-engineering.php
Outsource structural engineering
ReplyDelete
Replies
Micheal AlexanderFebruary 14, 2021 at 12:40 AM
Congratulation for the great post. Those who come to read your Information will find lots of helpful and informative tips. Covid Antigen Test UK
ReplyDelete
Replies
360digiTMG TrainingFebruary 18, 2021 at 10:43 PM

This is a really very nice post you shared, i like the post, thanks for sharing..
Digital Marketing Training Institutes in Hyderabad
ReplyDelete
Replies
AnonymousApril 25, 2021 at 9:45 PM
Good Information,
Learn Digital Marketing Course with Internship at Digital Brolly with 100% placement assistance.
ReplyDelete
Replies
ManeeshaMay 27, 2021 at 11:41 PM
This is also a primarily fantastic distribute which I really specialized confirming out
data scientist training in hyderabad
ReplyDelete
Replies
UnknownMay 30, 2021 at 11:20 PM
I just got to this amazing site not long ago. I was actually captured with the piece of resources you have got here. Big thumbs up for making such wonderful blog page!
<"https://starpmo.com/pmp-certification-training-in-hyderabad/">Pmp training
ReplyDelete
Replies
Priya RathodJuly 20, 2021 at 11:24 PM
This is a great post I saw thanks to sharing. I really want to hope that you will continue to share great posts in the future.
Data Science Training in Hyderabad
Data Science Course in Hyderabad
ReplyDelete
Replies
traininginstituteOctober 25, 2021 at 11:50 PM
You have done a amazing job with you website
data scientist training in malaysia

ReplyDelete
Replies
Service on StreetNovember 16, 2021 at 11:30 PM
It’s actually a great and useful piece of information. I’m happy that you just shared this useful info with us.
Best Refrigerator Repair Service in Hyderabad
ReplyDelete
Replies
manasaNovember 25, 2021 at 2:58 AM
I'm anticipating additional intriguing points from you. Also, this was great substance and certainly it will be valuable for some individuals…

Machine Learning Training in Hyderabad
ReplyDelete
Replies
Data Analytics Course in DehradunJanuary 10, 2022 at 7:09 PM
Impressive. Your story always bring hope and new energy. Keep up the good work. Data Science Course in Chennai
ReplyDelete
Replies
Ramesh SampangiFebruary 2, 2022 at 4:13 AM
Nice blog and informative content to users. Thanks for sharing this information with us.
Data Science Course in Hyderabad
ReplyDelete
Replies
traininginstituteFebruary 14, 2022 at 9:11 PM
What a really awesome post this is. Truly, one of the best posts I've ever witnessed to see in my whole life. Wow, just keep it up.
data science course

ReplyDelete
Replies
Version ITFebruary 13, 2025 at 12:21 AM
Congratulations! I truly enjoyed reading your post, and I appreciate the valuable insights you’ve shared. Thank you for this fantastic article—I look forward to seeing more great content from you soon!

SAP SD Training in Hyderabad

ReplyDelete
Replies
revathiMarch 10, 2025 at 11:16 PM
Start with an overview of the graphics project design or piece. Mention the designer (if known), the purpose of the design, and the context (e.g., branding, marketing, digital art).
ReplyDelete
Replies
revathiMarch 10, 2025 at 11:18 PM
Does the color grading match the mode and tone of the video
ReplyDelete
Replies
revathiMarch 10, 2025 at 11:19 PM
Digital Marketing continues to evolve rapidly, with AI-driven strategies and data analytics taking the forefront. Brands need to focus on personalized content, omnichannel presence, and leveraging social media algorithms to stay ahead. Investing in SEO and PPC remains crucial, but creativity and customer connection are the real game-changers."
ReplyDelete
Replies
John-Skyappz-AcademyMay 7, 2025 at 6:42 AM
I have read your blog; it is very attractive and impressive. I like it. Want to learn more about Latest AI technology course in Coimbatore, please visit
ReplyDelete
Replies
SkyappzJuly 16, 2025 at 9:36 PM
I was looking for something like this for a while. Great information driven with in detailed. Data visualization and reporting services in Dubai
ReplyDelete
Replies