Data Partitioning

Blue Flower

Details: Written by C M Raja; Category: Technology; Published: 05 December 2019; Hits: 12967

It is the process where very large tables are divided either vertically (columns) or horizontally(rows) into multiple smaller parts

Why one want to partition their data?
1) want good performance for SQL CRUD operations
2) need an ease maintenance
3) data security

When one want to partition their data?

You can go if your answer to one of the following question is "YES"
1) your data keep growing exponentially and it takes your time often on data house keeping ?
2) do you want to segregate HEAVY data columns in a table (images,pdf,documents) into a low cost platform?
3) do you want to restrict sensitive data of an entity?

Can you show me a vertical partitioning with example in MS SQL server?

Vertical partition used to increase SQL Server performance especially in cases where a query retrieves all columns from a table that
contains a more columns with heady data type (wide text or BLOB)

Let we take a report invoice report table with very wide report description column

CREATE TABLE InvoiceReports
(
InvoiceID int IDENTITY (1,1) NOT NULL,
InvoiceDate datetime,
InvoiceAmoumt float ,
ReportDescription varchar (max)
CONSTRAINT InvoiceReport_PK PRIMARY KEY CLUSTERED (InvoiceID)
)
go

DECLARE @i int
SET @i = 1
 
BEGIN TRAN
WHILE @i<100000 
BEGIN
 INSERT INTO InvoiceReports
 (
 InvoiceDate,
 InvoiceAmount,
 ReportDescription
 )
 VALUES
 (
 getdate(),
 cast(rand() * 100 as decimal(4,2)),
 REPLICATE ('InvReport', 1000)
 )
 SET @i=@i+1
END

COMMIT TRAN
GO

Note:

select cast(rand() * 100 as decimal(4,2)) => select randomly between 1 to 100 with 2 decimal precision

select REPLICATE ('InvReport', 1000)  ==> repeat the word 'InvReport' 1000 times



SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT ir.InvoiceID, ir.InvoiceDate , ir.InvoiceAmount FROM dbo.InvoiceReports ir WHERE ir.InvoiceAmount > 50
SET STATISTICS IO OFF
SET STATISTICS TIME OFF





--vertically split the above table by moving wide length column into new table "InvoiceDesc"
CREATE TABLE InvoiceReports
(
InvoiceID int IDENTITY (1,1) NOT NULL,
InvoiceDate datetime,
InvoiceAmoumt float ,
CONSTRAINT InvoiceReport_PK PRIMARY KEY CLUSTERED (InvoiceID)
)
go

--brought the wide column from "InvoiceReports"
CREATE TABLE InvoiceDesc 
( 
 InvoiceID int FOREIGN KEY REFERENCES InvoiceReports (InvoiceID),
 ReportDescription varchar(max)
 CONSTRAINT PK_ReportDesc PRIMARY KEY CLUSTERED (InvoiceID)
 )
 go

--insert into "InvoiceReports table
DECLARE @i int
SET @i = 1
 
BEGIN TRAN
WHILE @i<100000 
BEGIN
 INSERT INTO InvoiceReports
 (
 InvoiceDate,
 InvoiceAmoumt
 )
 VALUES
 (
 getdate(),
 cast(rand() * 100 as decimal(4,2))
 --REPLICATE ('InvReport', 1000)
 )
 SET @i=@i+1
END

COMMIT TRAN
GO

--insert into "InvoiceDesc" table
DECLARE @i int
SET @i = 1
 
BEGIN TRAN
WHILE @i<100000 
BEGIN
 INSERT INTO InvoiceDesc
 (
 InvoiceID,
 ReportDescription
 )
 VALUES
 (
 @i,
 REPLICATE ('InvReport', 1000)
 )
 SET @i=@i+1
END

COMMIT TRAN
GO

--retrieve and see the performance statistics
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT ir.InvoiceID, ir.InvoiceDate , ir.InvoiceAmoumt FROM dbo.InvoiceReports ir WHERE ir.InvoiceAmoumt > 50
SET STATISTICS IO OFF
SET STATISTICS TIME OFF

Can you show me a horizontal partitioning with example in MS SQL server?

Horizontal partitioning divides a table into multiple tables which contains same number of columns and data type, but fewer rows.

I show below how records can be organized into different tables (filegroups) by merchant ID

All merchant 1 related data goes to "file group1" and similarly all other merchant details


--for all tables
--------------
--1.creating partition function 
create partition function merchant_part_func (int)
as range for values(1,2,3,4,5)

--2.creating partition schema 
    -- Step 1: Before execute below query create filegroup using below step
               -- DB => right click => properties => Filegroups => "AddFileGroup". It will create .ndf file for each filegroup

                        
                         
    -- Step 2: Attach the created file group 
               -- DB=>right click => properties => Files => give logical name & choose the filegroup

                       

    -- Step 3: execute below script
               create partition scheme merchant_part_scheme  as partition merchant_part_func to (fg1,fg2,fg3,fg4,fg5,fg6)


--create table using partition scheme created above
create table merchant 
(merchantid int identity(1,1) not null, 
 merchantname varchar(100)
)
on merchant_part_scheme (merchantid)

--insert data into a table created with partition
declare @i int
set @i=0
while (@i < 150000)
begin
 insert into merchant values(' merchant' + convert(varchar,@i))
 set @i = @i +1
end

--select all records along with bucket no
select $partition.merchant_part_func(merchantid) as 'part_no',* from merchant


--create a table without partition (for performance comparision)
create table merchant1(merchantid int identity(1,1)not null,empname varchar(100))

--insert data into a table without partition
declare @i int
set @i=0
while (@i < 150000)
begin
 insert into merchant1 values(' merchant' + convert(varchar,@i))
 set @i = @i +1
end


--compare the select performance between partition VS without partition

--without partition
select * from merchant where merchantid > 100
--with partition
select * from merchant1 where merchantid > 100

--cleanup
--DROP TABLE TABLE1
--DROP PARTITION SCHEME merchant_part_scheme
--DROP PARTITION FUNCTION merchant_part_func

While update due its high performance row level lock doesn't seen and voids the waiting time of concurrent update

Its best candidate for micro service architecture

Comments

#1 vivoslot 2022-03-07 12:19

Wow! Finally I got a blog from where I know how to in fact get helpful information regarding my study and knowledge.

Refresh comments list

You have no rights to post comments

JComments

Data Partitioning

Comments

Latest Articles

Login Form