Wednesday, 4 September 2013

SQL PowerShell - Add Windows User as SysADM

SQL PowerShell - Add Windows User as SysADM

OK, So I guess its finally time to start Teaching myself Powershell commands to enable me to administer SQL Server on Windows Core.

I will share my findings along the way.


The first thing I need to know how to do its add  a Windows Domain User as a SQL Server SysAdm.


So here goes.


I borrowed the following code off the Web somewhere, but could not get the AddToRole statement to work, originally it was placed after the Create() Statement, I found that by moving the Create() to the End of the Statement, everything worked as it should.



[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SMO') | out-null
$SqlServer = New-Object ('Microsoft.SqlServer.Management.Smo.Server') 'SERVERNAME\INSTANCENAME'
$SqlUser = New-Object -TypeName Microsoft.SqlServer.Management.Smo.Login -ArgumentList $SqlServer, 'DOMAIN\USERNAME'
$SqlUser.LoginType = 'WindowsUser'
$SqlUser.AddToRole('sysadmin')
$SqlUser.Create()


OK, so two days in and I'm starting to get the hang of this now, I have realised that what I have done above (although it does work) is basically wrong.


We have to do this in stages : -

#LOAD SQL Server Objects
Load the SQL Server SMO Libraries to Powershell
#CONSTRUCTORS
Define the type of Object we want to work with, Database, Login etc
#PROPERTIES
Set the Values for the Properties, Login type for example
#CREATE
Run the Create() Method to Create the Object
#METHODS
Now the object has been created, we can run Methods to amend the Objects, such as AddRole.
#EVENTS
Check Event such as HasDataChanged

So the Add Windows User as SQL Sysadm, should look like this :-



#LOAD SQL Server Objects
[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SMO') | out-null
#CONSTRUCTORS
$Server = New-Object ('Microsoft.SqlServer.Management.Smo.Server') 'ServerName\InstanceName,Port'
$Login = New-Object Microsoft.SqlServer.Management.Smo.Login($Server, "Domain\User")
# PROPERTIES
$Login.LoginType = 'WindowsUser'
#CREATE
$Login.Create()
#METHODS
$Login.AddToRole('sysadmin')
#EVENTS


If you refer to the Microsoft Documentation on Microsoft.SqlServer.Management.Smo :-


We can use the above as the basis to do pretty much anything we desire in SQL Server with PowerShell.

So I don't think I am there yet, there will be plenty more to learn on the way, however, I think that I have passed a major Milestone in understanding just how these SQL PowerShell commands are constructed.

Monday, 29 April 2013

Disk Partition Alignment For SQL Server 2008


Overview

Disk partition alignment is a powerful tool for improving SQL Server performance, a Best Practice that is essential, yet often overlooked.

Windows Server 2008 attempts to align new partitions out-of-the-box, yet disk partition alignment remains a relevant technology for partitions created on prior versions of Windows or on volumes that have not been subject to a Windows 2008 Format such as volumes in a SAN environment.

This document is an extract of the Microsoft White paper

It documents performance for aligned and nonaligned storage and why nonaligned partitions can negatively impact I/O performance; it explains disk partition alignment for storage configured on Windows Server 2003, including analysis, diagnosis, and remediation; and it describes how Windows Server 2008 attempts to remedy challenges related to partition alignment for new partitions yet does not correct the configuration of pre existing partitions.

I have included the most relevant portions in to this document to try an make it a concise guide, however please refer to the Microsoft Document should you require any further background or explanation as to why Disk Partition Alignment is required.

Disk Partition Alignment.

Introduction

Noncompliance with storage configuration Best Practices for the Microsoft SQL Server is a common root cause of support calls to Microsoft for poorly performing Database Systems.

The reason is often shown to be misalignment between Windows, Storage, Disk Controllers, and Cache segment lines.

Failure to perform partition alignment may result in significant performance degradation.

Disk partition alignment is a requirement for partitions from which high performance is demanded and that were not created on RAID disk devices on Windows Server 2008.

Unless performed at the time of partition creation, the default alignment offset will result in unaligned partitions.

Earlier versions of Windows and some Manufacturers’ default SAN Settings create disk partitions by default to boundaries based on the Cylinder/Head/Sector (CHS) addressing scheme used by previous generation disk controllers.

Pre existing partitions attached to Windows Server 2008 maintain the original, flawed alignment under which they were created.
  • MBR basic
  • MBR dynamic
  • GPT basic
  • GPT dynamic 

 Scope

The information contained in this document applies to Windows basic and dynamic disks with master boot record (MBR) partitions.

However, disk partition alignment is a Best Practice, and it is required for optimal performance for each of these hard drive configurations:

Description 

The following is a simplified characterization of partition misalignment:

Disk array hardware reports 63 reserved (hidden) sectors, and Windows obediently implements this information, reserving these sectors at the beginning of the first partition of a disk.

The master boot record (MBR) resides within these hidden sectors.

The compliance by Windows with 63 hidden sectors reported by disk hardware results in misalignment with stripe units, disk controllers, and cache segment lines.

In all versions of Windows earlier than and including Windows Server 2003, these reserved sectors do not coincide with fundamental physical boundaries.

The result is that single clusters of user data are written across multiple stripe units of the RAID.

Every nth operation is affected (n depends on file allocation unit (cluster) size and stripe unit size).

Fundamental physical boundaries in disk controllers and other hardware are also violated.

Across a striped array, a single I/O coming into the array controller turns into multiple I/Os if the request crosses one or more stripe unit boundaries.

The cumulative effect can contribute to substantial performance degradation. 

In all cases, similar principals are at work:

Due to misalignment, clusters of data are written across physical boundaries, requiring unintended, unnecessary I/Os that result in performance degradation.

Partition Alignment in Windows Operating Systems 

The way partition alignment works depends on the version of Windows being used and the version in which the partition alignment was created.

Windows Server 2008 - New Partitions.

 In Windows Server 2008, partition alignment is usually performed by default.

The default for disks larger than 4 GB is 1 MB.

The setting is configurable and is found in the registry at the following location:
HKLM\SYSTEM\CurrentControlSet\Services\VDS\Alignment

However, if OEM setups are delivered (for example, with recovery partitions), even fresh installations of Windows Server 2008 having partitions with undesirable partition starting offsets are possible.

Whatever the operating system, confirm that new partitions are properly aligned.


Windows Server 2008 – Pre existing Partitions. 

New partitions on Windows Server 2008 are likely to be aligned.

Yet partitions created on earlier versions of Windows and become associated with Windows Server 2008 maintain the properties under which they were created.

That is, in the absence of partition alignment being explicitly performed, these partitions are not aligned.

Windows Server 2003. 

Partitions created on Windows Server 2003 by default are not aligned.

Partition alignment must be explicitly performed.




System Drives.

System drives in versions of Windows prior to Windows Server 2008 cannot be aligned.

Fortunately, workloads associated with system partitions of dedicated SQL Server computers are typically not as sensitive to partition misalignment as disks dedicated to I/O intensive uses, for example, SQL Server database files from which high-performance is demanded.

System drives on fresh installations of Windows Server 2008 should be aligned by default.

Virtual Drives.

 Virtual drives and the host drives on which they reside must be aligned for optimal performance.  



Starting Partition Offsets. 

The performance benefit of disk partition alignment requires configuration of valid starting partition offsets.

Several tools report the starting partition offset.

The results are reliable only in specific contexts.

Valid starting partition offsets and tools used to report them are discussed in this section.

Valid Starting Partition Offsets. 

Because versions of Windows earlier than and including Windows Server 2003 comply with the 63 hidden sectors reported by disk hardware, and because the most common sector size 512-byte sectors, the default (and suboptimal) starting partition offset is 32,256 bytes, exactly 31.5 KB.

Explicitly defining the starting offset from 31.5 KB to exactly 32 KB might seem like a legitimate approach.

In fact, as mentioned earlier, 64 KB is typically the minimum (and a common) valid starting partition offset for SQL Server because of the correlations described later.

When choosing a valid partition starting offset, refer first to the storage vendor best practices.

Make certain their recommendations correlate with the stripe unit size and file allocation unit size configured for SQL Server.

In the absence of definitive vendor information, choose the Windows Server 2008 default.
Windows Server 2008 partition alignment defaults to 1024 KB (that is, 1,048,576 bytes). 


This value provides a durable solution.

It correlates well (as described later) with common stripe unit sizes such as 64 KB, 128 KB, and 256 KB as well as the less frequently used values of 512 KB and 1024 KB.

Also, the value virtually guarantees hidden structures allocated by storage.


Dynamic Disk Partition Offsets: dmddiag.exe –v.

The command-line utility Dmdiag.exe is used to determine the partition offsets of Windows dynamic volumes.

Important: Neither the output of the wmic command nor any other tool designed only for basic disks, reliably reports starting partition offsets of Windows dynamic disks.

The tool is available in the support tools folder of Windows Server 2003.

In Windows Server 2008, the tool has been renamed diskdiag.exe.

To determine the starting partition offset of dynamic disks, execute the following command.

dmdiag -v
The output has several sections; the sections that are relevant to analyzing the starting partition offset of existing volumes are generated only if the -v switch is used.

Those sections (and relevant columns) are:
Dynamic Disk Information (Rel Sec)
LDM Volume Information (Rel Sectors)

Focus on these sections and columns.

Subsequent sections of the output that report offsets can be misleading, and they are unlikely to be reliable for interpreting starting partition offsets of dynamic volumes.

Note that Microsoft tools, including even dmdiag, may be unreliable for reporting starting partition offsets of dynamic volumes created by third-party vendors.

For example, dmdiag does not report correct starting partition offsets of dynamic volumes created by Veritas Enterprise Administrator.

In these cases, consult your vendor for the tools and techniques required for proper analysis.

 Command-Line Partition Alignment Tools. 

Disk partition alignment is not available from the Disk Management snap-in (diskmgmt.msc).
Windows provides two tools to implement disk partition alignment: diskpart.exe and diskpar.exe.

Note the presence or absence of a “t” in their names.

Both utilities are powerful and should be exercised with caution.

Diskpar.exe reliably reports partition alignment in terms of bytes.

However, results are valid only for MBR basic disks, and this tool is no longer supported by Microsoft.

Diskpart.exe reports alignment for basic disks in terms of kilobytes.

As noted, the Windows Server 2003 (and earlier) default alignment is 32,256 bytes, exactly 31.5 KB; unfortunately DiskPart rounds this up to 32 KB.

Though DiskPart is the tool of choice to implement partition alignment, the value it reports for partition offset is not sufficiently granular.

Therefore, use the wmic command to report partition offsets of basic disks; use dmdiag –v for Windows dynamic disks.

Implementation 

This section provides information about using specific measurements and tools to implement the recommendations presented earlier in this white paper.

It also addresses issues to consider regarding vendors.

Essential Correlations: Partition Offset, File Allocation Unit Size, and Stripe Unit Size

Use the information in this section to confirm the integrity of disk partition alignment configuration for existing partitions and new implementations.

There are two correlations which when satisfied are a fundamental precondition for optimal disk I/O performance.

The results of the following calculations must result in an integer value:

Partition_Offset ÷ Stripe_Unit_Size

Stripe_Unit_Size ÷ File_Allocation_Unit_Size

Of the two, the first is by far the most important for optimal performance.

The following demonstrates a common misalignment scenario:

Given a starting partition offset for 32,256 bytes (31.5 KB) and stripe unit size of 65,536 bytes (64 KB), the result is 0.4921875.

This is not an integer; therefore the offset & strip unit size are not correlated.

This is consistent with misalignment.

However, a starting partition offset of 1,048,576 bytes (1 MB) and a stripe unit size of 65,536 bytes produces a result of exactly 8, an exact integer, which is consistent with alignment.

Note that file allocation unit (cluster) size commonly correlates with common stripe unit sizes.

The performance question here is usually not one of correlation per the formula, but whether the cluster size is the NTFS default of 4,096 bytes or has been explicitly defined at 64 KB, which is a best practice for SQL Server.

 Stripe Unit Size. 

Windows does not have a reliable way to determine stripe unit sizes.

These values are obtained from vendor disk management software or from your SAN administrator.

File Allocation Unit Size. 

Run this command for each drive to see the file allocation unit size reported in bytes per cluster.

fsutil fsinfo ntfsinfo c:
fsutil fsinfo ntfsinfo e:
etc...

An appropriate value for most installations should be 65,536 bytes (that is, 64 KB) for partitions on which SQL Server data or log files reside.

 Vendor Considerations. 

Consult the storage vendor for their recommendation.

In the absence of definitive information, consider implementing disk partition offset with the Windows Server 2008 default of 1024 KB. 




Conclusion. 

Many factors contribute to optimal disk I/O performance.

For disk partitions created with Windows Server 2003, partition alignment properly correlated with stripe unit size and file allocation unit size is a best practice, and it provides a fundamental foundation for optimal performance.

Windows Server 2008 aligns partitions by default.

When servers are upgraded to Windows Server 2008, preexisting partitions are not automatically aligned and must be rebuilt for optimal performance.

Thus, until existing misaligned partitions created using Windows Server 2003 are rebuilt properly, disk partition alignment will remain a relevant technology.

The results of the following calculations must result in an integer value:

Partition_Offset ÷ Stripe_Unit_Size

Stripe_Unit_Size ÷ File_Allocation_Unit_Size

Of the two, the first is by far the most important for optimal performance.

A starting partition offset of 1,048,576 bytes (1 MB) and a stripe unit size of 65,536 bytes produces a result of exactly 8, an exact integer, which is consistent with alignment.

Note that file allocation unit (cluster) size commonly correlates with common stripe unit sizes.

The performance question here is usually not one of correlation per the formula, but whether the cluster size is the NTFS default of 4,096 bytes or has been explicitly defined at 64 KB, which is a best practice for SQL Server 2008.
























Installing SCCM with a Clustered SQL Server


Overview.

Traditionally there is has never been a High Availability solution for SMS, the only option available was to Backup the Databases.

Thankfully System Center Configuration Manager 2007 (SCCM 2007) now has the ability to support a Clustered SQL Server Instance.

Unfortunately, the documentation for how to install SCCM 2007 with clustered SQL Server is poor and can lead to problems with the installation and application.

Requirements pre SCCM Installation. 

Ensure Kerberos is enabled on SQL Cluster.

Ensure Kerberos is enabled on each of the SQL Cluster Nodes.

Although Kerberos must be enabled on the cluster for normal SCCM operation, the SCCM pre-requisite checker apparently does not check for this.

As a result it allows the install to go ahead and ends up installing directly to the individual nodes instead of to the SQL cluster.

This is because SCCM can not see the cluster since Kerberos is not enabled.

When using a clustered SQL Server instance to host the site database, TCP/IP network protocol communication is required for each SQL Server cluster node network connection to support Kerberos authentication.

The named pipes communication protocol is not required, but can be used to troubleshoot Kerberos authentication issues.

Create Dummy Files on Each Disk.

Create a dummy file NO_SMS_ON_DRIVE.sms on all disks at the cluster, execpt %systemdriv%.

This restricts the SCCM installer to install components on the SQL Cluster Instance %systemdrive% only!

Ensure SCCM Machine Has Rights in SQL.

Ensure primary site server has admin rights on each of the SQL Cluster Nodes.

If the local admins group has been removed from the sysadmin SQL Server role (SQL Server best practice), then we need to add the site server computer account to that group as well as the local admins group.

Site component manager needs admin rights on each node to install the SCCM backup and recovery components on the SQL Server nodes.

Note. This must also be performed for every new node added to the SQL cluster in the future.

Register SQL Server Cluster Virtual Name in Active Directory.

By default, Windows does not register SQL Server Cluster Virtual Name in Active Directory.

To facilitate a successful installation of SCCM 2007 on Clustered SQL Server, it must be registered.

Run setspn.exe utility as follows:-

It Can be Downloaded from the Microsoft Download Center


Execute setspn MSSQLSvc/SQLSERVER1:1433 

Execute setspn MSSQLSvc/SQLSERVER1.mydomain.com:1433

Where :-

SQLSERVER1 is the Virtual Name
1433 is the SQL Port
SQLSERVER1.mydomain.com is Fully qualified domain name for SQL Server cluster.


Support for SQL Server 2008. 

SQL Database Engine. 

Microsoft added support for SQL Server 2008 with SCCM along with this update they also made changes to the admin console OSD driver management node (under Operating System Deployment -> Drivers).

This results in an error in the Drivers node when running SCCM 2007 R2 SP2 with SQL Server 2008.

To overcome this, there is a hotfix available :-


SQL Reporting Services. 

The Microsoft SCCM 2007 R2 SQL Reporting Services feature provides a set of tools and resources that help you use the advanced reporting capabilities of SQL Reporting services from the Configuration Manager console.

Prerequisites for SQL Reporting Services.

Before we can use SQL Reporting Services in Configuration Manager 2007, we must configure a reporting services point.

The reporting services point is a site system role that must be configured on a server running Microsoft SQL Server with the Reporting Services component installed.

For more information about configuring Reporting Services in SQL Server,

When you install a SQL Reporting Services Point, Configuration Manager 2007 queries Windows Management Instrumentation (WMI) for installed instances of SQL Reporting Services.

The report folder is installed on the first instance of SQL Reporting Services found by the WMI query.

If you have installed multiple instances of SQL Reporting Services and are unsure about whether the instance on which you want to install the report folder is the first stored in WMI, it is recommended that you install the Reporting Services Point on a computer running only one instance of SQL Reporting Services.


 To Create a Reporting Services Point.


In the Configuration Manager console, navigate to System CenterConfiguration Manager / Site Database / Site Management / <site code>-<site name> / Site Settings / Site Systems.

Determine whether to create a new site system or to add the reporting services point site role to an existing site system, and then follow the associated step.

To create a new site system and add the reporting services point role:

Right-click Site Systems, click New, and then click Server.

On the General page of the New Site Role Wizard, specify the name of the server that will be used to host the reporting services point.


To add the reporting services point role to an existing site system:

Right-click the site system name, then click New Roles.

On the General page of the New Site Role Wizard, configure the general settings for this site system, and then click Next. For information about the options you can specify on this page, see How to Add New Site System Roles. Click Next.

On the System Role Selection page of the wizard, select Reporting Services point, and then click Next.

On the Reporting Services Point page, specify the folder that will be created on the report server to contain the SQL Reporting Services reports used in Configuration Manager and then click Next.

noteNote
The default name for this folder is Reports.

Review the information shown on the Summary page, and then click Next.

Click Close to exit the wizard.


How to Install Configuration Manager Using a Clustered SQL Server Instance.


The Configuration Manager 2007 site database can be installed using a clustered virtual Microsoft SQL Server instance during setup by using the following procedures.

noteNote
During Configuration Manager 2007 Setup, the Volume Shadow Copy Service (VSS) writer will be installed on each physical computer node of the Microsoft Windows Server cluster to support the Backup Site Server maintenance task.

1.1      To install Configuration Manager using a clustered SQL Server instance to host the site database

  1. Create the virtual SQL Server cluster to host the site database on an existing Windows Server cluster environment.
  2. Add the machine account of the primary site server machine to the Local Administrators group of each Windows Server cluster node computer. This is required to allow the site server to install and configure settings later.
  3. In the virtual SQL Server instance, assign the sysadmin SQL Server role to the user account running Configuration Manager Setup.
  4. Start Configuration Manager Setup, and select to install Configuration Manager using custom settings.
  5. On the Configuration Manager Database Server page of the Configuration Manager Setup Wizard, enter the name of the clustered virtual SQL Server instance to host the site database and the name for the site database to be created on that instance by setup.
Important
During setup, you must enter the name of the virtual SQL Server cluster instance, and not the virtual Windows Server name created by the Windows Server cluster. Installing the site database using the Windows Server cluster virtual instance name will result in the site database being installed on the local hard drive of the active Windows Server cluster node, and it will prevent successful failover if that node fails.

 


1.2      To verify that the site database was installed successfully


  1. Verify that Configuration Manager 2007 Setup completed successfully by reviewing the ConfigMgrSetup.log file located at the root of the system drive on the primary site server computer.
  2. In the SQL Server  Management Studio console, verify that the site database was created successfully.
  3. In the SQL Server  Management Studio console, verify that the following roles were created for the site database:
    • smsdbrole_MP
    • smsdbrole_SLP
    • smsdbrole_RP
    • smsdbrole_PSP
    • smsdbrole_DMP
    • smsdbrole_siteprovider
    • smsdbrole_siteserver
  4. Verify that the following SQL Server database roles for the site database have been assigned with the appropriate machine accounts:
    • smsdbrole_MP (if management point installation was selected)
    • smsdbrole_siteprovider (the site server or remote provider that was specified)
    • smsdbrole_siteserver (the primary site server computer)





SQL SERVER 2008 Top 20 Enterprise only Features


Overview.

You can find many articles, white papers, and books demonstrating the myriad features found in SQL Server 2008.

However it is sometimes easy to forget that there’s a big disparity in the features that the different editions of SQL Server 2008 support.

Chances are that the exciting new features you’re reading about are supported only by
SQL Server 2008 Enterprise Edition or SQL Server 2008 Developer Edition.

(Although Enterprise Edition and Developer Edition share the same feature set, Developer Edition is licensed solely for development work and can’t be used for production work.)

The following is a list of the 20 most important features.


1       Hot-add CPU. 

Recognizes newly added CPUs without a restart.

2       Hot-add RAM 

Recognizes additional RAM without a restart.

3       More instances. 

Up to 50 named instances (other editions support only 16).

4       Data Compression. 

Automatically compresses database data.

5       Transparent Database Encryption. 

Encrypts databases without making application changes.

6       Resource Governor. 

Allocates system resources per workload.

7       Partitioning.

Divides large tables and indexes into multiple file groups for better performance.

8       Partition Table Parallelism.

Uses separate threads for queries over multiple partitions.

9       Asynchronous Mirroring Mode.

SQL Server 2008 Standard Edition supports only Synchronous database mirroring.

10    More Fail-over Clustering Nodes.

 Up to 16 nodes (Standard Edition supports two nodes)

11    Database Snapshots.

For capturing point-in time database copies.

12    Fast Recovery.

System availability at the end of the transaction-log roll-forward phase.

13    On-line Indexing.

Rebuilds indexes while the base table is in use.

14    On-line Restore.

Restores file groups while a database is active.

15    Distributed Partitioned Views.

Creates scale-out clusters by dividing tables between multiple SQL Server systems.

16    Filtered Indexes.

Lets you selectively index column values.

17    Oracle Replication Publishing.

Lets Oracle act as replication publisher.

18    Peer-to Peer (P2P) Transactional Replication.

Replicates data changes to all nodes on the network.

19    Advanced Transformations.

Adds SQL Server Integration Services transformations such as Fuzzy Lookup and Data Mining.

20    Change Data Capture.

Ability to track changes on a table and capture them to a mirrored table.

SQL Server Partitioned Tables and Indexes


 What are partitions and why use them?

The simple answer is:

To improve the scalability and manageability of large tables and tables that have varying access patterns.

Typically, we create tables to store information about an entity, such as customers or sales, and each table has attributes that describe only that entity.

While a single table for each entity is the easiest to design and understand, these tables are not necessarily optimized for performance, scalability, and manageability, particularly as the table grows larger.

What constitutes a large table?

While the size of a very large database (VLDB) is measured in hundreds of gigabytes, or even terabytes, the term does not necessarily indicate the size of individual tables within the database.

A large database is one that does not perform as desired or one in which the operational or maintenance costs have gone beyond predefined maintenance or budgetary requirements.

These requirements also apply to tables; a table can be considered large if the activities of other users or maintenance operations have a limiting affect on availability.

For example, the sales table is considered large if performance is severely degraded or if the table is inaccessible during maintenance for two hours each day, each week, or even each month.

In some cases, periodic downtime is acceptable, yet it can often be avoided or minimized by better design and partitioning implementations. While the term VLDB applies only to a database, for partitioning, it is more important to look at table size.

In addition to size, a table with varying access patterns might be a concern for performance and availability when different sets of rows within the table have different usage patterns.

Although usage patterns may not always vary (and this is not a requirement for partitioning), when usage patterns do vary, partitioning can result in additional gains in management, performance, and availability. 

Again, to use the example of a sales table, the current month's data might be read-write, while the previous month's data (and often the larger part of the table) is read-only. In a case like this, where data usage varies, or in cases where the maintenance overhead is overwhelming as data moves in and out of the table, the table's ability to respond to user requests might be impacted. This, in turn, limits both the availability and the scalability of the server.

Additionally, when large sets of data are being used in different ways, frequent maintenance operations are performed on static data.

This can have costly effects, such as performance problems, blocking problems, backups (space, time, and operational costs) as well as a negative impact on the overall scalability of the server.

How can partitioning help?

When tables and indexes become very large, partitioning can help by partitioning the data into smaller, more manageable sections.

This document focuses on horizontal partitioning, in which large groups of rows will be stored in multiple separate partitions.

The definition of the partitioned set is customized, defined, and managed by our needs.

SQL Server allows us to partition our tables based on specific data usage patterns using defined ranges or lists.

SQL Server also offers numerous options for the long-term management of partitioned tables and indexes by the addition of features designed around the new table and index structure.

Furthermore, if a large table exists on a system with multiple CPUs, partitioning the table can lead to better performance through parallel operations.

The performance of large-scale operations across extremely large data sets (for instance many million rows) can benefit by performing multiple operations against individual subsets in parallel.

An example of performance gains over partitions can be seen in previous releases with aggregations.

For example, instead of aggregating a single large table, SQL Server can work on partitions independently, and then aggregate the aggregates.

In SQL Server, queries joining large datasets can benefit directly from partitioning;

SQL Server can join the data that resides on the same partitions first and then combine the results.

This allows SQL Server to more effectively use multiple-CPU computers. 

Partitioned Tables.

While the improvements in earlier versions of SQL Server significantly enhanced performance when using partitioned views, they did not simplify the administration, design, or development of a partitioned data set.

When using partitioned views, all of the base tables (on which the view is defined) must be created and managed individually.

Application design is easier and users benefit by not needing to know which base table to directly access, but administration is complex as there are numerous tables to manage and data integrity constraints must be managed for each table.

Because of the management issues, partitioned views were often used to separate tables only when data needed to be archived or loaded.

When data was moved into the read-only table or when data was deleted from the read-only table, the operations were expensive—taking time, log space, and often creating blocking.

Additionally, because partitioning strategies in previous versions required the developer to create individual tables and indexes and then UNION them through views, the optimizer was required to validate and determine plans for each partition (as indexes could have varied).

Therefore, query optimization time in SQL Server 2000 often goes up linearly with the number of processed partitions.

In SQL Server 2005, each partition has the same indexes by definition.

For example, consider a scenario in which the current month of On-line Transaction Processing (OLTP) data needs to be moved into an analysis table at the end of each month.

The analysis table (to be used for read-only queries) is a single table with one clustered index and two non clustered indexes; the bulk load of 1 gigabyte (GB) (into the already indexed and active single table) creates blocking with current users as the table and/or indexes become fragmented and/or locked.

Additionally, the loading process will take a significant amount of time, as the table and indexes must be maintained as each row comes in.

There are ways to speed up the bulk load; however, these can directly affect all other users, and sacrifices concurrency for speed.

If this data were isolated into a newly created (empty) and unindexed [heap] table, the load could occur first and then the indexes could be created after loading the data.

Often you will realize gains in performance of ten times or better by using this scheme.

In fact, by loading into an unindexed table you can take advantage of multiple CPUs by loading multiple data files in parallel or by loading multiple chunks from the same file (defined by starting and ending row position).

Since both operations can benefit from parallelism, this can yield even further gains in performance.

In any release of SQL Server, partitioning allows you this more granular control, and does not require that you have all data in one location; however, there are many objects to create and manage.

A functional partitioning strategy could be achieved in previous releases by dynamically creating and dropping tables and modifying the UNION view.

However, in SQL Server 2005 the solution is more elegant: you can simply switch in the newly filled partition(s) as an extra partition of the existing partition scheme and switch out any old partition(s).

From end to end, the process takes only a short period of time and can be made more efficient by using parallel bulk loading and parallel index creation.

More importantly, the partition is manipulated outside of the scope of the table so there is no effect on the queried table until the partition is added.

The result is that, typically, adding a partition takes only a few seconds.

The performance improvement is also significant when data needs to be removed.

If one database needs a sliding-window set of data in which new data is migrated in (for example, the current month), and the oldest data is removed (maybe the parallel month from the previous year), the performance of this data migration can be improved by orders of magnitude through the use of partitioning.

While this may seem extreme, consider the difference without partitioning; when all of the data is in a single table, deleting 1 GB of old data requires row-by-row manipulation of the table, as well as its associated indexes.

The process of deleting data creates a significant amount of log activity, does not allow log truncation for the duration of the delete (note that the delete is a single auto-commit transaction; however, you can control the size of the transaction by performing multiple deletes where possible) and therefore requires a potentially much larger log.

Using partitioning, however, removing that same data requires removing the specific partition from a partitioned table (which is a metadata operation) and then dropping or truncating the standalone table.

Moreover, without knowing how to best design partitions, one might not be aware that the use of file groups in conjunction with partitions is ideal for implementing partitioning.
File groups allow us to place individual tables on different physical disks.

If a single table spans multiple files (using file groups) then the physical location of data cannot be predicted.

For systems where parallelism is not expected, SQL Server improves performance by using all disks more evenly through file groups, making specific placement of data less critical.

SQL Server continues to balance allocations among all of the objects within that file group.
While SQL Server runs more effectively when using more disks for a given operation, using more disks is not as optimal from a management or maintenance perspective, particularly when usage patterns are very predictable (and isolated).
Since the data does not have a specific location on disk, you don't have the ability to isolate the data for maintenance such as backup operations.
With partitioned tables in SQL Server, a table can be designed (using a function and a scheme) such that all rows that have the same partitioning key are placed directly on (and will always go to) a specific location.
The function defines the partition boundaries, as well as the partition in which the first value should be placed.
In the case of a LEFT partition function, the first value will act as an upper boundary in the first partition.
In the case of a RIGHT partition function, the first value will act as a lower boundary in the second partition (partition functions will be covered in more detail later in this paper).
Once the function is defined, a partition scheme can be created to define the physical mapping of the partitions to their location within the database—based on a partition function.
When multiple tables use the same function (but not necessarily the same scheme), rows that have the same partitioning key will be grouped similarly.
This concept is called alignment.
By aligning rows with the same partition key from multiple tables on the same or different physical disks, SQL Server can, if the optimizer chooses, work with only the necessary groups of data from each of the tables.
To achieve alignment, two partitioned tables or indexes must have some correspondence between their respective partitions.
They must use equivalent partition functions with respect to the partitioning columns. Two partition functions can be used to align data when:

  • Both partition functions use the same number of arguments and partitions.
  • The partitioning key used in each function is of equal type (includes length, precision and scale if applicable, and collation if applicable).
  • The boundary values are equivalent (including the LEFT/RIGHT boundary criteria).

Note   Even when two partition functions are designed to align data, we could end up with an unaligned index if it is not partitioned on the same column as the partitioned table.

Collocation is a stronger form of alignment, where two aligned objects are joined with an equi-join predicate where the equi-join is on the partitioning column.
This becomes important in the context of a query, sub-query  or other similar construct where equi-join predicates may occur.
Collocation is valuable because queries that join tables on the partition columns are generally much faster.
Consider Orders and OrderDetails tables.
Instead of filling the files proportionally, you can create a partition scheme that maps to three file groups.
When defining the Orders and OrderDetails tables, you define them to use the same scheme.
Related data that has the same value for the partition key will be placed within the same file, isolating the necessary data for the join.
When related rows from multiple tables are partitioned in the same manner, SQL Server can join the partitions without having to search through an entire table or multiple partitions (if the table were using a different partitioning function) for matching rows.
In this case, the objects are not only aligned because they use the same key, but they are storage-aligned because the same data resides within the same files. 

Storage-aligned tables.

SQL Server allows partitioning based on ranges, and tables and indexes can be designed to use the same scheme for better alignment.

Good design significantly improves overall performance, but what if the usage of the data changes over time?
What if an additional partition is needed?
Administrative simplicity in adding partitions, removing partitions, and managing partitions outside of the partitioned table were major design goals for SQL Server 2005.
SQL Server 2005 introduced simplified partitioning with administration, development, and usage in mind.
Some of the performance and manageability benefits are:
  • Simplify the design and implementation of large tables that need to be partitioned for performance or manageability purposes.
  • Load data into a new partition of an existing partitioned table with minimal disruption in data access in the remaining partitions.
  • Load data into a new partition of an existing partitioned table with performance equal to loading the same data into a new, empty table.
  • Archive and/or remove a portion of a partitioned table while minimally impacting access to the remainder of the table.
  • Allow partitions to be maintained by switching partitions in and out of the partitioned table.
  • Allow better scaling and parallelism for extremely large operations over multiple related tables.
  • Improve performance over all partitions.
  • Improve query optimization time because each partition does not need to be optimized separately. 

Definitions and Terminology.

To implement partitions in SQL Server 2005 and above, you must be familiar with a few new concepts, terms, and syntax.

To understand these new concepts, let's first review a table's structure with regard to creation and placement.

In previous releases, a table was always both a physical and a logical concept, yet with SQL Server 2005 partitioned tables and indexes you have multiple choices for how and where you store a table.

In SQL Server 2005, tables and indexes can be created with the same syntax as previous releases—as a single tabular structure that is placed on the DEFAULT file group or a user-defined file group.

Additionally, in SQL Server 2005, table and indexes can be created on a partitioning scheme.

The partitioning scheme maps the object to one or more filegroups.
To determine which data goes to the appropriate physical location(s), the partitioning scheme uses a partitioning function.

The partitioning function defines the algorithm to be used to direct the rows and the scheme associates the partitions with their appropriate physical location (i.e. , a file group).

In other words, the table is still a logical concept but its physical placement on disk can be radically different from earlier releases; the table can have a scheme. 


Range Partitions.

 Range partitions are table partitions that are defined by specific and customizable ranges of data.

The range partition boundaries are chosen by the developer, and can be changed as data usage patterns change.

Typically, these ranges are date-based or based on ordered groupings of data.

The primary use of range partitions is for data archiving, decision support (when often only specific ranges of data are necessary, such as a given month or quarter), and for combined OLTP and Decision Support Systems (DSS) where data usage varies over the life cycle of a row.

The biggest benefit to a SQL Server 2005 partitioned table and index is the ability to manipulate very specific ranges of data, especially related to archiving and maintenance.

With range partitions, old data can be archived and replaced very quickly.
Range partitions are best suited when data access is typically for decision support over large ranges of data.

In this case, it matters where the data is specifically located so that only the appropriate partitions are accessed, when necessary.

Additionally, as transactional data becomes available you can add the data easily and quickly.

Range partitions are initially more complex to define, as you will need to define the boundary conditions for each of the partitions.

Additionally, you will create a scheme to map each partition to one or more filegroups.

However, they often follow a consistent pattern so once defined, they will likely be easy to maintain programmatically.

Defining the Partitioning Key.

The first step in partitioning tables and indexes is to define the data on which the partition is keyed.

The partition key must exist as a single column in the table and must meet certain criteria.

The partition function defines the data type on which the key (also known as the logical separation of data) is based.

The function defines this key but not the physical placement of the data on disk. The placement of data is determined by the partition scheme.

In other words, the scheme maps the data to one or more file groups that map the data to specific file(s) and therefore disks.

The scheme always uses a function to do this: if the function defines five partitions, then the scheme must use five file groups.

The file groups do not need to be different; however, you will get better performance when you have multiple disks and, preferably, multiple CPUs.

When the scheme is used with a table, you will define the column that is used as an argument for the partition function.

For range partitions, the dataset is divided by a logical and data-driven boundary.

In fact, the data partitions may not be truly balanced.

Data usage dictates a range partition when the table is used in a pattern that defines specific boundaries of analysis (also known as ranges).

The partition key for a range function can consist of only one column, and the partitioning function will include the entire domain, even when that data may not exist within the table (due to data integrity/constraints).

In other words, boundaries are defined for each partition but the first partition and the last partition will, potentially, include rows for the extreme left (values less than the lowest boundary condition) and for the extreme right (values greater than the highest boundary condition).

Therefore, to restrict the domain of values to a specific dataset, partitions must be combined with CHECK constraints.

Using check constraints to enforce your business rules and data integrity constraints allows you to restrict the dataset to a finite range rather than infinite range.

Range partitions are ideal when maintenance and administration requires archiving large ranges of data on a regular basis and when queries access a large amount of data that is within a subset of the ranges.

Index Partitioning.

In addition to partitioning a table's dataset, you can partition indexes.

Partitioning both the table and its indexes using the same function often optimizes performance.

When the indexes and the table use the same partitioning function and columns in the same order, the table and index are aligned.

If an index is created on an already partitioned table, SQL Server automatically aligns the new index with the table's partitioning scheme unless the index is explicitly partitioned differently.

When a table and its indexes are aligned, then SQL Server can move partitions in and out of partitioned tables more effectively, because all related data and indexes are divided with the same algorithm.

When the tables and indexes are defined with not only the same partition function but also the same partition scheme, they are considered to be storage-aligned.

One benefit to storage alignment is that all data within the same boundary is located on the same physical disk(s).

In this case, backups can be isolated to a certain time-frame and your strategies can vary, in terms of frequency and backup type, based on the volatility of the data.

Additional gains can be seen when tables and indexes on the same file or file group are joined or aggregated.

SQL Server benefits from parallelizing an operation across partitions.

In the case of storage alignment and multiple CPUs, each processor can work directly on a specific file or file group with no conflicts in data access because all required data is on the same disk.

This allows more processes to run in parallel without interruption.

Special Conditions for Partitions: Split, Merge, and Switch.

To aid in the usage of partitioned tables are several new features and concepts related to partition management.

Because partitions are used for large tables that scale, the number of partitions chosen when the partition function was created changes over time.

You can use the ALTER TABLE statement with the new split option to add another partition to the table.

When a partition is split, data can be moved to the new partition; but to preserve performance, rows should not move.

This scenario is described in the case study later in this Document.

Conversely, to remove a partition, perform a switch-out for the data and then merge the boundary point.

In the case of range partitions, a merge request is made by stating which boundary point should be removed.

Where only a specific period of data is needed, and data archiving is occurring on a regular basis (for example, monthly), you might want to archive one partition of data (the earliest month) when the data for the current month becomes available.

For example, you might choose to have one year of data available, and at the end each month you switch in the current month and then switch out the earliest month, differentiating between the current month's read/write OLTP versus the previous month's read-only data.

There is a specific flow of actions that makes the process most efficient, as illustrated by the following scenario.

You are keeping a year's worth of read-only data available.

Currently, the table holds data from September 2003 through August 2004.

The current month of September 2004 is in another database, optimized for OLTP performance.

In the read-only version of the table, there are 13 partitions: twelve partitions which contain data (September 2003 through August 2004) and one final partition that is empty.

This last partition is empty because a range partition always includes the entire domain—both the extreme left as well as the extreme right.

And if you plan to manage data in a sliding-window scenario, you'll always want to have an empty partition to split into which new data will be placed. 



In a partition function defined with LEFT boundary points, the empty partition will logically exist to the far RIGHT.

Leaving a partition empty at the end will allow you to split the empty partition (for the new data coming in) and not need to move rows from the last partition (because none exist) to the new file group that's being added (when the partition is split to include another chunk of data).

This is a fairly complex concept that will be discussed in greater detail in the case study later in the paper but the idea is that all data additions or deletions should be metadata-only operations.

To ensure that metadata-only operations occur, you will want to strategically manage the changing part of the table.

To ensure that this partition is empty, you will use a check constraint to restrict this data in the base table.

In this case, the OrderDate should be on or after September 1, 2003 and before September 1, 2004.

If the last defined boundary point is August 31 at 11:59:59.997 (more on why 997 is coming up), then the combination of the partition function and this constraint will keep the last partition empty.

While these are only concepts, it is important to know that split and merge are handled through ALTER PARTITION FUNCTION and switch is handled through ALTER TABLE. 

Range partition boundaries before data load/archive.

When October begins (in the OLTP database), September's data should move to the partitioned table, which is used for analysis.

The process of switching the tables in and out is a very fast process, and the preparation work can be performed outside of the partitioned table.

This scenario is explained in depth in the case study coming up but the idea is that you will be using "staging tables" that will eventually become partitions within the partitioned table A detailed description of this scenario is explained later in the case study later in this document.

In this process, you will switch out a partition of a table to a non partitioned table within the same file group.

Because the non partitioned table already exists within the same file group (and this is critical for success), SQL Server can make this switch as a metadata change.

As a metadata-only change, this can occur within seconds as opposed to running a delete that might take hours and create blocking in large tables.

Once this partition is switched out, you will still have 13 partitions; the first (oldest) partition is now empty and the last (most recent, also empty) partition needs to be split.

Switching a partition out. 

To remove the oldest partition (September 2003), use the new merge option with ALTER TABLE.

Merging a boundary point effectively removes a boundary point, and therefore a partition.

This reduces the number of partitions into which data loads to n-1 (in this case, 12).

Merging a partition should be a very fast operation when no rows have to be moved (because the boundary point being merged has no data rows).

In this case, because the first partition is empty, none of the rows need to move from the first to the second partition.

If the first partition is not empty and the boundary point is merged, then rows have to move from the first to the second partition, which can be a very expensive operation.

However, this is avoided in the most common sliding-window scenario where an empty partition is merged with an active partition, and no rows move.

 Merging a partition.

Finally, the new table has to be switched in to the partitioned table.

In order for this to be performed as a metadata change, loading and building indexes must happen in a new table, outside of the bounds of the partitioned table.

To switch in the partition, first split the last, most recent, and empty range into two partitions.

Additionally, you need to update the table's constraint to allow the new range.

Once again, the partitioned table will have 13 partitions.

In the sliding window scenario, the last partition with a LEFT partition function will always remain empty.

Splitting a partition.

Finally, the newly loaded data is ready to be switched in to the twelfth partition, September 2004. 



Range partition boundaries after data load/archive.

Because only one partition can be added or removed at a time, tables that need to have more than one partition added or removed should be recreated.

To change to this new partitioning structure, first create the new partitioned table and then load the data into the newly created table.

This is a more optimal approach than rebalancing the whole table for each split.
This process is accomplished by using a new partition function, a new partition scheme, and then by moving the data to the newly partitioned table.

To move the data, first copy the data using INSERT newtable SELECT columnlist FROM oldtable and then drop the original tables.

Prevent user modifications while this process is running to help ensure against data loss.


Steps for Creating Partitioned Tables.

Now that you have an understanding of the value of partitioned tables, the next section details the process of implementing a partitioned table, and the features that contribute to this process. 

Determine If Object Should Be Partitioned.

While partitioning can offer great benefits, it adds administrative overhead and complexity to the implementation of your objects, which can be more of a burden than a gain.

Specifically, you might not want to partition a small table, or a table that currently meets performance and maintenance requirements.

The sales scenario mentioned earlier uses partitioning to relieve the burden of moving rows and data—you should consider whether your scenario has this sort of burden when deciding whether to implement partitioning.

Determine Partitioning Key and Number of Partitions.

If you are trying to improve performance and manageability for large subsets of data and there are defined access patterns, range partitioning can alleviate contention as well as reduce maintenance when read-only data does not require it.

To determine the number of partitions, you should evaluate whether or not logical groupings and patterns exist within your data.

If you often work with only a few of these defined subsets at a time, then define your ranges so the queries are isolated to work with only the appropriate data (i.e. only the specific partition). 


Determine If Multiple Filegroups Should Be Used.

To help optimize performance and maintenance, you should use file groups to separate your data. 

The number of file groups is partially dictated by hardware resources: it is generally best to have the same number of file groups as partitions, and these file groups often reside on different disks.

However, this primarily only pertains to systems where analysis tends to be performed over the entire dataset.

When you have multiple CPUs, SQL Server can process multiple partitions in parallel and therefore significantly reduce the overall processing time of large complex reports and analysis.

In this case, you can have the benefit of parallel processing as well as switching partitions in and out of the partitioned table.


Create Filegroups.

If you want to place a partitioned table on multiple files for better I/O balancing, you will need to create at least one file group.

File groups can consist of one or more files, and each partition must map to a file group.

A single file group can be used for multiple partitions but for better data management, such as for more granular backup control, you should design your partitioned tables so that only related or logically grouped data resides on the same file group.

Using ALTER DATABASE, you can add a logical file group name, and then add files.

To create a file group named 2003Q3 for the AdventureWorks database, use ALTER DATABASE in the following way:

ALTER DATABASE AdventureWorks ADD FILEGROUP [2003Q3]

Once a filegroup exists, you use ALTER DATABASE to add files to the filegroup.

ALTER DATABASE AdventureWorks
ADD FILE 
  (NAME = N'2003Q3',
  FILENAME = N'C:\AdventureWorks\2003Q3.ndf',
  SIZE = 5MB,
  MAXSIZE = 100MB,
  FILEGROWTH = 5MB)
TO FILEGROUP [2003Q3]

A table can be created on a file(s) by specifying a filegroup in the ON clause of CREATE TABLE.

However, a table cannot be created on multiple filegroups unless it is partitioned.

To create a table on a single filegroup, use the ON clause of CREATE TABLE.

To create a partitioned table, you must first have a functional mechanism for the partitioning.

The criteria on which you partition are logically separated from the table in the form of a partition function.

This partition function will exist as a separate definition from the table and this physical separation helps because multiple objects can use the partition function.

Therefore, the first step in partitioning a table is to create the partition function.

Create the Partition Function for a Range Partition.

Range partitions must be defined with boundary conditions.

Moreover, no values, from either end of the range, can be eliminated even if a table is restricted through a CHECK constraint.

To allow for periodic switching of data into the table, you'll need a final and empty partition
In a range partition, first define the boundary points: for five partitions, define four boundary point values and specify whether each value is an upper boundary of the first (LEFT) or the lower boundary of the second (RIGHT) partition.

Based on the left or right designation, you will always have one partition that is empty, as the partition will not have an explicitly defined boundary point.

Specifically, if the first value (or boundary condition) of a partition function is '20001001' then the values within the bordering partitions will be:
For LEFT
first partition is all data <= '20001001'
second partition is all data >  '20001001'
For RIGHT
first partition is all data < '20001001'
second partition is all data => '20001001'

Since range partitions are likely to be defined on datetime data, you must be aware of the implications.

Using datetime has certain implications: you always have both a date and a time.
A date with no defined value for time implies a "0" time of 12:00 A.M.

If LEFT is used with this type of data, then the data with a date of Oct 1, 12:00 A.M. will end up in the first partition and the rest of October's data in the second partition.

Logically, it is best to use beginning values with RIGHT and ending values with LEFT. 

These three clauses create logically identical partitioning structures:
RANGE LEFT FOR VALUES ('20000930 23:59:59.997',
               '20001231 23:59:59.997', 
               '20010331 23:59:59.997',
               '20010630 23:59:59.997')

OR
RANGE RIGHT FOR VALUES ('20001001 00:00:00.000', 
                  '20010101 00:00:00.000', 
                  '20010401 00:00:00.000',
                  '20010701 00:00:00.000')

OR
RANGE RIGHT FOR VALUES ('20001001', '20010101', '20010401', '20010701')

Note   Using the datetime data type does add a bit of complexity here, but you need to make sure you set up the correct boundary cases.

Notice the simplicity with RIGHT because the default time is 12:00:00.000 A.M.

For LEFT, the added complexity is due to the precision of the datetime data type.

The reason that 23:59:59.997 MUST be chosen is that datetime data does not guarantee precision to the millisecond.

Instead, datetime data is precise within 3.33 milliseconds.

In the case of 23:59:59.999, this exact time tick is not available and instead the value is rounded to the nearest time tick that is 12:00:00.000 A.M. of the following day. With this rounding, the boundaries will not be defined properly.

For datetime data, you must use caution with specifically supplied millisecond values.

Note   Partitioning functions also allow functions as part of the partition function definition.

You may use DATEADD(ms,-3,'20010101') instead of explicitly defining the time using '20001231 23:59:59.997'. 




To store one-fourth of the Orders data in the four active partitions, each representing one calendar quarter, and create a fifth partition for later use (again, as a placeholder for sliding data in and out of the partitioned table), use a LEFT partition function with four boundary conditions:

CREATE PARTITION FUNCTION OrderDateRangePFN(datetime)
AS 
RANGE LEFT FOR VALUES ('20000930 23:59:59.997',
            '20001231 23:59:59.997',
            '20010331 23:59:59.997',
            '20010630 23:59:59.997')

Remember, four defined boundary points creates five partitions. Review the data sets created by this partition function by reviewing the sets as follows:
Boundary point '20000930 23:59:59.997' as LEFT (sets the pattern):

The leftmost partition will include all values <= '20000930 23:59:59.997'
Boundary point '20001231 23:59:59.997':

The second partition will include all values > '20000930 23:59:59.997' but <= '20001231 23:59:59.997'
Boundary point '20010331 23:59:59.997':

The third partition will include all values > '20001231 23:59:59.997' but <= '20010331 23:59:59.997'
Boundary point '20010630 23:59:59.997':

The fourth partition will include all values > '20010331 23:59:59.997' but <= '20010630 23:59:59.997'

Finally, a fifth partition will include all values > '20010630 23:59:59.997'.


 Create the Partition Scheme. 

Once you have created a partition function, you must associate it with a partition scheme to direct the partitions to specific filegroups.

When you define a partition scheme, you must make sure to name a filegroup for every partition, even if multiple partitions will reside on the same filegroup.

For the range partition created previously (OrderDateRangePFN), there are five partitions; the last, and empty, partition will be created in the PRIMARY filegroup.

There is no need for a special location for this partition because it will never contain data.

CREATE PARTITION SCHEME OrderDatePScheme 
AS 
PARTITION OrderDateRangePFN 
TO ([2000Q3], [2000Q4], [2001Q1], [2001Q2], [PRIMARY])

Note   If all partitions will reside in the same filegroup, then a simpler syntax can be used as follows:

CREATE PARTITION SCHEME OrderDatePScheme 
AS 
PARTITION OrderDateRangePFN 
ALL TO ([PRIMARY])


Create the Partitioned Table.

With the partition function (the logical structure) and the partition scheme (the physical structure) defined, the table can be created to take advantage of them.

The table defines which scheme should be used, and the scheme defines the function.

To tie all three together, you must specify the column to which the partitioning function should apply.

Range partitions always map to exactly one column of the table that should match the datatype of the boundary conditions defined within the partition function.

Additionally, if the table should specifically limit the data set (rather than from –infinity to positive infinity), then a check constraint should be added as well.

CREATE TABLE [dbo].[OrdersRange]  
(
   [PurchaseOrderID] [int] NOT NULL,
   [EmployeeID] [int] NULL,
   [VendorID] [int] NULL,
   [TaxAmt] [money] NULL,
   [Freight] [money] NULL,
   [SubTotal] [money] NULL,
   [Status] [tinyint] NOT NULL ,
   [RevisionNumber] [tinyint] NULL ,
   [ModifiedDate] [datetime] NULL ,
   [ShipMethodID] [tinyint] NULL,
   [ShipDate] [datetime] NOT NULL, 
   [OrderDate] [datetime] NOT NULL
      CONSTRAINT OrdersRangeYear
         CHECK ([OrderDate] >= '20030701' 
                         AND [OrderDate] <= '20040630 11:59:59.997'), 
   [TotalDue] [money] NULL
) 
ON OrderDatePScheme (OrderDate)
GO


Create Indexes: Partitioned or Not?

By default, indexes created on a partitioned table will also use the same partitioning scheme and partitioning column.

When this is true, the index is aligned with the table.

Although not required, aligning a table and its indexes allows for easier management and administration, particularly with the sliding-window scenario.

For example, to create unique indexes, the partitioning column must be one of the key columns; this will ensure verification of the appropriate partition to guarantee uniqueness.

Therefore, if you need to partition a table on one column, and you have to create unique index on a different column, then they cannot be aligned.

In this case, the index might be partitioned on the unique column (if this is multi-column unique key, then it could be any of the key columns) or it might not be partitioned at all. 

Be aware that this index has to be dropped and created when switching data in and out of the partitioned table.

Note   If you plan to load a table with existing data and add indexes to it immediately, you can often get better performance by loading into a nonpartitioned, unindexed table and then creating the indexes to partition the data after the load.

By defining a clustered index on a partition scheme, you will effectively partition the table after the load.

This is also a great way of partitioning an existing table.

To create the same table as a nonpartitioned table, and create the clustered index as a partitioned clustered index, replace the ON clause in the create table with a single filegroup destination.

Then, create the clustered index on the partition scheme after the data is loaded.