Recently, I had to put together a tool to verify data (in Microsoft SQL server database tables) in no time. Even though the time was very short, I had few requirements to follow while creating the tool as guidance. Those requirements were:
- Flexibility - so that the tool is usable to more than one projects with little or no modifications.
- Lightweight - no use of third party scripting framework like Ant/Nant and no use of programming languages like Java , c# etc.
Since the tool was intended specifically for Microsoft SQL Server on Windows environments, I decided to use batch scripting, SQLCMD (http://technet.microsoft.com/en-us/library/ms162773.aspx), and SQL scripting.
In order to make it flexible and not to hard code anything (like Servername, database name, table name etc), so that same SQL script could be used from one project to another for similar type of verification with little or no modification, I decided to externalize these attributes in a
Now, I had to put together a batch script file that could read each line as a row from
I've used
In this example, it skips the first line of
One catch here is that the output file. Even though, the
The final part was to create the re-usable (kind of template)
I guess, the idea is clear, below you can find more details of my .csv file, batch script file and
Below are the complete file, which you may be able to use:
;col-cond1;col-cond-2;mycolumn1;mycolumn2
mysqlserver1;mydatabase3.mytable1;10;
;col-cond1;col-cond-2;mycolumn1;mycolumn2
...
In order to make it flexible and not to hard code anything (like Servername, database name, table name etc), so that same SQL script could be used from one project to another for similar type of verification with little or no modification, I decided to externalize these attributes in a
';'
delimitted .csv
file. Each line of my .csv
file followed the following format:
<db-connection-info>;<db-name>.<table-name>;[<verification-criteria-expected-row-count>];[<other-criteria-1>];[<other-criteria-2>];[<column-name-1>];[<column-name-2>];[…]
|
Now, I had to put together a batch script file that could read each line as a row from
.csv
file, split it into columns and create and assign those values to corresponding variables into SQLCMD
in order to pass to the SQL
script during execution. I've used
'for'
loop in the batch script to realize it. 'For'
loop allows you to choose delimiter character, skipping any line(s) on the top of the file and what columns to read. Here is a fragment of batch script file:
for /F "tokens=1-7 skip=1 delims=;" %%c in (%dataFile%) do (
|
In this example, it skips the first line of
.csv
file (passed as variable %dataFile%
), reads column (token) 1 to 7 from each line delimited by ';'
and assigns to variable starting from letter 'c'
. And also it executes SQLCMD
command with the given options and provided SQL file '%iSqlFile%'.
One catch here is that the output file. Even though, the
SQLCMD
has -o <output-file>
option, I did not use it here because this option does not allow to append into the existing file. If you have multiple sqlcmd command executing in a loop they either create new file each time deleting the previous one or create multiple files given the output file are unique for each execution. The easy out way out is just to redirect using stream redirection symbol '>>'
.The final part was to create the re-usable (kind of template)
SQL
script file. Fragment below shows how variable passed from SQLCMD
command can be used in SQL
script and without hard-coding the database name, table name, column name etc.
SELECT @CurrRowCount=count (*) FROM $(tableName) where $(tableName).$(col1Name) NOT LIKE '$(col1-cond1)' and $(tableName).$(col2Name) NOT LIKE '$(col2-cond1)';
|
I guess, the idea is clear, below you can find more details of my .csv file, batch script file and
SQL
script files.Below are the complete file, which you may be able to use:
- .csv file: testcases.csv
#<db-connection-info>;<db-name>.<table-name>;[<verification-criteria-expected-row-count>];[<other-criteria-1>];[<other-criteria-2>];[<column-name-1>];[<column-name-2>];[…] |
;col-cond1;col-cond-2;mycolumn1;mycolumn2
mysqlserver1;mydatabase3.mytable1;10;
;col-cond1;col-cond-2;mycolumn1;mycolumn2
...
- Batch script file: DataVerification.cmd
SETLOCAL |
- SQL file:
RowCnt.sql
PRINT N'Generated as of: '+RTRIM(CAST(GETDATE() AS nvarchar(30))); |
Finally, when all test cases defined in
testcases.csv
are executed, you'll get either "Passed" (if all test cases pass) or "Failed" (even if a single test case fails). You can see the details in the thus generated log file, similar to following ...
==============================================
============= Verification Details ===========
==============================================
==============================================
Table: mydatabase1.mytable1
Generated as of: <timestamp>
Total row count: xx.
verification: passed[failed].
==============================================
==============================================
Table: mydatabase2.mytable1
Generated as of: <timestamp>
Total row count: yy.
verification: passed[failed].
==============================================
Obviously, this is just an idea, you can develop your won SQL
file(s) as per your data processing requirement, modify .csv
file as your input requirement and change the 'for' loop as change happens in .csv file.
No comments:
Post a Comment