Deduplication of Data in SQL Server (2005+) Using PARTITION BY
The SQL PARTITION BY clause allows you to partition your result set based on pretty much anything you want. Rather than expounding too much, I’m going to dive right into a simple toy example.
Let’s say that you have a table that holds your shopping list:
CREATE TABLE [#DuplicatedData] ( [ID] INT, [MyData] NVARCHAR(255), [Date] DATETIME )
You and your girlfriend both have an application on your phones that lets you enter data into this shopping list. However, you don’t coordinate every single thing you enter, so you’re bound to have some duplication sometimes as each of you realizes that, hey, you’re out of potatoes. So, on January 1, you both enter that you need potatoes, and you further enter that you need celery:
INSERT INTO [#DuplicatedData] VALUES (1, 'Potatoes', '1/1/2010 5:35:04 PM') INSERT INTO [#DuplicatedData] VALUES (2, 'Celery', '1/1/2010 5:36:12 PM') INSERT INTO [#DuplicatedData] VALUES (3, 'Potatoes', '1/1/2010 6:02:38 PM')
A few days later, you’ve used up your potatoes, so you need more:
INSERT INTO [#DuplicatedData] VALUES (4, 'Potatoes', '1/4/2010 1:20:09 PM')
When you went to print out your list on January 2, you didn’t want to see Potatoes twice.. and when you look back on your shopping habits, you also don’t want to see Potatoes twice for January 1. So you decide that, while maybe it’s possible once in a while, it’s probably pretty rate that you will need potatoes more than once in a day, so you’re going to remove any entries that are for the same item on the same day. You’ll only take the latest entry in a day for any particular item. But how?
That’s where the beauty of PARTITION BY comes into play. Let’s take a look at this query:
SELECT ROW_NUMBER() OVER(PARTITION BY [MyData], CONVERT(VARCHAR, [Date], 101) ORDER BY [ID] DESC) AS [RowNum], * FROM [#DuplicatedData]
We are breaking up the data by item and then by date (but not time) and ordering to get the latest entries showing up first, then we are getting a row number to identify the order in each partition of the items. The results from this query look like this:
1 2 Celery 2010-01-01 17:36:12.000 1 3 Potatoes 2010-01-01 18:02:38.000 2 1 Potatoes 2010-01-01 17:35:04.000 1 4 Potatoes 2010-01-04 13:20:09.000
The Potatoes entries on January 1 have been grouped together and numbered 1 and 2, whereas the others are each in their own partitions and are numbered 1. Once we have this, it’s trivial to get the data that we want:
SELECT [MyData], [Date] FROM ( SELECT ROW_NUMBER() OVER(PARTITION BY [MyData], CONVERT(VARCHAR, [Date], 101) ORDER BY [ID] DESC) AS [RowNum], * FROM [#DuplicatedData] ) AS [Sub] WHERE [RowNum] = 1 ORDER BY [Date] ASC
Results:
Celery 2010-01-01 17:36:12.000 Potatoes 2010-01-01 18:02:38.000 Potatoes 2010-01-04 13:20:09.000
Thanks to Sam Grose for teaching me pretty much everything in this post.
New web application.
I’ve created a new blog to track my efforts in creating a brand-new web application for personal use. It can be found here:
http://hiddensanctum.wordpress.com/
Now that I’ve figured out my password for this blog, I will continue to use it to post interesting .NET/SQL finds that do not relate to the above, but I am no longer working in SharePoint, so I won’t be posting any more SharePoint entries. (Not that I’ve posted a lot of entries at all so far.)
Time is slim, but I’m hoping to be able to make a post now and again that may help someone else out there.
Ensuring repeating fields are not blank in InfoPath 2007
A bit of exciting news before I get to the real post: Since I’ve last posted, I’ve studied for and passed the CAPM exam.
Anyway,…
The Situation
I am creating a form to track the staffing process at my company. I have an InfoPath 2007 form that has a repeating table. This table has two fields in it: a Candidate Name textbox and a Notes attachment. I want to know when, say, the Notes field has been filled in for every candidate. So, upon submit, I want to check if all of the Notes fields in the repeating table have been filled in/have attachments in them, and, if they have, I want to populate a value in another field, Date. Read the rest of this entry »
Workflow stops responding after List item is updated programmatically
I ran into an interesting situation today, and I could not find a solution online, so I figured I’d post this up in case someone else runs into the same issue. I don’t guarantee that this is the best solution, so if anyone has any better ways to go about this, please comment.
I am creating a workflow that needs escalation capabilities. Because of the issues that SharePoint has with Delaying, I decided to make a separate console application that can be run as a timer job to check the running workflows on my list items and nudge them along if necessary.
That’s a bit of a broad, general overview. Here are the specific details of the issue I ran into:
The workflow is designed to wait until the item on which it is running has a certain field updated to a value other than the default. This is taken care of by placing an OnWorkflowItemChanged activity within a While activity:
If I edit the item myself, through the UI, it works without a hitch – once the correct field is edited, the workflow moves on. The interesting part is the behavior when the list item is edited programmatically in my console application:
workflow.ParentItem["FieldName"] = "NewValue";
workflow.ParentItem.Update();
The field will update, but the OnWorkflowItemChanged (ItemChaanged2) event won’t fire. In fact, after running the code above, the event won’t even fire from the UI anymore.
The solution?
Web services:
// Create the web service reference
Lists.Lists listsService = new Lists.Lists();
// Set credentials
listsService.Credentials = CredentialCache.DefaultCredentials;
// Set the URL
listsService.Url = string.Format(
"{0}/_vti_bin/lists.asmx",
siteURL
);
// Build the CAML statement
string updateCAML = "<Method ID='1' Cmd='Update'>";
updateCAML += "<Field Name='ID'>" + itemID.ToString() + "</Field>"; // ID of the item (index of the item in the list)
updateCAML += "<Field Name='FieldName'>NewValue</Field>"; // Update information
updateCAML += "</Method>";
// Build the XmlElement object that contains the update request
XmlDocument document = new XmlDocument();
XmlElement batch = document.CreateElement("Batch");
batch.SetAttribute("OnError", "Continue");
batch.SetAttribute("ListVersion", listVersion.ToString());
batch.InnerXml = updateCAML;
// Do the update
XmlNode returnNode = listsService.UpdateListItems(listID.ToString(), batch);
Like I said, I’m not sure if this is the best approach, but the workflow responded just fine when the update was made through a web service call instead of through the object model.
If you are trying to modify promoted fields from an InfoPath form, do not use the above method. Read the rest of this entry »
First Post!
So, I passed my first MCTS exam yesterday (70-541 Microsoft Windows SharePoint Services 3.0 – Application Development), and I thought this would be a good time to finally start a blog. Expect important things to be posted here.
Eventually.
