Kevin Akselrod’s Blog

Just another WordPress.com weblog

Deduplication of Data in SQL Server (2005+) Using PARTITION BY

with 2 comments

The SQL PARTITION BY clause allows you to partition your result set based on pretty much anything you want.  Rather than expounding too much, I’m going to dive right into a simple toy example.

Let’s say that you have a table that holds your shopping list:

  CREATE TABLE [#DuplicatedData]
  (
	  [ID]		INT,
	  [MyData]	NVARCHAR(255),
	  [Date]	DATETIME
  )

You and your girlfriend both have an application on your phones that lets you enter data into this shopping list.  However, you don’t coordinate every single thing you enter, so you’re bound to have some duplication sometimes as each of you realizes that, hey, you’re out of potatoes.  So, on January 1, you both enter that you need potatoes, and you further enter that you need celery:

  INSERT INTO [#DuplicatedData] VALUES (1, 'Potatoes', '1/1/2010 5:35:04 PM')
  INSERT INTO [#DuplicatedData] VALUES (2, 'Celery', '1/1/2010 5:36:12 PM')
  INSERT INTO [#DuplicatedData] VALUES (3, 'Potatoes', '1/1/2010 6:02:38 PM')

A few days later, you’ve used up your potatoes, so you need more:

  INSERT INTO [#DuplicatedData] VALUES (4, 'Potatoes', '1/4/2010 1:20:09 PM')

When you went to print out your list on January 2, you didn’t want to see Potatoes twice.. and when you look back on your shopping habits, you also don’t want to see Potatoes twice for January 1.  So you decide that, while maybe it’s possible once in a while, it’s probably pretty rate that you will need potatoes more than once in a day, so you’re going to remove any entries that are for the same item on the same day.  You’ll only take the latest entry in a day for any particular item.  But how?

That’s where the beauty of PARTITION BY comes into play.  Let’s take a look at this query:

  SELECT
	  ROW_NUMBER() OVER(PARTITION BY [MyData], CONVERT(VARCHAR, [Date], 101) ORDER BY [ID] DESC) AS [RowNum],
	  *
  FROM
	  [#DuplicatedData]

We are breaking up the data by item and then by date (but not time) and ordering to get the latest entries showing up first, then we are getting a row number to identify the order in each partition of the items.  The results from this query look like this:

  1	  2	  Celery	2010-01-01 17:36:12.000
  1	  3	  Potatoes	2010-01-01 18:02:38.000
  2	  1	  Potatoes	2010-01-01 17:35:04.000
  1	  4	  Potatoes	2010-01-04 13:20:09.000

The Potatoes entries on January 1 have been grouped together and numbered 1 and 2, whereas the others are each in their own partitions and are numbered 1.  Once we have this, it’s trivial to get the data that we want:

  SELECT [MyData], [Date] FROM
	  (
		  SELECT
			  ROW_NUMBER() OVER(PARTITION BY [MyData], CONVERT(VARCHAR, [Date], 101) ORDER BY [ID] DESC) AS [RowNum],
			  *
		  FROM
			  [#DuplicatedData]
	  ) AS [Sub]
  WHERE
	  [RowNum] = 1
  ORDER BY
	  [Date] ASC

Results:

  Celery	2010-01-01 17:36:12.000
  Potatoes	2010-01-01 18:02:38.000
  Potatoes	2010-01-04 13:20:09.000

Thanks to Sam Grose for teaching me pretty much everything in this post.

Written by Kevin Akselrod

May 19, 2010 at 12:14 am

Posted in SQL

Tagged with , ,

New web application.

with 4 comments

I’ve created a new blog to track my efforts in creating a brand-new web application for personal use.  It can be found here:

http://hiddensanctum.wordpress.com/

Now that I’ve figured out my password for this blog, I will continue to use it to post interesting .NET/SQL finds that do not relate to the above, but I am no longer working in SharePoint, so I won’t be posting any more SharePoint entries.  (Not that I’ve posted a lot of entries at all so far.)

Time is slim, but I’m hoping to be able to make a post now and again that may help someone else out there.

Written by Kevin Akselrod

April 13, 2010 at 2:11 am

Posted in Uncategorized

Ensuring repeating fields are not blank in InfoPath 2007

leave a comment »

A bit of exciting news before I get to the real post:  Since I’ve last posted, I’ve studied for and passed the CAPM exam.

Anyway,…

The Situation

I am creating a form to track the staffing process at my company.  I have an InfoPath 2007 form that has a repeating table.  This table has two fields in it:  a Candidate Name textbox and a Notes attachment.  I want to know when, say, the Notes field has been filled in for every candidate.  So, upon submit, I want to check if all of the Notes fields in the repeating table have been filled in/have attachments in them, and, if they have, I want to populate a value in another field, Date. Read the rest of this entry »

Written by Kevin Akselrod

September 10, 2009 at 8:48 pm

Workflow stops responding after List item is updated programmatically

with 2 comments

I ran into an interesting situation today, and I could not find a solution online, so I figured I’d post this up in case someone else runs into the same issue.  I don’t guarantee that this is the best solution, so if anyone has any better ways to go about this, please comment.

I am creating a workflow that needs escalation capabilities.  Because of the issues that SharePoint has with Delaying, I decided to make a separate console application that can be run as a timer job to check the running workflows on my list items and nudge them along if necessary.

That’s a bit of a broad, general overview.  Here are the specific details of the issue I ran into:

The workflow is designed to wait until the item on which it is running has a certain field updated to a value other than the default.  This is taken care of by placing an OnWorkflowItemChanged activity within a While activity:

WorkflowSnippet

If I edit the item myself, through the UI, it works without a hitch – once the correct field is edited, the workflow moves on.  The interesting part is the behavior when the list item is edited programmatically in my console application:


    workflow.ParentItem["FieldName"] = "NewValue";
    workflow.ParentItem.Update();

The field will update, but the OnWorkflowItemChanged (ItemChaanged2) event won’t fire.  In fact, after running the code above, the event won’t even fire from the UI anymore.

The solution?

Web services:


    // Create the web service reference
    Lists.Lists listsService = new Lists.Lists();

    // Set credentials
    listsService.Credentials = CredentialCache.DefaultCredentials;

    // Set the URL
    listsService.Url = string.Format(
        "{0}/_vti_bin/lists.asmx",
        siteURL
      );

    // Build the CAML statement
    string updateCAML = "<Method ID='1' Cmd='Update'>";
    updateCAML += "<Field Name='ID'>" + itemID.ToString() + "</Field>"; // ID of the item (index of the item in the list)
    updateCAML += "<Field Name='FieldName'>NewValue</Field>"; // Update information
    updateCAML += "</Method>";

    // Build the XmlElement object that contains the update request
    XmlDocument document = new XmlDocument();
    XmlElement batch = document.CreateElement("Batch");
    batch.SetAttribute("OnError", "Continue");
    batch.SetAttribute("ListVersion", listVersion.ToString());
    batch.InnerXml = updateCAML;

    // Do the update
    XmlNode returnNode = listsService.UpdateListItems(listID.ToString(), batch);

Like I said, I’m not sure if this is the best approach, but the workflow responded just fine when the update was made through a web service call instead of through the object model.

If you are trying to modify promoted fields from an InfoPath form, do not use the above method. Read the rest of this entry »

Written by Kevin Akselrod

June 24, 2009 at 5:07 pm

First Post!

with 2 comments

So, I passed my first MCTS exam yesterday (70-541 Microsoft Windows SharePoint Services 3.0 – Application Development), and I thought this would be a good time to finally start a blog.  Expect important things to be posted here.

Eventually.

Written by Kevin Akselrod

May 7, 2009 at 7:18 pm

Posted in Blog Setup, WSS 3.0

Follow

Get every new post delivered to your Inbox.