Home > Uncategorized > Matching Strings that don’t exactly match

Matching Strings that don’t exactly match


Today I was helping a fellow in our office tackle what appeared on the surface to be a really simple problem. He needs to confirm that records in one system match those in another. However, some of this data is maintained manually or at different times. Therefore, they might be the same thing, but the strings don’t match perfectly.

At the time were were trying to think of various comparison and sub-string functions.  Like usual though, after walking away for a few minutes, I remembers some common functions like soundex.  Soundex is Phonetic Algorithm

SQL Difference() and Soundex() Function

One implementation using soundex is built into Transact-SQL of MS SQL and perhaps Sybase.  Difference() returns a ranking of 0 to 4 with 4 being highest match.  Have to try this when I’m back in the office.

Tame the Beast

In his Tame the Beast article Simon White shows us a number of techniques for finding near matches to strings in an effort to improve usability in GUI applications.  Really interesting. Simon discusses several techniques for almost matching including:

  • Equivalence Methods
  • Synonyms and Regular Expressions
  • The Soundex Algorithm
  • Similarity Ranking Methods
  • Editing and Hamming Distances

You never know what sort of gold you’re going to find while looking for something else 🙂

Probabilistic Matching

This definitely ranks up there with terms like orthogonal and non-deterministic as conversation enhancers.  Really makes the boring topic of trying to see if two things are the same sound exciting.

In his blog post on Probabilistic Matching Steve Sarsfield talks about whether this highly marketable feature in many data quality tools is really a good idea to use in production.

Conclusion

I think tomorrow back in the office, we’ll give the difference() and soundex() calls a try and see if that doesn’t reduce the number of manual comparisons we have to do.

Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: