Matching Strings that don’t exactly match

September 16, 2009 Leave a comment

Today I was helping a fellow in our office tackle what appeared on the surface to be a really simple problem. He needs to confirm that records in one system match those in another. However, some of this data is maintained manually or at different times. Therefore, they might be the same thing, but the strings don’t match perfectly.

At the time were were trying to think of various comparison and sub-string functions.  Like usual though, after walking away for a few minutes, I remembers some common functions like soundex.  Soundex is Phonetic Algorithm

SQL Difference() and Soundex() Function

One implementation using soundex is built into Transact-SQL of MS SQL and perhaps Sybase.  Difference() returns a ranking of 0 to 4 with 4 being highest match.  Have to try this when I’m back in the office.

Tame the Beast

In his Tame the Beast article Simon White shows us a number of techniques for finding near matches to strings in an effort to improve usability in GUI applications.  Really interesting. Simon discusses several techniques for almost matching including:

  • Equivalence Methods
  • Synonyms and Regular Expressions
  • The Soundex Algorithm
  • Similarity Ranking Methods
  • Editing and Hamming Distances

You never know what sort of gold you’re going to find while looking for something else 🙂

Probabilistic Matching

This definitely ranks up there with terms like orthogonal and non-deterministic as conversation enhancers.  Really makes the boring topic of trying to see if two things are the same sound exciting.

In his blog post on Probabilistic Matching Steve Sarsfield talks about whether this highly marketable feature in many data quality tools is really a good idea to use in production.

Conclusion

I think tomorrow back in the office, we’ll give the difference() and soundex() calls a try and see if that doesn’t reduce the number of manual comparisons we have to do.

Maybe it’s time to apply Agile principles to Data Warehouse

September 11, 2009 Leave a comment

My colleagues and I have recently been searching for a the perfect Reference Data Model for Financial Markets. The reason for the search is to establish the common data model for our enterprise so a Data Warehouse can be built that will not have to change.

Like every other bank, we already have a Data Warehouse – or two… But like most existing Data Warehouses, it does not meet all requirements and it is too difficult to change.  Thus the natural inclination is build another.

Thinking a bit differently, the existing Data Warehouse at one point probably did meet business requirements.  But, perhaps the real issue is not that the model was wrong from the start.   Rather, the need for rapid painless change was not engineered into the solution.

Enter Agile

Much of the motivation behind Agile software development methods is to accommodate change.  Change is a constant.  If we accept there will always be change and we adopt methods that mitigate the risk and cost of change in our systems, we can reduce the time we spend on Analysis and Design.  Today we spend inordinate amounts of time on Analysis and Design because we have all been taught that Change is Costly so we better get it right up front.  This places tremendous stress on Business Analysts and Architects to produce extensive documents.

In the Data Warehouse domain, change to the data model of the Data Warehouse looks extremely painful because it affects so many things.  So, naturally we aim to get the data model right up front.  In the current project I’m looking at, the immediate requirements are not so complex.  However, because we want get the data model right for all future requirements we have made the analysis task massive.

But yesterday I woke up and it (literally) dawned on me that we need to apply Agile principles to this Data Warehouse project.  But, I’ve only applied Agile to Software Development.  I really was not sure what this meant.

Agile Data Warehouse

If you’ve got a little time it doesn’t take long to find what you’re looking for.  I’d come across Scott Ambler’s writing on Agile before and good old google reminded me by bringing the Agile Modeling and Agile Best Practices for Data Warehousing essay is just what I was looking for.

Personal Branding and E2.0

September 2, 2009 1 comment

While I was reading through the FindFriend Feed on Enterprise2.0, I came across this really interesting blog on Personal Branding.  This ties in interestingly with my post on resource managment e2.0 style.

Categories: Uncategorized Tags: ,

Resource Management e2.0 Style

August 31, 2009 1 comment

Where I work, we are currently launching an initiative to create central pools of resources such as Business Analist, Testers, Architects, etc…   One of the key motivations is to preserve the skills and knowledge developed working on a project in our firm after the project is finished.

Prior to centralisation, people who were recruited for a given project would leave the company upon completion, taking their learnings with them.  The goal of the central team is to return these people to the pool from which they can be assigned to new projects rather than just leave.

E2.0 it instead of Centralise it

Thinking a bit differently, I ask, why don’t we E2.0 resource management  rather than centralise it?

What do you mean by Enterprise 2.0?

Enterprise 2.0 is basically applying Web 2.0 technology, primarily social networking, to the enterprise.   One way to learn more about E2.0 is to read a few blogs like this one.   Or look at this interesting application of e2.0 that aggregates content into a subject based feed on Enterprise 2.0.

How do I e2.0 resource management

Here is a site with 8 Tips for Successful Social Intranet Pilot to give some ideas on how to startup social networks on your intranet.

But perhaps it’s best to look at each of the aspects of resource management and review how we did it the old way and then recommend how you would do it the new way.

Social Software way

Here’s an interesting post on locating expertise in your enterprise which advocates using social media.  Pretty well supports my premise.

{Market}ecture vs Central Planning

August 29, 2009 Leave a comment

The term marketecture I got from a clever fellow at Queensland Treasury Corp who responded to one of my posts about applying free market principles to enterprise architecture.

A short time back I wrote about the merits of applying free market principles to Enterprise Architecture rather than the central planning model we generally see.  The premise of the post was, if central planning didn’t work for broader economies, why do we think it’s going to work for our large enterprises?

While I was working at Citigroup in the Global Architecture Team, one of our objectives was to identify duplicate capabilities and eliminate them.  While performing this task, there was a tendency to frown upon the development teams who had built this duplicate capability.  On the surface, this makes sense.  But as you dig a little deeper, you’d often find the duplicate capability had done the incumbant one better (at least).

My worry about punishing the duplication is that we were actually stifling innovation.  Innovation often comes when an ambitious person or team is determined to make things better.  There is almost always an existing capability that will be duplicated.

In the central planned process, only sanctioned or approved innovation can take place.  Those same ambitious innovators above are expected to present their case or idea to some planning committee who will prevent the innovation until they are satisfied it is totally risk free and will result in ROI for the company.

The primary problem with this is, those ambitious innovative types generally detest process and bureaucracy.  Next problem is, we cannot be certain that something innovative will succeed – until it’s been done at least once.

How can we have Marketecture

If we agree that innovation requires the freedom and suffers in a central planning environment, how do we do marketecture in an enterprise?

I think that’s grist for the next post.  If you have any thoughts or comments on this, please post a comment.

Granting a Monopoly to your IT Suppliers

August 28, 2009 Leave a comment

In Enterprise Architecture we are often fixated on standardization of technologies across our enterprise.  We publish standard technologies for services such as DBMS, Storage, Network as well as application services like workflow, reporting and reconciliation.

The rationale for standardization is pretty straight forward.  Diversity is costly.  Each technology product carries with is operational risks that must be managed.  There are legal and commercial relationships to be managed and operational and support costs for the platforms.

The down side of standardization

In this zeal for standardization though we might be blind to a few downsides of standardization.  Here are a couple:

  • Granting a Monopoly – through standardizing on a single vendor removes the competitive pressure on the vendor.
  • Concentration Risk – this is a term from financial markets that means too much of your risk exposure is concentrated in a single security or customer.  What happens when your standard supplier goes bust, which seems to be happening more frequently these days.

Bad things happen when you grant a monopoly

In your company it is no different than the broader economoy, when you grant a monopoly in a given sector, you eliminate the competitive pressure for a supplier to deliver quality and innovation.  You also put all your eggs in one basket.

What’s your view

Is this just too hypothetical? I’d like to hear other’s view on the topic.  So, please comment or direct me to your blog.

e2.0 it

July 28, 2009 Leave a comment

Yuk- I feel like I’ve just slipped into the swamp of latest coolest Web 2.0 est slime… But wait!   It’s actually cool.  Problem is, I think it’s changing so fast that whatever this  Link points at might not be there now.. wow.

But this is what I found… This fellow had a link to FriendFeed.  Friend Find is not a dating service!  It’s actually like a feed subscriber or aggregator that scans things like Twitter, Del.icio.us for you topic and displays it.  From there you go anywhere.  I found myself looking at the Enterprise 2.0 feed he created and following links to Tweets, etc.. until I realised it was getting very late and I should be getting off to bed.  So I Tweeted and update Facebook and now I’m posting to my blog.. I better go. Good night.

Target State Architecture without a Project

July 28, 2009 1 comment

Today I found myself explaining why the Department or Domain I have been responsible for nearly 6 months has not Target State Architecture.  Hmm.. you say.

Seriously – As an IT Architect responsible for a particular business domain in my company, one of the things I should create and maintain is the Target State Architecture.  If you’re an enterprise architect, this is quite obvious.  The problem I faced in developing target state though was the notion that in a Global Financial Crisis when a department head has decided they are not going to spend any money in the fore near future, the target state looks pretty much like the current.  It’s not the Dream State we’re talking about.  So, getting motivate to draw a diagram or write a description about something that isn’t likely to be is not fun. Read more…

Home Single Signon for Macs and PCs

July 12, 2009 Leave a comment

Why?

I have 4 kids and a wife and these days everyone wants to use a computer. I’m not going to buy 6 – so there’s going to have to be some sharing. Also, I can’t leave these things unsecured because I use OSX Parental Controls to manage how much time anyone can use a computer in a given day. I know it sounds really draconian. But it’s what you have to do with a big crowd.

What are my needs?

I’d like that when anyone logs in to any computer in the house, they get a consistent experience. Same user/password, access to their Documents and preferences set. Big ask you say? Well I hope not.

Home Server doesn’t have Single Signon

Paranoid Penguin is an article on setting up a single signon server in Linux. Reading through it I suddenly realize why this isn’t an out-of-the-box feature on the Linux Home Servers like Amahi which has many nice features and looks quite easy to setup.
Maybe I’ll see if I can’t talk the folks at Excito to add this to their Bubba Server.

Another Bubba 2 Fan

I found this Blog on setting up RAID on Bubba 2.  Something interesting.  I would like if the Bubba 2 just came with 2 drive bays in it so you have raid.

Another friend of mine

Categories: Uncategorized Tags: , ,

Why no Apple Home Server?

July 11, 2009 1 comment

Recently I’ve been wanting a decent home server. I have primarily Mac’s in the house, so thought it would be great to have an OSX based box that could serve as a Home Server.

I’d like to have the usual file sharing with some kind of mirroring. Also would like network profile so that we can all login to any mac in the house and get our files and preferences. Read more…

Categories: Uncategorized Tags: , , , ,