If you're a BI pro, do yourself a favor and take a few mins to watch it.
Updated with new link to YouTube version.
Speaking the language of business intelligence with an Australian accent
Posted by Nick Barclay at 4:04 PM
Full disclosure: The authors of this book provided me with a free review copy.
Time to dust off this blog and post something. Wow, has it been that long?
Microsoft SQL Server 2008 R2 Master Data Services is just what I look for when I want to get up and running with a new product. Jeremy, Tim and Martyn have written a book for getting up to speed on just about every aspect of MDS. Experienced users can use the book to ensure existing knowledge gaps are filled and experiment with more advanced functionality.
I’m a big fan of technical books like this one: Explain the concept, take the reader through step-by-step instructions, build on what has been created in previous examples. By the time the reader has finished the book they have created a set of interrelated artifacts and performed tasks that touch almost all the major functional areas of the product.
The authors didn’t stop at the basic functionality of MDS, a significant portion of the book is dedicated to the more advanced aspects of the product. T-SQL / SSIS based data loads, integration with SharePoint workflows, BizTalk and the MDS API are all covered. Lots of useful sample code and reference material earns the book a place on the within-easy-reach shelf.
Kudos to the authors for not making the book too heavy on the process rigor of master data management. They are careful to keep focus on setup, usage and extensibility of the product on which the book is based. Extra credit must also be given in that they managed to make the UI look simple and intuitive; the MDS web UI must be one of the clunkiest and difficult-to-use that I have had the displeasure to work with. The book makes it look easy. Hopefully MS is addressing this large shortfall in what is otherwise a pretty solid V1 product.
As when reviewing Alberto, Chris & Marco’s "Expert Cube Development" book, my primary criticism remains directed at the book’s publisher, not its authors. Packt’s layout formula does not recognize the importance of reference numbers and caption text beneath screenshots, tables and figures. In this day and age publishers are focusing more on delivering content digitally. The layout of digital publications is often performed dynamically on devices such as the Kindle or iPad, as a result what is considered “a page” is not necessarily the same because each reader’s personal settings may differ. Text referring to “the screenshot above” or “the previous page” may not necessarily be accurate or helpful. Which page? Which screenshot? Annoying. What is so difficult about “refer to Figure 1.2”, or “as listed in Table 3.4”?
All in all this is a book well worth getting hold of if you want to get stuck into all aspects of MDS from installation to advanced usage.
While I was creating the recent series of walkthrough posts on I put together a diagram of the major objects that make up an MDS model. I figured it was worth sharing.
The diagram below shows a single MDS instance containing a single model: Product. The aim is to show, at a high level, the relationships and some of the functionality found within an individual model. I’ve provided a brief sentence or two on my understanding of the objects contained in the diagram as a basic primer. Wherever possible I have linked to the online documentation for that particular feature.
MDS (Instance) the container of containers, the Master Data Services application itself.
Models are the primary container for specific groupings of master data. The example architecture diagram shows an MDS instance containing a single model: Product.
Entities are containers created within a model. Entities provide a home for members, and are in many ways analogous to database tables. Product, Color, SubCategory and Category entities exist in the sample diagram.
Members are analogous to the records in a database table (Entity). Members are contained within entities. Each member is made up of two or more attributes.
Attributes are analogous to the columns within a table (Entity). Attributes exist within entities and help describe members (the records within the table). Name and Code attributes are created by default for each entity and serve describe and uniquely identify leaf members. Attributes can be related to other attributes from other entities as seen in the diagram. For example the Color attribute of the Product entity is linked to the members contained in the Color attribute, so too the SubCategory and Category entities are related in the same way. These relationships are analogous to foreign key constraints.
Attribute Groups are explicitly defined collections of particular attributes. Say you may have an entity that is comprised of 50 different attributes; too much information for many of your users. Attribute groups enable the creation of custom sets of hand-picked attributes that are relevant for specific audiences.
Collections are customized subsets of members contained within hierarchies or other collections. Any entity that has a hierarchy associated with them supports the creation of collections. Shaun Ryan has put together a useful post on creating collections here.
Business Rules can be created and applied against model data to ensure that custom business logic is adhered to. In order to be committed into the system data must pass all business rule validations applied to them. In its current CTP version the business rules UI takes a bit of getting used to, nonetheless there is a lots of good functionality when it comes to information running the gauntlet before it is allowed in. Jeremy Kashel has a good introductory post on business rules here.
Subscription Views are views that can be created by appropriately privileged MDS admins in order to provide an appropriately named view for external systems to subscribe to. It should be noted MDS automatically creates views based on objects created within a model. Subscription views are separate from these and give admins control over the names and content. Shaun Ryan has written a post on the creation of subscription views here.
Versions provide system owners / administrators with the ability to Open, Lock or Commit a particular version of a model and the data contained within it at a particular point in time. As the content within a model varies / grows / shrinks over time versions provide a way of managing metadata so that subscribing systems can access to the correct content.
In homage to the Thanksgiving celebration about to take place in the USA I thought I’d place a bet on what I think will be the most overused (and least useful) feature of PPS 2010 analytic reports.
Multiple pie charts!
Those who have used ProClarity will recognize this multi-pie functionality. See how easily you can determine which of the clothing, bikes and components categories sold the most in CY 2008?
I believe that pie charts were included in PPS 2010 as a “required feature” by the sales team. If you listen carefully to some of the PPS team members as they present the latest features you can hear a slight tinge of cynicism in their voices as they say “oh yeah, we support pie charts now too…”
So I ask you, what’s better than a single pie chart?
OK, so we’ve created the objects and loaded data into them. Now we can have a closer look at what has happened to the MDS database. What has been built? Where is the data stored?
The aim of this final post is to get you started towards locating your data stored in the MDS repository database. There are plenty of ways to get at the data but we’re going to just take a quick peek at accessing the data via TSQL. Remember that TSQL isn’t the only way to get at this data. I just haven’t had much of a chance to have a detailed look at the MDS web service and API yet.
So where’s our Geography model data? Let’s start by finding the identifier of the model itself.
Note that there are a number of different metadata UDFs that can return scalar and tabular data for a variety of things such as model ID, I’m just going to do it the manual way for the purposes of demonstration.
Armed with the model ID we can take a look at the Entities defined within that model.
The MDS engine builds tables to store data for the objects that are created within models. Here is a list of all the tables created as a result of our efforts with the Geography (ID = 15) model.
Have a look inside the table that contains the records for the City entity, remembering that your own IDs (both for the model and the entity) will vary from mine.
Notice the uda_CAAPFLF reference in the TableColumn column for StateProvince. This provides a reference back to the column that links the StateProvince attribute to the City entity.
On top of the system-generated-model-centric tables that MDS generates there are also system views that already do much of the heavy lifting for you when it comes to getting at the data. Here are the views that MDS created by the Geography model objects.
Based on what we have built the most useful views are the …CHILDATTRIBUTES ones. These will return the records within a particular entity including all the attributes that have been defined on it.
All the human readable data is located at the far end of the table, so remember to scroll all the way to the right.
Note the friendly column names that have been created as part of the view definition.
If you want to look at the parent/child metadata that was defined as a result of the derived hierarchy we created look at the contents of the …PARENTCHILD_DERIVED views.
Hopefully this whirlwind tour of the MDS repository DB has been enough to pique your interest. Take time to explore the inner workings of the database and find all the good stuff that is baked into the product and how you can leverage it.
This post also marks the end of this series of walkthroughs, hope they were useful.
As the name suggests, derived hierarchies are derived from the relationships between entities within a model. In our Geography model we have used attributes to define a relationship between the City and the StateProvince entities and another one between StateProvince and CountryRegion. These relationships will enable the easy creation of a derived hierarchy. CountryRegion > StateProvince > City.
Once more browse to the Master Data Manager and select System Administration
In the Model Explorer page select Manage > Derived Hierarchies
In the Derived Hierarchy Maintenance page ensure Geography is selected in the Model dropdown and click the + sign to add a new derived hierarchy.
Type Cities in the Derived hierarchy name textbox. Click Save.
All that is left to do in the Edit Derived Hierarchy: Cities page is to drag and drop each of the desired entities from the Available Entities and Hierarchies area into the Current Levels area.
Start with the lowest (in this case the leaf) entity first. Drag and drop the City entity onto the Current levels: Cities area. Note that the preview area comes to life now too.
Now drag and drop the StateProvince entity from the Available Entities and Hierarchies onto the City item in the Current Levels area. Finally drag and drop the CountryRegion entity onto the StateProvince entity.
Once you’ve used up all three entities, you’ll be able to preview your complete derived hierarchy.
Now that we’ve defined some basic structures we can add some data. Through MDS' web based management interface we can manually add individual members or configure attributes on an entity one-by-one, or we can load them en masse. I'll leave the one-by-one method to the reader to figure out. What most will want to know is how to get a lot of data into the system in one hit.
For those familiar with the product formerly known as PerformancePoint Services 2007 Planning, the process of batch loading records is much the same. You insert the data to be loaded into system-defined staging tables, ensuring the appropriate metadata is defined on each record. MDS internal stored procedures are run over the staged data to check the validity of the records in accordance with the entity & attribute structures that have been set up. Each record is marked with a flag and an error code to show whether it has passed or failed validation and provides details as to why. Once validated and error-free the data can then be loaded into the appropriate area within MDS. Action can also be taken on the bad records in order to get them loaded too.
In this walkthrough we're going to:
Open up SSMS and connect to your MDS repository database (whatever you've called it). Mine's called "MDS".
Run the following TSQL to insert data into the mdm.tblStgMember table (I am assuming you've got the AdventureWorksDW2008R2 database installed on the same SQL instance)
If you wish you can have a look at the inserted records by running the following
Now that we know the data is in the staging table we can kick off the batch load process. Browse to the Master Data Manager web page and select the Integration Management option ensuring that you select Geography in the Model dropdown and VERSION_1 in the Version dropdown.
On the Import page (with the appropriate Model and Version selected in the corresponding dropdowns) note that there are 655 total member records that are flagged in the Unbatched Staging Records section. Click the Process button located above the Model dropdown.
Now the Staging Batches area at the top of the page comes to life showing that a new staging batch instance has been spun up for our 655 records. In the background the loading process is already running to validate our 655 records.
You can check on the status of the batch by clicking on the batch line item and then clicking the View details for selected batch button. The end-to-end process should take a few seconds.
Note that if you check the loaded records in SSMS (the mdm.tblStgMember table) the Status_ID field should be populated with a 1 indicating success. The ErrorCode field will be populated with “ERR210000” which I guess must mean “success” (doesn’t seem to be any doco on these codes at the time of this writing).
UPDATE (Nov 26th): MDS PM Kirk Haselden has listed all the staging table error codes here.
Once the batch has loaded we can check on the new members. Browse back to the Master Data Manager page by clicking on the MDS logo in the top left of the screen and select the Explorer option.
Select Entities > City to view the members that have been successfully loaded.
Note the yellow question marks next to each record. This means that business rule validation has not yet been run against these members. We won’t worry about business rules for the purposes of this walkthrough, you can ignore the yellow question marks for now.
If the question marks are really annoying you press the Apply Business Rules button (the one with the green check mark) to change them into green checks. We haven’t created any business rules to be applied so the change here is really only cosmetic in the context of this walkthrough.
Now we need to load members into the StateProvince and CountryRegion entities using the same process, just different TSQL. Once you have run the code to stage the members go back to the Integration Management screen kick off the batch process to load the members into their corresponding entities.
Insert values for the StateProvince attribute in the City entity.
Return to the Master Data Manager and ensure Geography and VERSION_1 are selected in the Model and Version dropdowns. Click the Explorer option and examine the members and attributes of both the City and the StateProvince entities. The attributes within each will now have been populated with the corresponding member of the appropriate entity.
City entity, StateProvince attribute
StateProvince entity, CountryRegion attribute
All that is left to do now is populate the freeform FrenchCountryRegionName and SpanishCountryRegionName attributes of the CountryRegion entity. Same method, different TSQL. Here is the code:
Attributes are defined within entities. An attribute contains values that help to describe the member they’re related to. For example our ProductName leaf entity within a Product model could have a freeform attribute defined to hold each item's Standard Cost or Weight. Attributes can also reference members of other entities defined within the same model. By referencing members in other entities we can maintain a master list of say Colors (in the Color entity) and then relate members of the product entity to the color entity, very much like a foreign key relationship (in fact, it is a foreign key relationship)
In this walkthrough we're going to create attributes on the City, StateProvince and CountryRegion entities within the Geography model.
Attributes are defined and maintained within entities, so on the Master Data Manager page select System Administration to administer the entities we created in the previous post.
In the Model Explorer page select Manage > Entities.
In the Entity Maintenance page ensure Geography is selected in the Model dropdown. Click the City entity to select it and note the toolbar buttons become visible. Click the pencil icon to edit the properties of the City entity.
In the Edit Entity: City screen in the Leaf Attributes section click the + sign to add a new attribute underneath the default Name and Code attributes that already exist by default.
In the Entity: City Add Attribute screen, select the Domain-based radio button, type StateProvince in the Name textbox and select StateProvince in the Entity dropdown. In the MDS repository DB this will create a physical foreign key constraint between the City and StateProvince entities. Click the save button when done. Click save again to save and exit the City entity maintenance screen.
Using the same steps as above create a domain-based attribute on the StateProvince entity with the name CountryRegion referring to CountryRegion entity. Click save and then save again to exit the StateProvince entity maintenance screen.
Now we'll add two attributes to the CountryRegion entity using the Free-form option, one for FrenchCountryRegionName and one for SpanishCountryRegionName. Use the same steps as before to create these two attributes.
Name: FrenchCountryRegionName, DataType: Text, Length: 100
Name: SpanishCountryRegionName, DataType: Text, Length: 100
After adding the FrenchCountryRegionName and SpanishCountryRegionName leaf attributes your CountryRegion entity should now look like the shot below.
Let's have a look at what we've got so far. Click on the Explorer link in the top left of the screen and click the Geography model to display the model and its entities on the right-hand side of the screen.
One or more Entities can be defined within a model. Entities are the foundational objects within an individual model and serve as the containers for Members, the data records themselves. For example a product model could contain entities such as ProductName, Category, SubCategory and Color to describe and classify the model contents. The Color entity would contain members for Blue, Red, Yellow etc. The ProductName entity would contain the names of the products themselves and so on.
In this walkthrough we're going to create entities for City, StateProvince and CountryRegion
In the Master Data Manager select System Administration which is where we will manage the structures that make up the Geography model.
In the Model Explorer page select Manage > Entities
When we created the Geography model we chose to automatically create an entity with the same name as the model. We’re going to change the name of that auto-created entity from Geography to City. In the Entity Maintenance screen select Geography in the Model dropdown. This will display any entities defined within the model.
Click the line for the Geography entity (the only one there). This will display the tools available to us for working with the selected entity. Click the pencil icon to edit the entity metadata.
In the Edit Entity: City section change the value in the Entity name textbox from Geography to City. Click save when done.
Now we're going to add two new entities. In the Entity Maintenance screen hit the + sign to add a new entity.
In the Add Entity screen enter StateProvince in the Entity name textbox and choose No in the Enable Explicit hierarchies and collections dropdown. Click the Save button.
Create another entity with the same settings but call this one CountryRegion.
Your list of entities should look like the shot below.
On to Creating Attributes
Models are the highest level container within an instance of MDS. Models are created to manage groups of similar data. In BI-speak it’s not much of a stretch to equate a model with a dimension, they’re not exactly the same but thinking about it in this way helps understand the concept. The two classic master data models are that you’ll see in most examples are Product or Customer. Once a model is created we can define objects within it including entities, attributes and hierarchies, among others.
In this walkthrough we're going to create a Geography model to manage our geographical master data. Subsequent walkthroughs will then build other objects inside our Geography model.
Browse to the Master Data Manager page, the primary management web page for MDS found (if default settings are used) at http://localhost/MDS. Click System Administration.
In the Model Explorer page select Manage > Models
In the Model Maintenance screen you will see a list of all the existing models. If you’ve just done a fresh install the only model you’ll see will be Metadata. Click the + button to create a new model.
Name the model Geography and click Save.
The Geography model has now been created.
On to Creating Entities
Like many other geeks out there I learn by doing. One of the things on my todo list has been to get familiar with MDS. During my experimentation with the recently released CTP I figured I'd take some notes on what I learned. These notes have evolved into a series of posts that will walk through some of the basics in putting MDS to work.
At this point we're only in the first public CTP, but everyone's just a bit curious to kick the MDS tires a bit. We all know there's quite a lot of functionality baked into the product in terms of workflow, versioning, web services and APIs but how about just the basics. These posts act as a quick start to see MDS in action. Once you’ve put some data into the system you can pull back the covers and have a look at how it happened and where the data is. We all learn something that way.
The walkthroughs will go through the creation of a very simple Geography MDS model based on the data contained in the DimGeography table in the SQL 2008R2 release of the AdventureWorks DW database. In the posts to follow we will walk through the following:
All posts assume that you have already installed MDS and have the AdventureWorksDW2008R2 DB set up on the same server.
On to Creating a Model
Many of you may know Don Dodge - he’s a start up and technology evangelist who, up until a day ago, worked for Microsoft. Apparently Don was part of the most recent round of layoffs. He was immediately snapped up by Google. Good for them.
The funny thing here is the contents of Don’s Thanks Microsoft, Hello Google post. While he’s completely entitled to his opinions, I am amazed at how quickly they changed. It really made me wonder just how much of an evangelist’s passion is determined by who signs their paycheck. As usual, Fake Steve Jobs provides analysis as only he can.
Posted by Nick Barclay at 8:13 PM
‘member what happened when Microsoft said “Hey, let’s bundle a reporting engine into the SQL Server license”, “Hey let’s bundle an OLAP engine into the SQL Server license”, “Hey let’s bundle an ETL engine into the SQL Server license”? Well, they’re doing it again.
The other day I downloaded and installed the latest CTP of SQL Server 2008 R2. Although there are plenty of good things to talk about in this release the one that really interests me (and many others) is Master Data Services or MDS, originally codenamed “Bulldog”. Once again Microsoft is being disruptive by bundling yet another <InsertNameHere> Services product into the SQL Server stack.
In its magic quadrants, Gartner splits analysis of Master Data vendors into Customer and Product master data categories. Their analysis of MDM players contains vendors that are very much enterprise focused and don’t sell huge volumes of licenses. Many of these vendors reference Fortune 500 companies as their customers. This reinforced by belief that MDM is very much an enterprise only playground. The license and maintenance revenue from small volumes of customers is enough to sustain these vendors’ business models. Translation: big license fees & big maintenance fees. I’m sure the products are worth every penny, but not every business can justify spending big money on buying and implementing MDM.
Companies that deal with hundreds of thousands, or even millions, of different SKUs or unique customers need a way to manage that one version of the truth for their incredibly large and complex global businesses. This is fine for those that can justify spending the amount of money needed to accomplish this, but what about the company with just 500 SKUs and 10,000 customers? They may still have tons of money, hell they my even be Fortune 500, but they may not have mountains of master data records to manage. Even the most cashed up companies would think twice about spending vast sums of money on ways to manage small volumes of critical master data. IMO the enterprise vendors are not interested in these companies and these companies are not interested in enterprise vendors.
Enter MDS. Cost? Included in SQL Server license.
I’m sure the established players in the MDM space are snickering behind their hands at Microsoft’s audacity in trying to muscle in on the MDM market. They’re already hard at work compiling comprehensive lists of “but does it have…?”, “can it do…?”, “it can’t…” and the ever-popular “C’mon, it’s Microsoft! Wait ‘till SP1 comes out.”
To be sure, there are plenty of good reasons the incumbents have as to why MDS may pale in comparison to their own technology stack. There is no question that MDS will be playing catch up here. Most of the others have been in business a long time and have excellent, very mature products. No argument there. Keep in mind, though, that MDS is also based on a pretty mature MDM product, Stratature, that was acquired by Microsoft in 2007. Nonetheless I’m sure there will not be as many features baked into MDS v1 when compared with the other market players.
The incumbents are focused on the big enterprise fish who have nasty, hairy, complicated master data problems that need to be solved. Of course, that’s their target market. These are the customers who will can (and want to) pay for what the incumbents have to offer. No doubt it’s good stuff, but what about the business who just wants a central place to manage the names and hierarchies of their 100-ish sales territories and their exclusive list of 2,000 customers? Do they need all the enterprise MDM bells, whistles and cost? Probably not. They’ve been making do with Excel. Until now.
Raise your hand if you’ve ever created a lookup table that had to be maintained or watched over by someone who is umm, not so technical. If you suggested forking out a large pile of cash to purchase MDM software to assist this non-technical person maintain proper control of small volumes of simple data you were probably laughed at.
“Do it in Excel”, “Create a table in a DB and build a UI for maintenance”, “Use Access…”. These are the thin-end-of-the-wedge scenarios that will allow MDS to gain footholds in places that the other vendors would not even get out of bed for. Like Analysis Services and Reporting Services the barriers of entry for the IT geeks to start playing around with and eventually deploying MDM into production will be drastically lowered.
Don’t get me wrong, there’s still going to be plenty of big, complex enterprise MDM scenarios that MDS will tackle as well. *But* (I think) there is going to be a whole new breed of non-enterprise MDM customer that will start making themselves known very soon.
When you start looking into the world of MDM one of the first things you quickly realize is that the software, while critical to the process, is not even close to the complete solution. Anyone who has been involved with an MDM project will tell you that while good software definitely helps, the real success of a master data initiative is inexorably linked to all of the 4 P’s of MDM:
Notice that product only makes up 25% of this. Figuring out the technicalities how to use MDS is not going to be much of a chore for most BI / IT pros. The real challenge is getting the other 75% of the 4 P’s in place.
MDS, as part of the SQL stack, frees up funds to spend on getting all four P’s right. There is reasonable about of consulting hours that can be purchased with the money saved when you don’t see any increase in SW license costs.
As I get stuck into the internals of the product I will blog more. From what I’ve toyed around with so far the product looks interesting, is simple to set up, and should be pretty easy for both geeks (getting stuck into the DB, web services and API) and the non-geeks (who will use the web-based UI to manage things) to get a handle on. On digging into some of the more complex looking objects within the repository DB and web-based UI you can see that there is a lot of good stuff to explore and experiment with.
Jamie Thompson blogged about his delight in discovering that MDS implements some very cool Regex and Fuzzy lookup functionality. I’m sure Jamie’s next question was whether there are any MDS-flavored custom SSIS transforms or tasks included in the initial release. I asked the same question. Answer: There aren’t any. Yet. The group PM for MDS is Kirk Haselden, you may remember Kirk from such products as SSIS where was the dev manager and one of the product’s primary designers. With Kirk’s involvement you can be pretty damn sure there will be some SSIS goodness that will make its way into MDS at some point in the foreseeable future. For now, though, you can interact with data via the MDS web service, API or just plain ‘ol TSQL. Plenty of options there for SSIS to hook into. More on this as I play around with the product.
MDS will go head-to-head with the established enterprise MDM players, no question. In the short term the product will probably not make much headway in that market, though. No surprises there. However, think of what the potential is for businesses that the big vendors don’t care about right now. Those who own SQL Server licenses and have even the smallest requirement for managed master data are all fair game.