Dynamic Data Flow using COZYROC component Part II

In the previous post, I explained the problem and the envisaged solution using COZYROC Data flow plus component. This post is the continuation of the required setup for using dynamic data source without the need of knowing its metadata ahead of time.

In the foreach loop container that which loop over the data definitions there is an Execute SQL Script task. This serves two purposes:

  1. To populate an object variable used into the derived columns setup of the data flow plus
  2. To serve as starting point to be able to do branching depending on the query language

The execute SQL task executes the following query and saves the result set in a variable : User::DerivedColumn

SELECT ‘Alert Name’ as ResultColumn,
‘@[User::AlertName]’ as Expression,
‘DT_WSTR’ as DataType,
7 as Length    ,
0 as Precision,
0 as Scale,
0 as CodePage

Once the variable is set I am using an expression and constraint to decide whether I am dealing with SQL or MDX. i am doing that only because the data connection component for MDX is an MSOLAP providere where as the SQL is the SQL Native Client. This is not interchangable at runtime, therefore the branching there.

Apart from the data connection type that the content of the data flow plus component is excatly the same.

Setting up the Data flow plus component

The data flow plus is a peculiar component for performing high-performance data extraction, transformation and loading. The think that stroke me the first time I used it is that when you double-click on the designer surface you don’t get to the data flow designer but instead in the data flow plus editor where you can adjust the properties of all the components inside the data flow plus component. Does it sound complicated ? Well it is not once you are used to it. Unfortunately the documentation on the COZYROC website was not very complete about how to do things in the component. There are a few instructions video that are for simple and that work in a perfect world. Ask any ETL developer about how perfect the data extraction and transformation world is and he/she will that is far more complicated!!

To design the content of the data flow plus component you’ll need to do it in the data flow tab of the designer, and remember to pick the right data flow in the drop down !

image

Figure 1 – the data flow design surface

As you can see the content of the data flow look like any other data flow. But don’t be fooled. Once you open the OLE DB source, remember to always use the advanced editor for metadata reasons.

Never, never, never use the regular editor or you’ll lose all the setup of the ole db source!!

image

Figure 2 – The advanced editor for the OLE DB Source

1 – ValidateExternalMetadata should be false

2 – The User::Query variable is used for the query.

Now move to the Input and Output Properties and make sure that you only have one column for tehe Source Output External Columns and Output columns and in the Error Output columns.

The name of the column should be THUNK_COLUMN and nothing else. and the data type doesn’t really matter.

image

Figure 3 – The Input and Output properties

Once you have this column in place –and removed all other columns you can click on the Column Mappings tab and make sure the THUNK_COLUMN is mapped.

image

Figure 4 – Checking the column mapping

What is achieved here is telling the data flow plus component that the column in this regular OLE DB Source component will be dynamically assigned by other means.

I then added my derived columns, partly based on the User::DerivedColumn variable.

image

Figure 5 – Derived Column Transformation

And then, the destination. In my case a SharePoint Destination. Which is setup in the same way than the SharePoint List source from the previous blog post. Except that the mapping will be done somewhere else than on the mapping page since we only have a column named THUNK_COLUMN and the derived columns as a source. The derived columns are by the way completely ignored by the SharePoint Destination and have to be set in the same way than the THUNK_COLUMN. I’ll get back to that.

image

Figure 6 – SharePoint Destination mapping page

I didn’t mention it but in my case the Connections are set to have a dynamic connection string which comes from the variable User::ConnectionString as described in part I of the serie.

image

Figure 7 – Connection properties

Configuring the dynamic part of the Data Flow Plus

Now for the exciting part, which is by far standard SSIS, and the part that makes this component a great asset but also a tough debugging candidate : The configuration

When you double-click the component it shows and Editor tab which is comports 3 tabs : Setup, Dynamic and Advanced.

Advanced is for scripting in .Net to control the component’s behaviour and I haven’t tested that part at all. I saw some of the documentation on the COZYROC website and it seemed pretty straightforward. But again I didn’t try it…

Setup is empty except for two buttons for linking the editor to a data flow or exporting the editor to – I guess – an other data flow plus component for easy reusability. Clever !

Dynamic is the one I used, and inside you can see all the components that are used inside the data flow grouped into 3 categories (Source, Transformation, Destination). In our case:

  • Source = OLE DB Source
  • Transformation = Derived Column
  • Destination = SharePoint Destination

image

Figure 8 – The Cozyroc Data Flow Task Editor

A feature I miss from the Open Live Writer is the ability for doing Search and Replace now that I found out that COZYROC spells with capital letters. May be I write a plugin for doing just that at a later time.

Call out for the editor

  1. Enabling the OLE DB Source means telling the component whether or not is dynamic
  2. Enabling the SharePoint Destination is telling the component that the destination is dynamic
    A – The mapping SharePoint destination can only be done by mean of text file containing a list of column mappings in that format :
    image
    The left side being the input column and the right side the column name in SharePoint. In my case the Alert Name maps to the Title of the element in SharePoint and so on.
  3. Enabling the Derived Columns tells the component to use my variable for the values of the column. This has to be done by means of a variable, not a text file or anything else.

This one not trivial to figure out from the documentation on the website and I am grateful that Ivsan Peev was willing to help me. I sent it some drafts of my package with some of my ideas and problems and he was able to explain what needed to be change in order to accomodate the components. Without his assistance I would have struggled a bit longer on making this thing work !

After 1½ of design and work, I was able to present a prototype of the solution to the customer which thought it was solved in an elegant manner even though the interface for defining alert definitions (MDX or SQL query in a SharePoint list item) was too complicated for the business guys.

Conclusion

COZYROC Data Flow plus component was spot on for what I needed to do. I will without a doubt reuse it again in other circumstances. My only pain point was the lack of covering documentation (In my opinion videos on YouTube don’t cut it as documentation). But it was largely compensated by the helpfulness of the guys from COZYROC. Aleksandar Biks (L) and Ivan Peev (L) were both really quick to reply to my requests for help. Hats off for that !!

Happy SSIS’ing !

Advertisements

One comment

  1. Excellent step by step guide, many thanks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: