When you create a new application, sometimes you have a choice between doing it in an orchestration vs doing the whole thing with send and receive ports and no orchestrations. The later is called “content based routing,” where messages are routed based on the Filters in the Send Ports.
I’m more or lesson assuming you are not totally new to BizTalk. If you need the basics of routing, see this blog, Pro BizTalk 2009 book, or this content based routing tutorial.
SendPort Filters
Filters are essential subscriptions to messages. BizTalk is known as a “Pub-Sub” architecture. A receive port/location publishes a message to the message box, and then there can be multiple subcribers (i.e. multiple send ports that pick up and process that message).
Below is a screen shot of the Filters options on a SendPort. You can select a Property from the drop-down list of properties. These are also known as context properties; additional fields about your message that are passed around with your message, but not part of the actual XML of the message.
NOTE: When you promote a field in a schema, and then deploy that application/project, the field from the property schema will also appear in this list.
So for example, you have a field called PO_TYPE which has multiple values. You could then have a SendPort which subscribes to one or two (or more) of those values. (To subscribe to two values, select the property again on the second row, then select “OR” in the “Group by” column.
The most common filters are:
1) BTS.ReceivePortName – where you want to subscribe to all messages (often files) from a Receive Port.
Note: There is no filter available for ReceiveLocation, only ReceivePort.
2) BTS.MessageType –
For example, with a recent EDI application, I had the following filter:
BTS.MessageType = http://schemas.microsoft.com/BizTalk/EDI/X12/2006#?X12_00503_850.
The message type consists of two parts: 1) The target namespace of the schema and 2) the root node name (separated by the pound # sign).
3) Custom Properties – your own promoted fields from a property schema. (See note above the image). Common Examples: PO_Type, State, Region_Code, Country_Code, CustomerID, ApplicationID, etc…
I would say that about 95% of the filters I’ve used fit in the above three and combinations of them.
The Problems with Content Based Routing
Content based routing can like spaghetti. Years ago, before structured programming, we used to use the term “spaghetti code” to describe code that branched all over the place. Well, content based routing can be the same.
When I go to a new company, one of the first things I have to do is create a Visio diagram of the routing. It can often take 2 to 16 hours to create it, depending on the complexity. There is no standard way to draw these types of diagrams. (There are BizTalk Stencils for Visio (by Sandro Pereira) but I’ve never really tried them because they don’t seem to include everything I want to include on my diagrams.) With a free-style diagram, you can also include disk folders, databases, FTP and web server names if needed. Sorry, but I cannot share my diagrams at this time, and I’m not sure how useful it would be to mock-up a dummy one; I would almost need a full case study to do that properly.
Why does it take so long to untangle and draw the diagram. Well, the names of the receive/send ports help, but you really need to open each of them, and understand the interaction of the following:
1) Maps
2) Pipelines
3) Filters
I can’t find it now, but I think I posted some SQL to help xref SendPort filters to receive port names. You should be able to run a SQL query against the BizTalkMgmtDb that finds all SendPorts have a specific filter. This is particular useful if your data is not contained in one application; for example, a SendPort in Application-X subscribes to data from Application-Y.
Just this week, I created a series of ports for processing an EDI 850 without an orchestration. I have 11 send ports and 7 receive locations. I could have done it with less ports, but then I would the ability to trace and archive.
What do I mean by “Trace and Archive?” It’s a very common requirement to archive all incoming files. (One way to do it would be to create an archive pipeline component and store the data as a blob in a SQL database column, but many clients are not keen on that idea.) So to do it with pure BizTalk, here’s how it goes.
1) Receive the file with a pass-thru pipeline and have two send ports
a) SendPort 1 – is the archive to disk of the original untouched and unprocessed file (i.e. still in the raw EDI format).
(Note: I always add a %MessageID% on the filename to guarantee uniqueness, in case I have an issue and have to fix and redrop the same file multiple times.)
b) SendPort 2 – is write to disk so it can be read in again with the EDI Pipeline component
Why archive? Suppose the file blows up when you are storing it into the database, how would you reprocess it? By keeping an archive, you can open the file with an editor, modify it, and redrop it be processed again.
The archive is also a “Trace” (or history) of what happened. Suppose you need to go to Visual Studio and check why a map has a bug, you need the original file to test the map.
If things are running well, the Send Port that goes the archive can of course be disabled, and everything should still work (be sure to test this scenario in your test environment though).
Adding archives to an existing system can be tricky; you sometimes have to insert extract receives and sends just for the archive. If the Receive Port has a map or a pipeline, then by the time it gets to the SendPort is has already changed.
Thus, this is what is frustrating about content based routing.
1) If you want archive/trace, it requires extra sends/receives, and thus extra I/O and extract network traffic.
2) You can end up with spaghetti ports. And as a good developer, you need to create some diagram to show the flow of the data. Then it’s up to you to make sure that Visio is saved in a safe place (such as Sharepoint) or maybe even checked into your source code control system along with the project. If the next developer cannot find it or doesn’t know it exists, it cannot help him or her.
Some people do archive/tracing by using BizTalk’s built-in message tracking. That has its own sets of advantages and disadvantages. It’s built-in and fast. But, as a developer, I often need access to this data. In maybe 1/2 of the clients I’ve worked for, I don’t have access to BTAdmin in production (or at least not every developer has access). By writing the files to the file system or database, they can be easily be shared and accessed. Even non-BizTalk people and users can access the file system; it’s a common denominator everyone can easily understand and use.
When using disk archives, you will need to have a pruning/delete or clean-up process in place. Specify some retention period, then for example delete all files over 30 days old. If you process 1000s of files a day, then opening a disk directory with 30,000 files can be very very slow. I recently posted a Powershell to Archive to Dated Subdirectories.
Another advantage of files on disks is that you can search them. I love a tool called “Total Commander”. I can search for all file with a mask of “edi*.txt” that contain the string “REF^BM^11003~” for example. Or all *.xml files that contain “<PO_TYPE>ABC</PO_TYPE>”. It also allows me to search using regular expressions for more difficult searches. How in the world could you ever do this with BizTalk’s built-in Message Tracking???
If you store the data in an SQL/XML column then you can use power of XQuery.
Make Sure you Get an Exact Archive
Today, I was testing sending out an 855.
I somewhat naively did the following as my first setup:
1) Receive 855 in XML format from our system. That receive maps to X12 855 format.
2) Send Port sends to Disk which is tied to FTP directory. This send port uses the EDI Pipeline.
3) To create archive, I just added a second Send Port, which sends message to my Archive disk directory. It uses the exact same EDI Pipeline.
Both SendPorts 2) and 3) above were identical except for the name of the disk file they were tied to.
What is the problem with the above scenario? The EDI pipeline actually uses the BizTalk EDI database to assign the interchange ID. So my archive file had a different interchange ID than the file I actually sent out (and unnecessarily bumped up the interchange ID).
To correct the above, I would drop Send Port of Step 3. I would route SendPort in step 2 to disk
Then I would have to create another 3 ports.
4) A new receive port to re-read what was written to disk.
5) A new send port to write the file to the disk which is tied to the FTP directory. 6) A new send port to write the file to the archive disk.
So above is a small example of your receive and send ports can mushroom quickly with content based routing in BizTalk.
It would be nice if you could write to message box directly and not to the disk to accomplish the above. Check out the BizTalk LoopBack Adapter and here also; I haven’t used it yet, sounds like it more for orchestrations (since it is a two-way send port). (Note: The loopback adapte is user contributed code, not part of the base BizTalk install.)
Orchestrations
Orchestrations generally have more overhead than content based routing; but I’ve never honestly benchmarked it. Ideally, if you really wanted to know, you would try processing something like 50,000 messages with content-based routing and then with orchestration and see what the time difference is.
This BizTalk 360 Tip Series says “Avoid Orchestrations where Possible“, and alludes to the performance hit.
You can write to your archive/trace in the orchestration. In this case, your orchestration can sort of take the place of the Visio diagram I described above. It’s a little more self-documenting than trying to read/open all the send/receive ports to try to figure them out. I’m not saying you should switch, but it is probably easier to read and understand the orchestration, then plowing through 10 to 20 send/receive ports, trying to match up the filters.
In one case, I was able to take an existing orchestration, and rebind and orchestration’s internal Send from a SendPort to a SendPortGroup. That allowed me to write the message to an archive, and the place it was going to before.
Comments Appreciated
I’d love to hear what you think. Please comment below. Which do your prefer and why? How do you do trace/archive in your BizTalk applications? Did I say anything you disagree with? Then tell me!