This C# code fragment illustrates how to use the RegEx Captures collection within RegEx Groups.

I needed the ability to read all files in a directory, and build a CSV to cross reference the PO number to the EDI file it was in.

            //Pattern we want, but then escape all the ^ signs  "REF^BM^(\d*)~";
            string strRegExPattern856 = @"REF^BM^(\d*)~".Replace("^", @"\^");

public static void parseFile(string strDirectoryName, string filemask, string prefix, 
                             string filetype, string strRegExPattern, 
                             StreamWriter objFileSW, int maxDaysBack)
{
		string[] filenameArray = 
                  Directory.GetFiles(strDirectoryName, filemask, SearchOption.TopDirectoryOnly);

		foreach (string filename in filenameArray)
		{
			Console.WriteLine("Processing file=" + filename);
			string fileContents = File.ReadAllText(filename);

			foreach (Match m in Regex.Matches(fileContents, strRegExPattern))
			{
				//Console.WriteLine("'{0}' found at index {1}.",m.Value, m.Index);
				string showFilename = Path.GetFileName(filename);
				FileInfo fileInfo = new FileInfo(filename);
				TimeSpan daysBack = DateTime.Now.Subtract(fileInfo.LastWriteTime);
				
				if (daysBack.Days < maxDaysBack)
				{

					//Worthless Capture collection! 
					/*
					foreach (Capture capture in m.Captures)
					{
					   Console.WriteLine("Index={0}, Value={1}", capture.Index, capture.Value);
					}
					*/

					int loopCounter = 0;
					foreach (Group grp in m.Groups)
					{

						loopCounter++;
						if (loopCounter == 2)  // trying to avoid subscript error 
						{
							string PONum = grp.Captures[0].Value;
							Console.WriteLine(loopCounter + " " + showFilename + " " + PONum);
							string outline = prefix + "," +
                                                                filetype + "," + 
                                                                showFilename + "," + 
                                                                fileInfo.LastWriteTime +
                                                                "," + PONum;
							objFileSW.WriteLine(outline);
						}
					}
				}
				else
				{
					Console.WriteLine("Skipping file " + showFilename + " Created=" + 
                                                          fileInfo.CreationTime + 
                                                          " Over " + maxDaysBack + " Days Ago"); 
				}

			}  // end foreach 

	}
} 


Was getting this error

xlang/s engine event log entry: Uncaught exception (see the 'inner exception' below) has suspended an instance of service 'MyProjectName.MyOrchName(8f69ac91-51ae-67be-b342-b3ee431859ca)'.
The service instance will remain suspended until administratively resumed or terminated.
If resumed the instance will continue from its last persisted state and may re-throw the same unexpected exception.
InstanceId: a38b30ab-455a-479e-8015-ac6d6960b5e9
Shape name: MyMessageAssignmentShape1
ShapeId: 426c4c2f-d3b1-4903-b4f3-648b275f899a
Exception thrown from: segment 2, progress 9
Inner exception: Object reference not set to an instance of an object.

Exception type: NullReferenceException
Source: MyOrchName
Target Site: Microsoft.XLANGs.Core.StopConditions segment2(Microsoft.XLANGs.Core.StopConditions)
The following is a stack trace that identifies the location where the exception occured

at MyProjectName.MyOrchName.segment2(StopConditions stopOn)
at Microsoft.XLANGs.Core.SegmentScheduler.RunASegment(Segment s, StopConditions stopCond, Exception&amp; exp)

Solution

I added a lot of trace/diagnostic statements and found it was blowing on this line of code, which was in a message assignment:

vXmlDocMsgToProcess = msgCanonical;

I simply added a “New” statement on XmlDocument to recreate the object/reference. Not sure 100% why it was required, but it made BizTalk happy. I seem to remember from years back when I was doing a lot of loops, that it’s often a good idea to reset your XmlDocument when in loops.

vXmlDocMsgToProcess = new System.Xml.XmlDocument();
vXmlDocMsgToProcess = msgCanonical;

 

This worked fine, but only for a while. When I removed some trace statement later, the exact same error came back in the exact same place.

2. The last thing I did, was to create a new variable: vXmlDocMsgToProcess2 and replaced the variable above.
So far, it is working.

I think this is a Microsoft bug, not sure if my machine has all the patches and cumulative updates on it.

3. A co-worker recommeneding doing the “new System.Xml.XmlDocument()” one time before the loop. Said it solved his similar issue.

How do you use the XPath statement on the left side of an assignment statement (i.e.. the left side of an equation or the equal sign)?  I’m referring to a message assignment shape, in an orchestration, where you want

This is the MSDN page for the BizTalk XPath statement.  There are two problems with that page.

1) Xpath subscripts ared 1-based and not 0-based. See http://stackoverflow.com/questions/3319341/why-do-indexes-in-xpath-start-with-1-and-not-0.
Seems like that page might have a mistake, is says “select the fourth book element”, but shows xpath of “/catalog/book[3]”.
2) It doesn’t show a good example of how to set an XML element value.

The “String” function, is used only the right side of the assignment statement. It is used to convert a “node” to a string value.
You can copy/paste the XPATH from the properties box on a schema or map (after clicking on the element or attribute).
I like to put the xpath in a string variable. I then had to add the subscript, which is needed if you have an element that has “Max Occurs=UNBOUNDED”.
In this case, I only had one MESSAGE_HEADER child, but I still had to specify the subscript of [1].

//NOTE: Substring [0] after MessageHeader
strXPATH = "/*[local-name()='MyRoot' and namespace-uri()='']/*[local-name()='MESSAGE' and namespace-uri()='']/*[local-name()='MESSAGE_HEADER' and namespace-uri()=''][1]/*[local-name()='FILE_NAME' and namespace-uri()='']/text()";
xpath(msgCanonical, strXPATH) = strFileRcvdName;

Note the following;

1) If you leave off the /text() statement, you might get this error:
“selected a node which is not valid for property or distinguished field retrieval, or it selected no node at all. Only text-only elements or attributes may be selected”.

Other errors related to bad Xpath:

Inner exception: Expression must evaluate to a node-set” – I think this happens if you put the String statement on the left side of the assignment statement.

NOTE: A few months after writing this post, I copied the above into a different message assignment shape, and got this error:
Inner exception: Illegal attempt to update the value of part ‘part’ in XLANG/s message ‘msgCanonical’ after the message construction was complete.
The issue was that in that orchestration, the message name should have been msgCanonicalCombined, as that was the message being constructed. It turns out I also had a msgCanonical in the same orchestration. Thus, the errors was subtly telling me that msgCanonical cannot be changed; I need to build a new message. Well, that is what I was doing; I just made a copy/paste typo. Changing to “xpath(msgCanonicalCombined, strXPATH) = strFileRcvdName;” fixed the error.

When you create a new application, sometimes you have a choice between doing it in an orchestration vs doing the whole thing with send and receive ports and no orchestrations.  The later is called “content based routing,” where messages are routed based on the Filters in the Send Ports.

I’m more or lesson assuming you are not totally new to BizTalk.  If you need the basics of routing, see this blog, Pro BizTalk 2009 book, or this content based routing tutorial.

SendPort Filters

Filters are essential subscriptions to messages.  BizTalk is known as a “Pub-Sub” architecture.  A receive port/location publishes a message to the message box, and then there can be multiple subcribers (i.e. multiple send ports that pick up and process that message).

Below is a screen shot of the Filters options on a SendPort.  You can select a Property from the drop-down list of properties.  These are also known as context properties; additional fields about your message that are passed around with your message, but not part of the actual XML of the message.

NOTE: When you promote a field in a schema, and then deploy that application/project, the field from the property schema will also appear in this list.
So for example, you have a field called PO_TYPE which has multiple values. You could then have a SendPort which subscribes to one or two (or more) of those values.  (To subscribe to two values, select the property again on the second row, then select “OR” in the “Group by” column.

BizTalk_SendPort_Common_Filters

The most common filters are:

1) BTS.ReceivePortName – where you want to subscribe to all messages (often files) from a Receive Port.
Note: There is no filter available for ReceiveLocation, only ReceivePort.

2) BTS.MessageType –
For example, with a recent EDI application, I had the following filter:

BTS.MessageType = http://schemas.microsoft.com/BizTalk/EDI/X12/2006#?X12_00503_850.
The message type consists of two parts: 1) The target namespace of the schema and 2) the root node name (separated by the pound # sign).

3) Custom Properties – your own promoted fields from a property schema.  (See note above the image).  Common Examples: PO_Type, State, Region_Code, Country_Code, CustomerID, ApplicationID, etc…

I would say that about 95% of the filters I’ve used fit in the above three and combinations of them.

The Problems with Content Based Routing

Content based routing can like spaghetti.  Years ago, before structured programming, we used to use the term “spaghetti code” to describe code that branched all over the place.  Well, content based routing can be the same.

When I go to a new company, one of the first things I have to do is create a Visio diagram of the routing.  It can often take 2 to 16 hours to create it, depending on the complexity. There is no standard way to draw these types of diagrams.  (There are BizTalk Stencils for Visio (by Sandro Pereira) but I’ve never really tried them because they don’t seem to include everything I want to include on my diagrams.)  With a free-style diagram, you can also include disk folders, databases, FTP and web server names if needed.   Sorry, but I cannot share my diagrams at this time, and I’m not sure how useful it would be to mock-up a dummy one; I would almost need a full case study to do that properly.

Why does it take so long to untangle and draw the diagram.  Well, the names of the receive/send ports help, but you really need to open each of them, and understand the interaction of the following:

1) Maps
2) Pipelines
3) Filters

I can’t find it now, but I think I posted some SQL to help xref SendPort filters to receive port names.  You should be able to run a SQL query against the BizTalkMgmtDb that finds all SendPorts have a specific filter.  This is particular useful if your data is not contained in one application; for example, a SendPort in Application-X subscribes to data from Application-Y.

Just this week, I created a series of ports for processing an EDI 850 without an orchestration.  I have 11 send ports and 7 receive locations. I could have done it with less ports, but then I would the ability to trace and archive.

What do I mean by “Trace and Archive?”  It’s a very common requirement to archive all incoming files.  (One way to do it would be to create an archive pipeline component and store the data as a blob in a SQL database column, but many clients are not keen on that idea.)   So to do it with pure BizTalk, here’s how it goes.

1) Receive the file with a pass-thru pipeline and have two send ports
a) SendPort 1 – is the archive to disk of the original untouched and unprocessed file (i.e. still in the raw EDI format).
(Note: I always add a %MessageID% on the filename to guarantee uniqueness, in case I have an issue and have to fix and redrop the same file multiple times.)
b) SendPort 2 – is write to disk so it can be read in again with the EDI Pipeline component

Why archive? Suppose the file blows up when you are storing it into the database, how would you reprocess it?  By keeping an archive, you can open the file with an editor, modify it, and redrop it be processed again.

The archive is also a “Trace” (or history) of what happened. Suppose you need to go to Visual Studio and check why a map has a bug, you need the original file to test the map.

If things are running well, the Send Port that goes the archive can of course be disabled, and everything should still work (be sure to test this scenario in your test environment though).

Adding archives to an existing system can be tricky; you sometimes have to insert extract receives and sends just for the archive.  If the Receive Port has a map or a pipeline, then by the time it gets to the SendPort is has already changed.

Thus, this is what is frustrating about content based routing.
1) If you want archive/trace, it requires extra sends/receives, and thus extra I/O and extract network traffic.
2) You can end up with spaghetti ports.  And as a good developer, you need to create some diagram to show the flow of the data. Then it’s up to you to make sure that Visio is saved in a safe place (such as Sharepoint) or maybe even checked into your source code control system along with the project.  If the next developer cannot find it or doesn’t know it exists, it cannot help him or her.

Some people do archive/tracing by using BizTalk’s built-in message tracking.  That has its own sets of advantages and disadvantages.  It’s built-in and fast.  But, as a developer, I often need access to this data.  In maybe 1/2 of the clients I’ve worked for, I don’t have access to BTAdmin in production (or at least not every developer has access).  By writing the files to the file system or database, they can be easily be shared and accessed.  Even non-BizTalk people and users can access the file system; it’s a common denominator everyone can easily understand and use.

When using disk archives, you will need to have a pruning/delete or clean-up process in place.  Specify some retention period, then for example delete all files over 30 days old.  If you process 1000s of files a day, then opening a disk directory with 30,000 files can be very very slow.  I recently posted a Powershell to Archive to Dated Subdirectories.

Another advantage of files on disks is that you can search them.  I love a tool called “Total Commander”.  I can search for all file with a mask of “edi*.txt” that contain the string “REF^BM^11003~” for example.  Or all *.xml files that contain “<PO_TYPE>ABC</PO_TYPE>”.  It also allows me to search using regular expressions for more difficult searches.  How in the world could you ever do this with BizTalk’s built-in Message Tracking???

If you store the data in an SQL/XML column then you can use power of XQuery.

Make Sure you Get an Exact Archive

Today, I was testing sending out an 855.

I somewhat naively did the following as my first setup:

1) Receive 855 in XML format from our system.  That receive maps to X12 855 format.
2) Send Port sends to Disk which is tied to FTP directory.  This send port uses the EDI Pipeline.
3) To create archive, I just added a second Send Port, which sends message to my Archive disk directory.  It uses the exact same EDI Pipeline.

Both SendPorts 2) and 3) above were identical except for the name of the disk file they were tied to.

What is the problem with the above scenario?  The EDI pipeline actually uses the BizTalk EDI database to assign the interchange ID.  So my archive file had a different interchange ID than the file I actually sent out (and unnecessarily bumped up the interchange ID).

To correct the above, I would drop Send Port of Step 3. I would route SendPort in step 2 to disk
Then I would have to create another 3 ports.

4) A new receive port to re-read what was written to disk.
5) A new send port to write the file to the disk which is tied to the FTP directory. 6) A new send port to write the file to the archive disk.

So above is a small example of your receive and send ports can mushroom quickly with content based routing in BizTalk.
It would be nice if you could write to message box directly and not to the disk to accomplish the above.  Check out the BizTalk LoopBack Adapter and here also; I haven’t used it yet, sounds like it more for orchestrations (since it is a two-way send port). (Note: The loopback adapte is user contributed code, not part of the base BizTalk install.)

Orchestrations

Orchestrations generally have more overhead than content based routing; but I’ve never honestly benchmarked it.  Ideally, if you really wanted to know, you would try processing something like 50,000 messages with content-based routing and then with orchestration and see what the time difference is.

This BizTalk 360 Tip Series says “Avoid Orchestrations where Possible“, and alludes to the performance hit.

You can write to your archive/trace in the orchestration.  In this case, your orchestration can sort of take the place of the Visio diagram I described above.  It’s a little more self-documenting than trying to read/open all the send/receive ports to try to figure them out.  I’m not saying you should switch, but it is probably easier to read and understand the orchestration, then plowing through 10 to 20 send/receive ports, trying to match up the filters.

In one case, I was able to take an existing orchestration, and rebind and orchestration’s internal Send from a SendPort to a SendPortGroup.  That allowed me to write the message to an archive, and the place it was going to before.

Comments Appreciated

I’d love to hear what you think.  Please comment below. Which do your prefer and why?  How do you do trace/archive in your BizTalk applications?  Did I say anything you disagree with? Then tell me!