I recently wrote a pipeline that needed to access the data, in order to use that data for setting one of the context properties (I can explain why in a future blog). In the past, I might have used XmlDoc.Load and used XPath to process the document, but these days, I’m more savvy on how to use XmlTextReader.
I want to summarize and add some comments about an issue I recently had (reported on MSDN forum, but solved myself):
Why should we be using XmlTextReader in a pipeline instead of a System.Xml.XmlDocument? As this StackOverflow article explains, it’s about memory usage. If you know your BizTalk messages are small (and to me, that probably means under a megabyte), you could probably use XmlDocument and not see much of an issue (unless you were doing extremely high volume of messages). XmlDocument will load the entire message in memory at one time. So if your message was huge (like 1 Gig for example), each running instance of your pipeline would need 1 Gig of memory, and that could result in performance issues. What if your trading partners sent you 30 1 Gig messages all within the same few minutes?
By using a “Stream,” that data can be processed in small chunks. With the BizTalk “VirtualStream” class/object, you can even set the buffer size. For more info, check out this blog for great background info on .NET Streams.
So here’s my tale of woe. I created a pipeline, processed the stream using XmlTestReader, found the desired data and used it to set the OutboundTransportLocation (the URL of a WCF service, that I was trying to make dynamic, while yet using a static port). My code worked fine when I hardcoded a value, but once I put in the XmlTextReader, it started blowing up with this error:
<pre> System.Xml.XmlException: Root element is missing. at System.Xml.XmlTextReaderImpl.Throw(Exception e) at System.Xml.XmlTextReaderImpl.ParseDocumentContent() at System.Xml.XmlTextReaderImpl.Read() at System.Xml.XmlReader.MoveToContent() at Microsoft.BizTalk.Adapter.Wcf.Runtime.BizTalkBodyWriter.ValidateStreamContainsXml(Stream stream) at Microsoft.BizTalk.Adapter.Wcf.Runtime.WcfMarshaller.CreateWcfMessage(CreateWcfMessageSettings settings) at Microsoft.BizTalk.Adapter.Wcf.Runtime.WcfClient`2.SendRequestMessage(IBaseMessage bizTalkMessage, IRequestChannel channel) at Microsoft.BizTalk.Adapter.Wcf.Runtime.WcfClient`2.SendMessage(IBaseMessage bizTalkMessage) </pre>
I searched the web for solutions, and didn’t find anything obvious, which is why I’m creating this blog. Somewhere from back in the deep dark memories of my brain, I recalled something about doing a Seek in pipelines, so I “googled” and found this article titled: Handling Incoming Data Streams in Pipeline Components.
That article made perfect sense. It says:
If you do not do this and the stream is read to the end in the current component, the next component receives what appears to be an empty stream because the data stream pointer was not set to the start of the stream. This can cause unexpected parsing and validation errors in subsequent pipeline components.
So if I’m at the end of the stream, and the stream gets passed on to the next step, and that step tried to read the stream, it would start at the current position of the stream, which could be the middle or end of the stream (depending on how I coded my XmlReader loop). Actually, I stopped as soon as I found the value I needed, so most likely, I left the seek position in the middle of the stream. So the next XmlTextReader starts from there, and it finds no root element, or basically invalid XML. If you take the second half of an XML file, it would not be well-formed; it would be missing it’s root element, and the final element wouldn’t match back to the root.
I cannot include my entire pipeline here, but here is short sample code using Virtual Stream in a BizTalk Pipeline. Instead of using the “Seek” method, that codes does the following to set the “Position” property, which accomplishes the same:
vStream.Position = 0;
For further info: MSDN Article: Optimizing Pipeline Performance