How to easily remove XML end tag that has no start tag in Workflow?

b-ssb · November 26, 2021, 6:53pm

Hello,

(1) Could you please tell me how to easily remove an XML end tag that has no start tag in Workflow?

My situation is that I have about 25 sample XMLs that need to be reformatted using Workflow so that they all work with the same DataMapper. 24 of the 25 work just fine with my Workflow.

Within Workflow I am using the Search and Replace Action and I don’t see a way to remove the XML end tag in question without messing up a handful of the other 24 sample XMLs that work fine now.

(2) Also, is there a faster way than using the Search and Replace Action in Workflow to reformat XMLs? The Search and Replace Action I’m using has 10 steps in it, which I assume means that it reads the entire file 10 times. The entire workflow takes a bit over 2 seconds, but if I could decrease that then I’d like to.

Thank you very much,

Brian

jouberto · November 26, 2021, 7:13pm

Hi,

If you deal with valid XML, the optimal way to edit them is using xslt, however that might be out of reach for most users. Another way you can do this is by scripting your changes by reading line by line in a VBScript or JavaScript. If you don’t have the scripting expertise, I’m afraid the add/remove text plugin and search and replace plugin are your only option.
If your end tag is always by itself on a line and its contents can vary, the add/remove text plugin could blindly get rid of the last line or last n characters of your job to easily discard that extra tag. If you need more evaluation of the contents or conditions, I would recommend scripting your way through the modifications.

Thanks.

Phil · November 28, 2021, 8:21am

The following javascript code removes an extra closing tag named </RECORD> from a file, but only if the XML file is invalid (i.e. when it has an extra closing tag). You can change the name of the extra tag by setting the EXTRA_TAG_NAME variable at the top of the script:

var EXTRA_END_TAG = "</RECORD>";
var myXML = new ActiveXObject("Msxml2.DOMDocument");
myXML.async = false;
myXML.load(Watch.GetJobFileName());

if (myXML.parseError.errorCode != 0) {
   var myErr = myXML.parseError;
   Watch.Log("Error at line " + myErr.line + ",column " + myErr.linepos + ": " + myErr.reason,2);
   var fso = new ActiveXObject("Scripting.FileSystemObject");

   var inFile = fso.OpenTextFile(Watch.GetJobFileName(),1);
   var outFile = fso.CreateTextFile(Watch.GetJobFileName()+".tmp");
   var lineIndex=1;
   var myLine="";
   while(!inFile.AtEndOfStream) {
      myLine = inFile.ReadLine()
      if(lineIndex==myErr.line){
         myLine = myLine.slice(0,myErr.linepos-3)+myLine.slice(myErr.linepos-3+EXTRA_END_TAG.length);
         Watch.Log("Removing " + EXTRA_END_TAG + " from line " + myErr.line,2);
      }
      outFile.WriteLine(myLine);
      lineIndex++;
   }
   outFile.Close();
   inFile.Close();
   fso.CopyFile(Watch.GetJobFileName()+".tmp",Watch.GetJobFileName(),true);
} else {
   Script.returnValue=true;
}

The script first attempts to read the XML file using an XML parser. If the operation succeeds, that means there is no extra end tag and the script doesn’t do anything to the file. If there is a parsing error, then the information contained in the DOMParseError object is used to determine where the extra end tag is located, and the script proceeds to removing it.

Note that if the extra end tag name is variable (i.e. it changes from file to file), then you’ll have to parse the error message reported in myErr.reason to determine the name of the extra end tag and store it in the EXTRA_TAG_NAME variable.

For a small XML file - a few hundred lines - this script runs in mere milliseconds. You can make it faster by removing all logging.

b-ssb · November 29, 2021, 4:39pm

That works great! And is indeed very fast.

Thank you very much,

Brian

b-ssb · December 1, 2021, 3:25pm

After seeing how fast that Javascript was I rewrote the rest of my Workflow in Javascript and shaved 2 seconds off of the processing time of each XML file – awesome!

Thanks!