Split a long text string (EDI Tradacom9 format)

I’m currently trying to break what is effectively one extremely long row of data into individual fields that can then be extracted. A (suitably anonymised) chunk of the data would look like:

STX=ANA:1+0000000000001:SOME COMPANY+0000000000002:OTHER COMPANY+190101:000806+1+PASSWORD+INVFIL’MHD=1+INVFIL:9’TYP=0700+INVOICES’SDT=0000000000001+SOME COMPANY+COMPANY ADDRESS 1:COMPANY ADDRESS 2:AA1 1AA UNITED KINGDOM+222222222’

The ’ symbol represents a line break, the three characters followed by = defines the row type, the + is used to separate elements and the : is used to separate sub-elements.

There are no headers (except for the three characters (e.g. STX)) and a given field always appears in the same place based on the element dividers. What I need is a script that will look for a given header and then extract the data, based on the number of element breaks.

I’ve written a script in Powershell that can do this but I’m unable to translate it to JavaScript having absolutely no prior experience of it. If someone could get me started I should be able to replicate it for the rest of the document :slight_smile:

Here’s a bit of code that may help you get started:

var myLine = "STX=ANA:1+0000000000001: .... " // store your entire line in the variable
myLine = myLine.replace(/\+|:/g,";"); // replace + and : with semi-colons

var myMHDArray = getFields(myLine,"MHD"); // call the function that returns an array of fields for MHD
var myTYPArray = getFields(myLine,"TYP");   // same thing for TYP

// Results:
// myMHDArray contains [1,INVFIL,9]
// myTYPArray contains [0700,INVOICES]


// Here's the function that finds the header type, reads until the next ’ character and then splits the
// substring into an array of fields, based on the position of the semi-colon
function getFields(line, headerName){  
	return line.match("("+headerName+"=)([^’]*)")[2].split(";");
}
1 Like

Thanks for the code. I’ve spent the last couple of days having a play and I’m going to have to admit defeat :pensive:

In Connect Designer I defined a field called “Raw” and used the location (i.e. row 1, 6000 characters) to capture all the data. I then used your code (slightly modified) to extract based on header:


var delimited = record.fields.Raw.replace(/\+|:/g,";") ;

results = getFields(delimited,"MHD");

function getFields(line, headerName){ 
	return line.match("("+headerName+"=)([^']*)")[2].split(";").toString();
}

This works in that it allows me to extract the data into a field but (obviously) what I get is “1,INVFIL,9” in a single field

Is there some way to use that code to extract it to a table instead? I’ve been having a play with the preprocessor to break all data into individual rows but there are optional fields so depending on the source the STX component may have anywhere between 8 and 16 elements (for example).

The various TRADACOMS formats, just like most EDI formats, are composed of a number of fixed and repeating elements, each with a variable number of values and sub elements. As such, they are much closer to an XML file than they are to a Table. If you are to convert your files, I would look at XML first.

From what I can see, the specific format you are dealing with is Invoices, which itself may contain references to Orders and Delivery Notes. Converting all of this to a generic XML format is a significant endeavour, that I don’t believe we can cover in a Forum thread like this one.

You stated in your original post that you were able to write a PowerShell Script to presumably convert the data stream; can you make changes to that script so that it outputs XML instead?

.

The Powershell is only designed to make something human readable so it’s really just a long IF statement with a sendmail at the end. I guess I could write a converter, I was just hoping to skip that step as the ultimate output will be getting keyed back in by a human :slight_smile:

If this is going to turn out to be beyond complicated I will take that approach. Thanks for your assistance regardless.

Long shot, but may work for you depending on what do you need to do with the data.

How about splitting your data to separate rows with regex (Advanced Search and Replace plugin in workflow or sed)?
Basically, replace the ’ with new line. This will turn your data into something like this:
STX=ANA:1+0000000000001:SOME COMPANY+0000000000002:OTHER COMPANY+190101:000806+1+PASSWORD+INVFIL
MHD=1+INVFIL:9
TYP=0700+INVOICES
SDT=0000000000001+SOME COMPANY+COMPANY ADDRESS 1:COMPANY ADDRESS 2:AA1 1AA UNITED KINGDOM+222222222

Then you can have a text based datamapper, with the record limit based on first three characters of each line (field type). In the datamapper, loop on all lines, then first extract three characters (field type). From there it depends on what you want from the data. For example have a multiple condition step where you check the field type and have extraction step specific to the field you are dealing with. Extraction may populate detail table

Hi John,

I think I see what you’re doing there - I’ve had success with the pre-processor splitting the document into lines so that loop should work. I’ll give that a go, thanks.

Hi John,

Found a script for modifying an XML using the pre-processor and extracted part of it and modified it to replace the ’ with a line break:


var inFile = openTextReader(data.filename);
var outFile = openTextWriter(data.filename+'.tmp');

var edidata = "";

while ((sLine=inFile.readLine())!=null) {
  edidata+=sLine.replace(/'/g,"\n");
}

outFile.write(edidata);
inFile.close();
outFile.close();
copyFile(data.filename+'.tmp',data.filename);

I then found an older OL Learn post that deals with breaking a text string down into its elements here and combined that with the multiple conditions step you suggested.

So for the STX line (STX=ANA:1+0000000000001:SOME COMPANY+0000000000002:OTHER COMPANY+) I had to break it up into elements (using the +) and further split it by sub element (:). The first element I was interested in was Supplier GLN:Supplier Name (0000000000001:SOME COMPANY) so that required a JavaScript defined field that split up the entire row by the ‘+’ symbol, split the element I was interested in by the ‘:’ symbol, and returned the first part:


var STXArray = data.extract(1,4000,0,1,"
").split("+"); var supgln = STXArray[1].trim(); var supglnArray = supgln.split(":"); supglnArray[0].trim();

Then to get the second part I just used the element split variable (supglnArray) in a new field:

EDIT: I had to add an IF statement to check if the sub element exists as not all EDI files use it (and you obviously can’t trim on a NULL). I used the IF rather than just removing the trim as in some sections there can be multiple sub elements (or none).


if (supglnArray[1] == null) {
	results = "";
} else {
	results = supglnArray[1].trim();
}

After that, it was just rinse and repeat for the next element, Recipient GLN:Recipient Name (although I took advantage of the variable that I created in the first field to split the row - ‘STXArray’):


var repgln = STXArray[2].trim();
var repglnArray = repgln.split(":");
repglnArray[0].trim();

So on and so forth.

It’s going to take me an age to completely pull all 50 rows of data to pieces like this but once it’s done I should be able to reuse it on other similar formats. I appreciate my JavaScript is probably poorly written, but as I mentioned above I only starting using it a week ago :slight_smile:

My method is long winded and no doubt there is a more efficient way to do this out there, but hopefully this post will help someone else trying to do something similar.

Thank you both for your assistance.