How to efficiently set up a datamapper?

Hello,

We have started using PressConnect for datamapping in hotfolders, the way we do it is by recognizing company name in the filename and then a program searches for the headers that belong to that company and extracts an output as follows.

ADRES;PCWP;LANDCODE;LAND;ABONEENUMMER;NAW1;NAW2;NAW3;NAW4;NAW5;NAW6;NAW7;NAW8;AFDELING;AANHEF;VERSIE;EDITIE;GESLACHT;VOORNAAM;VOORLETTERS;TUSSENVOEGSEL;ACHTERNAAM;BETALINGSKENMERK;NAAMGOEDDOEL;INCASSANTID;IBAN;AFSCHRIJVINGSDATUM;MAILINGONDERWERP;MATCHCODE;REFERENTIE;RETOURBARCODE;BRIEFDATUM;AANTAL;EXTRA1;EXTRA2;EXTRA3;EXTRA4;EXTRA5;EXTRA6;EXTRA7;EXTRA8;EXTRA9;EXTRA10;BEDRIJF;RETOURADRES;RETOURNAAM;RETOURSTRAAT;RETOURHUISNR;RETOURHUISNRT;RETOURPC;RETOURWP;Bestandsnaam;Records;Opdrachtgever

It’s quite a bunch of columns, our input is CSV, our output is .CSV ANSI

For some reason it takes really long for files to be processed, we might be making something thinking mistakes.

50.000 records takes like an hour, we have a colleague who replicated this in Python, he did 250.000 records in like 10 seconds, i mean i can understand some difference in speed, but i feel we are doing something wrong in press since its just text conversion? Here is a sample of the script my colleague made for one customer.

if (PrintOpdracht.Opdrachtgever() == “Customername”){

try { record.fields.ADRES = data.extract(‘Straat’,0)+" “+data.extract(‘Huisnummer’,0)+” "+data.extract(‘Toevoeging’,0);} catch (e){}

try { record.fields.PCWP = data.extract(‘Postcode’,0)+" "+data.extract(‘Plaats’,0);} catch (e){}

record.fields.PCWP = record.fields.PCWP.toUpperCase();

try { record.fields.LAND = data.extract(‘Landnaam’,0).toUpperCase(); if (record.fields.LAND == “NETHERLANDS” || record.fields.LAND == “NEDERLAND” || record.fields.LAND == “NL” || record.fields.LAND == “NLD”){record.fields.LAND = “”;}} catch (e){}

try { record.fields.LANDCODE = data.extract(‘Landcode’,0).toUpperCase(); if (record.fields.LANDCODE == “NETHERLANDS” || record.fields.LANDCODE == “NEDERLAND” || record.fields.LANDCODE == “NL” || record.fields.LANDCODE == “NLD”){record.fields.LANDCODE = “”;}} catch (e){}

try { record.fields.ABONNEENUMMER = "Abonr.: "+data.extract(‘Abonr’,0)} catch (e){}

try { record.fields.NAW1 = data.extract(‘Bedrijf’,0)} catch (e){}

try { record.fields.NAW2 = “”;} catch (e){}

try { record.fields.NAW3 = data.extract(‘Voorletters’,0)+data.extract(‘Voorvoegsels’,0)+" "+data.extract(‘Achternaam’,0);} catch (e){}

record.fields.NAW4 = record.fields.ADRES;

record.fields.NAW5 = record.fields.PCWP;

try { record.fields.NAW6 = “”;} catch (e){}

try { record.fields.NAW7 = “”;} catch (e){}

record.fields.NAW8 = record.fields.LAND || record.fields.LANDCODE;

if (record.fields.LAND == “NEDERLAND” || record.fields.LANDCODE == “NL” || record.fields.LANDCODE == “NLD”){record.fields.NAW8 = “”;}

try { record.fields.BEDRIJF = record.fields.NAW1;} catch (e){}

try { record.fields.AFDELING = record.fields.NAW2;} catch (e){}

try { record.fields.AANHEF = data.extract(‘Aanhef’,0);} catch (e){}

try { record.fields.VERSIE = data.extract(‘[ABMNT_TTL]’,0);} catch (e){}

try { record.fields.EDITIE = data.extract(‘Editie’,0).replace(“23”, “”);} catch (e){}

try { record.fields.GESLACHT = data.extract(‘Geslacht’,0);} catch (e){}

try { record.fields.VOORLETTERS = data.extract(‘Voorletters’,0);} catch (e){}

try { record.fields.VOORNAAM = data.extract(‘Roepnaam’,0);} catch (e){}

try { record.fields.TUSSENVOEGSEL = data.extract(‘Voorvoegsels’,0);} catch (e){}

try { record.fields.ACHTERNAAM = data.extract(‘Achternaam’,0);} catch (e){}

try { record.fields.BETALINGSKENMERK = data.extract(‘BETALINGSKENMERK’,0);} catch (e){}

try { record.fields.NAAMGOEDDOEL = “”;} catch (e){}

try { record.fields.INCASSANTID = “”;} catch (e){}

try { record.fields.IBAN = “”;} catch (e){}

try { record.fields.AFSCHRIJVINGSDATUM = “”;} catch (e){}

try { record.fields.MAILINGONDERWERP = “”;} catch (e){}

try { record.fields.MATCHCODE = “”;} catch (e){}

try { record.fields.REFERENTIE = “”;} catch (e){}

try { record.fields.RETOURBARCODE = “”;} catch (e){}

try { record.fields.BRIEFDATUM = “”;} catch (e){}

try { record.fields.EXTRA1 = “”;} catch (e){}

try { record.fields.EXTRA2 = “”;} catch (e){}

try { record.fields.EXTRA3 = “”;} catch (e){}

try { record.fields.EXTRA4 = “”;} catch (e){}

try { record.fields.EXTRA5 = “”;} catch (e){}

try { record.fields.EXTRA6 = “”;} catch (e){}

try { record.fields.EXTRA7 = “”;} catch (e){}

try { record.fields.EXTRA8 = “”;} catch (e){}

try { record.fields.EXTRA9 = “”;} catch (e){}

try { record.fields.EXTRA10 = “”;} catch (e){}

try { record.fields.AANTAL = data.extract(‘Exempl’,0);} catch (e){}

record.fields.RETOURNAAM = “”;

record.fields.RETOURSTRAAT = “Postbus”;

record.fields.RETOURHUISNR = “23620”;

record.fields.RETOURHUISNRT = “”;

record.fields.RETOURPC = “1100 EC”;

record.fields.RETOURWP = “AMSTERDAM-ZUIDOOST”;

record.fields.RETOURADRES = “Postbus 23620, 1100 EC AMSTERDAM-ZUIDOOST”;

Hope you guys have some tips or recognize where we are creating bottlenecks that ruin the speed

p.s. the reason we have try and catch is cause alot of our customers make mistakes in their headers, extra spaces, typo’s.

That kind of data would be much easier to process if it were in XML or JSON format. Converting a CSV to JSON would require a fairly simple script in Workflow, and then the DataMapper could be used without any scripting (in JSON and XML, a non-existent or misspelled field doesn’t cause an error).