how to extract data on first page only of PDF in datamapper

Hi - I have a datamapper where I need to extract one piece of data on the first page only. My PDF will consist of at least 6 pages. I am using this to create a csv only. No other output is needed. So basically, I have a pdf with a number on the first page and I need to extract that number and create a csv with just the number on it. The number is 6 digits. I currently have it working but it is extracting the data in the same location on all six pages. I will need to extract this one number and have the output on one csv for several different pdf’s. So the example below is from 4 different pdfs. I see where I can extract based on script. Is there a sample script to tell it to extract loction on page 1 only?

field1
123456
789456
369852
741258

The script I am using is below.

//Define the CSV file
var fileOut = openTextWriter(“c:\out\BPAextract.csv”);
var fieldCounter = 0;

//If you want the first line of the CSV file to have the field name as headers
for (field in data.records[0].fields)
{
if(fieldCounter > 0)
{
fileOut.write(‘,"’+field+‘"’);
} else
{
fileOut.write(‘"’+field+‘"’);
}
fieldCounter++;
}
fileOut.newLine();

//To add all field values to the CSV file.
for (var i = 0; i < data.records.length; i++)
{
fieldCounter = 0;
for (field in data.records[i].fields)
{
if(fieldCounter > 0)
{
fileOut.write(‘,"’+data.records[i].fields[field]+‘"’);
} else
{
fileOut.write(‘"’+data.records[i].fields[field]+‘"’);
}
fieldCounter++;
}
fileOut.newLine();
}
//Close the CSV file
fileOut.close();

Thank You

You’ll need to adjust your boundary trigger in the data mapper. By default for PDF datamappings, one page equals one record. You’ll need to get it set up such that your whole variable length document becomes a single record. Perhaps use the On Text trigger and read when a value is present or changes.

Then you just do the same extract on each record and you’re done.

ok - I actually changed the trigger to on all pages and my output is what I need now. However, if I put multiple files through the csv output is only the last file that went through. So it is getting over written. Is there a way I can run multiple pdfs through and have once csv with all the numbers?

Thank You

Yeah, your script is defaulting to Overwrite mode (OpenTextWriter). You’ll need to set it to append mode. PlanetPress Connect 2018.1 User Guide

I think this will do it, but I haven’t tested it out.

var fileOut = openTextWriter(“c:\out\BPAextract.csv”,“UTF-8”,True);

If you just need to extract this number why not doing that in a Workflow script?

E.g. a VBScript in Workflow could do this, like following:

Dim inputPDF, number

Set inputPDF = Watch.GetPDFEditObject

inputPDF.Open Watch.GetJobFileName, true

number = Trim(inputPDF.Pages(0).ExtractText2(x1,y1,x2,y2))

inputPDF.Close

And the same script could write to csv as well.

Thanks. I am trying to get this to work. Hopefully this will eventually work for me.

MartinS - Thank you for suggestion however I wouldn’t know how to do it that way. I am so close the other way so I am going to keep trying it.

Hi - is there a way I get it so that the .dat files are not piling up in the folder? I am ending up with like 5,000 .dat files because that is how many files I am processing per .csv

Thank You