Extract text position from a PDF

HakInitial · September 20, 2019, 2:05pm

Hi all,
I have a fixed template.odt that the user will [print] send to workflow. the goal is to automatically attach some dynamic images and text on the next line after the last line written by the user.

Is it possible to know the position of a certain word and replace it with a <div> that contains the images and text?!
If yes, could I write a word, e.g. END on the template and run the replacement with its position?

How can I achieve this?

Thank you,
Hak

jchamel · September 20, 2019, 2:36pm

We kind of need more information in order to grasp what you are trying to achieve. From your title…is your input data file a PDF?

What is a Template.odt?

HakInitial · September 20, 2019, 3:37pm

Well, the template is designed with LibreOffice Writer. the users have specific areas write into.

So basically, what I am trying to do is, once a user has fills up the template, they can send it to worlkflow (print it).

From here I tried to create PDF then send it to datamapper.
in the datamapper I tried to set a Preprocessor where I can read the text of the pdf and get the position (line number) of END so that I replace this word or this line with my name and signature.

When I used this function var fileIn = openTextReader(filename String, encoding String) I could not decode it correctly as I get gibberish text.
just to show how I am trying to access each lines of the PDF

messages:

If I could read the lines and found END, is it possible to swap it with html tags?

Thank you.

jchamel · September 20, 2019, 4:22pm

Ok…the input file is a PDF. So you can’t read it as a text file. You need to read it as a PDF.

You are able to extract text from the PDF, but you can’t read it as text file. You can add signature and other value but try to see it as adding them over an image. It will not word wrap or move down for your text to insert, if this is what you are trying to do.

You can add boxes filled with white, to cover a region you do not want to see and cover that with what you want but it will not act as a text file where you can simply add other elements which will shift down what’s already there.

You would be better to have your user fill-up an HTML form which looks like what you have in LibreOffice Writer, and then send that to Workflow. what Workflow would receive would be an XML file containing whatever you decide to put in it from you Web Template and build your desired output from it.

Hope it is clearer this way.

TDGreer · September 20, 2019, 4:24pm

Look in the Help System for “data.find()”, that’s the Javascript Function you can use in an Extract Step of the Data Map to get the coordinates for a text string.

TDGreer · September 20, 2019, 4:56pm

So I had some time, and worked this out.

In your Data Mapper, use the data.find() method to return an “object” that has the four coordinates of your found text.

Create two more JavaScript fields, where you parse out the TOP and LEFT coordinates, like so:

Then in your Template, have a DIV already on the page, with an ID, and change the position of that DIV with a Script:

Capture2

With your background set to the PDF from the Data Mapper, this will position the DIV over the “end tag” text.

NOTE to anyone who knows: is there a better way to “parse” to object returned by data.find()? I tried various forms of myTarget[“Left”], etc. and kept getting “undefined”.