Working with a PDF in Datamapper

dan_hamacher · January 18, 2024, 4:52pm

I’m brand new to learning Connect and the first thing I’m trying to do is create a data model from a PDF. I watched a video tutorial where the instructor was able to start creating boundaries from a .TXT file and I thought I’d be able to do the same thing with my PDF, but I can not highlight or select the data on screen. If I convert the PDF to a .TXT file, the data looks all jumbled up.

Can somebody point me in the right direction?

Phil · January 18, 2024, 5:25pm

This can happen if your pdf contains a scanned image, for instance. In that case, the DataMapper can’t extract text because there is no text per se in the file. You can test that with Acrobat Reader: load the PDF and see if you can select text and copy/paste it into Notepad or something similar.

dan_hamacher · January 18, 2024, 5:46pm

That’s the thing… I can select the text in Acrobat, so I assumed it would be fine. I pasted two images here. One is me selecting the text in Acrobat, and the second shows what happens when I try to select in Connect. It just draws a blue box over the data.

dan_hamacher · January 18, 2024, 5:47pm

Phil · January 18, 2024, 5:56pm

Once you’ve selected data to extract, you need to tell the DataMapper to extract that data. You can do that by dragging/dropping the selection into the Data Model pane (using the arrow gidget in the upper-right corner of the selection) or by right-clicking on the selection and clicking the Add Extraction option (which is also available via the F6 shortcut, or on the main toolbar).

dan_hamacher · January 18, 2024, 6:01pm

Ah ha! Thanks again Phil. I incorrectly assumed it was not reading the data because it wasn’t making a selection in the way I expected. Much appreciated.