In PP7 I Design could select a block and check the skip blank line option. I found this very useful as the data wasn’t always in exactly the same place from pdf to pdf.
I can’t find the same option in Connect dataMapper so I was going to write a little javascript to replicate that behaviour. I found that if I did that I lost the ability to position the box where I wanted as it replaced it with a data.extract with hardcoded measures. Again I find it very useful to be able to adjust the size of the box by dragging and dropping.
Any suggestions on how I might replicate the skip blank line behaviour?
You should love the new feature we implemented in Connect 1.3 and which allows you to specify a JavaScript-based post-extract function that does not have any impact on the visual selection on screen.
For instance, say you extract an address block as an HTMLString type delimited with __<br />__ elements, the post-extract function could look like this:
The first replace strips any whitespace from all lines while the second one makes sure that two __<br />__ elements found in succession are always replaced with a single one.
You can play around with the code in that function and still retain (and adjust) the visual selection you made on screen.
With a bit of tinkering I was able to achieve my objectives on a block of data but not with an Address. Ideally I need this to use the split lines option for subsequent processing but with blank lines suppressed
Well you can’t do that with the Split Lines option because then your data model would be different from one record to the next!
if you absolutely need each line in its own field, I would suggest you extract all of them, including the blanks, but simply not display them in the template when they are empty.
I’m not sure about your comments on the split line options. I can’t see the difference between having all of the data in the first few subfields instead of spread across all of them? It’s not just a case of not displaying them but I do need to do some processing on them as well.
On the right hand side of a letter are a number of fields, date, our ref, your ref, contact person or phone number. They may not exist and can be in any sequence and with any number of blank lines. All I know for certain is that they’ll be in the block.
I’ve also realised that my replace routine wasn’t working either. It removed duplicates but I couldn’t find a way to remove a blank from the first line. I’ve tried using the caret symbol as a starting anchor but it didn’t match.
I might have to try some lateral thinking on this!
I should have explained a little bit better: in PDF mode, your data model will vary because each sub-field is being created using an algorithm that detects the next occurrence of text within the specified rectangular region. So say you have, on the first record:
John Doe, 123 Main St. App 42 SmallVille, NY
And then in the second record you have:
Jane Doe, 456 1st Avenue.
LargeVille, CA
Then on the first record, the city will be stored in SubField4 while on the second record, it will be stored in SubField3 (because the empty line isn’t ever extracted and the city is the 3rd occurrence of any kind of text within the region). That’s going to make it difficult for you to determine where each piece of data is stored.
In TXT mode, this won’t happen because empty lines are always extracted since they have a fixed height, so you know both Cities would be found in SubField4.
However, your very last comment seems to indicate the data is unstable to start with. I think you will have no choice but to create conditions that inspect each value for specific markers that will help you determine in which field each of them should be stored.