PDF Splitter by a word

mariusz.fik · January 25, 2019, 11:01am

Hi folks,

I’m using PDF splitter to split pdf into smaller ones. My first page indicator is page number, like 1, 2, 3, etc.
I want to split before page 1. But if there is a page number 11, splitter also splits pdf.

It looks like comparison method is split if region contains value, is there a chance to change comparison method from contains to equal to?

jchamel · January 25, 2019, 3:36pm

Lets assume that you have a text that says “Page 1 of 6”. Then you could split on “Page 1 of”. This way, page 11 doesn’t trigger the split.

Please add more specific info so we can propose you other means to achieve what you want.

mariusz.fik · January 25, 2019, 3:38pm

I don’t have anything like “Page X of Y”. That would be easy

I have just plain number, “1”, “2”, “3”, … “10”, “11” etc.

jchamel · January 25, 2019, 3:57pm

Then I suggest you go the Metadata way.

The idea is to set the page number in a metadata field. Then, still using the PDF Splitter, you will Split PDF file based on Metadata, at Datapage level, following a rule which will compare your metadata field for you page number to the number 1.

In matters of steps in Workflow:

Create Metadata
Metadata Fields Management (here you create your pageNumber field and assign to it your page number at on Page level)
Use the PDF Splitter, as explained before.

AlbertsN · January 25, 2019, 4:01pm

The boundaries operators list does contain a “Is Equal To” operator.

Now, the trouble with that with PDFs is that you may not always get “1” in your region. Depending on how the PDF is encoded, you may get "1 " or something else.

So, to be sure you’re looking for the right string, select the region that will contain the entire page number (not just the position of the first number, but all possible numbers) and then use the TT button to copy the exact value that is present on the first page.

Then, even if page 1 is really "page 1 " in the PDF, your Is Equal To condition will match the 1 and the extra characters. Of course, even if there aren’t extra characters, the is equals to is not going to match on “11”.

In more complicated cases, you may have to do a scripted boundary, but I think that might not be necessary here.

mariusz.fik · January 25, 2019, 4:04pm

In Connect yes, but I need to split pdf inside a Workflow.

AlbertsN · January 25, 2019, 7:20pm

Ah. That’s important to know.

You’re going to need to use a metadata based split in this case.

First, run the PDF through a Create Metadata step in passthrough. Next, you’ll run it through a Metadata Level Creation.

In there, you’re going to specify that a new document starts when Region is equal to 1 (make sure it’s a numerical comparison) where region is the area on the page with your numbers.

Finally, you’ll run it through the PDF splitter and split based on metadata at every 1 occurance of the Document level.

That should get you there.

mariusz.fik · February 12, 2019, 12:59pm

I’m not sure if all my data files are valid, bo one of the files returns an error during split. All other files are splitted correctly using above solution. But this one that produces an error is as below:

W3001 : Error while executing plugin: 
W3671 : Move file <C:\ProgramData\Objectif Lune\PlanetPress Suite 7\PlanetPress Watch\Debug\tmp01036F43F30AQ4EE9EB1D.pdf>
to <C:\ProgramData\Objectif Lune\PlanetPress Suite 7\PlanetPress Watch\Debug\spl01036F3S2TW6M18E8DF4B.dat>
failed: Error code 0: The operation completed successfully.

Any idea?

edanting · June 21, 2023, 3:59am

Hi,

Can I just jump in here for a sec? Im also trying to split 20000+ pages via text in the document region.

When I run it on debug, it seems slow, Im guessing this will be generally quicker when running live?

tosuji · June 21, 2023, 4:17am

Basically, the operation is faster in production than in debug mode.