I have a 2 types documents. I’m able to determine document type by string in the very first line.
Each of type has ‘first page indicator’ in different regions and because of that I’m not able to use ‘On Text’ boundary. I wonder, how difficult it would be to write javascript boundaries to handle my documents properly.
The idea is to have a document boundary:
if (docTypeRegion == ‘Type’)
area1 equals ‘Page 1’ → new document
else
area2 equals ‘Page 1 of’ → new document
Where:
docTypeRegion - is a region which allows to determine document type
area1 - first page indicator for Type1
area2 - first page indicator for Type2
The above link actually gives a pretty concrete sample to base this on.
But here’s a very quick and dirty sample that gets the job done. Obviously this could be refined quite a bit as I’m scanning nearly the entire page for the keywords of Type1 or Type2 and the Page 1 text.
if (boundaries.find("Type1", region.createRegion(10,10,215,279)).found && boundaries.find("Page 1", region.createRegion(10,10,215,279)).found) {
boundaries.set();
} else if (boundaries.find("Type2", region.createRegion(10,10,215,279)).found && boundaries.find("Page 1", region.createRegion(10,10,215,279)).found){
boundaries.set();
}
Albert’s answer works if the document type is actually printed somewhere on the page. If not, and you want the DM boundaries to automatically determine the document types (or if, for instance, you have two types of documents inside the same PDF), then the following would work:
docTypes = [
{
textToFind : "Page 1",
r :{x:25,y:34,r:36,b:38}
},
{
textToFind : "Page 1 of",
r :{x:166,y:34,r:182,b:38}
}
];
for(var i=0;i<docTypes.length;i++){
var r = docTypes[i].r;
if(boundaries.find("Page 1", region.createRegion(r.x, r.y, r.r, r.b)).found) {
boundaries.set();
break;
}
}
Thanks a lot for answers. It seemed I was on good track with writing it on my own, the main difference was region coordinates. Why the heck region takes x-left, y-top, x-right, y-bottom while data extract takes x-left, x-righ, y-top, y-height?
Would be nice to have the same parameters order for methods related to region