October 22, 2024
Chicago 12, Melborne City, USA
pdf

How to add the annotation tag in tagged PDF using PDFBox


I have a tagged PDF (PDF-UA compliance) with simple structure:
Tagged PDF

I need to add the annotation:

    <?xml version="1.0" encoding="UTF-8"?>
    <xfdf xmlns="http://ns.adobe.com/xfdf/">
      <annots>
        <text color="#FFC333" date="D:20240425000000" flags="print,nozoom,norotate" opacity="0.600000" page="0" rect="43.086,707.79,63.086,732.79" title="ABC">
          <contents-richtext>
            <body xmlns="http://www.w3.org/1999/xhtml">
              <p dir="ltr">Sample comment</p>
            </body>
          </contents-richtext>
          <popup flags="nozoom,norotate" open="yes" page="0" rect="595.0,507.78998,790.0,632.79"/>
        </text>
      </annots>
    </xfdf>

in the nested tag Annot in the tag P. Like this (I’ve made in manually in the Adobe Acrobat):
Expected result

I’m able to import the XFDF XML into the PDF:

    PDDocument document = PDDocument.load(new File("tagged.pdf"));
    
    FDFDocument fdfDoc = FDFDocument.loadXFDF(new File("annotation.xfdf"));
    List<FDFAnnotation> fdfAnnots = fdfDoc.getCatalog().getFDF().getAnnotations();
    
    List<PDAnnotation> pageAnnotations = new ArrayList<>();
    for (int i=0; i < fdfAnnots.size(); i++) {
        FDFAnnotation fdfannot = fdfAnnots.get(i);
        PDAnnotation pdfannot = PDAnnotation.createAnnotation(fdfannot.getCOSObject());
        pdfannot.constructAppearances();
        pageAnnotations.add(pdfannot);
    }
    document.getPage(0).setAnnotations(pageAnnotations);
    
    document.save(new File("tagged_with_annotation.pdf"));

but the veraPDF checker outputs:

An annotation, excluding annotations of subtype Widget, PrinterMark or Link, shall be nested within an Annot tag

and PDF Accessibility Checker (PAC tool) outputs:

Annotation is not nested inside an "Annot" structure element

as expected, because I didn’t set the ‘annot’ tag (and concrete place) for imported annotation.

Also, I’m able to obtain the tag P object:

    PDDocumentCatalog pdDocumentCatalog = document.getDocumentCatalog();
    PDStructureTreeRoot pdStructureTreeRoot = pdDocumentCatalog.getStructureTreeRoot();
    COSArray aDocument = (COSArray)(pdStructureTreeRoot.getK().getCOSObject());
    COSObject oDocument = (COSObject) aDocument.get(0);
    COSArray aPart = (COSArray)oDocument.getItem(COSName.K).getCOSObject();
    COSObject oPart = (COSObject) aPart.get(0);
    COSArray aSect = (COSArray)oPart.getItem(COSName.K).getCOSObject();
    COSObject oSect = (COSObject) aSect.get(0);
    COSArray aP = (COSArray)oSect.getItem(COSName.K).getCOSObject();
    COSObject oP= (COSObject) aP.get(0);

But I don’t understand how to edit the existing tags tree in the PDF.

Question: how to add the tag and annotation into the existing tags tree into PDF using PDFBox?

UPDATE: the example PDF and XFDF here.

UPDATE 2: I’ve updated the file tagged_with_annotation_acrobat.pdf on google drive. The previous version was incorrect – missed Annotation tag.

UPDATE 3: To select the last P, change:

COSArray aP = oSect.getCOSArray(COSName.K);

to

COSArray aH1P = oSect.getCOSArray(COSName.K);
COSDictionary oP = (COSDictionary) aH1P.getObject(1);
COSArray aP = oP.getCOSArray(COSName.K);

and for adding the annotation without outer P inside last P

    // add the annotation element
    COSDictionary anDict = new COSDictionary();
    anDict.setItem(COSName.S, COSName.ANNOT);
    anDict.setItem(COSName.P, oP);
    anDict.setItem(COSName.PG, page);
    
    PDObjectReference objRef = new PDObjectReference();
    anDict.setItem(COSName.K, objRef);
    
objRef.setReferencedObject(page.getAnnotations().get(0));
    
    aP.add(anDict);



You need to sign in to view this answers

Leave feedback about this

  • Quality
  • Price
  • Service

PROS

+
Add Field

CONS

+
Add Field
Choose Image
Choose Video