pdf - 在 iText 中的 JavaScript 操作中搜索特定字符串的 PDF

标签 pdf itext

我的目标是在 PDF 的注释中查找给定模式的 JavaScript。为此,我提供了以下代码:

public static void main(String[] args) {

        try {

            // Reads and parses a PDF document
            PdfReader reader = new PdfReader("Test.pdf");

            // For each PDF page
            for (int i = 1; i <= reader.getNumberOfPages(); i++) {

                // Get a page a PDF page
                PdfDictionary page = reader.getPageN(i);
                // Get all the annotations of page i
                PdfArray annotsArray = page.getAsArray(PdfName.ANNOTS);

                // If page does not have annotations
                if (page.getAsArray(PdfName.ANNOTS) == null) {
                    continue;
                }

                // For each annotation
                for (int j = 0; j < annotsArray.size(); ++j) {

                    // For current annotation
                    PdfDictionary curAnnot = annotsArray.getAsDict(j);

                    // check if has JS as described below
                 PdfDictionary AnnotationAction = AnnotationDictionary.GetAsDict(PdfName.A);
                 // test if it is a JavaScript action
                 if (AnnotationAction.Get(PdfName.S).Equals(PdfName.JavaScript)){
                 // what here?
                 }


                }
            }

        } catch (Exception e) {
            e.printStackTrace();
        }

    }

据我所知,比较字符串是由 StringCompare library 完成的.问题是它比较了两个字符串,但我想知道注释中的 JavaScript 操作是否以(或包含)以下字符串开头:if (this.hostContainer) { try {
那么,如何检查注释中的 JavaScript 是否包含上述字符串?

编辑
带有 JS 的示例页面位于:pdf with JS

最佳答案

JavaScript 操作在 ISO 32000-1 中定义如下:

12.6.4.16 JavaScript Actions

Upon invocation of a JavaScript action, a conforming processor shall execute a script that is written in the JavaScript programming language. Depending on the nature of the script, various interactive form fields in the document may update their values or change their visual appearances. Mozilla Development Center’s Client-Side JavaScript Reference and the Adobe JavaScript for Acrobat API Reference (see the Bibliography) give details on the contents and effects of JavaScript scripts. Table 217 shows the action dictionary entries specific to this type of action.

Table 217 – Additional entries specific to a JavaScript action

Key Type Value

S name (Required) The type of action that this dictionary describes; shall be JavaScript for a JavaScript action.

JS text string or text stream (Required) A text string or text stream containing the JavaScript script to be executed. PDFDocEncoding or Unicode encoding (the latter identified by the Unicode prefix U+FEFF) shall be used to encode the contents of the string or stream.

To support the use of parameterized function calls in JavaScript scripts, the JavaScript entry in a PDF document’s name dictionary (see 7.7.4, “Name Dictionary”) may contain a name tree that maps name strings to document-level JavaScript actions. When the document is opened, all of the actions in this name tree shall be executed, defining JavaScript functions for use by other scripts in the document.



因此,如果您有兴趣知道注释中的 JavaScript 操作是否以(或包含)以下字符串开头:if (this.hostContainer) { try {在这种情况下
 if (AnnotationAction.Get(PdfName.S).Equals(PdfName.JavaScript)){
 // what here?
 }

您可能要先检查是否 AnnotationAction.Get(PdfName.JS)PdfStringPdfStream , 在任何一种情况下都以字符串形式检索内容,并检查它或其调用的任何函数(该函数可能在 JavaScript 名称树中定义)是否包含您使用常用字符串比较方法搜索的字符串。

示例代码

我拿走了你的代码,稍微清理了一下(特别是它是 C# 和 Java 的混合体)并添加了如上所述的代码,检查注释操作元素中的直接 JavaScript 代码:

java 版
System.out.println("file.pdf - Looking for special JavaScript actions.");
// Reads and parses a PDF document
PdfReader reader = new PdfReader(resource);

// For each PDF page
for (int i = 1; i <= reader.getNumberOfPages(); i++)
{
    System.out.printf("\nPage %d\n", i);
    // Get a page a PDF page
    PdfDictionary page = reader.getPageN(i);
    // Get all the annotations of page i
    PdfArray annotsArray = page.getAsArray(PdfName.ANNOTS);

    // If page does not have annotations
    if (annotsArray == null)
    {
        System.out.printf("No annotations.\n", i);
        continue;
    }

    // For each annotation
    for (int j = 0; j < annotsArray.size(); ++j)
    {
        System.out.printf("Annotation %d - ", j);

        // For current annotation
        PdfDictionary curAnnot = annotsArray.getAsDict(j);

        // check if has JS as described below
        PdfDictionary annotationAction = curAnnot.getAsDict(PdfName.A);
        if (annotationAction == null)
        {
            System.out.print("no action");
        }
        // test if it is a JavaScript action
        else if (PdfName.JAVASCRIPT.equals(annotationAction.get(PdfName.S)))
        {
            PdfObject scriptObject = annotationAction.getDirectObject(PdfName.JS);
            if (scriptObject == null)
            {
                System.out.print("missing JS entry");
                continue;
            }
            final String script;
            if (scriptObject.isString())
                script = ((PdfString)scriptObject).toUnicodeString();
            else if (scriptObject.isStream())
            {
                try (   ByteArrayOutputStream baos = new ByteArrayOutputStream()    )
                {
                    ((PdfStream)scriptObject).writeContent(baos);
                    script = baos.toString("ISO-8859-1");
                }
            }
            else
            {
                System.out.println("malformed JS entry");
                continue;
            }

            if (script.contains("if (this.hostContainer) { try {"))
                System.out.print("contains test string - ");

            System.out.printf("\n---\n%s\n---", script);
            // what here?
        }
        else
        {
            System.out.print("no JavaScript action");
        }
        System.out.println();
    }
}

(测试 SearchActionJavaScript,方法 testSearchJsActionInFile)

C#版本
using (PdfReader reader = new PdfReader(sourcePath))
{
    Console.WriteLine("file.pdf - Looking for special JavaScript actions.");

    // For each PDF page
    for (int i = 1; i <= reader.NumberOfPages; i++)
    {
        Console.Write("\nPage {0}\n", i);
        // Get a page a PDF page
        PdfDictionary page = reader.GetPageN(i);
        // Get all the annotations of page i
        PdfArray annotsArray = page.GetAsArray(PdfName.ANNOTS);

        // If page does not have annotations
        if (annotsArray == null)
        {
            Console.WriteLine("No annotations.");
            continue;
        }

        // For each annotation
        for (int j = 0; j < annotsArray.Size; ++j)
        {
            Console.Write("Annotation {0} - ", j);

            // For current annotation
            PdfDictionary curAnnot = annotsArray.GetAsDict(j);

            // check if has JS as described below
            PdfDictionary annotationAction = curAnnot.GetAsDict(PdfName.A);
            if (annotationAction == null)
            {
                Console.Write("no action");
            }
            // test if it is a JavaScript action
            else if (PdfName.JAVASCRIPT.Equals(annotationAction.Get(PdfName.S)))
            {
                PdfObject scriptObject = annotationAction.GetDirectObject(PdfName.JS);
                if (scriptObject == null)
                {
                    Console.WriteLine("missing JS entry");
                    continue;
                }
                String script;
                if (scriptObject.IsString())
                    script = ((PdfString)scriptObject).ToUnicodeString();
                else if (scriptObject.IsStream())
                {
                    using (MemoryStream stream = new MemoryStream())
                    {
                        ((PdfStream)scriptObject).WriteContent(stream);
                        script = stream.ToString();
                    }
                }
                else
                {
                    Console.WriteLine("malformed JS entry");
                    continue;
                }

                if (script.Contains("if (this.hostContainer) { try {"))
                    Console.Write("contains test string - ");

                Console.Write("\n---\n{0}\n---", script);
                // what here?
            }
            else
            {
                Console.Write("no JavaScript action");
            }
            Console.WriteLine();
        }
    }
}

输出

对您的示例文件运行任一版本时,都会得到:
file.pdf - Looking for special JavaScript actions.

Page 1
Annotation 0 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_vii', 0]);
} catch(e) { console.println(e); }};
---
Annotation 1 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_ix', 0]);
} catch(e) { console.println(e); }};
---
Annotation 2 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_xi', 0]);
} catch(e) { console.println(e); }};
---
Annotation 3 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_3', 0]);
} catch(e) { console.println(e); }};
---
Annotation 4 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_15', 0]);
} catch(e) { console.println(e); }};
---
Annotation 5 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_37', 0]);
} catch(e) { console.println(e); }};
---
Annotation 6 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_57', 0]);
} catch(e) { console.println(e); }};
---
Annotation 7 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_81', 0]);
} catch(e) { console.println(e); }};
---
Annotation 8 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_111', 0]);
} catch(e) { console.println(e); }};
---
Annotation 9 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_136', 0]);
} catch(e) { console.println(e); }};
---
Annotation 10 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_160', 0]);
} catch(e) { console.println(e); }};
---
Annotation 11 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_197', 0]);
} catch(e) { console.println(e); }};
---
Annotation 12 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_179', 0]);
} catch(e) { console.println(e); }};
---
Annotation 13 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_201', 0]);
} catch(e) { console.println(e); }};
---
Annotation 14 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_223', 0]);
} catch(e) { console.println(e); }};
---

Page 2
No annotations.

Page 3
No annotations.

关于pdf - 在 iText 中的 JavaScript 操作中搜索特定字符串的 PDF,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41090131/

相关文章:

ruby-on-rails - 在 Ruby on Rails 中使用 mini_magick 将 pdf 转换为 png

java - 如何在java/jsp或javascript中读取扫描的pdf文件的内容

java - 如何使用 PDFBox 获取 PDF 中书签内容的页码

java - 仅在第一页上显示使用 itext 生成的 pdf 中的页数

java - 证书链不包含在签名的 pdf 中

java - 通过 java 本身将附件设置为密码保护

c# - 使用 itextsharp 从 PDF 获取字体大小

java - 通过 Java 批处理 : iText or Apache FOP? 简单生成 PDF

c# - 如何创建和应用密文?

r - 有没有办法将 grob 直接保存到 rasterGrob?