pdf - 在 iText 中的 JavaScript 操作中搜索特定字符串的 PDF

我的目标是在 PDF 的注释中查找给定模式的 JavaScript。为此，我提供了以下代码:

public static void main(String[] args) {

        try {

            // Reads and parses a PDF document
            PdfReader reader = new PdfReader("Test.pdf");

            // For each PDF page
            for (int i = 1; i <= reader.getNumberOfPages(); i++) {

                // Get a page a PDF page
                PdfDictionary page = reader.getPageN(i);
                // Get all the annotations of page i
                PdfArray annotsArray = page.getAsArray(PdfName.ANNOTS);

                // If page does not have annotations
                if (page.getAsArray(PdfName.ANNOTS) == null) {
                    continue;
                }

                // For each annotation
                for (int j = 0; j < annotsArray.size(); ++j) {

                    // For current annotation
                    PdfDictionary curAnnot = annotsArray.getAsDict(j);

                    // check if has JS as described below
                 PdfDictionary AnnotationAction = AnnotationDictionary.GetAsDict(PdfName.A);
                 // test if it is a JavaScript action
                 if (AnnotationAction.Get(PdfName.S).Equals(PdfName.JavaScript)){
                 // what here?
                 }


                }
            }

        } catch (Exception e) {
            e.printStackTrace();
        }

    }

据我所知，比较字符串是由 StringCompare library 完成的.问题是它比较了两个字符串，但我想知道注释中的 JavaScript 操作是否以(或包含)以下字符串开头:if (this.hostContainer) { try {
那么，如何检查注释中的 JavaScript 是否包含上述字符串？

编辑
带有 JS 的示例页面位于:pdf with JS

最佳答案

JavaScript 操作在 ISO 32000-1 中定义如下:

12.6.4.16 JavaScript Actions

Upon invocation of a JavaScript action, a conforming processor shall execute a script that is written in the JavaScript programming language. Depending on the nature of the script, various interactive form fields in the document may update their values or change their visual appearances. Mozilla Development Center’s Client-Side JavaScript Reference and the Adobe JavaScript for Acrobat API Reference (see the Bibliography) give details on the contents and effects of JavaScript scripts. Table 217 shows the action dictionary entries specific to this type of action.

Table 217 – Additional entries specific to a JavaScript action

Key Type Value

S name (Required) The type of action that this dictionary describes; shall be JavaScript for a JavaScript action.

JS text string or text stream (Required) A text string or text stream containing the JavaScript script to be executed. PDFDocEncoding or Unicode encoding (the latter identified by the Unicode prefix U+FEFF) shall be used to encode the contents of the string or stream.

To support the use of parameterized function calls in JavaScript scripts, the JavaScript entry in a PDF document’s name dictionary (see 7.7.4, “Name Dictionary”) may contain a name tree that maps name strings to document-level JavaScript actions. When the document is opened, all of the actions in this name tree shall be executed, defining JavaScript functions for use by other scripts in the document.

因此，如果您有兴趣知道注释中的 JavaScript 操作是否以(或包含)以下字符串开头:if (this.hostContainer) { try {在这种情况下

 if (AnnotationAction.Get(PdfName.S).Equals(PdfName.JavaScript)){
 // what here?
 }

您可能要先检查是否 AnnotationAction.Get(PdfName.JS)是 PdfString或 PdfStream , 在任何一种情况下都以字符串形式检索内容，并检查它或其调用的任何函数(该函数可能在 JavaScript 名称树中定义)是否包含您使用常用字符串比较方法搜索的字符串。

示例代码

我拿走了你的代码，稍微清理了一下(特别是它是 C# 和 Java 的混合体)并添加了如上所述的代码，检查注释操作元素中的直接 JavaScript 代码:

java 版

System.out.println("file.pdf - Looking for special JavaScript actions.");
// Reads and parses a PDF document
PdfReader reader = new PdfReader(resource);

// For each PDF page
for (int i = 1; i <= reader.getNumberOfPages(); i++)
{
    System.out.printf("\nPage %d\n", i);
    // Get a page a PDF page
    PdfDictionary page = reader.getPageN(i);
    // Get all the annotations of page i
    PdfArray annotsArray = page.getAsArray(PdfName.ANNOTS);

    // If page does not have annotations
    if (annotsArray == null)
    {
        System.out.printf("No annotations.\n", i);
        continue;
    }

    // For each annotation
    for (int j = 0; j < annotsArray.size(); ++j)
    {
        System.out.printf("Annotation %d - ", j);

        // For current annotation
        PdfDictionary curAnnot = annotsArray.getAsDict(j);

        // check if has JS as described below
        PdfDictionary annotationAction = curAnnot.getAsDict(PdfName.A);
        if (annotationAction == null)
        {
            System.out.print("no action");
        }
        // test if it is a JavaScript action
        else if (PdfName.JAVASCRIPT.equals(annotationAction.get(PdfName.S)))
        {
            PdfObject scriptObject = annotationAction.getDirectObject(PdfName.JS);
            if (scriptObject == null)
            {
                System.out.print("missing JS entry");
                continue;
            }
            final String script;
            if (scriptObject.isString())
                script = ((PdfString)scriptObject).toUnicodeString();
            else if (scriptObject.isStream())
            {
                try (   ByteArrayOutputStream baos = new ByteArrayOutputStream()    )
                {
                    ((PdfStream)scriptObject).writeContent(baos);
                    script = baos.toString("ISO-8859-1");
                }
            }
            else
            {
                System.out.println("malformed JS entry");
                continue;
            }

            if (script.contains("if (this.hostContainer) { try {"))
                System.out.print("contains test string - ");

            System.out.printf("\n---\n%s\n---", script);
            // what here?
        }
        else
        {
            System.out.print("no JavaScript action");
        }
        System.out.println();
    }
}

(测试 SearchActionJavaScript，方法 testSearchJsActionInFile)

C#版本

using (PdfReader reader = new PdfReader(sourcePath))
{
    Console.WriteLine("file.pdf - Looking for special JavaScript actions.");

    // For each PDF page
    for (int i = 1; i <= reader.NumberOfPages; i++)
    {
        Console.Write("\nPage {0}\n", i);
        // Get a page a PDF page
        PdfDictionary page = reader.GetPageN(i);
        // Get all the annotations of page i
        PdfArray annotsArray = page.GetAsArray(PdfName.ANNOTS);

        // If page does not have annotations
        if (annotsArray == null)
        {
            Console.WriteLine("No annotations.");
            continue;
        }

        // For each annotation
        for (int j = 0; j < annotsArray.Size; ++j)
        {
            Console.Write("Annotation {0} - ", j);

            // For current annotation
            PdfDictionary curAnnot = annotsArray.GetAsDict(j);

            // check if has JS as described below
            PdfDictionary annotationAction = curAnnot.GetAsDict(PdfName.A);
            if (annotationAction == null)
            {
                Console.Write("no action");
            }
            // test if it is a JavaScript action
            else if (PdfName.JAVASCRIPT.Equals(annotationAction.Get(PdfName.S)))
            {
                PdfObject scriptObject = annotationAction.GetDirectObject(PdfName.JS);
                if (scriptObject == null)
                {
                    Console.WriteLine("missing JS entry");
                    continue;
                }
                String script;
                if (scriptObject.IsString())
                    script = ((PdfString)scriptObject).ToUnicodeString();
                else if (scriptObject.IsStream())
                {
                    using (MemoryStream stream = new MemoryStream())
                    {
                        ((PdfStream)scriptObject).WriteContent(stream);
                        script = stream.ToString();
                    }
                }
                else
                {
                    Console.WriteLine("malformed JS entry");
                    continue;
                }

                if (script.Contains("if (this.hostContainer) { try {"))
                    Console.Write("contains test string - ");

                Console.Write("\n---\n{0}\n---", script);
                // what here?
            }
            else
            {
                Console.Write("no JavaScript action");
            }
            Console.WriteLine();
        }
    }
}

输出

对您的示例文件运行任一版本时，都会得到:

file.pdf - Looking for special JavaScript actions.

Page 1
Annotation 0 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_vii', 0]);
} catch(e) { console.println(e); }};
---
Annotation 1 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_ix', 0]);
} catch(e) { console.println(e); }};
---
Annotation 2 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_xi', 0]);
} catch(e) { console.println(e); }};
---
Annotation 3 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_3', 0]);
} catch(e) { console.println(e); }};
---
Annotation 4 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_15', 0]);
} catch(e) { console.println(e); }};
---
Annotation 5 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_37', 0]);
} catch(e) { console.println(e); }};
---
Annotation 6 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_57', 0]);
} catch(e) { console.println(e); }};
---
Annotation 7 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_81', 0]);
} catch(e) { console.println(e); }};
---
Annotation 8 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_111', 0]);
} catch(e) { console.println(e); }};
---
Annotation 9 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_136', 0]);
} catch(e) { console.println(e); }};
---
Annotation 10 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_160', 0]);
} catch(e) { console.println(e); }};
---
Annotation 11 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_197', 0]);
} catch(e) { console.println(e); }};
---
Annotation 12 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_179', 0]);
} catch(e) { console.println(e); }};
---
Annotation 13 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_201', 0]);
} catch(e) { console.println(e); }};
---
Annotation 14 - contains test string - 
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_223', 0]);
} catch(e) { console.println(e); }};
---

Page 2
No annotations.

Page 3
No annotations.

关于pdf - 在 iText 中的 JavaScript 操作中搜索特定字符串的 PDF，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41090131/

pdf - 在 iText 中的 JavaScript 操作中搜索特定字符串的 PDF

12.6.4.16 JavaScript Actions

上一篇：scala - 使用私有(private)构造函数参数扩展特征

下一篇：vba - VBA 有 ATan2 功能吗？