android - 如何从 RAM 中完全解析压缩文件?

标签 android memory java-native-interface zip apache-commons

背景

我需要解析一些各种类型的 zip 文件(出于某种目的获取一些内部文件内容,包括获取它们的名称)。

有些文件无法通过文件路径访问,因为 Android 有 Uri 可以访问它们,并且有时 zip 文件位于另一个 zip 文件中。随着使用 SAF 的插入,在某些情况下使用文件路径的可能性甚至更小。

为此,我们有两种主要的处理方式:ZipFile类(class)和ZipInputStream类(class)。

问题

当我们有一个文件路径时,ZipFile 是一个完美的解决方案。它在速度方面也非常有效。

但是,对于其余情况,ZipInputStream 可能会遇到问题,例如 this one ,它有一个有问题的 zip 文件,并导致此异常:

  java.util.zip.ZipException: only DEFLATED entries can have EXT descriptor
        at java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:321)
        at java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:124)

我试过的

唯一始终有效的解决方案是将文件复制到其他地方,在那里您可以使用 ZipFile 对其进行解析,但这效率低下,并且需要您拥有可用存储空间,并在完成后删除文件。

所以,我发现 Apache 有一个很好的纯 Java 库 (here) 来解析 Zip 文件,并且出于某种原因,它的 InputStream 解决方案(称为“ZipArchiveInputStream”)似乎比原生 ZipInputStream 类更有效。

与我们在 native 框架中的内容相反,该库提供了更多的灵 active 。例如,我可以将整个 zip 文件加载到字节数组中,并让库照常处理它,这甚至适用于我提到的有问题的 Zip 文件:
org.apache.commons.compress.archivers.zip.ZipFile(SeekableInMemoryByteChannel(byteArray)).use { zipFile ->
    for (entry in zipFile.entries) {
      val name = entry.name
      ... // use the zipFile like you do with native framework

梯度依赖:
// http://commons.apache.org/proper/commons-compress/ https://mvnrepository.com/artifact/org.apache.commons/commons-compress
implementation 'org.apache.commons:commons-compress:1.20'

可悲的是,这并不总是可能的,因为它取决于堆内存保存整个 zip 文件,而在 Android 上它变得更加有限,因为堆大小可能相对较小(堆可能是 100MB,而文件是 200MB )。与可以设置巨大堆内存的 PC 不同,对于 Android,它根本不灵活。

因此,我搜索了一个具有 JNI 的解决方案,将整个 ZIP 文件加载到那里的字节数组中,而不是进入堆(至少不完全)。这可能是一个更好的解决方法,因为如果 ZIP 可以放入设备的 RAM 而不是堆中,它可能会阻止我到达 OOM,同时也不需要额外的文件。

我找到了 this library called "larray"这看起来很有希望,但遗憾的是,当我尝试使用它时,它崩溃了,因为它的要求包括拥有完整的 JVM,这意味着不适合 Android。

编辑:看到我找不到任何库和任何内置类,我自己尝试使用 JNI。可悲的是,我对它非常生疏,我查看了很久以前创建的旧存储库,用于对位图执行一些操作 (here)。这就是我想出的:

native -lib.cpp

#include <jni.h>
#include <android/log.h>
#include <cstdio>
#include <android/bitmap.h>
#include <cstring>
#include <unistd.h>

class JniBytesArray {
public:
    uint32_t *_storedData;

    JniBytesArray() {
        _storedData = NULL;
    }
};

extern "C" {
JNIEXPORT jobject JNICALL Java_com_lb_myapplication_JniByteArrayHolder_allocate(
        JNIEnv *env, jobject obj, jlong size) {
    auto *jniBytesArray = new JniBytesArray();
    auto *array = new uint32_t[size];
    for (int i = 0; i < size; ++i)
        array[i] = 0;
    jniBytesArray->_storedData = array;
    return env->NewDirectByteBuffer(jniBytesArray, 0);
}
}

JniByteArrayHolder.kt
class JniByteArrayHolder {
    external fun allocate(size: Long): ByteBuffer

    companion object {
        init {
            System.loadLibrary("native-lib")
        }
    }
}
class MainActivity : AppCompatActivity() {
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
        thread {
            printMemStats()
            val jniByteArrayHolder = JniByteArrayHolder()
            val byteBuffer = jniByteArrayHolder.allocate(1L * 1024L)
            printMemStats()
        }
    }

    fun printMemStats() {
        val memoryInfo = ActivityManager.MemoryInfo()
        (getSystemService(Context.ACTIVITY_SERVICE) as ActivityManager).getMemoryInfo(memoryInfo)
        val nativeHeapSize = memoryInfo.totalMem
        val nativeHeapFreeSize = memoryInfo.availMem
        val usedMemInBytes = nativeHeapSize - nativeHeapFreeSize
        val usedMemInPercentage = usedMemInBytes * 100 / nativeHeapSize
        Log.d("AppLog", "total:${Formatter.formatFileSize(this, nativeHeapSize)} " +
                "free:${Formatter.formatFileSize(this, nativeHeapFreeSize)} " +
                "used:${Formatter.formatFileSize(this, usedMemInBytes)} ($usedMemInPercentage%)")
    }

这似乎不对,因为如果我尝试使用 jniByteArrayHolder.allocate(1L * 1024L * 1024L * 1024L) 创建一个 1GB 字节数组,它崩溃没有任何异常或错误日志。

问题
  • 是否可以将 JNI 用于 Apache 的库,以便它处理包含在 JNI 的“世界”中的 ZIP 文件内容?
  • 如果是这样,我该怎么做?有没有关于如何做到这一点的样本?有适合它的类(class)吗?还是我必须自己实现?如果是这样,您能否展示它在 JNI 中是如何完成的?
  • 如果不可能,还有什么其他方法可以做到吗?也许可以替代 Apache 所拥有的?
  • 对于JNI的解决方案,怎么不好用呢?我怎样才能有效地将流中的字节复制到 JNI 字节数组中(我的猜测是它将通过缓冲区)?
  • 最佳答案

    我查看了您发布的 JNI 代码并进行了一些更改。主要是定义 NewDirectByteBuffer 的大小参数。并使用 malloc() .

    这是分配 800mb 后的日志输出:

    D/AppLog: total:1.57 GB free:1.03 GB used:541 MB (34%)
    D/AppLog: total:1.57 GB free:247 MB used:1.32 GB (84%)



    以下是分配后缓冲区的样子。如您所见,调试器报告的限制为 800mb,这是我们所期望的。

    enter image description here
    我的 C 很生锈,所以我确信还有一些工作要做。我已经更新了代码,使其更加健壮并允许释放内存。

    native -lib.cpp
    extern "C" {
    static jbyteArray *_holdBuffer = NULL;
    static jobject _directBuffer = NULL;
    /*
        This routine is not re-entrant and can handle only one buffer at a time. If a buffer is
        allocated then it must be released before the next one is allocated.
     */
    JNIEXPORT
    jobject JNICALL Java_com_example_zipfileinmemoryjni_JniByteArrayHolder_allocate(
            JNIEnv *env, jobject obj, jlong size) {
        if (_holdBuffer != NULL || _directBuffer != NULL) {
            __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                                "Call to JNI allocate() before freeBuffer()");
            return NULL;
        }
    
        // Max size for a direct buffer is the max of a jint even though NewDirectByteBuffer takes a
        // long. Clamp max size as follows:
        if (size > SIZE_T_MAX || size > INT_MAX || size <= 0) {
            jlong maxSize = SIZE_T_MAX < INT_MAX ? SIZE_T_MAX : INT_MAX;
            __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                                "Native memory allocation request must be >0 and <= %lld but was %lld.\n",
                                maxSize, size);
            return NULL;
        }
    
        jbyteArray *array = (jbyteArray *) malloc(static_cast<size_t>(size));
        if (array == NULL) {
            __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                                "Failed to allocate %lld bytes of native memory.\n",
                                size);
            return NULL;
        }
    
        jobject directBuffer = env->NewDirectByteBuffer(array, size);
        if (directBuffer == NULL) {
            free(array);
            __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                                "Failed to create direct buffer of size %lld.\n",
                                size);
            return NULL;
        }
        // memset() is not really needed but we call it here to force Android to count
        // the consumed memory in the stats since it only seems to "count" dirty pages. (?)
        memset(array, 0xFF, static_cast<size_t>(size));
        _holdBuffer = array;
    
        // Get a global reference to the direct buffer so Java isn't tempted to GC it.
        _directBuffer = env->NewGlobalRef(directBuffer);
        return directBuffer;
    }
    
    JNIEXPORT void JNICALL Java_com_example_zipfileinmemoryjni_JniByteArrayHolder_freeBuffer(
            JNIEnv *env, jobject obj, jobject directBuffer) {
    
        if (_directBuffer == NULL || _holdBuffer == NULL) {
            __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                                "Attempt to free unallocated buffer.");
            return;
        }
    
        jbyteArray *bufferLoc = (jbyteArray *) env->GetDirectBufferAddress(directBuffer);
        if (bufferLoc == NULL) {
            __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                                "Failed to retrieve direct buffer location associated with ByteBuffer.");
            return;
        }
    
        if (bufferLoc != _holdBuffer) {
            __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                                "DirectBuffer does not match that allocated.");
            return;
        }
    
        // Free the malloc'ed buffer and the global reference. Java can not GC the direct buffer.
        free(bufferLoc);
        env->DeleteGlobalRef(_directBuffer);
        _holdBuffer = NULL;
        _directBuffer = NULL;
    }
    }
    

    我还更新了数组持有者:
    class JniByteArrayHolder {
        external fun allocate(size: Long): ByteBuffer
        external fun freeBuffer(byteBuffer: ByteBuffer)
    
        companion object {
            init {
                System.loadLibrary("native-lib")
            }
        }
    }
    

    我可以确认此代码与 ByteBufferChannel由 Botje here 提供的类(class)适用于 API 24 之前的 Android 版本。SeekableByteChannel接口(interface)是在 API 24 中引入的,ZipFile 实用程序需要该接口(interface)。

    可以分配的最大缓冲区大小是 jint 的大小,并且是由于 JNI 的限制。可以容纳更大的数据(如果可用),但需要多个缓冲区和处理它们的方法。

    这是示例应用程序的主要 Activity 。早期版本总是假定 InputStream尝试将其放入 ByteBuffer 时,读取缓冲区总是被填满并出错。 .这是固定的。

    MainActivity.kt
    class MainActivity : AppCompatActivity() {
        override fun onCreate(savedInstanceState: Bundle?) {
            super.onCreate(savedInstanceState)
            setContentView(R.layout.activity_main)
        }
    
        fun onClick(view: View) {
            button.isEnabled = false
            status.text = getString(R.string.running)
    
            thread {
                printMemStats("Before buffer allocation:")
                var bufferSize = 0L
                // testzipfile.zip is not part of the project but any zip can be uploaded through the
                // device file manager or adb to test.
                val fileToRead = "$filesDir/testzipfile.zip"
                val inStream =
                    if (File(fileToRead).exists()) {
                        FileInputStream(fileToRead).apply {
                            bufferSize = getFileSize(this)
                            close()
                        }
                        FileInputStream(fileToRead)
                    } else {
                        // If testzipfile.zip doesn't exist, we will just look at this one which
                        // is part of the APK.
                        resources.openRawResource(R.raw.appapk).apply {
                            bufferSize = getFileSize(this)
                            close()
                        }
                        resources.openRawResource(R.raw.appapk)
                    }
                // Allocate the buffer in native memory (off-heap).
                val jniByteArrayHolder = JniByteArrayHolder()
                val byteBuffer =
                    if (bufferSize != 0L) {
                        jniByteArrayHolder.allocate(bufferSize)?.apply {
                            printMemStats("After buffer allocation")
                        }
                    } else {
                        null
                    }
    
                if (byteBuffer == null) {
                    Log.d("Applog", "Failed to allocate $bufferSize bytes of native memory.")
                } else {
                    Log.d("Applog", "Allocated ${Formatter.formatFileSize(this, bufferSize)} buffer.")
                    val inBytes = ByteArray(4096)
                    Log.d("Applog", "Starting buffered read...")
                    while (inStream.available() > 0) {
                        byteBuffer.put(inBytes, 0, inStream.read(inBytes))
                    }
                    inStream.close()
                    byteBuffer.flip()
                    ZipFile(ByteBufferChannel(byteBuffer)).use {
                        Log.d("Applog", "Starting Zip file name dump...")
                        for (entry in it.entries) {
                            Log.d("Applog", "Zip name: ${entry.name}")
                            val zis = it.getInputStream(entry)
                            while (zis.available() > 0) {
                                zis.read(inBytes)
                            }
                        }
                    }
                    printMemStats("Before buffer release:")
                    jniByteArrayHolder.freeBuffer(byteBuffer)
                    printMemStats("After buffer release:")
                }
                runOnUiThread {
                    status.text = getString(R.string.idle)
                    button.isEnabled = true
                    Log.d("Applog", "Done!")
                }
            }
        }
    
        /*
            This function is a little misleading since it does not reflect the true status of memory.
            After native buffer allocation, it waits until the memory is used before counting is as
            used. After release, it doesn't seem to count the memory as released until garbage
            collection. (My observations only.) Also, see the comment for memset() in native-lib.cpp
            which is a member of this project.
        */
        private fun printMemStats(desc: String? = null) {
            val memoryInfo = ActivityManager.MemoryInfo()
            (getSystemService(Context.ACTIVITY_SERVICE) as ActivityManager).getMemoryInfo(memoryInfo)
            val nativeHeapSize = memoryInfo.totalMem
            val nativeHeapFreeSize = memoryInfo.availMem
            val usedMemInBytes = nativeHeapSize - nativeHeapFreeSize
            val usedMemInPercentage = usedMemInBytes * 100 / nativeHeapSize
            val sDesc = desc?.run { "$this:\n" }
            Log.d(
                "AppLog", "$sDesc total:${Formatter.formatFileSize(this, nativeHeapSize)} " +
                        "free:${Formatter.formatFileSize(this, nativeHeapFreeSize)} " +
                        "used:${Formatter.formatFileSize(this, usedMemInBytes)} ($usedMemInPercentage%)"
            )
        }
    
        // Not a great way to do this but not the object of the demo.
        private fun getFileSize(inStream: InputStream): Long {
            var bufferSize = 0L
            while (inStream.available() > 0) {
                val toSkip = inStream.available().toLong()
                inStream.skip(toSkip)
                bufferSize += toSkip
            }
            return bufferSize
        }
    }
    

    示例 GitHub 存储库是 here .

    关于android - 如何从 RAM 中完全解析压缩文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61652063/

    相关文章:

    android - 调用 onCreate() 方法而不是 onNewIntent()

    android - 跳过上载以丢失aar和pom文件

    android - 使用没有edittext android的customview获取输入文本

    php - yy_create_buffer() 中的动态内存不足

    c - Strtok 使用动态内存

    ruby-on-rails - Heroku Dyno 上的总内存上升

    java - JNI 对象创建和内存管理

    java - Stanford CoreNLP python接口(interface)安装错误

    java - OSGi 服务 : java. lang.UnsatisfiedLinkError 调用 DLL - 当 DLL 作为单元测试调用时没有错误

    Android Activity 无法解析符号 ACCESS_BACKGROUND_LOCATION