assembly - 在移动垃圾收集实现中,内存引用是如何定位的?

标签 assembly compiler-construction garbage-collection

在移动的垃圾收集器中,必须有一种精确的方法来区分堆栈和堆上的哪些值是引用,哪些是立即值。在我读过的大多数关于垃圾收集的文献中,这个细节似乎都被掩盖了。

我已经研究了为每个堆栈帧分配一些前导码是否有效,例如,在调用之前描述每个参数。但可以肯定的是,这一切都是将问题转移到间接的上层。那么,在 GC 周期期间遍历前导码以获取立即值或引用时,如何区分前导码和堆栈帧呢?

有人可以解释一下这是如何在现实世界中实现的吗?

这里是这个问题的示例程序,使用第一类函数词法闭包及其堆栈帧和位于堆上的父环境的图表:

示例程序

def foo(x) = {
    def bar(y,z) = {
        return x + y + z
    }
    return bar
}


def main() = {
    let makeBar = foo(1)
    makeBar(2,3)
}

调用时 Bar 的堆栈框架:

bar's stackframe during invocation

在此示例中,bar 的堆栈帧有一个局部变量 x,它是指向堆上值的指针,其中参数 y 和 z 是立即整数值。

我读到 Objective CAML 对放置在堆栈上的每个值使用一个标记位,该标记位为每个值添加前缀。允许在 GC 周期期间对每个值进行二进制 ref-or-imm 检查。但这可能会产生一些不需要的副作用。整数仅限于 31 位,并且需要调整原始计算的动态代码生成来补偿这一点。简而言之——感觉有点太脏了。必须有一个更优雅的解决方案。

是否可以静态地了解并访问这些信息?比如以某种方式将类型信息传递给垃圾收集器?

最佳答案

Could somebody explain how this is implemented in the real world?

有几种可能的方法

  • 保守的堆栈扫描。一切都被视为潜在的指针。这会导致 GC 不精确。不精确的扫描会导致对象无法重新定位,从而阻碍semi-space/compacting GCs的实现或使其复杂化。 .
  • 按照您提到的方式标记位。这可以认为是稍微不太保守的扫描,但仍然不精确
  • 编译器保留了确切的堆栈布局的知识,即在任何给定时间指针所在的位置。由于这可能会随着指令的不同而变化,并且指针也可以驻留在寄存器中,因此这将非常复杂。
    作为一种简化,它仅针对特定点进行,在这些特定点上,所有线程都可以使用已知的参数将控制权协同移交给 GC。当另一个线程请求 GC 时的堆栈布局。这称为安全点(如下所述)。
  • 其他机制也可能是可行的,例如将堆栈划分为引用和非引用条目,并始终确保注册的引用也在堆栈上的某个位置,但我不知道这种方法有多实用

Gil Tene对于什么是安全点,有一个很好的(尽管主要是 JVM 特定的)解释,所以我将在这里引用相关部分:

Here is a collection of statement about "what is a safepoint" that attempt to be both correct and somewhat precise:

  1. A thread can be at a safepoint or not be at a safepoint. When at a safepoint, the thread's representation of it's Java machine state is well described, and can be safely manipulated and observed by other threads in the JVM. When not at a safepoint, the thread's representation of the java machine state will NOT be manipulated by other threads in the JVM. [Note that other threads do not manipulate a thread's actual logical machine state, just it's representation of that state. A simple example of changing the representation of machine state is changing the virtual addresss that a java reference stack variable points to as a result of relocating that object. The logical state of the reference variable is not affected by this change, as the reference still refers to the same object, and two references variable referring to the same object will still be logically equal to each other even if they temporarily point to different virtual addresses].

[...]

  1. All [practical] JVMs apply some highly efficient mechanism for frequently crossing safepoint opportunities, where the thread does not actually enter a safepoint unless someone else indicates the need to do so. E.g. most call sites and loop backedges in generated code will include some sort of safepoint polling sequence that amounts to "do I need to go to a safepoint now?". Many HotSpot variants (OpenJDK and Oracle JDK) currently use a simple global "go to safepoint" indicator in the form of a page that is protected when a safepoint is needed, and unprotected otherwise. The safepoint polling for this mechanism amounts to a load from a fixed address in that page. If the load traps with a SEGV, the thread knows it needs to go to enter a safepoint. Zing uses a different, per-thread go-to-safepoint indicator of similar efficiency.

[...]

关于assembly - 在移动垃圾收集实现中,内存引用是如何定位的?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33928481/

相关文章:

c++ - 如何从外部代码和 Makefile 启动 CodeBlocks 项目?

java - Java 的垃圾收集器会中断一个线程吗?

c++ - 如何停止线程并将其寄存器刷新到堆栈中?

x86-64 程序集的性能优化 - 对齐和分支预测

assembly - 试图理解汇编指令 : cltd on x86

assembly - 操作系统 不同架构的汇编语言

c - 为什么 C 程序每次引用一个时都会说 'struct'?

c++ - 在 C++ 中包含目录时指定完整路径

java - 应用程序启动 5 小时后进行 Full GC,耗时 40 秒

c - GCC 以双机器字类型(包括 asm)访问高/低机器字