google-cloud-platform - 使用启动脚本时首次重新启动后无法访问虚拟机

标签 google-cloud-platform google-compute-engine

我在 GCP 的 CentOS7 VM 中使用以下启动脚本。首次重新启动后即可访问该 URL。但是,如果我重新启动机器或停止然后启动机器,则机器将无法访问,并且 url 也不起作用。我虽然这可能是由于selinux,所以我添加了代码来禁用selinux,但结果仍然相同。我通过创建多个新虚拟机尝试了这一点,但看起来有些东西我无法弄清楚。当我在虚拟机上手动执行此脚本并尝试多次重新启动时,我没有遇到任何问题。

#!/bin/bash -xe

# introducing sleep so network interfaces and routes can get ready before fetching software
sleep 10

if rpm -q --quiet httpd ; then 
    echo "installed"
else
  yum update -y
  yum install -y httpd php php-common
  setenforce 0
  sed -i.bak -e 's/^SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
  
cat > /var/www/html/index.php <<'EOF'
<?php
function metadata_value($value) {
    $opts = array(
        "http" => array(
            "method" => "GET",
            "header" => "Metadata-Flavor: Google"
        )
    );
    $context = stream_context_create($opts);
    $content = file_get_contents("http://metadata/computeMetadata/v1/$value", false, $context);
    return $content;
}
if ($_SERVER['HTTP_X_FORWARDED_PROTO'] == "http") {
        $redirect = 'https://' . $_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI'];
        header('HTTP/1.1 301 Moved Permanently');
        header('Location: ' . $redirect);
        exit();
}
?>

<!doctype html>
<html>
<head>
<!-- Compiled and minified CSS -->
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/materialize/0.97.0/css/materialize.min.css">

<!-- Compiled and minified JavaScript -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/materialize/0.97.0/js/materialize.min.js"></script>
<title>Frontend Web Server</title>
</head>
<body>
<div class="container">
<div class="row">
<div class="col s2">&nbsp;</div>
<div class="col s8">

<img src="/assets/gcp-logo.svg"/>

<div class="card blue">
<div class="card-content white-text">
<div class="card-title">Backend that serviced this request</div>
</div>
<div class="card-content white">
<table class="bordered">
  <tbody>
    <tr>
      <td>Name</td>
      <td><?php printf(metadata_value("instance/name")) ?></td>
    </tr>
    <tr>
      <td>ID</td>
      <td><?php printf(metadata_value("instance/id")) ?></td>
    </tr>
    <tr>
      <td>Hostname</td>
      <td><?php printf(metadata_value("instance/hostname")) ?></td>
    </tr>
    <tr>
      <td>Zone</td>
      <td><?php printf(metadata_value("instance/zone")) ?></td>
    </tr>
    <tr>
      <td>Machine Type</td>
      <td><?php printf(metadata_value("instance/machine-type")) ?></td>
    </tr>
    <tr>
      <td>Project</td>
      <td><?php printf(metadata_value("project/project-id")) ?></td>
    </tr>
    <tr>
      <td>Internal IP</td>
      <td><?php printf(metadata_value("instance/network-interfaces/0/ip")) ?></td>
    </tr>
    <tr>
      <td>External IP</td>
      <td><?php printf(metadata_value("instance/network-interfaces/0/access-configs/0/external-ip")) ?></td>
    </tr>
  </tbody>
</table>
</div>
</div>

<div class="card blue">
<div class="card-content white-text">
<div class="card-title">Proxy that handled this request</div>
</div>
<div class="card-content white">
<table class="bordered">
  <tbody>
    <tr>
      <td>Address</td>
      <td><?php printf($_SERVER["HTTP_HOST"]); ?></td>
    </tr>
  </tbody>
</table>
</div>

</div>
</div>
<div class="col s2">&nbsp;</div>
</div>
</div>
</html>
EOF

mkdir -p /var/www/html/group1 && cp /var/www/html/index.php /var/www/html/group1/index.php

systemctl enable httpd
systemctl restart httpd

fi

在控制台上我可以看到以下输出

serialport: Connected to mytower.us-central1-a.centos7 port 1 (session ID: 405c4d17b926f0906f45a53784d4abd379d6480d, active connections: 1).
DS   - 0000000000000030, ES  - 0000000000000030, FS  - 0000000000000030
GS   - 0000000000000030, SS  - 0000000000000030
CR0  - 0000000080010033, CR2 - 0000000000000000, CR3 - 00000000BF401000
CR4  - 0000000000000668, CR8 - 0000000000000000
DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3  - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 00000000BF3EEA98 0000000000000047, LDTR - 0000000000000000
IDTR - 00000000BEE1F018 0000000000000FFF,   TR - 0000000000000000
FXSAVE_STATE - 00000000BFF39AB0
!!!! Find image based on IP(0xBF2E6D5C) /build/work/af60adde42b1d1ad5be2a01e4924bb905248/google3/blaze-out/k8-opt/genfiles/third_party/edk2/ovmf_x64_csm_debug_workspace_dir/ovmf_x64_csm_debug_edk2_files_dir/Build/OvmfX64/DEBUG_CLANG38/X64/OvmfPkg/8254TimerDxe/8254Timer/DEBUG/Timer.dll (ImageBase=00000000BF2E5000, EntryPoint=00000000BF2E6AB5) !!!!

有人遇到过这样的问题吗?请求帮助我找出为什么虚拟机在重新启动后无法访问。

谢谢

最佳答案

这是已知问题,Google 工程师也意识到了这一点:

We are currently experiencing an issue with Google Compute Engine instances running RHEL and CentOS 7 and 8. More details on this issue are available in the following article and bugs:

Symptoms: Instances running RHEL and CentOS 7 and 8 that run yum update may fail to boot after restart with errors messages referring to a combination of:

  • "X64 Exception Type - 0D(#GP - General Protection) CPU Apic ID",
  • "FXSAVE_STATE",
  • or "Find image based on IP".

This issue affects instances with specific versions of the shim package installed. To find the currently installed shim version, use the following command: rpm -q shim-x64

Affected shim versions: CentOS 7: shim-x64-15-7.el7_9.x86_64 CentOS 8: shim-x64-15-13.el8.x86_64 RHEL 7: shim-x64-15-7.el7_8.x86_64 RHEL 8: shim-x64-15-14.el8_2.x86_64

Workaround: Do not update or reboot instances running RHEL or CentOS 7 and 8. If you are on an affected shim version, run yum downgrade shim\* grub2\* mokutil to downgrade to the correct version. This command may not work on CentOS 8. If you have already rebooted, you will need to attach the disk to a working instance (that has not been updated with the problematic shim binary), and copy over the working shim binary to the relevant EFI directory on the mounted disk. For RHEL, this is /boot/efi/EFI/redhat/shimx64.efi. For CentOS, this is /boot/efi/EFI/centos/shimx64.efi

Please follow the redhat thread for realtime updates. We will update here once the issue is resolved on our end.

我建议您加入并关注this case在 Google 问题跟踪器中。

此外,您还可以通过 Google Cloud Status Dashboard 检查此问题的状态.

关于google-cloud-platform - 使用启动脚本时首次重新启动后无法访问虚拟机,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63187796/

相关文章:

kubernetes - 等待集群初始化超时,节点自动升级失败/或运行出错

google-cloud-platform - 谷歌云工作流 : Reach Private VPC

kubernetes - 确保GCE/GKE上的XFS卷

python - 加载大 pickle 时,Flask 应用程序中的 Google App Engine gunicorn 工作超时?

google-cloud-platform - 谷歌云实例的用户名 - 在哪里可以找到

javascript - 无法加载默认凭据? (Node.js 谷歌计算引擎教程)

python - 在 GAE 上部署 cron 作业时出现内部服务器错误

google-compute-engine - 默认 Compute Engine 服务帐户无法访问 Cloud SQL

google-app-engine - 替代 Google Cloud Function 的替代方法

google-compute-engine - Google Cloud Platform - 如何按标签过滤instance.list?