php - SVN合并具有不同字符集的分支

标签 php svn utf-8 merge

我们有一个网站,我们在其中使用 ISO-8859-1 并希望迁移到 UTF-8。它是用 PHP 制作的,过程简单且有详细记录。

在我们的例子中,因为我们在不同的国家/地区都有这个网站,所以我们想在一个国家/地区尝试一下。我们这样做很多次。我们遵循的结构非常简单:分支代码主干并将去分支代码部署到生产环境。为了保持分支更新,我们只是将更改从主干合并到分支,直到我们重新集成并关闭此功能分支。

我们希望仅在一个国家/地区或其他国家/地区对其进行测试,以减少我们犯错时的影响。

对于任何其他类型的更改,它工作得很好,但在这种情况下,在移动到 UTF-8 之后,我将无法对分支进行合并主干更改以使其保持最新。

我一直试图找到与此相关的东西但没有成功。

您知道是否有任何方法可以正确地合并不同的字符集吗?

非常感谢, 格雷戈

最佳答案

我遇到了同样的问题,我通过以下方式解决了:

使用 chardet 包安装 python3 pip install chardet。安装 diff3 util,在 Windows 上你可以从 MinGW 获得它。

编辑 svn 配置文件(在 Windows %APPDATA%\Subversion\config 上)

[helpers]
diff3-cmd = C:\\diff3wrap.bat # note \\ instead of \

所在 C:\diff3wrap.bat

@echo off
SETLOCAL ENABLEEXTENSIONS
set pythondir=path\to\python3\dir
set mingwdir=path\to\diff3\dir

set pythonpath=%pythondir%lib\site-packages\;%pythonpath%
set path=%pythondir%;%mingwdir%;%path%
rem svn pass to diff3-cmd arguments suitable for diff3 util
rem e.g. -E -m -L .working -L .merge-left.r5 -L .merge-right.r6 path\to\temp\local\file path\to\temp\base\file path\to\temp\remote\file
python C:\diff3.py %* 

其中 C:\diff3.py

#!python3
import codecs
import sys
from subprocess import Popen, PIPE

from chardet.langcyrillicmodel import Ibm866Model, Win1251CyrillicModel
from chardet.sbcharsetprober import SingleByteCharSetProber
from chardet.universaldetector import UniversalDetector
from chardet.utf8prober import UTF8Prober

detector = UniversalDetector()
# leave only necessary probers in order to speed up encoding detection
detector._mCharSetProbers = [ # in new chardet use _charset_probers
    UTF8Prober(),
    SingleByteCharSetProber(Ibm866Model),
    SingleByteCharSetProber(Win1251CyrillicModel)]


def detect_encoding(file_path):
    detector.reset()
    for line in open(file_path, 'rb'):
        detector.feed(line)
        if detector.done:
            break
    detector.close()
    encoding = detector.result["encoding"]
    # treat ascii files as utf-8
    return 'utf-8' if encoding == 'ascii' else encoding


def iconv(file_path, from_encoding, to_encoding):
    if from_encoding == to_encoding:
        return
    with codecs.open(file_path, 'r', from_encoding) as i:
        text = i.read()
    write_to_file(file_path, text, to_encoding)


def write_to_file(file_path, text, to_encoding):
    with codecs.open(file_path, 'bw') as o:
        write_bytes_to_stream(o, text, to_encoding)


def write_bytes_to_stream(stream, text, to_encoding):
    # if you want BOM in your files you should add it by hand
    if to_encoding == "UTF-16LE":
        stream.write(codecs.BOM_UTF16_LE)
    elif to_encoding == "UTF-16BE":
        stream.write(codecs.BOM_UTF16_BE)
    stream.write(text.encode(to_encoding, 'ignore'))


def main():
    # in tortoise svn when press 'merge' button in commit dialog, some arguments are added that diff3 tool doesn't know
    for f in ['--ignore-eol-style', '-w']:
        if f in sys.argv:
            sys.argv.remove(f)

    # ['diff3.py', '-E', '-m', '-L', '.working', '-L', '.merge-left.r5',  '-L', '.merge-right.r6',
    # 'local_path', 'base_path', 'remote_path']
    local_path = sys.argv[-3]
    local_encoding = detect_encoding(local_path)

    base_path = sys.argv[-2]
    base_encoding = detect_encoding(base_path)

    remote_path = sys.argv[-1]
    remote_encoding = detect_encoding(remote_path)

    # diff3 doesn't work with utf-16 that's why you have to convert all files to utf-8
    aux_encoding = 'utf-8'
    iconv(local_path, local_encoding, aux_encoding)
    iconv(base_path, base_encoding, aux_encoding)
    iconv(remote_path, remote_encoding, aux_encoding)

    sys.argv[0] = 'diff3'
    p = Popen(sys.argv, stdout=PIPE, stderr=sys.stderr)
    stdout = p.communicate()[0]
    result_text = stdout.decode(aux_encoding)
    write_bytes_to_stream(sys.stdout.buffer, result_text, local_encoding)

    # in case of conflict svn copy temp base file and temp remote file next to your file in working copy
    # with names like your_file_with_conflict.merge-left.r5 and your_file_with_conflict.merge-right.r6
    # if you resolve conflicts using merge tool, it will use this files
    # if this files and file in your working copy have different encodings,
    # then after conflict resolution your working file change encoding and this is bad
    # that's why you have to convert temp files to local file encoding
    iconv(base_path, aux_encoding, local_encoding)
    iconv(remote_path, aux_encoding, local_encoding)
    sys.exit(p.returncode)


if __name__ == '__main__':
    main()

关于php - SVN合并具有不同字符集的分支,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27292932/

相关文章:

php - 我怎样才能拥有一系列电子邮件并确保它们都已发送?

phpmyadmin : no database selected

php - 需要便宜的源代码控制

php - 无法在 Debian 中安装 php5-mysql

java - 如何在 Java 中执行此 MutliArray 设置?

ios - 如何从 xcode 6.4 提交代码到 SVN?

android - 下载 2.3.3 的 android 源代码并将其作为 eclipse 项目打开

python - "utf-8-sig"是否适契约(Contract)时解码 UTF-8 和 UTF-8 BOM?

javascript - 如何使用 javascript 将 html 中的表单字段转换为 UTF8?

php - Mysql/Phpmyadmin 中的德语元音变音