我们有一个网站,我们在其中使用 ISO-8859-1 并希望迁移到 UTF-8。它是用 PHP 制作的,过程简单且有详细记录。
在我们的例子中,因为我们在不同的国家/地区都有这个网站,所以我们想在一个国家/地区尝试一下。我们这样做很多次。我们遵循的结构非常简单:分支代码主干并将去分支代码部署到生产环境。为了保持分支更新,我们只是将更改从主干合并到分支,直到我们重新集成并关闭此功能分支。
我们希望仅在一个国家/地区或其他国家/地区对其进行测试,以减少我们犯错时的影响。
对于任何其他类型的更改,它工作得很好,但在这种情况下,在移动到 UTF-8 之后,我将无法对分支进行合并主干更改以使其保持最新。
我一直试图找到与此相关的东西但没有成功。
您知道是否有任何方法可以正确地合并不同的字符集吗?
非常感谢, 格雷戈
最佳答案
我遇到了同样的问题,我通过以下方式解决了:
使用 chardet 包安装 python3 pip install chardet
。安装 diff3 util,在 Windows 上你可以从 MinGW 获得它。
编辑 svn 配置文件(在 Windows %APPDATA%\Subversion\config 上)
[helpers]
diff3-cmd = C:\\diff3wrap.bat # note \\ instead of \
所在 C:\diff3wrap.bat
@echo off
SETLOCAL ENABLEEXTENSIONS
set pythondir=path\to\python3\dir
set mingwdir=path\to\diff3\dir
set pythonpath=%pythondir%lib\site-packages\;%pythonpath%
set path=%pythondir%;%mingwdir%;%path%
rem svn pass to diff3-cmd arguments suitable for diff3 util
rem e.g. -E -m -L .working -L .merge-left.r5 -L .merge-right.r6 path\to\temp\local\file path\to\temp\base\file path\to\temp\remote\file
python C:\diff3.py %*
其中 C:\diff3.py
#!python3
import codecs
import sys
from subprocess import Popen, PIPE
from chardet.langcyrillicmodel import Ibm866Model, Win1251CyrillicModel
from chardet.sbcharsetprober import SingleByteCharSetProber
from chardet.universaldetector import UniversalDetector
from chardet.utf8prober import UTF8Prober
detector = UniversalDetector()
# leave only necessary probers in order to speed up encoding detection
detector._mCharSetProbers = [ # in new chardet use _charset_probers
UTF8Prober(),
SingleByteCharSetProber(Ibm866Model),
SingleByteCharSetProber(Win1251CyrillicModel)]
def detect_encoding(file_path):
detector.reset()
for line in open(file_path, 'rb'):
detector.feed(line)
if detector.done:
break
detector.close()
encoding = detector.result["encoding"]
# treat ascii files as utf-8
return 'utf-8' if encoding == 'ascii' else encoding
def iconv(file_path, from_encoding, to_encoding):
if from_encoding == to_encoding:
return
with codecs.open(file_path, 'r', from_encoding) as i:
text = i.read()
write_to_file(file_path, text, to_encoding)
def write_to_file(file_path, text, to_encoding):
with codecs.open(file_path, 'bw') as o:
write_bytes_to_stream(o, text, to_encoding)
def write_bytes_to_stream(stream, text, to_encoding):
# if you want BOM in your files you should add it by hand
if to_encoding == "UTF-16LE":
stream.write(codecs.BOM_UTF16_LE)
elif to_encoding == "UTF-16BE":
stream.write(codecs.BOM_UTF16_BE)
stream.write(text.encode(to_encoding, 'ignore'))
def main():
# in tortoise svn when press 'merge' button in commit dialog, some arguments are added that diff3 tool doesn't know
for f in ['--ignore-eol-style', '-w']:
if f in sys.argv:
sys.argv.remove(f)
# ['diff3.py', '-E', '-m', '-L', '.working', '-L', '.merge-left.r5', '-L', '.merge-right.r6',
# 'local_path', 'base_path', 'remote_path']
local_path = sys.argv[-3]
local_encoding = detect_encoding(local_path)
base_path = sys.argv[-2]
base_encoding = detect_encoding(base_path)
remote_path = sys.argv[-1]
remote_encoding = detect_encoding(remote_path)
# diff3 doesn't work with utf-16 that's why you have to convert all files to utf-8
aux_encoding = 'utf-8'
iconv(local_path, local_encoding, aux_encoding)
iconv(base_path, base_encoding, aux_encoding)
iconv(remote_path, remote_encoding, aux_encoding)
sys.argv[0] = 'diff3'
p = Popen(sys.argv, stdout=PIPE, stderr=sys.stderr)
stdout = p.communicate()[0]
result_text = stdout.decode(aux_encoding)
write_bytes_to_stream(sys.stdout.buffer, result_text, local_encoding)
# in case of conflict svn copy temp base file and temp remote file next to your file in working copy
# with names like your_file_with_conflict.merge-left.r5 and your_file_with_conflict.merge-right.r6
# if you resolve conflicts using merge tool, it will use this files
# if this files and file in your working copy have different encodings,
# then after conflict resolution your working file change encoding and this is bad
# that's why you have to convert temp files to local file encoding
iconv(base_path, aux_encoding, local_encoding)
iconv(remote_path, aux_encoding, local_encoding)
sys.exit(p.returncode)
if __name__ == '__main__':
main()
关于php - SVN合并具有不同字符集的分支,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27292932/