我有一些数据清理任务。我有一个专栏,从 H6 开始,然后从他们开始。该列包含本应在 snake_case 中的数据,但事实并非如此。单元格值的形式为:
- 使用驼峰式大小写:“CamelCase”
- 带空格:“Spaced Value”
- 有一些初始调用上限:ALLCAPSPREFIX_rest
- 以上的组合
我知道没有具体的算法可以将这一切都带到 snake_case 中,但我想提出至少可以将大多数单元格带到 snake_case 中的代码。
我尝试用 VBA 代码用下划线替换空格并获取下划线的索引。现在我想把下划线后面的所有字符都变成小写。此外,我正在考虑替换两个字符的序列:第一个小写字母和下一个大写字母,将 lC
说成 l_c
因为我不希望 CCC
转换为 c_c_c
,但转换为 ccc
。但在进一步推进之前,我想知道是否有更简单的方法。
最佳答案
这里有一种方法可以满足您的需求:
Option Explicit
Function Snake_case(s As String) As String
Dim RE As Object
Const sPat As String = "([A-Za-z0-9]+)(?=[ _A-Z])[ _]?(\S+)"
Const sRepl As String = "$1_$2"
Dim v As Variant
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.ignorecase = False
.Pattern = sPat
v = Split(.Replace(s, sRepl), "_")
End With
v(0) = WorksheetFunction.Proper(v(0))
v(1) = LCase(v(1))
Snake_case = Join(v, "_")
End Function
下面是对正则表达式和替换字符串的解释:
蛇形转换
([A-Za-z0-9]+)(?=[ _A-Z])[ _]?(\S+)
选项:区分大小写; ^$ 匹配换行符
- Match the regex below and capture its match into backreference number 1
([A-Za-z0-9]+)
- Match a single character present in the list below
[A-Za-z0-9]+
- Match a single character present in the list below
- Assert that the regex below can be matched starting at this position (positive lookahead)
(?=[_A-Z])
- Match a single character from the list “ _”
[_]?
- Match the regex below and capture its match into backreference number 2
(\S+)
$1_$2
- Insert the text that was last matched by capturing group number 1
$1
- Insert the character “_” literally
_
- Insert the text that was last matched by capturing group number 2
$2
使用 RegexBuddy 创建
关于regex - 将单元格值转换为 snake_case,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55221222/