python - 使用Python 're.split' Unicode字符

标签 python regex unicode compiler-errors

我正在尝试将以下数据字段拆分为3个字段(pre，match和suf)，并将其输入到逗号分隔的txt文件中。我正在从一个csv文件中读取所有内容...这是utf-8数据。

我现在的问题是我无法解决“TypeError:强制转换为Unicode:需要字符串或缓冲区，找到列表”错误...但是，看到我已经尝试设置编码，我不知道犯规在哪里...

样本数据:

 A-1 طس
 TX 35-L
 Av Rib

对此进行拆分应该(\ d +(-？[NSEW])？)为我提供以下内容:

Column1 | Column2 | Column3
A       |1        |طس
TX      |35       |-L
Av Rib  |         |

我当前的代码是这样的:

## Iterate over csv file to create matches and splits 
## string according to regex pattern..

    reader = csv.reader(csvfile)

    with codecs.open(r'file.txt', 'w', 'utf-8') as outfile1:
        for row in reader:
           unicode_row = [x.decode('utf-8') for x in row]
           item = unicode_row[1]
           parsed = re.compile("\d+(-?[NSEW])?", re.UNICODE).split(unicode(item, 'utf-8'))
           outfile1.write(parsed + "\n")

最佳答案

您的错误是因为parsed是列表列表。

http://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects

关于python - 使用Python 're.split' Unicode字符，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20189150/

上一篇：scalac 在 org.squeryl.Table 中找不到正确的插入方法

下一篇：windows - Dev-C++为错误版本的Windows创建EXE？

相关文章：

Javascript - 如何使用 RegExp 识别模式

Java正则表达式替换原始字符串中的组值

javascript - Javascript 中的正则表达式用于测试各种日期

html - 奇怪的 HTML/XML 编码问题

从R中的字符串中删除表情符号

Python tkinter 按钮绑定(bind)

python - 处理 panda to_datetime 函数中的时区

python - 将形容词和副词转换为名词形式

python - 多列+条件连接

ios - 如何在 Swift 中将字符串转换为 unicode(UTF-8) 字符串？