python - Python3 中德语元音变音的编码/解码

标签 python python-3.x string character-encoding

我遇到了一个问题,我花了几个小时才解决它。我确信这只是一件小事,但不知何故我不知道我做错了什么。

我的目标是通过 json 从公共(public)交通公司获取数据,并在显示屏上显示地铁/电车的下一个出发时间。基本上一切正常,但一旦 json 返回变音符号(如“ü”),我就会收到一条错误消息。有趣的是:升号 s (ß) 有效!

这是确切的错误消息(应该是“Hütteldorf”):

UnicodeEncodeError('ascii', u'H\xfctteldorf', 1, 2, 'ordinal not in range(128)')

部分代码:

...
    apiurl = 'https://www.wienerlinien.at/ogd_realtime/monitor?rbl={rbl}&sender={apikey}'

...

        for rbl in rbls:
            r = requests.get(url, timeout=10)

            ##r.encoding = 'utf-8';
            ##print(r.json())
            ##print(r.encoding)
            ##r.encoding = 'latin1'

            if requests.codes.ok:
                try:
                    for monitor in r.json()['data']['monitors']:
                        rbl.station = monitor['locationStop']['properties']['title'].encode('utf-8')
                        for line in monitor['lines']:

                            #Decoding-Problem is here - ß works, ü doesn't
                            #UnicodeEncodeError('ascii', u'H\xfctteldorf', 1, 2, 'ordinal not in range(128)')
                            rbl.name = str(line['name'])
                            rbl.direction = str(line['towards'])

                            rbl.trafficjam = line['trafficjam'] #Boolean
...

我个人认为我尝试了 Python3 中所有可能的方法...编码、解码...每次升号 s 或元音变音 ü 失败时。

有人能给我正确方向的提示吗? 非常感谢!

[编辑:] 这是完整的源代码,其中有一个解决方法(ü=ue):

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys, getopt, time
import requests
import smbus

# Define some device parameters
I2C_ADDR  = 0x27 # I2C device address, if any error, change this address to 0x3f
LCD_WIDTH = 20   # Maximum characters per line

# Define some device constants
LCD_CHR = 1 # Mode - Sending data
LCD_CMD = 0 # Mode - Sending command

LCD_LINE_1 = 0x80 # LCD RAM address for the 1st line
LCD_LINE_2 = 0xC0 # LCD RAM address for the 2nd line
LCD_LINE_3 = 0x94 # LCD RAM address for the 3rd line
LCD_LINE_4 = 0xD4 # LCD RAM address for the 4th line

LCD_BACKLIGHT  = 0x08  # On
#LCD_BACKLIGHT = 0x00  # Off

ENABLE = 0b00000100 # Enable bit

# Timing constants
E_PULSE = 0.0005
E_DELAY = 0.0005

#Open I2C interface
bus = smbus.SMBus(1) # Rev 2 Pi uses 1

class RBL:
    id = 0
    line = ''
    station = ''
    direction = ''
    time = -1

def replaceUmlaut(s):
    s = s.replace("Ä", "Ae") # A umlaut
    s = s.replace("Ö", "Oe") # O umlaut
    s = s.replace("Ü", "Ue") # U umlaut
    s = s.replace("ä", "ae") # a umlaut
    s = s.replace("ö", "oe") # o umlaut
    s = s.replace("ü", "ue") # u umlaut
    return s

def lcd_init():
  # Initialise display
  lcd_byte(0x33,LCD_CMD) # 110011 Initialise
  lcd_byte(0x32,LCD_CMD) # 110010 Initialise
  lcd_byte(0x06,LCD_CMD) # 000110 Cursor move direction
  lcd_byte(0x0C,LCD_CMD) # 001100 Display On,Cursor Off, Blink Off 
  lcd_byte(0x28,LCD_CMD) # 101000 Data length, number of lines, font size
  lcd_byte(0x01,LCD_CMD) # 000001 Clear display
  time.sleep(E_DELAY)

def lcd_byte(bits, mode):
  # Send byte to data pins
  # bits = the data
  # mode = 1 for data
  #        0 for command

  bits_high = mode | (bits & 0xF0) | LCD_BACKLIGHT
  bits_low = mode | ((bits<<4) & 0xF0) | LCD_BACKLIGHT

  # High bits
  bus.write_byte(I2C_ADDR, bits_high)
  lcd_toggle_enable(bits_high)

  # Low bits
  bus.write_byte(I2C_ADDR, bits_low)
  lcd_toggle_enable(bits_low)

def lcd_toggle_enable(bits):
  # Toggle enable
  time.sleep(E_DELAY)
  bus.write_byte(I2C_ADDR, (bits | ENABLE))
  time.sleep(E_PULSE)
  bus.write_byte(I2C_ADDR,(bits & ~ENABLE))
  time.sleep(E_DELAY)

def lcd_string(message,line):
  # Send string to display

  message = message.ljust(LCD_WIDTH," ")

  lcd_byte(line, LCD_CMD)

  for i in range(LCD_WIDTH):
    lcd_byte(ord(message[i]),LCD_CHR)


def main(argv):

    apikey = False
    apiurl = 'https://www.wienerlinien.at/ogd_realtime/monitor?rbl={rbl}&sender={apikey}'

    #Time between updates
    st = 10

    # Initialise display
    lcd_init()
    lcd_string("Willkommen!",LCD_LINE_2)

    try:
        opts, args = getopt.getopt(argv, "hk:t:", ["help", "key=", "time="])
    except getopt.GetoptError:
        usage()
        sys.exit(2)
    for opt, arg in opts:
        if opt in ("-h", "--help"):
            usage()
            sys.exit()
        elif opt in ("-k", "--key"):
            apikey = arg
        elif opt in ("-t", "--time"):
            try:
                tmpst = int(arg)
                if tmpst > 0:
                    st = tmpst
            except ValueError:
                usage()
                sys.exit(2)


    if apikey == False or len(args) < 1:
        usage()
        sys.exit()

    rbls = []
    for rbl in args:
        tmprbl = RBL()
        tmprbl.id = rbl
        rbls.append(tmprbl)

    x = 1
    while True:
        for rbl in rbls:
            url = apiurl.replace('{apikey}', apikey).replace('{rbl}', rbl.id)
            r = requests.get(url, timeout=10)
            r.encoding = 'utf-8'

            if requests.codes.ok:
                try:
                    for monitor in r.json()['data']['monitors']:
                        rbl.station = monitor['locationStop']['properties']['title']
                        for line in monitor['lines']:

                            rbl.name = replaceUmlaut(str(line['name'].encode('ascii','xmlcharrefreplace').decode('ascii')))
                rbl.direction = replaceUmlaut(str(line['towards'].encode('ascii','xmlcharrefreplace').decode('ascii')))

                            rbl.trafficjam = line['trafficjam']
                            rbl.type = line['type']
                            rbl.time1 = line['departures']['departure'][0]['departureTime']['countdown']
                            rbl.time2 = line['departures']['departure'][1]['departureTime']['countdown']
                            rbl.time3 = line['departures']['departure'][2]['departureTime']['countdown']

                            lcdShow(rbl)
                            time.sleep(st)

                except Exception as e:
                    print("Fehler (Exc): " + repr(e))
                    print(r)
                    lcd_string("Fehler (Exc):",LCD_LINE_1)
                    lcd_string(repr(e),LCD_LINE_2)
                    lcd_string("",LCD_LINE_3)
                    lcd_string("",LCD_LINE_4)
            else:
                print('Fehler bei Kommunikation mit Server')
                lcd_string("Fehler:",LCD_LINE_1)
                lcd_string("Serverkomm.",LCD_LINE_2)
                lcd_string("",LCD_LINE_3)
                lcd_string("",LCD_LINE_4)

def lcdShow(rbl):
    lcdLine1 = rbl.name + ' ' + rbl.station
    lcdLine2 = rbl.direction

    lcdLine3 = "".ljust(LCD_WIDTH-9) + ' ' + '{:0>2d}'.format(rbl.time1) + ' ' + '{:0>2d}'.format(rbl.time2) + ' ' + '{:0>2d}'.format(rbl.time3)

    if not rbl.type == "ptMetro":
        if rbl.trafficjam:
            lcdLine4 = "Stau in Zufahrt"
        else:
            lcdLine4 = "kein Stau"
    else:
        lcdLine4 = ""

    lcd_string(lcdLine1,LCD_LINE_1)
    lcd_string(lcdLine2,LCD_LINE_2)
    lcd_string(lcdLine3,LCD_LINE_3)
    lcd_string(lcdLine4,LCD_LINE_4)

    #print(lcdLine1 + '\n' + lcdLine2+ '\n' + lcdLine3+ '\n' + lcdLine4)

def usage():
    print('usage: ' + __file__ + ' [-h] [-t time] -k apikey rbl [rbl ...]\n')
    print('arguments:')
    print('  -k, --key=\tAPI key')
    print('  rbl\t\tRBL number\n')
    print('optional arguments:')
    print('  -h, --help\tshow this help')
    print('  -t, --time=\ttime between station updates in seconds, default 10')

if __name__ == "__main__":
    main(sys.argv[1:])

最佳答案

I personally think I tried everything I found that is possible in Python3...encode, decode, ... Every time either the sharp s or the umlaut ü is failing.

正如评论中所述,根据您看到的错误消息,您似乎正在运行 Python 2。

Python 2 有两种“字符串”类型,包含原始字节的 str 和包含 unicode 字符的 unicode 。当您调用 .json() 时,您将返回一个包含 unicode 字符串的数据结构。所以 line['name'] 就是这样一个 unicode 字符串。

当您调用 str(line['name']) 时,您隐式要求将 unicode 字符串编码为< em>ASCII 字节。这会失败,因为 ASCII 无法表示这些字符。不幸的是,我不知道你为什么要在这里这样做。 rbl.name 是否需要是 str ?它用在哪里?使用它的其他代码预计采用什么编码?

在评论中,Jorropo 建议编写 line['name'].decode("utf-8") 您指出这也不起作用。这是因为对 unicode 字符串进行de编码并没有什么意义,但Python 2无论如何都会先尝试en编码在按照您的要求尝试以 UTF-8 进行解码之前,先以 ASCII 格式(失败)。

您的修复将取决于您对 rbl.name 所做的操作。你可能会:

  1. 直接使用unicode字符串即可。 rbl.name = line['name'] 这要求后续代码需要一个 unicode 字符串。
  2. 将其编码为 UTF-8 字节。 rbl.name = line['name'].encode('utf-8') 这要求后续代码需要 UTF-8 字节序列。

无论哪种方式,当您尝试其中任何一种方法时,都有可能(甚至可能)其他东西随后会中断,这完全取决于代码的其余部分对 rbl.name 所做的假设应该是什么以及它是如何编码的。

至于为什么它与 u'Westbahnstraße' 配合使用,我无法确定。您能否提供一个完整的示例,包括演示一个工作而另一个不工作的输入数据?

关于python - Python3 中德语元音变音的编码/解码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59346377/

相关文章:

django - 哪个 Django/Python 处理程序类会将日志传递到 UWSGI 记录器?

使用 ElementTree 和请求进行 XML 解析

python - 在 tensorflow 中的急切执行训练期间修复变量的一部分

python - OpenStack sdk 不会使用 keystone v3?

mysql - 使用sqlalchemy同时执行多条sql语句

java - 使用字符串数组(JAVA)计算文件的字数

string - 如何在字符串的字符之间添加空格

javascript - 合并 2 个数组变化

python - 在 PyOpt 库中使用 SLSQP 求解器时出现类型错误

python - TensorFlow 打印输入张量?