python - USPS Package Track API 不返回 TrackSummary 的 XML 子元素

标签 python xml http usps

临时解决办法见文末

摘要(为澄清起见于 12/24/22 添加):

USPS 的跟踪 API 未返回与其文档格式相同的响应。由于没有 EventDate XML 元素,实际格式使得提取事件日期变得困难。最坏的情况是,我可以使用正则表达式,但想知道是否有一种方法可以接收 USPS 文档中显示的 API 响应。

详情

USPS 的 Track and Confirm API documentation第 19 页,示例响应显示 <TrackSummary>带有子元素( <EventTime>, <EventDate> 等):

Screenshot of USPS's sample response

以下是 USPS 的文本回复示例:

<TrackResponse>
 <TrackInfo ID=" XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ">
 <GuaranteedDeliveryDate>June 24, 2022</GuaranteedDeliveryDate>
 <TrackSummary>
 <EventTime>9:00 am</EventTime>
 <EventDate>June 22, 2022</EventDate>
 <Event>Delivered, To Agent</Event>
 <EventCity>AMARILLO</EventCity>
 <EventState>TX</EventState>
 <EventZIPCode>79109</EventZIPCode>
 <EventCountry/>
 <FirmName/>
 <Name>RXXXXXX XXXXXXX</Name>
 <AuthorizedAgent>false</AuthorizedAgent>
 <DeliveryAttributeCode>23</DeliveryAttributeCode>
 <GMT>14:00:00</GMT>
 <GMTOffset>-05:00</GMTOffset>
 </TrackSummary>

但是,在执行调用时,实际的 XML 响应在 TrackSummary 中缺少这些子元素:

<?xml version="1.0" encoding="UTF-8"?>
<TrackResponse>
    <TrackInfo ID="9405511206213782679396">
        <TrackSummary>Your item departed our WEST PALM BEACH FL DISTRIBUTION CENTER destination facility on December 23, 2022 at 12:40 pm. The item is currently in transit to the destination.</TrackSummary>
        <TrackDetail>Arrived at USPS Regional Facility, December 23, 2022, 4:49 am, WEST PALM BEACH FL DISTRIBUTION CENTER</TrackDetail>
        <TrackDetail>In Transit to Next Facility, 12/22/2022, 9:41 pm</TrackDetail>
        <TrackDetail>In Transit to Next Facility, 12/22/2022, 1:36 pm</TrackDetail>
        <TrackDetail>Departed USPS Facility, 12/22/2022, 5:58 am, HARRISBURG, PA 17112</TrackDetail>
        <TrackDetail>Arrived at USPS Regional Origin Facility, 12/21/2022, 10:12 pm, HARRISBURG PA PACKAGE SORTING CENTER</TrackDetail>
        <TrackDetail>Departed Post Office, December 21, 2022, 4:34 pm, DALLASTOWN, PA 17313</TrackDetail>
        <TrackDetail>USPS picked up item, December 21, 2022, 2:37 pm, DALLASTOWN, PA 17313</TrackDetail>
        <TrackDetail>Shipping Label Created, USPS Awaiting Item, December 21, 2022, 2:16 pm, DALLASTOWN, PA 17313</TrackDetail>
    </TrackInfo>
</TrackResponse>

这可以用 Lob's USPS Postman workspace 复制

我要解决的问题是从 TrackSummary 数据中获取日期,现在需要正则表达式,因为 USPS 的 API 不返回 EventDate 子元素。

请求返回这些有用的 XML 子元素时是否有选项?我在文档中找不到,我看到的示例响应都包含这些子元素。

我已经尝试使用 Python 和 Lob 的 USPS 工作区形成请求,并且两个 XML 响应都缺少 TrackSummary 子元素。

长期解决方案(22 年 12 月 26 日进行中)

@Parfait 指出我应该使用 Package Tracking “Fields” API 而不是 Package Track API。

这是我目前使用 Package Track API 形成 XML 请求的方式:

from lxml import etree

def generate_url_tracking(tracking_numbers: list[str]) -> str:
    """generate the USPS tracking request url
    :param: tracking_numbers - list of strings of tracking numbers
    :return url: str tracking url for calling the USPS API
    """
    xml = generate_xml_tracking(tracking_numbers)
    url = f"{base_url}{url_vars['track']}{xml}"
    return url

def generate_xml_tracking(tracking_numbers: list[str]) -> str:
    """
    Generate USPS track and confirm API xml
    :param tracking_numbers: list of strings of tracking numbers
    :return: xml string
    """
    xml = etree.Element("TrackRequest", {"USERID": config("USPS_USER")})
    # loop through tracking numbers
    for tracking in tracking_numbers:
        etree.SubElement(xml, "TrackID", {"ID": tracking})
    xml_string = etree.tostring(xml, encoding="utf8", method="xml").decode()
    return xml_string

我会在有时间时将其更新为包裹跟踪“字段”API 请求。

临时解决方案 (12/25/22)

在 USPS 的实际响应与其 API 文档匹配之前,此解决方案从 <TrackSummary> 中提取最后更新日期针对几种不同的状态(发货前、已交付、RTS 等)

TRACK_SUMMARIES 字典具有不同的测试状态。一些没有日期的状态(no_info、out_for_delivery_no_date)返回 None。

import re
from dateutil.parser import ParserError, parse

TRACK_SUMMARIES = {
    "delivered": """Your
     item was delivered in or at the mailbox at 10:23 am on December 24, 2022 in HOBE SOUND, FL 33455.""",
    "out_for_delivery": "Out for Delivery, December 13, 2021, 6:10 am, ARLINGTON, VA 22204.",
    "out_for_delivery_no_date": "Out for Delivery, Expected Delivery Between 9:45am and 1:45pm",
    "arrived_at_post_office": """Arrived at Post Office,
     Arrived at USPS Regional Origin Facility, December 11, 2021, 9:23 pm, HARRISBURG PA PACKAGE SORTING CENTER""",
    "acceptance": "Acceptance, December 10, 2021, 12:54 pm, DALLASTOWN, PA 17313",
    "pre_shipment": "Pre-Shipment Info Sent to USPS, USPS Awaiting Item, December 27, 2021",
    "rts": """Your item was returned to the sender on January 31, 2022 at 9:14 am in YORK, PA 17402
     because of an incorrect address.""",
    "no_info": "The Postal Service could not locate the tracking information for your request",
    "label_prepared": "A shipping label has been prepared for your item at 10:47 am on December 16, 2021 in WINSTON",
    "forwarded": """Your item was forwarded to a different address at 5:13 pm on January 4, 2022
        in REDDING, CA. This was because of forwarding instructions or because the
        address or ZIP Code on the label was incorrect.
        """,
}

def get_last_updated(track_summary: str) -> Optional[datetime]:
    """Takes the USPS TrackSummary string and return the last updated datetime"""
    # remove the zip code since it interferes with the date parser
    track_summary = re.sub(r"\d{5}", "", track_summary)
    months_regex = "January|February|March|April|May|June|July|August|September|October|November|December"
    first_result = re.search(rf"(?={months_regex}).*", track_summary)
    # return early if there's no Month
    if not first_result:
        return
    first_result = first_result.group()
    # some summaries have am/pm and some don't
    result_for_parser = re.search(r".*(?<=am|pm)", first_result)
    if result_for_parser:
        result_for_parser = result_for_parser.group()
    else:
        result_for_parser = first_result
    try:
        # fuzzy parsing is required for dates in certain summaries
        result = parse(result_for_parser, fuzzy=True)
    except ParserError:
        return
    return result

来源:

Using the dateutil parser Regex for finding months

最佳答案

xml.etree.ElementTree通过 XPath

找到 child 是件好事

它为在树中定位元素的 XPath 表达式提供了有限的支持。但它足以找到 TrackSummary 数据

找到顶级的“TrackSummary” child

root.find(".//TrackSummary").text ->
Your item departed our WEST PALM BEACH FL DISTRIBUTION CENTER destination facility on December 23, 2022 at 12:40 pm. The item is currently in transit to the destination.

这个 python 演示

import xml.etree.ElementTree as ET
import datetime

document = """\
<?xml version="1.0" encoding="UTF-8"?>
<TrackResponse>
    <TrackInfo ID="9405511206213782679396">
        <TrackSummary>Your item departed our WEST PALM BEACH FL DISTRIBUTION CENTER destination facility on December 23, 2022 at 12:40 pm. The item is currently in transit to the destination.</TrackSummary>
        <TrackDetail>Arrived at USPS Regional Facility, December 23, 2022, 4:49 am, WEST PALM BEACH FL DISTRIBUTION CENTER</TrackDetail>
        <TrackDetail>In Transit to Next Facility, 12/22/2022, 9:41 pm</TrackDetail>
        <TrackDetail>In Transit to Next Facility, 12/22/2022, 1:36 pm</TrackDetail>
        <TrackDetail>Departed USPS Facility, 12/22/2022, 5:58 am, HARRISBURG, PA 17112</TrackDetail>
        <TrackDetail>Arrived at USPS Regional Origin Facility, 12/21/2022, 10:12 pm, HARRISBURG PA PACKAGE SORTING CENTER</TrackDetail>
        <TrackDetail>Departed Post Office, December 21, 2022, 4:34 pm, DALLASTOWN, PA 17313</TrackDetail>
        <TrackDetail>USPS picked up item, December 21, 2022, 2:37 pm, DALLASTOWN, PA 17313</TrackDetail>
        <TrackDetail>Shipping Label Created, USPS Awaiting Item, December 21, 2022, 2:16 pm, DALLASTOWN, PA 17313</TrackDetail>
    </TrackInfo>
</TrackResponse>
"""

def find_between( s, first, last ):
    try:
        start = s.index( first ) + len( first )
        end = s.index( last, start )
        return s[start:end]
    except ValueError:
        return ""

root = ET.fromstring(document)

date_time_obj = datetime.datetime.strptime(find_between(root.find(".//TrackSummary").text,' on ', '.'), '%B %d' + ", " + '%Y at %I:%M %p')
print('Date:', date_time_obj.date())
print('Time:', date_time_obj.time())
print('Date-time:', date_time_obj)

结果

$ python track-summary.py
Date: 2022-12-23
Time: 12:40:00
Date-time: 2022-12-23 12:40:00

更新了 Reg 表达式解析

基于您针对临时解决方案 (12/25/22) 的更新问题 我用 import re library 添加了解析部分。

代码

import re
import numpy as np
from datetime import date, time, datetime

def get_date(date_string):
    months = np.array(['January','February','March','April','May','June','July','August','September','October','November','December'])
    pattern = re.compile(r'(January|February|March|April|May|June|July|August|September|October|November|December)\s(\d{2}|\d{1})\,\s(\d{4})')
    match = re.search(pattern, date_string)
    if not match:
        d = None
    else:
        month_data = match.groups()[0]
        month = np.where(months==month_data)[0][0] + 1
        day = int(match.groups()[1])
        year = int(match.groups()[2])
        try:
            d = date(year, month, day)
        except ValueError:
            d = None  # or handle error in a different way
    return d

def get_hour_min(hour, min, am_pm):
    hour = int(hour)
    min = int(min)
    add_hour = 0
    if (am_pm == 'pm'):
        if (hour != 12):
            add_hour = 12
    return [hour+add_hour,  min]

def get_time(date_string):
    pattern = re.compile(r'(\d{2}|\d{1})\:(\d{2})\s*(am|pm)')
    matches = re.findall(pattern, date_string)
    if (len(matches) == 2):
        hour, min = get_hour_min(matches[0][0], matches[0][1], matches[0][2])
        start_t = time(hour, min, 0)
        hour, min = get_hour_min(matches[1][0], matches[1][1], matches[1][2])
        end_t = time(hour, min, 0)
        return [start_t, end_t]

    match = re.search(pattern, date_string)
    if not match:
        t = None
    else:
        hour, min = get_hour_min(match.groups()[0], match.groups()[1], match.groups()[2])
        try:
            t = time(hour, min, 0)
        except ValueError:
            t = None  # or handle error in a different way
    return [t, None]

TRACK_SUMMARIES = {
    "delivered": """Your
     item was delivered in or at the mailbox at 10:23 am on December 24, 2022 in HOBE SOUND, FL 33455.""",
    "out_for_delivery": "Out for Delivery, December 13, 2021, 6:10 am, ARLINGTON, VA 22204.",
    "out_for_delivery_no_date": "Out for Delivery, Expected Delivery Between 9:45am and 1:45pm",
    "arrived_at_post_office": """Arrived at Post Office,
     Arrived at USPS Regional Origin Facility, December 11, 2021, 9:23 pm, HARRISBURG PA PACKAGE SORTING CENTER""",
    "acceptance": "Acceptance, December 10, 2021, 12:54 pm, DALLASTOWN, PA 17313",
    "pre_shipment": "Pre-Shipment Info Sent to USPS, USPS Awaiting Item, December 27, 2021",
    "rts": """Your item was returned to the sender on January 31, 2022 at 9:14 am in YORK, PA 17402
     because of an incorrect address.""",
    "no_info": "The Postal Service could not locate the tracking information for your request",
    "label_prepared": "A shipping label has been prepared for your item at 10:47 am on December 16, 2021 in WINSTON",
    "forwarded": """Your item was forwarded to a different address at 5:13 pm on January 4, 2022
        in REDDING, CA. This was because of forwarding instructions or because the
        address or ZIP Code on the label was incorrect.
        """,
}

tracks = {}
# parsing and tuple list by key ( example : delivered, out_for_delivery and so on )
for key in TRACK_SUMMARIES:
    value = TRACK_SUMMARIES[key].replace("\n", "")
    found_date = get_date(value)
    start_time, end_time = get_time(value)
    tracks[key] = [ found_date, start_time, end_time, value ]
    # print(key, '->', value)
    # if (found_date != None):
    #     print('found date: ' + found_date.strftime("%m/%d/%Y"))
    # if (start_time != None):
    #     if(end_time == None):
    #         print('time: ' + start_time.strftime("%H:%M:%S"))
    #     else:
    #         print('start time: ' + start_time.strftime("%H:%M:%S") + ' end time: ' + end_time.strftime("%H:%M:%S"))
    # print('=========================================================================')

# decoding from tuple list by key ( tracks['delivered'], tracks['out_for_delivery'] and so on )
for key in tracks.keys():
    found_date, start_time, end_time, value = tracks[key]
    
    found_date = found_date.strftime("%m/%d/%Y") if found_date != None else None
    start_time = start_time.strftime("%H:%M:%S") if start_time != None else None
    end_time = end_time.strftime("%H:%M:%S") if end_time != None else None

    print(value)
    print(key)
    if (found_date != None):
        print('found date: ' + found_date)
    if (start_time != None):
        if(end_time == None):
            print('time: ' + start_time)
        else:
            print('start time: ' + start_time + ' end time: ' + end_time)
    print('------------------------------------------------------------------------')

结果

$ python reg-express.py
Your     item was delivered in or at the mailbox at 10:23 am on December 24, 2022 in HOBE SOUND, FL 33455.
delivered
found date: 12/24/2022
time: 10:23:00
------------------------------------------------------------------------
Out for Delivery, December 13, 2021, 6:10 am, ARLINGTON, VA 22204.
out_for_delivery
found date: 12/13/2021
time: 06:10:00
------------------------------------------------------------------------
Out for Delivery, Expected Delivery Between 9:45am and 1:45pm
out_for_delivery_no_date
start time: 09:45:00 end time: 13:45:00
------------------------------------------------------------------------
Arrived at Post Office,     Arrived at USPS Regional Origin Facility, December 11, 2021, 9:23 pm, HARRISBURG PA PACKAGE SORTING CENTER
arrived_at_post_office
found date: 12/11/2021
time: 21:23:00
------------------------------------------------------------------------
Acceptance, December 10, 2021, 12:54 pm, DALLASTOWN, PA 17313
acceptance
found date: 12/10/2021
time: 12:54:00
------------------------------------------------------------------------
Pre-Shipment Info Sent to USPS, USPS Awaiting Item, December 27, 2021
pre_shipment
found date: 12/27/2021
------------------------------------------------------------------------
Your item was returned to the sender on January 31, 2022 at 9:14 am in YORK, PA 17402     because of an incorrect address.
rts
found date: 01/31/2022
time: 09:14:00
------------------------------------------------------------------------
The Postal Service could not locate the tracking information for your request
no_info
------------------------------------------------------------------------
A shipping label has been prepared for your item at 10:47 am on December 16, 2021 in WINSTON
label_prepared
found date: 12/16/2021
time: 10:47:00
------------------------------------------------------------------------
Your item was forwarded to a different address at 5:13 pm on January 4, 2022        in REDDING, CA. This was because of forwarding instructions or because the        address or ZIP Code on the label was incorrect.
forwarded
found date: 01/04/2022
time: 17:13:00
------------------------------------------------------------------------

日期/时间模式

我从您的 TRACK_SUMMARIES 字典中提取数据。 这是时间和日期模式,有些行没有日期,有些行之间有时间。

10:23 am on December 24, 2022
December 13, 2021, 6:10 am
Between 9:45am and 1:45pm
December 10, 2021, 12:54 pm
December 27, 2021
January 31, 2022 at 9:14 am
at 10:47 am on December 16, 2021
at 5:13 pm on January 4, 2022

日期解析

(January|February|March|April|May|June|July|August|September|October|November|December)\s(\d{2}|\d{1})\,\s(\d{4})

enter image description here

enter image description here 与组匹配的项目 - 它在代码中使用。

enter image description here

时间解析

(\d{2}|\d{1})\:(\d{2})\s*(am|pm)

enter image description here

enter image description here

将项目与组匹配 - 它在代码中使用。

enter image description here

引用资料

Find string between two substrings

Converting Strings Using datetime

Regexper

regular expression 101

关于python - USPS Package Track API 不返回 TrackSummary 的 XML 子元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74902976/

相关文章:

将 .CSV 文件转换为 .XML 的 PHP 脚本

javascript - 如何在服务器端回答ajax请求?

python - 使用 getattr() 而不返回,出现以下错误 : getattr(): attribute name must be string

python - 使用 selenium2library 设置自动下载首选项配置文件

python - 每隔一定时间重复一个声音/ Action

xml - XSLT - 多次替换字符串

python - 类型错误 : unsupported operand type(s) for +: 'int' and 'NoneType' return length

java - JBOSS 6.2 - 新的缺失/不满足的依赖项(对于 mysql 驱动程序)

Angular 6 - HTTP 拦截器和 net::ERR_TIMED_OUT

http - swift NSURLConnectction 不工作