我正在编写一个使用质量可疑数据的 Python 脚本。数据存储在 SQLite 数据库中。
我想要一种紧凑的方式来指定对数据的约束。约束有两种类型:
- 数据错误 - 将发出一条错误消息。
- “A列必须是0-10范围内的整数”
- “B列必须是非空字符串”,等等。
- 数据质量警告 - “你确定这是正确的吗?”将发出警告信息。约束是这样的
- “如果 C 列的默认值为 0,则发出警告”- 您确定打字员没有遗漏任何条目吗?
- “如果 D 列中的数字异常大 (> 1000),则发出警告”。
理想情况下,我喜欢以人类可读的格式表达我的约束,例如:
'kV' MUST BE float IN RANGE 0-10
'Rating' SHOULD NOT BE DEFAULT 1.0
'Description' SHOULD NOT BE DEFAULT ""
...但我将对我当前的方法进行任何改进(如下)。我很乐意接受涉及在 Python 或 SQLite 模式中强制执行约束的解决方案。
这是我目前正在使用的:
def is_number_in_range(number, expected_type, lower, upper):
if type(number) != expected_type:
return "not an %s" % expected_type
elif ((number < lower) or (number > upper)):
return "%s out of range [%i-%i]." % (expected_type, upper, lower)
else:
return "OK"
def not_default (value, expected_type, default_value):
if type(value) != expected_type:
return "not an %s" % expected_type
elif value == default_value:
return "default value of %s - make sure this is what you want." % default_value
else:
return "OK"
def Check_Cable_Lib(db_conn):
res = db_conn.execute("SELECT * FROM Lib_Cable LIMIT 1")
constraints = (
('kV', lambda x: is_number_in_range(x, float, 0, 1000) ),
('kA1', lambda x: is_number_in_range (x, float, 0, 10) ),
('kA1', lambda x: not_default(x, float, 1.0))
)
for cable_type in res:
for constraint in constraints:
constraint_variable = constraint[0]
constraint_data = cable_type[constraint_variable]
constraint_function = constraint[1]
validation_message = constraint_function(constraint_data)
print ("%(constraint_variable)s = %(constraint_data)s : %(validation_message)s" % locals())
stage1_db_path = "stage1.sqlite3";
db_conn = sqlite3.connect(stage1_db_path)
db_conn.row_factory = sqlite3.Row
Check_Cable_Lib(db_conn)
示例输出:
kV = 11.0 : OK
kA1 = 1.0 : OK
kA1 = 1.0 : default value of 1.0 - make sure this is what you want.
编辑:我知道在 Python 中显式检查类型是不礼貌的。然而,为了使用数据的代码,我需要检查 SQLite 是否没有在列中存储意外的东西(“hello world”在 INT 列中,等等)记住数据质量可疑,SQLite 会愉快地将任何类型的数据放在任何列中。捕获这些类型的数据输入错误是此代码的目标之一。
最佳答案
以下文章可能会引起您的兴趣:
Verbalizing Business Rules by Terry Halpin
Alethic rules impose necessities, which cannot, even in principle, be violated by the business, typically because of some physical or logical law. For example: each employee was born on at most one date; no product is a component of itself. Deontic rules impose obligations, which may be violated, ven though they ought not. For example: it is obligatory that each employee is married to at most one person; no smoking is permitted in any office.
从 SQL 的角度来看,编写查询以返回违反规则的数据,例如
SELECT *
FROM T
WHERE Column_A < 0
然后测试每条规则是否为空集。看看使它们变得细化,例如对 Column_A < 0
进行单独测试和 Column_A > 10
分别。
关于python - 检查来自 SQLite 的数据是否满足 "number in range x-y"等约束,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9408920/