我有一个数据库,其中包含以 28 种语言出版的赞美诗集的赞美诗标题。当我输入标题时,我研究了每种语言中哪些 unicode 字符最正确(例如,汤加声门塞音应该是 U+02BB,尽管它看起来像一个撇号;此外,在罗马尼亚语中,U+021A (ş)比 U+0163 (ţ) 等更正确。
现在我正在做一个类似的项目,我想返回并“分解”我的研究,方法是用一种语言收集所有标题并输出标题中使用的所有独特字符的列表。
有没有办法用 MySQL 和/或 Python 做到这一点?我在想在每个字符之间拆分一个字符串,对所有字符进行排序并将它们组合在一起的东西。我的网站是用 Python 编写的,但都是非常基本的编码(我还不太高级)。
编辑:这就是我的代码最终的结果,感谢这些回复,它运行良好!
hymnstitleslist = lookup('''
SELECT HyName FROM Hymns
WHERE HymnbookID = "'''+hbid+'''"
''')
import string
from collections import Counter
some_text = ""
for x in range(0, len(hymnstitleslist)):
some_text = some_text+hymnstitleslist[x]['HyName']
letters = []
for i in some_text:
letters.append(i)
letter_count = Counter(letters)
for letter,count in letter_count.iteritems():
print "{}: {}".format(letter,count)
最佳答案
I'm thinking something that splits a string between every character, orders all the characters, and groups them together.
这部分很容易完成:
import string
from collections import Counter
some_text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque fringilla augue ac metus laoreet quis imperdiet velit congue. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Quisque tincidunt lorem ut justo fermentum adipiscing. Nullam ullamcorper eros in arcu tincidunt non scelerisque ligula molestie. Vestibulum imperdiet facilisis nisi, et sodales leo sodales at. In hac habitasse platea dictumst."
letters = []
for i in some_text:
# Each "i" is a letter or space
if i in string.letters:
# only collect letters, not punctuation marks or spaces
letters.append(i)
# count how many of each
letter_count = Counter(letters)
# For each letter, print the count:
for letter,count in letter_count.iteritems():
print "{}: {}".format(letter,count)
这会给你:
C: 1
I: 1
L: 1
N: 1
Q: 1
P: 1
V: 2
a: 24
c: 19
b: 5
e: 44
d: 10
g: 6
f: 4
i: 44
h: 2
j: 1
m: 17
l: 27
o: 17
n: 18
q: 4
p: 10
s: 32
r: 19
u: 34
t: 31
v: 1
I'm pulling from a MySQL table, so my data is in a dictionary. How can I combine data from all selected entries?
好吧,第一步是将所有数据收集到某种集合中,比如说一个列表:
letters = []
cur.execute(some_query) # See the Python database API for what is going on here
results = cur.fetchone()
while results:
the_text = results[0] # if its the first column
for i in the_text.split():
# By default, split() will separate on whitespace,
# so each i is a word.
for letter in i:
if letter in string.letters:
letters.append(letter)
results = cur.fetchone() # get the next result
关于python - 在 MySQL 表中生成唯一字符列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15240321/