python - 打印出文件中以字母表中的每个字母开头的单词的第一次出现

标签 python bash awk grep

我有一个包含形容词 A - Z 列表的文件

如何打印第一个以 A 开头的单词,然后是第一个以 B 开头的单词...一直打印到 Z?
我认为 grep 可能是方式。但是对其他人开放,awk,python...其他。

一些示例输出:

$ cat adjectives.txt | head
Adamant: unyielding; a very hard substance
Adroit: clever, resourceful
Amatory: sexual
Animistic: quality of recurrence or reversion to earlier form
Antic: clownish, frolicsome
Arcadian: serene
Baleful: deadly, foreboding
Bellicose: quarrelsome (its synonym belligerent can also be a noun)
Bilious: unpleasant, peevish
Boorish: crude, insensitive

$ cat adjectives.txt | grep '^[ABCDE]' | head
Adamant: unyielding; a very hard substance
Adroit: clever, resourceful
Amatory: sexual
Animistic: quality of recurrence or reversion to earlier form
Antic: clownish, frolicsome
Arcadian: serene
Baleful: deadly, foreboding
Bellicose: quarrelsome (its synonym belligerent can also be a noun)
Bilious: unpleasant, peevish
Boorish: crude, insensitive

所以我的示例输出将是:

Adamant: unyielding; a very hard substance
Baleful: deadly, foreboding
...
Irksome: annoying
Jejune: dull, puerile
...
Wheedling: flattering
Zealous: eager, devoted

完整文件来自 here

$ cat adjectives.txt
Adamant: unyielding; a very hard substance
Adroit: clever, resourceful
Amatory: sexual
Animistic: quality of recurrence or reversion to earlier form
Antic: clownish, frolicsome
Arcadian: serene
Baleful: deadly, foreboding
Bellicose: quarrelsome (its synonym belligerent can also be a noun)
Bilious: unpleasant, peevish
Boorish: crude, insensitive
Calamitous: disastrous
Caustic: corrosive, sarcastic; a corrosive substance
Cerulean: sky blue
Comely: attractive
Concomitant: accompanying
Contumacious: rebellious
Corpulent: obese
Crapulous: immoderate in appetite
Defamatory: maliciously misrepresenting
Didactic: conveying information or moral instruction
Dilatory: causing delay, tardy
Dowdy: shabby, old-fashioned; an unkempt woman
Efficacious: producing a desired effect
Effulgent: brilliantly radiant
Egregious: conspicuous, flagrant
Endemic: prevalent, native, peculiar to an area
Equanimous: even, balanced
Execrable: wretched, detestable
Fastidious: meticulous, overly delicate
Feckless: weak, irresponsible
Fecund: prolific, inventive
Friable: brittle
Fulsome: abundant, overdone, effusive
Garrulous: wordy, talkative
Guileless: naive
Gustatory: having to do with taste or eating
Heuristic: learning through trial-and-error or problem solving
Histrionic: affected, theatrical
Hubristic: proud, excessively self-confident
Incendiary: inflammatory, spontaneously combustible, hot
Insidious: subtle, seductive, treacherous
Insolent: impudent, contemptuous
Intransigent: uncompromising
Inveterate: habitual, persistent
Invidious: resentful, envious, obnoxious
Irksome: annoying
Jejune: dull, puerile
Jocular: jesting, playful
Judicious: discreet
Lachrymose: tearful
Limpid: simple, transparent, serene
Loquacious: talkative
Luminous: clear, shining
Mannered: artificial, stilted
Mendacious: deceptive
Meretricious: whorish, superficially appealing, pretentious
Minatory: menacing
Mordant: biting, incisive, pungent
Munificent: lavish, generous
Nefarious: wicked
Noxious: harmful, corrupting
Obtuse: blunt, stupid
Parsimonious: frugal, restrained
Pendulous: suspended, indecisive
Pernicious: injurious, deadly
Pervasive: widespread
Petulant: rude, ill humored
Platitudinous: resembling or full of dull or banal comments
Precipitate: steep, speedy
Propitious: auspicious, advantageous, benevolent
Puckish: impish
Querulous: cranky, whining
Quiescent: inactive, untroublesome
Rebarbative: irritating, repellent
Recalcitrant: resistant, obstinate
Redolent: aromatic, evocative
Rhadamanthine: harshly strict
Risible: laughable
Ruminative: contemplative
Sagacious: wise, discerning
Salubrious: healthful
Sartorial: relating to attire, especially tailored fashions
Sclerotic: hardening
Serpentine: snake-like, winding, tempting or wily
Spasmodic: having to do with or resembling a spasm, excitable,
intermittent
Strident: harsh, discordant; obtrusively loud
Taciturn: closemouthed, reticent
Tenacious: persistent, cohesive,
Tremulous: nervous, trembling, timid, sensitive
Trenchant: sharp, penetrating, distinct
Turbulent: restless, tempestuous
Turgid: swollen, pompous
Ubiquitous: pervasive, widespread
Uxorious: inordinately affectionate or compliant with a wife
Verdant: green, unripe
Voluble: glib, given to speaking
Voracious: ravenous, insatiable
Wheedling: flattering
Withering: devastating
Zealous: eager, devoted

最佳答案

awk来救援!

$ awk '!a[tolower(substr($0,1,1))]++' file

这为每个初始字符创建一个计数器,并且仅在计数为零(即第一个实例)时打印。 tolower()有没有让它不区分大小写,如果不需要你可以删除。 substr($0,1,1)从行中提取第一个字符。有一个隐式循环将对输入文件的所有行重复此操作。

通过稍微改变脚本

$ awk '++a[substr($0,1,1)]==2' file  

您可以获得第二条记录(如果存在)或使用 <3而不是 ==2前 2 条记录。

如果您的文件已经排序并且大小写一致,您可以选择更简单的脚本

$ uniq -w1 file

uniq命令提取比较值​​的第一个实例,此处仅限于第一个字符。因此,它将立即提取所有字母中的第一个。添加-i如果大小写不一致,则忽略大小写标志。

扫描文件一次就够了,不需要多次扫描...

关于python - 打印出文件中以字母表中的每个字母开头的单词的第一次出现,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47400218/

相关文章:

regex - 从 shell 脚本中将文件部分提取到变量

linux - 使用 iwatch 监视目录更改 - 获取新文件名而不是完整路径?

sed - 如何用特定值 "XYX"替换 csv 文件的第二列

python - 在 Python 中遍历列表列表中的列

python - 使用 Matplotlib 在点击图像之间画线

bash - 将输出写入日志文件和控制台

linux - 打开和编辑文件对另一个文件也有效

linux - awk/sed : find string and return its previous index

尝试在 CentOS 上以守护程序模式配置 Django/mod_wsgi 应用程序时,python-home 选项失败

python - 使用 Python 在具有 JSON 列的表上测试 Postgresql 查询