python - 将列表的列表转换为字典

标签 python list dictionary

我有数据文件,看起来像这样,

["Arts & Entertainment", "Arts & Entertainment / Animation & Comics", "Arts & Entertainment / Books & Literature", "Arts & Entertainment / Celebrity/Gossip", "Arts & Entertainment / Fine Art", "Arts & Entertainment / Humor", "Arts & Entertainment / Movies", "Arts & Entertainment / Movies / Action", "Arts & Entertainment / Movies / Comedy", "Arts & Entertainment / Movies / Documentary", "Arts & Entertainment / Movies / Drama", "Arts & Entertainment / Movies / Horror", "Arts & Entertainment / Music", "Arts & Entertainment / Music / Alternative Music", "Arts & Entertainment / Music / Blues", "Arts & Entertainment / Music / Christian Music", "Arts & Entertainment / Music / Classic Rock", "Arts & Entertainment / Music / Classical Music", "Arts & Entertainment / Music / Country Music", "Arts & Entertainment / Music / Electronic Dance Music", "Arts & Entertainment / Music / Heavy Metal", "Arts & Entertainment / Music / Pop Music", "Arts & Entertainment / Music / Rap", "Arts & Entertainment / Radio Stations", "Arts & Entertainment / Television", "Arts & Entertainment / Television / Game Show", "Arts & Entertainment / Television / Kids", "Arts & Entertainment / Television / News", "Arts & Entertainment / Television / Reality", "Arts & Entertainment / Television / Science", "Arts & Entertainment / Television / Sitcom", "Arts & Entertainment / Television / Soap Opera", "Arts & Entertainment / Television / Talk Show", "Autos", "Autos / 4-Wheel Drive/SUVs", "Autos / Buying/Selling Cars", "Autos / Certified Pre-Owned", "Autos / Convertible", "Autos / Coupe", "Autos / Crossover", "Autos / Diesel", "Autos / Electric Vehicles", "Autos / Hatchback", "Autos / Hybrid", "Autos / Luxury", "Autos / Maintenance", "Autos / Maintenance / Parts", "Autos / Maintenance / Repair", "Autos / MiniVan", "Autos / Motorcycles", "Autos / Off-Road Vehicles", "Autos / Road-Side Assistance", "Autos / Sedan", "Autos / Trucks", "Autos / Trucks / Pickup", "Autos / Vintage Cars", "Autos / Wagon", "Business & Industry", "Business & Industry / Advertising", "Business & Industry / Agriculture", "Business & Industry / Biotech/Biomedical", "Business & Industry / Business Software", "Business & Industry / Construction", "Business & Industry / Construction / Composites & Plastics", "Business & Industry / Forestry", "Business & Industry / Government", "Business & Industry / Green Solutions", "Business & Industry / Human Resources", "Business & Industry / Logistics", "Business & Industry / Marketing", "Business & Industry / Metals", "Business & Industry / Non-Profit Organizations", "Business & Industry / Power Industry", "Business & Industry / Public Services", "Business & Industry / Public Services / Emergency Services", "Business & Industry / Public Services / Waste Management", "Business & Industry / Purchasing", "Business & Industry / Retail Industry", "Business & Industry / Small Business", "Business & Industry / Telecom", "Career", "Career / Career Planning", "Career / Job Search", "Career / Job Search / Resume Writing/Advice", "Career / Telecommuting", "Career / U.S. Military", "Education", "Education / Business School", "Education / College Education", "Education / College Education / Admissions", "Education / College Education / College Life", "Education / Continuing Education", "Education / Distance Learning", "Education / Financial Aid", "Education / Financial Aid / Scholarships", "Education / Graduate School", "Education / Homeschooling", "Education / Language Learning", "Education / Language Learning / English as a 2nd Language", "Education / Primary Education", "Education / Secondary Education", "Education / Special Education", "Finance & Money", "Finance & Money / Credit/Debt & Loans", "Finance & Money / Day Trading", "Finance & Money / Exchange Traded Funds", "Finance & Money / Financial News", "Finance & Money / Financial Planning", "Finance & Money / Financial Planning / Retirement Planning", "Finance & Money / Financial Planning / Tax Planning", "Finance & Money / Foreign Exchange Trading", "Finance & Money / Hedge Fund", "Finance & Money / Insurance", "Finance & Money / Investing", "Finance & Money / Mutual Funds", "Finance & Money / Options", "Finance & Money / Stocks", "Food & Drink", "Food & Drink / Barbecues & Grilling", "Food & Drink / Beverages", "Food & Drink / Beverages / Cocktails/Beer", "Food & Drink / Beverages / Coffee/Tea", "Food & Drink / Beverages / Wine", "Food & Drink / Cuisine-Specific", "Food & Drink / Cuisine-Specific / American Cusine", "Food & Drink / Cuisine-Specific / Cajun/Creole", "Food & Drink / Cuisine-Specific / Chinese Cuisine", "Food & Drink / Cuisine-Specific / French Cuisine", "Food & Drink / Cuisine-Specific / Italian Food", "Food & Drink / Cuisine-Specific / Japanese Food", "Food & Drink / Cuisine-Specific / Mexican Cuisine", "Food & Drink / Desserts & Baking", "Food & Drink / Health/LowFat Cooking", "Food & Drink / Organic Food", "Food & Drink / Vegetarian", "Health & Fitness", "Health & Fitness / A.D.D.", "Health & Fitness / AIDS/HIV", "Health & Fitness / Allergies", "Health & Fitness / Alternative Medicine", "Health & Fitness / Alzheimer\\'s Disease", "Health & Fitness / Arthritis", "Health & Fitness / Asthma", "Health & Fitness / Autism/PDD", "Health & Fitness / Bipolar Disorder", "Health & Fitness / Brain Tumor", "Health & Fitness / Cancer", "Health & Fitness / Cancer / Breast Cancer", "Health & Fitness / Cancer / Lung Cancer", "Health & Fitness / Cancer / Prostate Cancer", "Health & Fitness / Cholesterol", "Health & Fitness / Chronic Fatigue Syndrome", "Health & Fitness / Chronic Obstructive Pulmonary Disease", "Health & Fitness / Chronic Pain", "Health & Fitness / Cold & Flu", "Health & Fitness / Deafness", "Health & Fitness / Dental Care", "Health & Fitness / Depression", "Health & Fitness / Dermatology", "Health & Fitness / Diabetes", "Health & Fitness / Epilepsy", "Health & Fitness / Exercise", "Health & Fitness / GERD/Acid Reflux", "Health & Fitness / Headaches/Migraines", "Health & Fitness / Heart Disease", "Health & Fitness / Heart Disease / Women\\'s Heart Disease", "Health & Fitness / Hepatitis", "Health & Fitness / Herbs for Health", "Health & Fitness / Holistic Healing", "Health & Fitness / Hypertension", "Health & Fitness / IBS/Crohn\\'s Disease", "Health & Fitness / Incest/Abuse Support", "Health & Fitness / Incontinence", "Health & Fitness / Infertility", "Health & Fitness / Men\\'s Health", "Health & Fitness / Nursing", "Health & Fitness / Nutrition", "Health & Fitness / Orthopedics", "Health & Fitness / Orthopedics / Sports Medicine", "Health & Fitness / Panic/Anxiety Disorders", "Health & Fitness / Pediatrics", "Health & Fitness / Pharmaceutical", "Health & Fitness / Physical Therapy", "Health & Fitness / Psychology/Psychiatry", "Health & Fitness / Senior Health", "Health & Fitness / Sexuality", "Health & Fitness / Sleep Disorders", "Health & Fitness / Smoking Cessation", "Health & Fitness / Substance Abuse", "Health & Fitness / Substance Abuse / Alcoholism", "Health & Fitness / Thyroid Disease", "Health & Fitness / Weight Loss", "Health & Fitness / Women\\'s Health", "Hobbies & Games", "Hobbies & Games / Arts & Crafts", "Hobbies & Games / Arts & Crafts / Beadwork", "Hobbies & Games / Arts & Crafts / Drawing/Sketching", "Hobbies & Games / Arts & Crafts / Needlework", "Hobbies & Games / Arts & Crafts / Painting", "Hobbies & Games / Arts & Crafts / Photography", "Hobbies & Games / Arts & Crafts / Woodworking", "Hobbies & Games / Astrology", "Hobbies & Games / Birdwatching", "Hobbies & Games / BoardGames/Puzzles", "Hobbies & Games / Candle & Soap Making", "Hobbies & Games / Card Games", "Hobbies & Games / Chess", "Hobbies & Games / Cigars", "Hobbies & Games / Collecting", "Hobbies & Games / Collecting / Antiques", "Hobbies & Games / Collecting / Book Collecting", "Hobbies & Games / Collecting / Miniatures", "Hobbies & Games / Collecting / Stamps & Coins", "Hobbies & Games / Creative Writing", "Hobbies & Games / Getting Published", "Hobbies & Games / Home Recording", "Hobbies & Games / Inventors & Patents", "Hobbies & Games / Learning a Musical Instrument", "Hobbies & Games / Learning a Musical Instrument / Guitar", "Hobbies & Games / Magic & Illusion", "Hobbies & Games / Paranormal Phenomena", "Hobbies & Games / Sci-Fi & Fantasy", "Hobbies & Games / Video Games", "Hobbies & Games / Video Games / Nintendo", "Hobbies & Games / Video Games / PSP", "Hobbies & Games / Video Games / Playstation", "Hobbies & Games / Video Games / RPG", "Hobbies & Games / Video Games / Racing", "Hobbies & Games / Video Games / X-Box", "Home & Garden", "Home & Garden / Appliances", "Home & Garden / Environmental Safety", "Home & Garden / Gardening/Landscaping", "Home & Garden / Home Repair", "Home & Garden / Interior Decorating", "News & Current Affairs", "News & Current Affairs / Law & Politics", "News & Current Affairs / Law & Politics / Immigration", "News & Current Affairs / Law & Politics / Legal Issues", "News & Current Affairs / Law & Politics / U.S. Government Resources", "Parenting & Family", "Parenting & Family / Adoption", "Parenting & Family / Babies & Toddlers", "Parenting & Family / Daycare/Pre-School", "Parenting & Family / Parenting Children", "Parenting & Family / Parenting Teens", "Parenting & Family / Pregnancy", "Parenting & Family / Special Needs Kids", "Pets", "Pets / Aquariums", "Pets / Cats", "Pets / Dogs", "Pets / Veterinary Medicine", "Real Estate", "Real Estate / Apartments", "Real Estate / Architecture", "Real Estate / Buying/Selling Homes", "Religion", "Religion / Alternative Religions", "Religion / Atheism/Agnosticism", "Religion / Buddhism", "Religion / Catholicism", "Religion / Christianity", "Religion / Hinduism", "Religion / Islam", "Religion / Judaism", "Religion / Latter-Day Saints", "Religion / Pagan/Wiccan", "Science", "Science / Astronomy", "Science / Biology", "Science / Chemistry", "Science / Geology", "Science / Physics", "Sensitive Content", "Sensitive Content / Gambling", "Sensitive Content / Gambling / Sports Gambling", "Society", "Society / Dating", "Society / Divorce", "Society / Gay Life", "Society / Marriage", "Society / Senior Living", "Society / Weddings", "Sports & Recreation", "Sports & Recreation / Auto Racing", "Sports & Recreation / Auto Racing / NASCAR Racing", "Sports & Recreation / Baseball", "Sports & Recreation / Basketball", "Sports & Recreation / Bicycling", "Sports & Recreation / Bicycling / Mountain Biking", "Sports & Recreation / Bodybuilding", "Sports & Recreation / Boxing", "Sports & Recreation / Canoeing/Kayaking", "Sports & Recreation / Cheerleading", "Sports & Recreation / Climbing", "Sports & Recreation / College Sports", "Sports & Recreation / Cricket", "Sports & Recreation / Figure Skating", "Sports & Recreation / Fishing", "Sports & Recreation / Fishing / Fly Fishing", "Sports & Recreation / Fishing / Freshwater Fishing", "Sports & Recreation / Fishing / Game & Fish", "Sports & Recreation / Fishing / Saltwater Fishing", "Sports & Recreation / Football", "Sports & Recreation / Golf", "Sports & Recreation / Horses", "Sports & Recreation / Horses / Horse Racing", "Sports & Recreation / Hunting/Shooting", "Sports & Recreation / Ice Hockey", "Sports & Recreation / Inline Skating", "Sports & Recreation / Martial Arts", "Sports & Recreation / Olympics", "Sports & Recreation / Paintball", "Sports & Recreation / Rodeo", "Sports & Recreation / Rugby", "Sports & Recreation / Running/Walking", "Sports & Recreation / Sailing", "Sports & Recreation / Scuba Diving", "Sports & Recreation / Skateboarding", "Sports & Recreation / Skiing", "Sports & Recreation / Snowboarding", "Sports & Recreation / Soccer", "Sports & Recreation / Surfing/Bodyboarding", "Sports & Recreation / Swimming", "Sports & Recreation / Table Tennis/Ping-Pong", "Sports & Recreation / Tennis", "Sports & Recreation / Volleyball", "Sports & Recreation / Waterski/Wakeboard", "Sports & Recreation / Yachting", "Style & Fashion", "Style & Fashion / Body Art", "Style & Fashion / Cosmetics", "Style & Fashion / Fashion", "Style & Fashion / Jewelry", "Technology & Computing", "Technology & Computing / Cameras & Camcorders", "Technology & Computing / Cell Phones", "Technology & Computing / Computer Certification", "Technology & Computing / Computer Networking", "Technology & Computing / Computer Peripherals", "Technology & Computing / Computer Security", "Technology & Computing / Computer Security / Antivirus Software", "Technology & Computing / Computer Security / Network Security", "Technology & Computing / Databases", "Technology & Computing / Graphics", "Technology & Computing / Graphics / 3-D Graphics", "Technology & Computing / Graphics / Animation", "Technology & Computing / Graphics / Desktop Publishing", "Technology & Computing / Graphics / Desktop Video", "Technology & Computing / Graphics / Web Design/HTML", "Technology & Computing / Home Theater Systems", "Technology & Computing / Operating Systems", "Technology & Computing / Operating Systems / Linux", "Technology & Computing / Operating Systems / Mac OS", "Technology & Computing / Operating Systems / Unix", "Technology & Computing / Operating Systems / Windows", "Technology & Computing / Portable Device", "Technology & Computing / Programming", "Technology & Computing / Programming / C/C++", "Technology & Computing / Programming / Java", "Technology & Computing / Programming / JavaScript", "Technology & Computing / Programming / Visual Basic", "Travel", "Travel / Adventure Travel", "Travel / Africa", "Travel / Air Travel", "Travel / Asia", "Travel / Asia / Japan", "Travel / Australia & New Zealand", "Travel / Bed & Breakfasts", "Travel / Budget Travel", "Travel / Business Travel", "Travel / Camping", "Travel / Canada", "Travel / Caribbean", "Travel / Cruises", "Travel / Europe", "Travel / Europe / Eastern Europe", "Travel / Europe / France", "Travel / Europe / Greece", "Travel / Europe / Italy", "Travel / Europe / United Kingdom", "Travel / Honeymoons/Getaways", "Travel / Hotels", "Travel / Mexico & Central America", "Travel / National Parks", "Travel / South America", "Travel / Spas", "Travel / Theme Parks", "Travel / United States", "Travel / United States / California", "Travel / United States / Florida", "Travel / United States / Hawaii", "Travel / United States / Las Vegas, Nevada", "Travel / United States / Manhattan, New York", "Travel / United States / New England", "Travel / United States / Texas", "Travel / Weather"]

我清理了数据文件并拆分了它,所以它看起来像这样,

['Arts & Entertainment']
['Arts & Entertainment', 'Animation & Comics']
['Arts & Entertainment', 'Books & Literature']
['Arts & Entertainment', 'Celebrity Gossip']
['Arts & Entertainment', 'Fine Art']
['Arts & Entertainment', 'Humor']
['Arts & Entertainment', 'Movies']
['Arts & Entertainment', 'Movies', 'Action']
['Arts & Entertainment', 'Movies', 'Comedy']
['Arts & Entertainment', 'Movies', 'Documentary']
['Arts & Entertainment', 'Movies', 'Drama']
['Arts & Entertainment', 'Movies', 'Horror']
['Arts & Entertainment', 'Music']
['Arts & Entertainment', 'Music', 'Alternative Music']
['Arts & Entertainment', 'Music', 'Blues']
['Arts & Entertainment', 'Music', 'Christian Music']
['Arts & Entertainment', 'Music', 'Classic Rock']
['Arts & Entertainment', 'Music', 'Classical Music']
['Arts & Entertainment', 'Music', 'Country Music']
['Arts & Entertainment', 'Music', 'Electronic Dance Music']
['Arts & Entertainment', 'Music', 'Heavy Metal']
['Arts & Entertainment', 'Music', 'Pop Music']
['Arts & Entertainment', 'Music', 'Rap']
['Arts & Entertainment', 'Radio Stations']
['Arts & Entertainment', 'Television']
['Arts & Entertainment', 'Television', 'Game Show']
['Arts & Entertainment', 'Television', 'Kids']
['Arts & Entertainment', 'Television', 'News']
['Arts & Entertainment', 'Television', 'Reality']
['Arts & Entertainment', 'Television', 'Science']
['Arts & Entertainment', 'Television', 'Sitcom']
['Arts & Entertainment', 'Television', 'Soap Opera']
['Arts & Entertainment', 'Television', 'Talk Show']...

现在,我正在尝试将列表对象转换成如下所示的字典,

{
    "Arts & Entertainment": {
        "Animation & Comics": {}, 
        "Books & Literature": {}, 
        "Celebrity Gossip": {}, 
        "Fine Art": {}, 
        "Humor": {}, 
        "Movies": {
            "Horror": {},
            "Action": {},
            "Comedy": {}, ...
        }, ...
}

问题是我不知道如何不覆盖我的子类别,在上面的例子中,电影子键有三个类别,但是当我运行我的代码时,它下面只有键其中包含“恐怖”,这是因为恐怖是该类别中最后一个列表的最后一个元素中的最后一个元素。 我得到的例子:

{
    "Arts & Entertainment": {
        "Animation & Comics": {}, 
        "Books & Literature": {}, 
        "Celebrity Gossip": {}, 
        "Fine Art": {}, 
        "Humor": {}, 
        "Movies": {
            "Horror": {} # notice there are no other categories in the movies section
        }, ...
}

我试过的代码:

def cleanup_contextweb():
  contextweb_file_path = directory_path + raw_file_names[1]
  tree = {}
  with open(contextweb_file_path, 'r') as contextweb_file:
    cats = contextweb_file.read().replace('Manhattan, New York', 'Manhattan New York').replace('Las Vegas, Nevada', 'Las Vegas Nevada').replace('Celebrity/Gossip', 'Celebrity Gossip').replace('Atheism/Agnosticism', 'Atheism Agnosticism').replace('Pagan/Wiccan', 'Pagan Wiccan').split(',')
    #cats = re.sub(r'"|\[|\]', '', cats)
    cats = [map(str.strip, re.sub(r'"|\[|\]', '', cat).split('/')) for cat in cats]
    cats = sorted(cats)
    for cat in cats:
      if len(cat) == 1:
        tree[cat[0]] = {}
      elif len(cat) == 2:
        tree[cat[0]][cat[1]] = {}
      elif len(cat) == 3:
        tree[cat[0]][cat[1]] = {}
        tree[cat[0]][cat[1]][cat[2]] = {}
      elif len(cat) == 4:
        tree[cat[0]][cat[1]] = {}
        tree[cat[0]][cat[1]][cat[2]] = {}
        tree[cat[0]][cat[1]][cat[2]][cat[3]] = {}
  with open(directory_path + 'cleaned_' + raw_file_names[1], 'w') as contextweb_file_out:
    json.dump(tree, contextweb_file_out, sort_keys=True, indent=4)

  return json.dumps(tree, sort_keys=True, indent=4)

正如您将看到的,我正在尝试构建字典,我知道我基于传入的列表的长度有多深(我需要多少键)。其他事情,我已经尝试过,但已删除, include,根据子列表的长度对列表列表 (cats) 进行排序并将其反转,以便首先迭代所有包含 4 个元素的列表。我想我可以那样构建 key ,因为 key 将存在于较低级别。这并没有真正帮助。

最佳答案

实际上,for 循环也可以产生非常好的解决方案:

>>> data
[['a', 'b', 'c', 'd'], ['a', 'b', 'c'], ['a', 's', 'd'], ['a', 'b', 'c', 'd', 'e']]
>>> tree = {}
>>> for cats in data:
...      curtree = tree
...      for c in cats:
...          curtree = curtree.setdefault(c, {})
... 
>>> tree
{'a': {'s': {'d': {}}, 'b': {'c': {'d': {'e': {}}}}}}

.setdefault() 方法确保当且仅当键(类别)之前不存在时才添加子词典。

curtree 从基本字典 tree 开始,并使用类别遍历/构建树。

关于python - 将列表的列表转换为字典,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33221020/

相关文章:

python - 如何将数据附加到json文件?

python - Pyglet 在更快的计算机上运行得更慢

dictionary - 为什么如果实现了 hashCode 方法,在 Dictionary 数据类型中的键的情况下也必须实现 equals 方法?

dictionary - 在 map 中反向查找

Python + Selenium,GeckoDriver 打开 Firefox 浏览器然后什么都不做

python - 多个循环创建绘图的问题,出现错误

python - Python中的成对叉积

python - 如何定义一个元素属于其他类别?

C# 给定所需的顺序需要一个节省空间的列表重新排序或排序

excel - 如何在Excel VBA中将字典设置为字典值?