python - 从列表中的元素中提取 url

标签 python pandas for-loop twitter

我有一个列表 json_response,其中包含 Twitter 数据(包括图像 URL)。我正在尝试从 ['includes']['media'] 对象中提取 url 。但是,列表中的大多数元素没有 ['media'] 我认为这会导致循环失败。运行代码我得到 KeyError: 'media' 即使我在循环中 row['image_url'] = None 会考虑没有 [ 的列表元素'媒体']

我提供了 json_response 的示例。然而,由于 Stackoverflows 对发布 URL 的限制,实际 URL 已被替换

print(json.dumps(json_response[10:13], indent=4, sort_keys=True))  # look at json_response object.

[
    {
        "data": [
            {
                "author_id": "125700232",
                "created_at": "2021-12-31T07:13:04.000Z",
                "id": "1476813641265549317",
                "text": "You can\u2019t be a democrat or a liberal or progressive & besties with racists who radicalize people like this. \n\nI\u2019ve never publicly named him but since he blocked me years ago for holding him accountable, maybe I will."
            },
            {
                "author_id": "800464894361382912",
                "created_at": "2021-12-27T12:17:25.000Z",
                "id": "1475440681258737673",
                "text": "For $9 an hour, I was told to kill myself over a confusing sale sign, I'd been called worthless and stupid weekly. I've had things thrown at me, been spat on. A customer blocked me from coming on a bus so I couldn't go home. If people were kind to begin with, more would \"show up\""
            },
            {
                "attachments": {
                    "media_keys": [
                        "3_1474448924249407490"
                    ]
                },
                "author_id": "1390363055150845959",
                "created_at": "2021-12-24T18:36:32.000Z",
                "id": "1474448926891782149",
                "text": "Blocked by China boy. Spy banging snowflake @RepSwalwell"
            },
            {
                "author_id": "196428643",
                "created_at": "2021-12-21T22:22:15.000Z",
                "id": "1473418564430229505",
                "text": "I replied to an Eric Swalwell lame tweet with a Fang Fang reference yesterday and he blocked me.  Then suddenly my account was hacked and my account linked email was changed from a Manhattan ISP.  I don't think it was a coincidence."
            },
            {
                "attachments": {
                    "media_keys": [
                        "3_1462187451292819458",
                        "3_1462187494385065994"
                    ]
                },
                "author_id": "25871358",
                "created_at": "2021-11-20T22:40:05.000Z",
                "id": "1462189029919805450",
                "text": "Pearl clutch elsewhere about @RepSwalwell unfollowing you, when I told you you were a lying gaslighting jackwagon and you blocked me. Truth hurts."
            },
            {
                "author_id": "1251510910390337536",
                "created_at": "2021-10-30T01:40:32.000Z",
                "id": "1454261909759406086",
                "text": "Eric Swalwell blocked me tonight \ud83d\ude02"
            },
            {
                "author_id": "15790644",
                "created_at": "2021-07-23T20:11:58.000Z",
                "id": "1418665211221925889",
                "text": "Twitter won't allow me to follow anyone.\n\nAlso, tried to retweet Eric Swalwell's tweet and it blocked me.  And other tweets...\n\nGuess I'm like a mosquito buzzing around the head of  Jack."
            },
            {
                "attachments": {
                    "media_keys": [
                        "3_1411309517317586945"
                    ]
                },
                "author_id": "107575508",
                "created_at": "2021-07-03T13:03:05.000Z",
                "id": "1411309521251745796",
                "text": "This tweet was blocked by Twitter for retweets and quotes. In summary a team member of Eric Swalwell illegally entered Mo Brooks home to serve papers and assaulted Brooks wife. There is security camera footage. Papers being serve claim Brooks caused Jan. 6 \u2018riot\u2019."
            },
            {
                "author_id": "26182604",
                "created_at": "2021-06-07T03:58:01.000Z",
                "id": "1401750267121524738",
                "text": "I can't @ him because my words hurt his feeling and he blocked me. LOL! CNN : Democratic Rep. Eric Swalwell's suit seeks to hold Brooks, ex-President Trump and others liable for the January 6 attack."
            },
            {
                "author_id": "258617217",
                "created_at": "2021-04-06T20:37:13.000Z",
                "id": "1379533675772186630",
                "text": "George Webb Blocked me.\nI guess because I pointed out that in one of his books he connected a lady  FANG FANG from Wuhan as the same Fang Fang CCP agent that tried to seduce Eric Swalwell, \n\n2 different ladies.\nThat wasn't nice Mr. Webb"
            }
        ],
        "includes": {
            "media": [
                {
                    "media_key": "3_1474448924249407490",
                    "type": "photo",
                    "url": "url here"
                },
                {
                    "media_key": "3_1462187451292819458",
                    "type": "photo",
                    "url": "url here"
                },
                {
                    "media_key": "3_1462187494385065994",
                    "type": "photo",
                    "url": "url here"
                },
                {
                    "media_key": "3_1411309517317586945",
                    "type": "photo",
                    "url": "url here"
                }
            ],
            "users": [
                {
                    "created_at": "2010-03-23T15:50:53.000Z",
                    "description": "founder of Melanated Mingle|licensed psychotherapist|psych prof|Latin\u00e8 Ph.D|chingona",
                    "id": "125700232",
                    "name": "Dr. Lisa Xochitl Vallejos, Ph.D., LPC",
                    "username": "realdocv"
                },
                {
                    "created_at": "2016-11-20T22:24:37.000Z",
                    "description": "31, she/her",
                    "id": "800464894361382912",
                    "name": "Fluke \ud83d\udc99",
                    "username": "flukefancy"
                },
                {
                    "created_at": "2021-05-06T17:49:30.000Z",
                    "description": "",
                    "id": "1390363055150845959",
                    "name": "kbark",
                    "username": "kbark23500486"
                },
                {
                    "created_at": "2010-09-29T02:29:17.000Z",
                    "description": "",
                    "id": "196428643",
                    "name": "Bird Dog \u00d3 S\u00failleabh\u00e1in",
                    "username": "AntiqueSully"
                },
                {
                    "created_at": "2009-03-22T20:12:14.000Z",
                    "description": "I don't know what a Hoosier is, either.",
                    "id": "25871358",
                    "name": "Misty",
                    "username": "mialynneb"
                },
                {
                    "created_at": "2020-04-18T14:00:37.000Z",
                    "description": "@Rachlsbored @KittyKattKeee @Rebeccahansonn @BasedHabits @bxtchbabyy @Sexxcel @lizhomlesvoice  @psychoness_xo @VOLTRON4444 @jessiprincey @MartinaMarkota: CEO",
                    "id": "1251510910390337536",
                    "name": "\u1587\u15e9\u15ea\u15ea\u01b3\u26a1\ufe0f\u26a1\ufe0f",
                    "username": "_RadicalReality"
                },
                {
                    "created_at": "2008-08-09T17:27:58.000Z",
                    "description": "Really. There are conservatives in New York! #MAGA",
                    "id": "15790644",
                    "name": "SueDinNY",
                    "username": "SueDinNY"
                },
                {
                    "created_at": "2010-01-23T01:24:09.000Z",
                    "description": "Army Vet, served in M.I. Unit. If you disagree with me, it\u2019s because you haven\u2019t seen what I\u2019ve seen. Proud Supporter of @realDonaldTrump #maga",
                    "id": "107575508",
                    "name": "Margaret Briem",
                    "username": "LivethLifeULove"
                },
                {
                    "created_at": "2009-03-24T05:09:49.000Z",
                    "description": "Dad. IT guy. Linux geek. TTRPG fan. Critter. Drone Pilot. Sometimes I go outside. (he/him)",
                    "id": "26182604",
                    "name": "Wayne Edgar",
                    "username": "zerovertex"
                },
                {
                    "created_at": "2011-02-28T03:06:07.000Z",
                    "description": "Author-Film Maker-Researcher-Artist-Peace Seeker\n",
                    "id": "258617217",
                    "name": "F\u04e8\u042fBIDD\u03a3\u041f FI\u1102\u03a3\u01a7 \u01acV",
                    "username": "TMV_intel"
                }
            ]
        },
        "meta": {
            "newest_id": "1476813641265549317",
            "next_token": "b26v89c19zqg8o3fosqt4kos8ff8dfq3on3e08qcqvngd",
            "oldest_id": "1379533675772186630",
            "result_count": 10
        }
    },
    {
        "data": [
            {
                "attachments": {
                    "media_keys": [
                        "3_1311261760222101505"
                    ]
                },
                "author_id": "395236271",
                "created_at": "2020-09-30T11:10:19.000Z",
                "id": "1311262093992132610",
                "text": "Steve Knight didn't like it when I pointed out that his \"Trump didn't say nazis are fine people\" run against all visual evidence we have on the Unite the Right rally, and so the \"Free Speech Champion\" blocked me.\n\nFucking snowflake."
            }
        ],
        "includes": {
            "media": [
                {
                    "media_key": "3_1311261760222101505",
                    "type": "photo",
                    "url": "url here"
                }
            ],
            "users": [
                {
                    "created_at": "2011-10-21T10:49:03.000Z",
                    "description": "Harmless but a bit insane.",
                    "id": "395236271",
                    "name": "Lu\u00eds Dias",
                    "username": "lmldias"
                }
            ]
        },
        "meta": {
            "newest_id": "1311262093992132610",
            "next_token": "b26v89c19zqg8o3fn0mljncu0v5ci7xlbm3agsunyikxp",
            "oldest_id": "1311262093992132610",
            "result_count": 1
        }
    },
    {
        "data": [
            {
                "attachments": {
                    "media_keys": [
                        "3_1471578541368221703"
                    ]
                },
                "author_id": "1442527297773326344",
                "created_at": "2021-12-16T20:30:40.000Z",
                "id": "1471578543385677830",
                "text": "Hahaha @NancyPelosi @SpeakerPelosi staff has blocked me from tweeting to them! Why are they so afraid of the truth?"
            },
            {
                "attachments": {
                    "media_keys": [
                        "3_1469091211038404613"
                    ]
                },
                "author_id": "864826449601019905",
                "created_at": "2021-12-09T23:48:02.000Z",
                "id": "1469091500264935424",
                "text": "I'm blocked by Elizabeth Warren, Nancy Pelosi and now Karlyn. Interesting pattern. ;)"
            },
            {
                "attachments": {
                    "media_keys": [
                        "3_1465403503354990595"
                    ]
                },
                "author_id": "1083551928821448710",
                "created_at": "2021-11-29T19:33:16.000Z",
                "id": "1465403505045393418",
                "text": "I just realized that I've been blocked by Nancy Pelosi's daughter \ud83d\ude02"
            },
            {
                "attachments": {
                    "media_keys": [
                        "3_1462066354568273930"
                    ]
                },
                "author_id": "844569319409405958",
                "created_at": "2021-11-20T14:32:40.000Z",
                "id": "1462066368921100293",
                "text": "Tried to tag Drunk Nancy Pelosi. She Blocked me. Or shall I say her assistant blocked me. LMAO. They don\u2019t want the truth out. I don\u2019t care who she is. I front her out."
            },
            {
                "author_id": "3921070047",
                "created_at": "2021-11-03T04:26:18.000Z",
                "id": "1455753176322347009",
                "text": "\"Nancy Pelosi is not going to change your lifestyle, I can, but you've blocked me and hald of mules...\""
            },
            {
                "author_id": "345120618",
                "created_at": "2021-10-03T02:04:28.000Z",
                "id": "1444483458164670467",
                "text": "Blocked by Nancy Pelosi? I'm jealous."
            },
            {
                "author_id": "1227381497277095937",
                "created_at": "2021-10-03T00:49:28.000Z",
                "id": "1444464586506350595",
                "text": "Gosh. I can only dream of being blocked by a trash receptacle like Nancy Pelosi. What a badge of honor \ud83c\udf96 it would  be. I'll just have to keep trying.\ud83d\ude0e\ud83c\uddfa\ud83c\uddf8"
            },
            {
                "attachments": {
                    "media_keys": [
                        "3_1444377454018138115"
                    ]
                },
                "author_id": "918169011602386944",
                "created_at": "2021-10-02T19:03:40.000Z",
                "id": "1444377560385658880",
                "text": "Anybody else blocked by Nancy Pelosi? \n\nI thought it was illegal for government people to block us?"
            },
            {
                "author_id": "783746267222462464",
                "created_at": "2021-08-09T21:23:09.000Z",
                "id": "1424843718042001411",
                "text": "\" This page has been blocked by Microsoft Edge\"\n--\nSidney Powell Discusses the FBI & Nancy Pelosi\u2019s Role In The January 6th FALSE FLAG"
            },
            {
                "author_id": "158064102",
                "created_at": "2021-08-08T12:52:20.000Z",
                "id": "1424352780777508864",
                "text": "Nancy Pelosi's daughter blocked me?? sweet old little me!!"
            },
            {
                "author_id": "9484732",
                "created_at": "2021-08-02T18:30:09.000Z",
                "id": "1422263466396733441",
                "text": "Democratic leadership didn't have the votes for an extension of the eviction moratorium and were blocked by Republicans from attempting to get around their internal divisions by passing a shorter-term extension through Oct. 18. via @siobhanehughes"
            },
            {
                "author_id": "1381073800624660484",
                "created_at": "2021-07-31T22:47:32.000Z",
                "id": "1421603462643535873",
                "text": "Joe Biden\n> is spending a lot on defense that could be used to create a debt free design\n> is hiding behind Nancy Pelosi and other women in his life \n> can cancel student debt\n> if he's being blocked by the DoD then he actually can't do it"
            },
            {
                "author_id": "1278119139601715201",
                "created_at": "2021-07-27T18:41:31.000Z",
                "id": "1420091998590099458",
                "text": "What?? I got blocked by her because I said victim blaming about Elise stefanik blaming Nancy pelosi for Jan 6th. I went to answer her and I\u2019m blocked. Ppl are seriously reactionary. Geez!"
            },
            {
                "author_id": "1367196589291364357",
                "created_at": "2021-07-16T13:58:01.000Z",
                "id": "1416034388400353281",
                "text": "They were blocked by Nancy Pelosi"
            },
            {
                "author_id": "3001635726",
                "created_at": "2021-07-09T04:06:48.000Z",
                "id": "1413348887855828997",
                "text": "Blocked by Nancy Pelosi who then staged her laptop to be stolen"
            },
            {
                "author_id": "19845473",
                "created_at": "2021-07-02T03:54:13.000Z",
                "id": "1410809005258256393",
                "text": "Fox News @ChadPergram blocked me.  Don't worry he didn't fail to ask Nancy Pelosi about 49ers. News."
            },
            {
                "author_id": "979513121541967873",
                "created_at": "2021-06-01T01:37:11.000Z",
                "id": "1399540499552378881",
                "text": "Unarmed Ashli Babbitt... Behind doors that were blocked by furniture.... what threat did she pose\u2049\ufe0f\nZero.... Zero... Zero Threat\u203c\ufe0f A scared, slimy POS backed by Nancy Pelosi took her life & has been protected\u203c\ufe0f"
            },
            {
                "author_id": "1394830598087249924",
                "created_at": "2021-05-31T17:56:40.000Z",
                "id": "1399424603605323783",
                "text": "Nancy Pelosi blocked me. Badge of honor"
            },
            {
                "author_id": "969989169186557953",
                "created_at": "2021-05-21T11:15:17.000Z",
                "id": "1395699716659286018",
                "text": "Nancy Pelosi\u2019s daughter blocked me on Twitter"
            },
            {
                "attachments": {
                    "media_keys": [
                        "3_1385809440830484481"
                    ]
                },
                "author_id": "803311702032850944",
                "created_at": "2021-04-24T04:14:52.000Z",
                "id": "1385809443015831557",
                "text": "Just realized that big @danrodimer blocked me. How can he stand up to Nancy pelosi when he can't even stand up to me posting his old campaign video? #txlege #TXpolitics "
            },
            {
                "author_id": "1213210549732855808",
                "created_at": "2021-04-11T08:26:15.000Z",
                "id": "1381161660593758212",
                "text": "thinking about the fact that on my old account I was blocked by Nancy Pelosi's daughter"
            },
            {
                "author_id": "2836412739",
                "created_at": "2021-03-29T21:30:42.000Z",
                "id": "1376648033430044679",
                "text": "Lol, corrupt scumbag Nancy Pelosi blocked me. #Corruption She doesn\u2019t want her sleepy followers to see the truth."
            },
            {
                "attachments": {
                    "media_keys": [
                        "3_1373636857561513987"
                    ]
                },
                "author_id": "969989169186557953",
                "created_at": "2021-03-21T14:05:23.000Z",
                "id": "1373636860635983873",
                "text": "Nancy Pelosi\u2019s daughter blocked me too. I honestly need to make a Hall of Fame for those who have blocked me"
            },
            {
                "author_id": "1169798579768180736",
                "created_at": "2021-03-18T00:50:21.000Z",
                "id": "1372349621033443329",
                "text": "FellowAMERICANS #BlackLivesMatter @NAACP_LDF #African #Muslim We #UMMAABroadcasting BLOCKED_by #Facebook #Gmail We_DEMAND #HumanRights of  Work_Class(80% #USA One_BillionAfrican #Blacks 2.5Billion #Muslims )& #JoeBiden #KamalaHarris #NancyPelosi @POTUS @VP @SpeakerPelosi MUST_ACT"
            }
        ],
        "includes": {
            "media": [
                {
                    "media_key": "3_1471578541368221703",
                    "type": "photo",
                    "url": "url here"
                },
                {
                    "media_key": "3_1469091211038404613",
                    "type": "photo",
                    "url": "url here"
                },
                {
                    "media_key": "3_1465403503354990595",
                    "type": "photo",
                    "url": "url here"
                },
                {
                    "media_key": "3_1462066354568273930",
                    "type": "photo",
                    "url": "url here"
                },
                {
                    "media_key": "3_1444377454018138115",
                    "type": "photo",
                    "url": "url here"
                },
                {
                    "media_key": "3_1385809440830484481",
                    "type": "photo",
                    "url": "url here"
                },
                {
                    "media_key": "3_1373636857561513987",
                    "type": "photo",
                    "url": "url here"
                }
            ],
            "users": [
                {
                    "created_at": "2021-09-27T16:31:37.000Z",
                    "description": "Don't Tread On Me! Trump 2024. Patriot, Anti-Socialist, Pro-1st & 2nd Amendment. Pro-FREEDOM. I AM MAGA! #IamMAGA\nMelting Snowflake Brains with my Salty Tweets!",
                    "id": "1442527297773326344",
                    "name": "Patriot USA \ud83c\uddfa\ud83c\uddf8",
                    "username": "I_am_MAGA_USA"
                },
                {
                    "created_at": "2017-05-17T12:54:28.000Z",
                    "description": "#LetsGoBrandon #FJB #WitchesForTrump #MagicalPersistence  #LibertariansForTrump  #PeaceLoveLiberty #PatriotPaganPride #Cult45",
                    "id": "864826449601019905",
                    "name": "The\u26e4Tower\u26e4Falls",
                    "username": "Gwenhwyfar7Aine"
                },
                {
                    "created_at": "2019-01-11T02:31:26.000Z",
                    "description": "GETTR - @TheJohnD \n\nGAB - @John_Deplorable",
                    "id": "1083551928821448710",
                    "name": "John D \u2022",
                    "username": "RedWingGrips"
                },
                {
                    "created_at": "2015-10-10T19:42:38.000Z",
                    "description": "\u3134\u3147\u3139 \u3142\u3137\u3145\u314c\u3134\u314c\u3139",
                    "id": "3921070047",
                    "name": "\u2728\ud83e\udd88\u2622\ud83d\udd1e",
                    "username": "Dystar924"
                },
                {
                    "created_at": "2011-07-30T02:54:55.000Z",
                    "description": "Living the good life in sunny Scottsdale, Arizona.",
                    "id": "345120618",
                    "name": "Bill Deegan",
                    "username": "RealBillDeegan"
                },
                {
                    "created_at": "2017-10-11T17:38:45.000Z",
                    "description": "Don't most of us rely on a single strand for happiness?\n\nAfter being a Single Mom & CFO, I was ready to LIVE!\n\nA drunk driver stole that \ud83d\udcaf",
                    "id": "918169011602386944",
                    "name": "Caren R \ud83c\uddfa\ud83c\uddf8\ud83c\uddee\ud83c\uddf1\ud83c\uddec\ud83c\udde7",
                    "username": "BritishCaren"
                },
                {
                    "created_at": "2015-01-15T17:32:12.000Z",
                    "description": "I did stuff in special education. I\u2019ll always defend the public schools. Progressive feminist and registered Democrat since before you were born.",
                    "id": "2984412230",
                    "name": "Kay D\u2019Antonio",
                    "username": "KayDA26"
                },
                {
                    "created_at": "2016-10-05T19:10:46.000Z",
                    "description": "GETTR handle: Murt32_1943 @murt32\n\nForever America First. Always MAGA\n\nAdjectives: Brilliant/Gorgeous \n\nSupports the LGBFJB community\n\nSorry, I don't do DM's.",
                    "id": "783746267222462464",
                    "name": "murt32\ud83c\uddfa\ud83c\uddf8 \ud83c\udf40",
                    "username": "murt32_1943"
                },
                {
                    "created_at": "2014-08-04T22:58:39.000Z",
                    "description": "Finance |Filmmaker\ud83c\udfac| 2A Advocate\ud83e\uddf9| Content Creator \ud83c\udf9e|Political Commentary\ud83d\udce1| Senior Director \u270f|Engineer | Humility is a journey we must all take.",
                    "id": "2733732880",
                    "name": "Somebody's Uncle",
                    "username": "Dariusr0berts"
                },
                {
                    "created_at": "2020-01-03T21:28:46.000Z",
                    "description": "18,Will buy Origami Angel Merch DM me!!!! (He/Him) Private // @SadHammyFan",
                    "id": "1213210549732855808",
                    "name": "Mess",
                    "username": "punk_matthew"
                },
                {
                    "created_at": "2014-10-18T19:41:25.000Z",
                    "description": "Musician, composer, luthier, digital warrior, Patriot, #MAGA\ud83c\uddfa\ud83c\uddf8\ud83c\uddfa\ud83c\uddf8\ud83c\uddfa\ud83c\uddf8 Q, #Trump 2020!, Save the children from the Peds!",
                    "id": "2836412739",
                    "name": "Truth Hurts",
                    "username": "TruthHurtu2"
                },
                {
                    "created_at": "2019-09-06T02:26:30.000Z",
                    "description": "JOURNALIST in MEMPHIS; Our WatsApp & 2Facebooks BLOCKED, by ENEMIES of our US Constitution.",
                    "id": "1169798579768180736",
                    "name": "Arshad Khan, UMMAA Broadcasting, Rolla, MO, USA",
                    "username": "arshad_usa"
                }
            ]
        },
        "meta": {
            "newest_id": "1471578543385677830",
            "next_token": "b26v89c19zqg8o3fosqrfh7sqsqc9rs7aukssfoknvuyl",
            "oldest_id": "1372349621033443329",
            "result_count": 36
        }
    }
]

应从 ['includes']['media'] 检索 URL 的代码

for each_dict in json_lite:

    row = {}  # empty dict for data

    # 3. loop for user object
    row['image_url'] = None  # assuming user has no image url
    for user in each_dict['includes']['media']:
        # 5. user url
        # check for url of the current user only
        if 'url' in user['url']:
            row['image_url'] = user.get('url')  # if user has url
            break  # break the loop, as url is found
            
    url_df = url_df.append(row, ignore_index=True)  # append data to empty url_df

最佳答案

不完全按照您要求的方式,但您可以考虑仅使用regex:

import re

urls = re.findall('"url": "([^"]*)"', json.dumps(data))                                                                                                                                                   

输出:

['url here',
 'url here',
 'url here',
 'url here',
 'url here',
 'url here',
 'url here',
 'url here',
 'url here',
 'url here',
 'url here',
 'url here']

关于python - 从列表中的元素中提取 url,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70870424/

相关文章:

python - 使用 Python 在 Excel 工作表内创建超链接时出错

python - 按字符串列中最后 3 个字符选择行

javascript - 在数组中,我怎样才能按照我希望打印的方式打印值而不是最后一个值?

python - 如何修复嵌套的 if/for 循环

c++ - 我在用 srand 在 C++ 中构建非重复随机数组时做错了什么?

python - 使用 python 从 HTML 中的类中打印所有文本

python - 正则表达式匹配 MAC 地址(不同的分隔符、格式等)

python - 将 Amazon S3 与 Heroku、Python 和 Flask 结合使用

python - 在 Pandas 中,如何查询列表?

python - 获得滚动百分位数排名的快速方法