python - 如何通过 python 保存 Google pdf 文件?

标签 python pdf urllib google-docs-api

requests 库可以完美地从 Google 文档(来自 How do you save a Google Sheets file as CSV from Python 3 (or 2)?)检索 csv 或 txt 文件

但是当我尝试对 google doc 中的 pdf 文件执行相同操作时,我只能获取 HTML 文件,有什么方法可以从 google doc 下载 pdf 文件吗?例如https://docs.google.com/file/d/0BxcsLDhZbUBBMWY1MzRkZGQtMjQxNC00NzQ3LWFmNzEtNzNmMzYzYmU2MDRj/edit

我尝试使用requester并且得到了这个:

>>> import requests # https://pypi.python.org/pypi/requests
>>> gdoc = 'https://docs.google.com/file/d/0BxcsLDhZbUBBMWY1MzRkZGQtMjQxNC00NzQ3LWFmNzEtNzNmMzYzYmU2MDRj/edit'
>>> print requests.get(gdoc).text

输出:

<!DOCTYPE html><html><head><meta name="google" content="notranslate"><meta http-equiv="X-UA-Compatible" content="IE=edge;"><meta name="fragment" content="!"><title>The Starfish Story (Translation in Navajo).pdf - Google Drive</title><style type="text/css">#gbar,#guser{font-size:13px;padding-right:8px;padding-top:4px !important;}#gbar{padding-left:8px;height:22px}#guser{padding-bottom:7px !important;text-align:right}.gbh,.gbd{border-top:1px solid #c9d7f1;font-size:1px}.gbh{height:0;position:absolute;top:24px;width:100%}@media all{.gb1{height:22px;margin-right:.5em;vertical-align:top}#gbar{float:left}}a.gb1,a.gb4{text-decoration:underline !important}a.gb1,a.gb4{color:#00c !important}.gbi .gb4{color:#dd8e27 !important}.gbf .gb4{color:#900 !important}</style><script>_docs_flag_initialData={"jobset":"prod","docs-aiiws":"docs_warm_sdf","info_params":{},"uls":"","icso":false,"docs_eoal":true,"docs_oogt":"NONE","docosEmbedApiJs":"\/\/docs.google.com\/comments\/d\/AAHRpnXu2c4T_cvcH9MyrCUHPNj25CBhn1z7azmidPK7l5vEFT86M59YW7kmm6hTnmTuic9OmYbD43mFsbHo7FXIzRxICAm6TFMaL7q9d34z6-gL59HUNgpG3DAaoty1Q1eA5v7R_WCJU\/api\/js?hl=de","docosUnreadCommentsEnabled":false,"docs-egc":true,"docs-chat_base_url":"talkgadget.google.com\/talkgadget\/","docs-chat_domain_rotation":true,"docs-ce":true,"docs-ut":2,"promo_url":"","promo_title":"","promo_title_prefix":"","promo_content_html":"","promo_element_id":"","promo_orientation":1,"promo_show_on_click":false,"promo_show_on_load":false,"show_promo":false,"docs-encp":false,"buildLabel":"texmex_2013-49-Thu_RC1","buildClNumber":"57718063","debugTask":"ve_32","docs-show_debug_info":false,"dcau":"https:\/\/chrome.google.com\/webstore\/detail\/apdfllckaahabafndbhieahigkjlhalf","ondlburl":"\/\/docs.google.com","drive_url":"\/\/drive.google.com","docs-sup":"\/file","docs-uptc":["lsrp","usp","urp","utm_source","utm_medium","utm_campaign","utm_term","utm_content"],"docs-cwsd":"","docs-al":[0,0,0,1,0]
,"docs-ndt":"Untitled Texmex","docs-eit":false,"docs-spfe":true,"docs-mriim":1800000,"docs-ecc":false,"docs-mnumea":false,"docs-ess":false,"ecbsl":true,"ecid":true,"eod":true,"docs-eilb":false,"docs-pedd":true,"docs-evr":true,"docs-eir":false,"docs-enmr":false,"docs-esrd":false,"share_ui":"jfk","server_time_ms":1387227430022,"gaia_session_id":"","enable_iframed_embed_api":true,"cup":"\/folder\/d\/{folderId}\/edit","docs-fut":"\/\/docs.google.com\/#folders\/{folderId}","esid":true,"esubid":false,"docs-etbs":true,"enable_kennedy":true,"onePickImportDocumentUrl":"","opbu":"https:\/\/docs.google.com\/picker","opru":"https:\/\/docs.google.com\/relay.html","opdu":false,"ophi":"texmex","opuci":"","docs-se":false,"docs-ebcrsct":false,"docs-iror":false,"xdbcmUri":"https:\/\/docs.google.com\/file\/xdbcm.html","xdbcfAllowXpc":true,"docs-corsbc":false,"xdbcfAllowHostNamePrefix":true,"docs-spdy":false,"enable_client_docos":true,"enable_anchored_docos":true,"enable_docos_tickle":true,"gv_int_native":true,"enable_a11y":true,"tpc":true,"enable_pinned_revisions":false,"enable_edit_blob_revisions":false,"upload_url":"https:\/\/docs.google.com\/upload\/resumableupload","enable_toolbar":true,"enable_feedback_button":false,"enable_microscope":true,"enable_manage_timed_text":true,"video_embed_type":"PREFER_FLASH","enable_maps_embed":true,"maps_api_uri":"https:\/\/maps.googleapis.com\/maps\/api\/js?key=AIzaSyBCjpnguVjzi6vS67NdBtyYuvCYz3yBxCY&sensor=false","maps_display_uri":"https:\/\/maps.google.com\/maps","docs_abuse_link":"https:\/\/docs.google.com\/abuse?id=0BxcsLDhZbUBBMWY1MzRkZGQtMjQxNC00NzQ3LWFmNzEtNzNmMzYzYmU2MDRj","enable_csi":true,"csi_service_name":"texmex","third_party_default_icon_urls":{"icon16":"\/\/ssl.gstatic.com\/docs\/doclist\/images\/generic_app_icon_16.png","icon32":"\/\/ssl.gstatic.com\/docs\/doclist\/images\/generic_app_icon_32.png","icon64":"\/\/ssl.gstatic.com\/docs\/doclist\/images\/generic_app_icon_64.png","icon128":"\/\/ssl.gstatic.com\/docs\/doclist\/images\/generic_app_icon_128.png"},"enable_chrome_webstore_link":true};</script><script type="text/javascript">(function(){(function(){function e(a){this.t={};this.tick=function(a,c,b){var d=void 0!=b?b:(new Date).getTime();this.t[a]=[d,c];if(void 0==b)try{window.console.timeStamp("CSI/"+a)}catch(e){}};this.tick("start",null,a)}var a;window.performance&&(a=window.performance.timing);var f=a?new e(a.responseStart):new e;window.jstiming={Timer:e,load:f};if(a){var c=a.navigationStart,d=a.responseStart;0<c&&d>=c&&(window.jstiming.srt=d-c)}if(a){var b=window.jstiming.load;0<c&&d>=c&&(b.tick("_wtsrt",void 0,c),b.tick("wtsrt_",
"_wtsrt",d),b.tick("tbsd_","wtsrt_"))}try{a=null,window.chrome&&window.chrome.csi&&(a=Math.floor(window.chrome.csi().pageT),b&&0<c&&(b.tick("_tbnd",void 0,window.chrome.csi().startE),b.tick("tbnd_","_tbnd",c))),null==a&&window.gtbExternal&&(a=window.gtbExternal.pageT()),null==a&&window.external&&(a=window.external.pageT,b&&0<c&&(b.tick("_tbnd",void 0,window.external.startE),b.tick("tbnd_","_tbnd",c))),a&&(window.jstiming.pt=a)}catch(g){}})();})();
</script><link rel="stylesheet" href="/static/file/client/css/1508097430-edit_css_ltr.css">
<link rel="shortcut icon" href="https://ssl.gstatic.com/docs/doclist/images/icon_11_pdf_favicon.ico"><link rel="chrome-webstore-item" href="https://chrome.google.com/webstore/detail/apdfllckaahabafndbhieahigkjlhalf"></head><body dir="ltr" role="application" onload='_onload()'itemscope itemtype="http://schema.org/CreativeWork/DocumentObject"><noscript><div class="docs-butterbar-container"><div class="docs-butterbar-wrap"><div class="jfk-butterBar jfk-butterBar-shown jfk-butterBar-warning">Die Datei kann in Ihrem Browser nicht geöffnet werden, da JavaScript nicht aktiviert ist. Aktivieren Sie JavaScript und laden Sie die Seite noch einmal.</div></div><br></div></noscript><meta itemprop="name" content="The Starfish Story (Translation in Navajo).pdf"><meta itemprop="faviconUrl" content="https://ssl.gstatic.com/docs/doclist/images/icon_11_pdf_favicon.ico"><meta itemprop="url" content="https://docs.google.com/file/d/0BxcsLDhZbUBBMWY1MzRkZGQtMjQxNC00NzQ3LWFmNzEtNzNmMzYzYmU2MDRj/edit"><meta itemprop="embedURL" content="https://docs.google.com/file/d/0BxcsLDhZbUBBMWY1MzRkZGQtMjQxNC00NzQ3LWFmNzEtNzNmMzYzYmU2MDRj/preview"><div id="docs-chrome" class="docs-vis-ref-chrome" tabindex="0"><div><div id="docs-header"><div id="docs-branding-container"class="docs-branding-default"><a title="Google Drive öffnen" href="//drive.google.com" target="_blank"><div id="docs-drive-logo"></div><div id="docs-branding-logo"></div></a></div><div id=gbar><nobr><a target=_blank class=gb1 href="https://www.google.com/webhp?tab=ow">Suche</a> <a target=_blank class=gb1 href="http://www.google.com/imghp?hl=de&tab=oi">Bilder</a> <a target=_blank class=gb1 href="https://maps.google.com/maps?hl=de&tab=ol">Maps</a> <a target=_blank class=gb1 href="https://play.google.com/?hl=de&tab=o8">Play</a> <a target=_blank class=gb1 href="https://www.youtube.com/?tab=o1">YouTube</a> <a target=_blank class=gb1 href="https://news.google.com/nwshp?hl=de&tab=on">News</a> <a target=_blank class=gb1 href="https://mail.google.com/mail/?tab=om">Gmail</a> <b class=gb1>Drive</b> <a target=_blank class=gb1 style="text-decoration:none" href="http://www.google.com/intl/de/options/"><u>Mehr</u> &raquo;</a></nobr></div><div id=guser width=100%><nobr><span id=gbn class=gbi></span><span id=gbf class=gbf></span><span id=gbe><a  target='_blank' href="https://docs.google.com/abuse?id=0BxcsLDhZbUBBMWY1MzRkZGQtMjQxNC00NzQ3LWFmNzEtNzNmMzYzYmU2MDRj" class=gb4>Missbrauch melden</a> | </span><a  target='_blank' href="https://docs.google.com/settings" class=gb4>Einstellungen</a> | <a target=_top id=gb_70 href="https://www.google.com/accounts/ServiceLogin?service=wise&passive=1209600&continue=https://docs.google.com/file/d/0BxcsLDhZbUBBMWY1MzRkZGQtMjQxNC00NzQ3LWFmNzEtNzNmMzYzYmU2MDRj/edit&followup=https://docs.google.com/file/d/0BxcsLDhZbUBBMWY1MzRkZGQtMjQxNC00NzQ3LWFmNzEtNzNmMzYzYmU2MDRj/edit" class=gb4>Anmelden</a></nobr></div><div class=gbh style=left:0></div><div class=gbh style=right:0></div><div style="clear:both"></div><div id="docs-titlebar-container"><div id="docs-titlebar"><div class="docs-title-outer"><div class="docs-title-widget goog-inline-block" id="docs-title-widget"><span class="docs-title" id="docs-title" role="button"><div class="docs-title-inner" id="docs-title-inner">The Starfish Story (Translation in Navajo).pdf</div></span></div><div class="docs-star-container goog-inline-block"><div id="docs-star" class="goog-inline-block" style="display:none"></div></div><div class="docs-activity-indicator-container goog-inline-block"></div></div></div><div class="docs-titlebar-buttons"><div id="docs-presence-container" class="goog-inline-block docs-titlebar-button"><div id="docs-presence" class="goog-inline-block"></div><div role="button" id="docs-chat" class="goog-inline-block jfk-button jfk-button-standard jfk-button-narrow docs-chat jfk-button-disabled" aria-disabled="true" style="display: none"><div class="docs-icon goog-inline-block "><div class="docs-icon-img-container docs-icon-img docs-icon-chat">&nbsp;</div></div></div></div><div class="goog-inline-block"><div role="button" id="docs-docos-commentsbutton" class="goog-inline-block jfk-button jfk-button-standard docs-titlebar-button jfk-button-disabled" aria-disabled="true">Kommentare</div><div id="docs-docos-caret" style="display: none"><div class="docs-docos-caret-outer"></div><div class="docs-docos-caret-inner"></div></div></div><span vsjson="{&quot;role&quot;:20,&quot;summary&quot;:&quot;Jeder, der über den Link verfügt&quot;,&quot;visibilityState&quot;:&quot;unlisted&quot;,&quot;restrictedToDomain&quot;:false,&quot;visibilityEntries&quot;:[{&quot;role&quot;:20,&quot;summary&quot;:&quot;Jeder, der über den Link verfügt&quot;,&quot;visibilityState&quot;:&quot;unlisted&quot;,&quot;restrictedToDomain&quot;:false,&quot;details&quot;:&quot;Alle Nutzer, die über den Link verfügen, sind zum Zugriff berechtigt. Es ist keine Anmeldung erforderlich.&quot;}],&quot;restrictedToSingleUserScope&quot;:false}" id="docs-titlebar-share-client-button" class="goog-inline-block"><div role="button" class="goog-inline-block jfk-button jfk-button-action docs-titlebar-button jfk-button-disabled" aria-disabled="true"><span class="goog-inline-block apps-share-sprite scb-button-icon  scb-unlisted-icon-white">&nbsp;</span>Freigeben</div></span></div></div></div><div class="docs-butterbar-container"><div class="docs-butterbar-wrap"><div class="jfk-butterBar jfk-butterBar-shown jfk-butterBar-info">Der von Ihnen verwendete Browser wird nicht mehr unterstützt. Einige Funktionen sind daher möglicherweise nicht wie gewünscht verfügbar. Führen Sie ein Upgrade auf einen <a href="http://whatbrowser.org" target="_blank" class="docs-butterbar-link-no-pad">modernen Browser</a> wie <a href="https://www.google.com/chrome/?&brand=CHVN&utm_campaign=en&utm_source=en-et-na-us-docs-ug&utm_medium=et" target="_blank" class="docs-butterbar-link-no-pad">Google Chrome</a> aus.<a href="#" onclick="this.parentNode.parentNode.removeChild(this.parentNode);return false;" class="docs-butterbar-link">Schließen</a></div></div><br></div></div><div id="docs-bars"><div id="docs-menubars"><div id="docs-menubar" role="menubar" class="docs-menubar goog-container goog-container-horizontal" tabIndex="0"><div id="docs-file-menu" role="menuitem" class="menu-button goog-control goog-control-disabled goog-inline-block">Datei</div><div id="docs-edit-menu" role="menuitem" class="menu-button goog-control goog-control-disabled goog-inline-block">Bearbeiten</div><div id="docs-view-menu" role="menuitem" class="menu-button goog-control goog-control-disabled goog-inline-block">Ansicht</div><div id="docs-help-menu" role="menuitem" class="menu-button goog-control goog-control-disabled goog-inline-block">Hilfe</div></div><div id="docs-chat-message-a11y" aria-live="polite" class="docs-offscreen" style="height: 0; width: 0; overflow: hidden"></div><div id="docs-presence-menubar"></div></div></div><div id="docs-help-anchor-wrapper"><div id="docs-help-anchor"></div><div id="docs-help-anchor-right"></div></div><div id="docs-additional-bars"></div></div><div id="docs-editor-container" class="docs-vis-ref-editor-container"><div id="docs-editor" tabindex="1" ><iframe id="gview-embed-content"class="gview-embed-iframe"src="https://docs.google.com/viewer?srcid=0BxcsLDhZbUBBMWY1MzRkZGQtMjQxNC00NzQ3LWFmNzEtNzNmMzYzYmU2MDRj&amp;pid=explorer&amp;efh=false&amp;a=v"></iframe></div></div><script type="text/javascript" src="/static/file/client/js/612129255-edit_core__de.js"></script>
<script>DOCS_initializeModules({"core":[],"app":["core"]},{"core":["\/static\/file\/client\/js\/612129255-edit_core__de.js"],"app":["\/static\/file\/client\/js\/4052761810-edit_app__de.js"]}, 'core');</script><script type="text/javascript">_main('\/file\/d\/0BxcsLDhZbUBBMWY1MzRkZGQtMjQxNC00NzQ3LWFmNzEtNzNmMzYzYmU2MDRj', {'sid': '48231c7ba8cb2d29','id': '0BxcsLDhZbUBBMWY1MzRkZGQtMjQxNC00NzQ3LWFmNzEtNzNmMzYzYmU2MDRj', 'email': '', 'title': 'The Starfish Story (Translation in Navajo).pdf', 'description': '', 'mimetype': 'application\/pdf', 'fileExtension': 'pdf', 'mediaType': 'pdf', 'revisions': [{"tags":[],"creatorDisplayName":"Terry Teller","pinned":true,"filename":"The Starfish Story (Translation in Navajo).pdf","downloadUrl":"https:\/\/docs.google.com\/uc?id=0BxcsLDhZbUBBMWY1MzRkZGQtMjQxNC00NzQ3LWFmNzEtNzNmMzYzYmU2MDRj&export=download&revid=0BxcsLDhZbUBBQmJUS0dhMVV4YWZmZStPa05xWlgxd3ZzaVJrPQ","sizeInBytes":33421,"docId":"0BxcsLDhZbUBBQmJUS0dhMVV4YWZmZStPa05xWlgxd3ZzaVJrPQ","creationDateString":"09.01.12","creator":{"isMe":false,"nickname":"Terry Teller","iconUrl":"images\/doclist\/contact_nopicture.png","editProfileUrl":"editProfile"}}],'obfuscatedUserId': 'ANONYMOUS_17612595759507348808','userDomain': '', 'embedPreviewUri': 'https:\/\/docs.google.com\/file\/d\/0BxcsLDhZbUBBMWY1MzRkZGQtMjQxNC00NzQ3LWFmNzEtNzNmMzYzYmU2MDRj\/preview','syncUpdates': [],'contentRenderer': 'gviewembed'},{"description":{"raw":"","formatted":""},"download":{"isMissingBlobRef":false,"filename":"The Starfish Story (Translation in Navajo).pdf","url":"https:\/\/docs.google.com\/uc?id=0BxcsLDhZbUBBMWY1MzRkZGQtMjQxNC00NzQ3LWFmNzEtNzNmMzYzYmU2MDRj&export=download"},"revision":{"swfUrl":"\/static\/doclist\/client\/css\/1531528182-uploaderapi.swf","busyIconImageUrl":"https:\/\/ssl.gstatic.com\/docs\/doclist\/images\/loading_small.gif"},"sharing":{"is_private":false,"visibility_is_restricted_to_domain":false,"visibility_domain_display_name":""},"basicdetails":{"mimeType":"application\/pdf","lastModifiedDateString":"28.06.12","creationDateString":"09.01.12","fileSize":"33421"},"thumbnail":{"thumbnail_128":"https:\/\/lh5.googleusercontent.com\/lpPbLs1Ej7u889Xoa2e15WTjJJ1nQZMZYEfYlE5tIq-kyhOLFz-33NfIxbrTFLcVA4YyPU6cpVkdhUaXG30aCt3u0nKvWVZw3xdt4A=s128","thumbnail_full":"https:\/\/lh5.googleusercontent.com\/lpPbLs1Ej7u889Xoa2e15WTjJJ1nQZMZYEfYlE5tIq-kyhOLFz-33NfIxbrTFLcVA4YyPU6cpVkdhUaXG30aCt3u0nKvWVZw3xdt4A=s1600"},"gviewembed":{"url":"https:\/\/docs.google.com\/viewer?srcid=0BxcsLDhZbUBBMWY1MzRkZGQtMjQxNC00NzQ3LWFmNzEtNzNmMzYzYmU2MDRj&pid=explorer&efh=false&a=v","embeduri":"https:\/\/docs.google.com\/viewer?srcid=0BxcsLDhZbUBBMWY1MzRkZGQtMjQxNC00NzQ3LWFmNzEtNzNmMzYzYmU2MDRj&pid=explorer&efh=false&a=v&chrome=false&embedded=true","nonredirectedgviewurl":"https:\/\/docs.google.com\/viewer?srcid=0BxcsLDhZbUBBMWY1MzRkZGQtMjQxNC00NzQ3LWFmNzEtNzNmMzYzYmU2MDRj&pid=explorer&efh=false&a=v&chrome=true&redirect=false","isNativeGView":false},"webstoreui":{"mimeType":"application\/pdf","fileExtension":"pdf","moreDriveAppsUrl":"https:\/\/chrome.google.com\/webstore\/category\/collection\/drive_apps"}});</script></body></html>

我尝试使用urllib并且我得到了:

>>> import urllib, codecs
>>> urllib.urlretrieve('https://docs.google.com/file/d/0BxcsLDhZbUBBMWY1MzRkZGQtMjQxNC00NzQ3LWFmNzEtNzNmMzYzYmU2MDRj/edit')
('/tmp/tmpQ5tDwR', <httplib.HTTPMessage instance at 0x16fbbd8>)
>>> codecs.open('/tmp/tmpQ5tDwR','r').read()

我得到了这个输出:http://pastebin.com/D2FM1VMU

最佳答案

这里的正确答案是使用 Google Drive API访问您的文档,而不是像普通的面向用户的网络浏览器一样编写与 Google 文档交互的脚本。

根据您的操作方式,Google 认为您想要查看该页面。而且,由于您看起来不像一个可以 native 查看 PDF 的浏览器,因此它对您很好并创建了一个 HTML 查看器页面来让您阅读 PDF。该查看器页面确实具有“下载”功能,您可以尝试解析 HTML 和 JavaScript 并触发下载,但这需要大量工作。

此外,我敢打赌 Google 云端硬盘的服务条款明确禁止编写脚本和抓取网络界面。

API 确实要求您创建 API key ,并且您可能还需要 OAuth 来处理以正确用户身份登录的情况。但一旦你这样做了,它就和你想要做的一样容易使用——而且它确实有效。你做了一个Files: get请求从其 ID(来自现有尝试的一长串垃圾)获取有关文件的信息,其中包括 downloadUrl字段,您只需获取该 URL。像这样的东西,在纯 stdlib 中:

url = 'https://www.googleapis.com/drive/v2/files/' + fileid
r = urllib2.urlopen(url)
filesinfo = json.load(r)
downloadurl = filesinfo['downloadUrl']
r2 = urllib2.urlopen(downloadurl)
data = r2.read()
当您开始添加 API key 以及可能的 OAuth 时,

requests 将使您的生活变得更加轻松,例如,您只需传递 {'key': API_KEY} 而不是调用 urllib.urlencode在字典上将其添加为查询字符串。

Google API Client Library for Python将使事情变得更加容易 - 您可以在文档页面上看到示例代码。

关于python - 如何通过 python 保存 Google pdf 文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20620786/

相关文章:

python - 不循环地迭代参数

c++ - 是否可以在 Qt 中制作具有不同页面大小的 pdf?

python - 我怎样才能在 python 中阅读 pdf?

python - 在url中传递一个变量?

python - JSON或Python dict/list解码问题

python - 如何在 Python 中从 nltk.book 读取 nltk.text.Text 文件?

python - 将多个正则表达式合并为一个 RE

java - 使用 selenium chrome 驱动程序生成 PDF

html - 无法从 pdb 数据库网站抓取 <div id ="search-container">

http-headers - HTTP 基本身份验证不适用于 Python 3