r - 如何在 R 中使用 httr 对 shibboleth 多主机名网站进行身份验证

标签 r post shibboleth httr

注意:ipums international 和 ipums usa 可能使用相同的系统。 ipums usa 允许更快的注册。如果您想测试您的代码,请尝试 https://usa.ipums.org/usa-action/users/request_access注册!

我正在尝试以编程方式从 https://international.ipums.org/ 下载文件使用 R 语言和 httr。我需要使用 httr 而不是 RCurl,因为我需要在身份验证后下载大文件,而不是下载到 RAM 中,而是直接下载到磁盘。 this is currently only possible with httr as far as i know

下面的可重现代码记录了我从登录页面 ( https://international.ipums.org/international-action/users/login ) 到主认证后页面的最大努力。任何提示或提示将不胜感激!谢谢!

my_email <- "email@address.com"
my_password <- "password"

tf <- tempfile()

# use httr, because i need to download a large file after authentication
# and only httr supports that with its `write_disk()` option
library(httr)

# turn off ssl verify, otherwise the subsequent GET command will fail
set_config( config( ssl_verifypeer = 0L ) )

GET( "https://international.ipums.org/Shibboleth.sso/Login?target=https%3A%2F%2Finternational.ipums.org%2Finternational-action%2Fmenu" )

# connect to the starting login page of the website
( a <- GET( "https://international.ipums.org/international-action/users/login" , verbose( info = TRUE ) ) )

# which takes me through to a lot of websites, but ultimately (in my browser) lands at
shibboleth_url <- "https://live.identity.popdata.org:443/idp/Authn/UserPassword"

# construct authentication information?
base_values <- list( "j_username" = my_email , "j_password" = my_password )
idp_values <- list( "j_username" = my_email , "j_password" = my_password ,  "_idp_authn_lc_key"=subset( a$cookies , domain == "live.identity.popdata.org" )$value , "JSESSIONID" = subset( a$cookies , domain == "#HttpOnly_live.identity.popdata.org" )$value )
ipums_values <- list( "j_username" = my_email , "j_password" = my_password ,  "_idp_authn_lc_key"=subset( a$cookies , domain == "live.identity.popdata.org" )$value , "JSESSIONID" = subset( a$cookies , domain == "international.ipums.org" )$value)

# i believe this is where the main login should happen, but it looks like it's failing
GET( shibboleth_url , query = idp_values )
POST( shibboleth_url , body = base_values )
writeBin( GET( shibboleth_url , query = idp_values )$content , tf )

readLines( tf )
# The MPC account authentication system has encountered an error
# This error can sometimes occur if you did not close your browser after logging out of an application previously.  It may also occur for other reasons.  Please close your browser and try your action again."                                                                      

writeBin( GET( "https://live.identity.popdata.org/idp/profile/SAML2/Redirect/SSO" , query = idp_values )$content , tf )
POST( "https://live.identity.popdata.org/idp/profile/SAML2/Redirect/SSO" , body = idp_values )
readLines( tf )
# same error as above

# return to the main login page..
writeBin( GET( "https://international.ipums.org/international-action/menu" , query = ipums_values )$content , tf )
readLines( tf )
# ..not logged in

最佳答案

您必须使用 set_cookies()将您的 cookie 发送到服务器:

library(httr)
library(rvest)
#my_email <- "xxx"
#my_password <- "yyy"
tf <- tempfile()
set_config( config( ssl_verifypeer = 0L ) )

# Get first page
p1 <- GET( "https://international.ipums.org/international-action/users/login" , verbose( info = TRUE ) )

# Post Login credentials
b2 <- list( "j_username" = my_email , "j_password" = my_password )
c2 <- c(JSESSIONID=p1$cookies[p1$cookies$domain=="#HttpOnly_live.identity.popdata.org",]$value,
           `_idp_authn_lc_key`=p1$cookies[p1$cookies$domain=="live.identity.popdata.org",]$value)
p2 <- POST(p1$url,body = b2, set_cookies(.cookies = c2), encode="form" )

# Parse hidden fields
h2 <- read_html(p2$content)
form <-  h2 %>% html_form() 

# Post hidden fields
b3 <- list( "RelayState"=form[[1]]$fields[[1]]$value, "SAMLResponse"=form[[1]]$fields[[2]]$value)
c3 <- c(JSESSIONID=p1$cookies[p1$cookies$domain=="#HttpOnly_live.identity.popdata.org",]$value,
           `_idp_session`=p2$cookies[p2$cookies$name=="_idp_session",]$value,
           `_idp_authn_lc_key`=p2$cookies[p2$cookies$name=="_idp_authn_lc_key",]$value)
p3 <- POST( form[[1]]$url , body=b3, set_cookies(.cookies = c3), encode = "form")

# Get interesting page
c4 <- c(JSESSIONID=p3$cookies[p1$cookies$domain=="international.ipums.org" && p3$cookies$name=="JSESSIONID",]$value,
           `_idp_session`=p3$cookies[p3$cookies$name=="_idp_session",]$value,
           `_idp_authn_lc_key`=p3$cookies[p3$cookies$name=="_idp_authn_lc_key",]$value)
p4 <- GET( "https://international.ipums.org/international-action/menu", set_cookies(.cookies = c4) )
writeBin(p4$content , tf )
readLines( tf )[55]

由于结果是
[1] "    <li class=\"lastItem\"><a href=\"/international-action/users/logout\">Logout</a></li>"

我想你已经登录了...

关于r - 如何在 R 中使用 httr 对 shibboleth 多主机名网站进行身份验证,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34829920/

相关文章:

json - Swagger POST Json 正文参数架构 YAML

java - 单一注销不适用于 Shibboleth IdP

apache - 在代理到 tomcat 之前在 apache 中设置 REMOTE_USER

python - R 或 python 中的隐马尔可夫模型实现

r - 使用 ggalluvial 包将标签数据添加到桑基绘图轴

r - 使用 R 代码在 A4 纸上放置多个绘图

jquery $.ajax 自定义 http header 问题

r -/usr/bin/Rscript : Argument list too long

javascript - 如何以编程方式在 php 中发送帖子数据?

spring-security - Shibboleth SSO 和 Spring SP : Unable to login due to "InResponseToField" mismatch error