Asked  7 Months ago    Answers:  5   Viewed   56 times

I tried the sample provided within the documentation of the requests library for python.

With async.map(rs), I get the response codes, but I want to get the content of each page requested. This, for example, does not work:

out = async.map(rs)
print out[0].content

 Answers

92

Note

The below answer is not applicable to requests v0.13.0+. The asynchronous functionality was moved to grequests after this question was written. However, you could just replace requests with grequests below and it should work.

I've left this answer as is to reflect the original question which was about using requests < v0.13.0.


To do multiple tasks with async.map asynchronously you have to:

  1. Define a function for what you want to do with each object (your task)
  2. Add that function as an event hook in your request
  3. Call async.map on a list of all the requests / actions

Example:

from requests import async
# If using requests > v0.13.0, use
# from grequests import async

urls = [
    'http://python-requests.org',
    'http://httpbin.org',
    'http://python-guide.org',
    'http://kennethreitz.com'
]

# A simple task to do to each response object
def do_something(response):
    print response.url

# A list to hold our things to do via async
async_list = []

for u in urls:
    # The "hooks = {..." part is where you define what you want to do
    # 
    # Note the lack of parentheses following do_something, this is
    # because the response will be used as the first argument automatically
    action_item = async.get(u, hooks = {'response' : do_something})

    # Add the task to our list of things to do via async
    async_list.append(action_item)

# Do our list of things to do via async
async.map(async_list)
Tuesday, June 1, 2021
 
IcedAnt
answered 7 Months ago
94

Cookie values are strings, not integers. Set them as such:

s.cookies['cookie1'] = '25'
s.cookies['cookie2'] = '25'

Demo:

>>> import requests
>>> from urllib.parse import urlparse
>>> url = 'http://httpbin.org/cookies'
>>> s = requests.Session()
>>> s.headers.update({
...     'Origin':urlparse(url).netloc,
...     'Referer':url
... })
>>> r = s.get(url)
>>> s.cookies['cookie1'] = '25'
>>> s.cookies['cookie2'] = '25'
>>> r = s.get(url, headers={'X-Requested-With':'XMLHttpRequest'})
>>> print(r.text)
{"cookies": {"cookie1": "25", "cookie2": "25"}}
Thursday, August 19, 2021
 
Henrik
answered 4 Months ago
24

Requests doesn't show the redirect because you're not actually being redirected in the HTTP sense. Wikipedia does some JavaScript trickery (probably HTML5 history modification and pushState) to change the address that's shown in the address bar, but that doesn't apply to Requests, of course.

In other words, both requests and your browser are correct: requests is showing the URL you actually requested (and Wikipedia actually served), while your browser's address bar is showing the 'proper', canonical URL.

You could parse the response and look for the <link rel="canonical"> tag if you want to find out the 'proper' URL from your script, or fetch articles over Wikipedia's API instead.

Tuesday, August 31, 2021
 
Jonesome Reinstate Monica
answered 3 Months ago
57
class ThreadedHTTPServer(ThreadingMixIn, HTTPServer):
    pass

is enough. Your client probably don't make concurrent requests. If you make the requests in parallel the threaded server works as expected. Here's the client:

#!/usr/bin/env python
import sys
import urllib2

from threading import Thread

def make_request(url):
    print urllib2.urlopen(url).read()

def main():
    port = int(sys.argv[1]) if len(sys.argv) > 1 else 8000
    for _ in range(10):
        Thread(target=make_request, args=("http://localhost:%d" % port,)).start()

main()

And the corresponding server:

import time
from BaseHTTPServer   import BaseHTTPRequestHandler, HTTPServer, test as _test
from SocketServer     import ThreadingMixIn


class ThreadedHTTPServer(ThreadingMixIn, HTTPServer):
    pass

class SlowHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.send_header("Content-type", "text/plain")
        self.end_headers()

        self.wfile.write("Entered GET request handler")
        time.sleep(1)
        self.wfile.write("Sending response!")

def test(HandlerClass = SlowHandler,
         ServerClass = ThreadedHTTPServer):
    _test(HandlerClass, ServerClass)


if __name__ == '__main__':
    test()

All 10 requests finish in 1 second. If you remove ThreadingMixIn from the server definition then all 10 requests take 10 seconds to complete.

Monday, September 20, 2021
 
coolguy
answered 3 Months ago
55

First of all, to reproduce the problem, I had to add the following line to your onStringSend function:

request.get_data()

Otherwise, I was getting “connection reset by peer” errors because the server’s receive buffer kept filling up.

Now, the immediate reason for this problem is that Response.content (which is called implicitly when stream=False) iterates over the response data in chunks of 10240 bytes:

self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()

Therefore, the easiest way to solve the problem is to use stream=True, thus telling Requests that you will be reading the data at your own pace:

response_data = s.post(url=url, data=data, stream=True, verify=False).raw.read()

With this change, the performance of the Requests version becomes more or less the same as that of the urllib version.

Please also see the “Raw Response Content” section in the Requests docs for useful advice.

Now, the interesting question remains: why is Response.content iterating in such small chunks? After talking to Cory Benfield, a core developer of Requests, it looks like there may be no particular reason. I filed issue #3186 in Requests to look further into this.

Thursday, September 23, 2021
 
JustSteveKing
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share