update from Atlas with major reorg
This commit is contained in:
194
attic/futures/countries/README.rst
Normal file
194
attic/futures/countries/README.rst
Normal file
@@ -0,0 +1,194 @@
|
||||
====================================
|
||||
Configuring a local test environment
|
||||
====================================
|
||||
|
||||
tl;dr;
|
||||
======
|
||||
|
||||
This text explains how to configure **nginx** and **vaurien** to build a local
|
||||
mirror of the data to run the flag download examples while avoiding network
|
||||
traffic and introducing controlled delays and errors for testing, thanks to
|
||||
the **vaurien** proxy.
|
||||
|
||||
|
||||
Rationale and overview
|
||||
======================
|
||||
|
||||
The flag download examples are designed to compare the performance of
|
||||
different approaches to finding and downloading files from the Web. However,
|
||||
we don't want to hit a public server with multiple requests per second while
|
||||
testing, and we want to be able to simulate high latency and random network
|
||||
errors.
|
||||
|
||||
For this setup I chose **nginx** as the HTTP server because it is very fast
|
||||
and easy to configure, and the **vaurien** proxy because it was designed by
|
||||
Mozilla to introduce delays and network errors for testing.
|
||||
|
||||
The archive ``flags.zip``, contains a directory ``flags/`` with 194
|
||||
subdirectories, each containing a ``.gif` image and a ``metadata.json`` file.
|
||||
These images are public-domain flags copied from the CIA World Fact Book [1].
|
||||
|
||||
[1] https://www.cia.gov/library/publications/the-world-factbook/
|
||||
|
||||
Once these files are unpacked to the ``flags/`` directory and **nginx** is
|
||||
configured, you can experiment with the ``flags*.py``examples without hitting
|
||||
the network.
|
||||
|
||||
|
||||
Instructions
|
||||
============
|
||||
|
||||
1. Unpack test data
|
||||
-------------------
|
||||
|
||||
Unpack the initial data in the ``countries/`` directory and verify that 194
|
||||
directories are created in ``countries/flags/``, each with a ``.gif`` and
|
||||
a ``metadata.json`` file::
|
||||
|
||||
$ unzip flags.zip
|
||||
... many lines omitted...
|
||||
creating: flags/zw/
|
||||
inflating: flags/zw/metadata.json
|
||||
inflating: flags/zw/zw.gif
|
||||
$ ls flags | wc -w
|
||||
194
|
||||
$ find flags | grep .gif | wc -l
|
||||
194
|
||||
$ find flags | grep .json | wc -l
|
||||
194
|
||||
$ ls flags/ad
|
||||
ad.gif metadata.json
|
||||
|
||||
|
||||
2. Install **nginx**
|
||||
--------------------
|
||||
|
||||
Download and install **nginx**. I used version 1.6.2 -- the latest
|
||||
stable version as I write this.
|
||||
|
||||
- Download page: http://nginx.org/en/download.html
|
||||
|
||||
- Beginner's guide: http://nginx.org/en/docs/beginners_guide.html
|
||||
|
||||
|
||||
3. Configure **nginx**
|
||||
----------------------
|
||||
|
||||
Edit the the ``nginx.conf`` file to set the port and document root.
|
||||
You can determine which ``nginx.conf`` is in use by running::
|
||||
|
||||
$ nginx -V
|
||||
|
||||
The output starts with::
|
||||
|
||||
nginx version: nginx/1.6.2
|
||||
built by clang 6.0 (clang-600.0.51) (based on LLVM 3.5svn)
|
||||
TLS SNI support enabled
|
||||
configure arguments:...
|
||||
|
||||
Among the configure arguments you'll see ``--conf-path=``. That's the
|
||||
file you will edit.
|
||||
|
||||
Most of the content in ``nginx.conf`` is within a block labeled ``http``
|
||||
and enclosed in curly braces. Within that block there can be multiple
|
||||
blocks labeled ``server``. Add another ``server`` block like this one::
|
||||
|
||||
server {
|
||||
listen 8001;
|
||||
|
||||
location /flags/ {
|
||||
root /full-path-to.../countries/;
|
||||
}
|
||||
}
|
||||
|
||||
After editing ``nginx.conf`` the server must be started (if it's not
|
||||
running) or told to reload the configuration file::
|
||||
|
||||
$ nginx # to start, if necessary
|
||||
$ nginx -s reload # to reload the configuration
|
||||
|
||||
To test the configuration, open the URL below in a browser. You should
|
||||
see the blue, yellow and red flag of Andorra::
|
||||
|
||||
http://localhost:8001/flags/ad/ad.gif
|
||||
|
||||
If the test fails, please double check the procedure just described and
|
||||
refer to the **nginx** documentation.
|
||||
|
||||
At this point you may run the ``flags_*2.py`` examples against the **nginx**
|
||||
install by changing the ``BASE_URL`` constant in ``flags_sequential2.py``.
|
||||
However, **nginx** is so fast that you will not see much difference in run
|
||||
time between the sequential and the threaded versions, for example. For more
|
||||
realistic testing with simulated network lag, we need **vaurien**.
|
||||
|
||||
|
||||
4. Install and run **vaurien**
|
||||
------------------------------
|
||||
|
||||
**vaurien depends on gevent which is only available for Python 2.5-2.7. To
|
||||
install vaurien I opened another shell, created another ``virtualenv`` for
|
||||
Python 2.7, and used that environment to install and run vaurien::
|
||||
|
||||
$ virtualenv-2.7 .env27 --no-site-packages --distribute
|
||||
New python executable in .env27/bin/python
|
||||
Installing setuptools, pip...done.
|
||||
$ . .env27/bin/activate
|
||||
(.env27)$ pip install vaurien
|
||||
Downloading/unpacking vaurien
|
||||
Downloading vaurien-1.9.tar.gz (50kB): 50kB downloaded
|
||||
...many lines and a few minutes later...
|
||||
|
||||
Successfully installed vaurien cornice gevent statsd-client vaurienclient
|
||||
greenlet http-parser pyramid simplejson requests zope.interface
|
||||
translationstring PasteDeploy WebOb repoze.lru zope.deprecation venusian
|
||||
Cleaning up...
|
||||
(.env27)$
|
||||
|
||||
Using that same shell with the ``.env27`` activated, run the ``vaurien_delay.sh`` script in the ``countries/`` directory::
|
||||
|
||||
(.env27)$ $ ./vaurien_delay.sh
|
||||
2015-02-25 20:20:17 [69124] [INFO] Starting the Chaos TCP Server
|
||||
2015-02-25 20:20:17 [69124] [INFO] Options:
|
||||
2015-02-25 20:20:17 [69124] [INFO] * proxies from localhost:8002 to localhost:8001
|
||||
2015-02-25 20:20:17 [69124] [INFO] * timeout: 30
|
||||
2015-02-25 20:20:17 [69124] [INFO] * stay_connected: 0
|
||||
2015-02-25 20:20:17 [69124] [INFO] * pool_max_size: 100
|
||||
2015-02-25 20:20:17 [69124] [INFO] * pool_timeout: 30
|
||||
2015-02-25 20:20:17 [69124] [INFO] * async_mode: 1
|
||||
|
||||
The ``vaurien_delay.sh`` adds a 1s delay to every response.
|
||||
|
||||
There is also the ``vaurien_error_delay.sh`` script which produces errors in 25% of the responses and a .5 se delay to 50% of the responses.
|
||||
|
||||
|
||||
Platform-specific instructions
|
||||
==============================
|
||||
|
||||
Nginx setup on Mac OS X
|
||||
-----------------------
|
||||
|
||||
Homebrew (copy & paste code at the bottom of http://brew.sh/)::
|
||||
|
||||
$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
|
||||
$ brew doctor
|
||||
$ brew install nginx
|
||||
|
||||
Download and unpack::
|
||||
|
||||
Docroot is: /usr/local/var/www
|
||||
/usr/local/etc/nginx/nginx.conf
|
||||
|
||||
To have launchd start nginx at login:
|
||||
ln -sfv /usr/local/opt/nginx/*.plist ~/Library/LaunchAgents
|
||||
Then to load nginx now:
|
||||
launchctl load ~/Library/LaunchAgents/homebrew.mxcl.nginx.plist
|
||||
Or, if you don't want/need launchctl, you can just run:
|
||||
nginx
|
||||
|
||||
|
||||
|
||||
Nginx setup on Lubuntu 14.04.1 LTS
|
||||
----------------------------------
|
||||
|
||||
Docroot is: /usr/share/nginx/html
|
||||
|
||||
61
attic/futures/countries/flags_asyncio2.py
Normal file
61
attic/futures/countries/flags_asyncio2.py
Normal file
@@ -0,0 +1,61 @@
|
||||
"""Download flags of top 20 countries by population
|
||||
|
||||
asyncio+aiottp version
|
||||
|
||||
Sample run::
|
||||
|
||||
$ python3 flags_asyncio.py
|
||||
NG retrieved.
|
||||
FR retrieved.
|
||||
IN retrieved.
|
||||
...
|
||||
EG retrieved.
|
||||
DE retrieved.
|
||||
IR retrieved.
|
||||
20 flags downloaded in 1.08s
|
||||
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
|
||||
import aiohttp
|
||||
|
||||
from flags import BASE_URL, save_flag, main
|
||||
|
||||
|
||||
@asyncio.coroutine
|
||||
def get_flag(cc):
|
||||
url = '{}/{cc}/{cc}.gif'.format(BASE_URL, cc=cc.lower())
|
||||
res = yield from aiohttp.request('GET', url)
|
||||
image = yield from res.read()
|
||||
return image
|
||||
|
||||
|
||||
@asyncio.coroutine
|
||||
def download_one(cc):
|
||||
image = yield from get_flag(cc)
|
||||
print('{} retrieved.'.format(cc))
|
||||
save_flag(image, cc.lower() + '.gif')
|
||||
return cc
|
||||
|
||||
|
||||
@asyncio.coroutine
|
||||
def downloader_coro(cc_list):
|
||||
to_do = [download_one(cc) for cc in cc_list]
|
||||
results = []
|
||||
for future in asyncio.as_completed(to_do):
|
||||
print(future)
|
||||
result = yield from future
|
||||
results.append(result)
|
||||
return results
|
||||
|
||||
|
||||
def download_many(cc_list):
|
||||
loop = asyncio.get_event_loop()
|
||||
results = loop.run_until_complete(downloader_coro(cc_list))
|
||||
loop.close()
|
||||
return len(results)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main(download_many)
|
||||
42
attic/futures/countries/flags_processpool.py
Normal file
42
attic/futures/countries/flags_processpool.py
Normal file
@@ -0,0 +1,42 @@
|
||||
"""Download flags of top 20 countries by population
|
||||
|
||||
ProcessPoolExecutor version
|
||||
|
||||
Sample run::
|
||||
|
||||
$ python3 flags_threadpool.py
|
||||
BD retrieved.
|
||||
EG retrieved.
|
||||
CN retrieved.
|
||||
...
|
||||
PH retrieved.
|
||||
US retrieved.
|
||||
IR retrieved.
|
||||
20 flags downloaded in 0.93s
|
||||
|
||||
"""
|
||||
# BEGIN FLAGS_PROCESSPOOL
|
||||
from concurrent import futures
|
||||
|
||||
from flags import save_flag, get_flag, show, main
|
||||
|
||||
MAX_WORKERS = 20
|
||||
|
||||
|
||||
def download_one(cc):
|
||||
image = get_flag(cc)
|
||||
show(cc)
|
||||
save_flag(image, cc.lower() + '.gif')
|
||||
return cc
|
||||
|
||||
|
||||
def download_many(cc_list):
|
||||
with futures.ProcessPoolExecutor() as executor: # <1>
|
||||
res = executor.map(download_one, sorted(cc_list))
|
||||
|
||||
return len(list(res))
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main(download_many)
|
||||
# END FLAGS_PROCESSPOOL
|
||||
103
attic/futures/countries/notes.txt
Normal file
103
attic/futures/countries/notes.txt
Normal file
@@ -0,0 +1,103 @@
|
||||
Prefixes with most flags:
|
||||
|
||||
M 18
|
||||
S 18
|
||||
B 17
|
||||
C 15
|
||||
T 13
|
||||
G 12
|
||||
A 11
|
||||
L 11
|
||||
K 10
|
||||
|
||||
There are no flags with prefix X
|
||||
|
||||
Errors with threadpool:
|
||||
|
||||
$ python3 flags_threadpool2.py _
|
||||
|
||||
ZT failed: 503 - Service Temporarily Unavailable
|
||||
ZU failed: 503 - Service Temporarily Unavailable
|
||||
ZV failed: 503 - Service Temporarily Unavailable
|
||||
ZY failed: 503 - Service Temporarily Unavailable
|
||||
--------------------
|
||||
24 flags downloaded.
|
||||
37 not found.
|
||||
615 errors.
|
||||
Elapsed time: 3.86s
|
||||
|
||||
|
||||
$ python3 flags_sequential2.py
|
||||
Searching for 10 flags: BD, BR, CN, ID, IN, JP, NG, PK, RU, US
|
||||
BD failed: (ProtocolError('Connection aborted.', gaierror(8, 'nodename nor servname provided, or not known')),)
|
||||
--------------------
|
||||
0 flag downloaded.
|
||||
1 error.
|
||||
Elapsed time: 0.02s
|
||||
*** WARNING: 9 downloads never started! ***
|
||||
|
||||
|
||||
194 flags downloaded.
|
||||
482 not found.
|
||||
Elapsed time: 683.71s
|
||||
|
||||
real 11m23.870s
|
||||
user 0m3.214s
|
||||
sys 0m0.603s
|
||||
|
||||
|
||||
$ python3 flags2.py -a
|
||||
LOCAL site: http://localhost:8001/flags
|
||||
Searching for 194 flags: from AD to ZW
|
||||
1 concurrent conection will be used.
|
||||
--------------------
|
||||
194 flags downloaded.
|
||||
Elapsed time: 0.90s
|
||||
(.env34) 192:countries luciano$ python3 flags2.py -e
|
||||
LOCAL site: http://localhost:8001/flags
|
||||
Searching for 676 flags: from AA to ZZ
|
||||
1 concurrent conection will be used.
|
||||
--------------------
|
||||
194 flags downloaded.
|
||||
482 not found.
|
||||
Elapsed time: 4.71s
|
||||
(.env34) 192:countries luciano$ python3 flags2.py -s remote
|
||||
(.env34) 192:countries luciano$ python3 flags2.py -s remote -a -l 100
|
||||
REMOTE site: http://python.pro.br/fluent/data/flags
|
||||
Searching for 100 flags: from AD to LK
|
||||
1 concurrent conection will be used.
|
||||
--------------------
|
||||
100 flags downloaded.
|
||||
Elapsed time: 72.58s
|
||||
(.env34) 192:countries luciano$ python3 flags2.py -s remote -e
|
||||
REMOTE site: http://python.pro.br/fluent/data/flags
|
||||
Searching for 676 flags: from AA to ZZ
|
||||
1 concurrent conection will be used.
|
||||
--------------------
|
||||
194 flags downloaded.
|
||||
482 not found.
|
||||
Elapsed time: 436.09s
|
||||
(.env34) 192:countries luciano$ python3 flags2_threadpool.py -s remote -e
|
||||
REMOTE site: http://python.pro.br/fluent/data/flags
|
||||
Searching for 676 flags: from AA to ZZ
|
||||
30 concurrent conections will be used.
|
||||
--------------------
|
||||
194 flags downloaded.
|
||||
482 not found.
|
||||
Elapsed time: 12.32s
|
||||
(.env34) 192:countries luciano$ python3 flags2_threadpool.py -s remote -e -m 100
|
||||
REMOTE site: http://python.pro.br/fluent/data/flags
|
||||
Searching for 676 flags: from AA to ZZ
|
||||
100 concurrent conections will be used.
|
||||
--------------------
|
||||
89 flags downloaded.
|
||||
184 not found.
|
||||
403 errors.
|
||||
Elapsed time: 7.62s
|
||||
(.env34) 192:countries luciano$
|
||||
|
||||
wait_with_progress
|
||||
http://compiletoi.net/fast-scraping-in-python-with-asyncio.html
|
||||
|
||||
http://blog.condi.me/asynchronous-part-1/
|
||||
|
||||
Reference in New Issue
Block a user