Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ArchiveTeam/ludios_wpull

wpull fork with fixes and faster parsing using html5-parser; used by grab-site; should go away when wpull is similarly improved
https://github.com/ArchiveTeam/ludios_wpull

Uses only ipv4 for unit tests.

1a5503a3bbb4b56af6696473844e6b469c39af3e authored almost 11 years ago by Christopher Foo <[email protected]>
Adds a delay to GoodAppTestCase to hopefully allow Travis tests pass.

d0f89294a952f6c664b9549b615c67f7e94251b4 authored almost 11 years ago by Christopher Foo <[email protected]>
Perfers our argparse for Python 2.6.

c6667e704ec297de42fa2ef67ff525c8a95eb705 authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes setting protocol_version in unit test handler. Adds more debug logs.

0dae4e2c93e0f35fb6a155d985d0d21d30869da2 authored almost 11 years ago by Christopher Foo <[email protected]>
Basic support for HTTPS.

5da1245eeba924a975b8bd0d2f33bbc374ec627a authored almost 11 years ago by Christopher Foo <[email protected]>
Removes use of unit test debug app setting which may trigger autoreload.

9289c0b3024944e543307997e5146c2302fce6a5 authored almost 11 years ago by Christopher Foo <[email protected]>
Increases unit test timeouts for network_test.py.

8ab6685db849dc613686de55f001b70aff030858 authored almost 11 years ago by Christopher Foo <[email protected]>
Backports argparse.

62f9f6819734c1ea72d0baa72b5204602fdbcd94 authored almost 11 years ago by Christopher Foo <[email protected]>
Adds assert to Connection().

096fdb50520532e9e69a321a6646171b9d2c9509 authored almost 11 years ago by Christopher Foo <[email protected]>
Includes option sanity check for warc options.

7257af4523d73c014025e88f56a6ea1952e5721b authored almost 11 years ago by Christopher Foo <[email protected]>
Implements graceful stop.

02fe9df905fbe34c9bd9ed938dd5c20cd7f40d4d authored almost 11 years ago by Christopher Foo <[email protected]>
http.py: Implements close().

62052d61e3e484edb25fd3817549857b62048091 authored almost 11 years ago by Christopher Foo <[email protected]>
Removes printing traceback to stderr.

It's already logged.

ecc1d95401b5ab74be9b6135104c4c29a6f46b7a authored almost 11 years ago by Christopher Foo <[email protected]>
Uses concurrent unit test http server. Increases writer_test timeout.

1351cce9dec4f5bad0f2105ff140b84586d7a09b authored almost 11 years ago by Christopher Foo <[email protected]>
http.py: Adds active property to Connection. More debug statements.

4ac23af1baf4bcd84e53bd372db38d660c32f418 authored almost 11 years ago by Christopher Foo <[email protected]>
Updates backport.py with latest version 727027eadace.

655e08ae4a3ebd37921ac7936f031bbad0112ee5 authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes backport/functools.py str issue in 3to2.

92de86fed96b6b15c9625605acfe79f3eea7abca authored almost 11 years ago by Christopher Foo <[email protected]>
Increases unit test timeout to 30 to account for resolver delay.

f9bf7d8200886702317d4eecaa47c9ac5be9ae7c authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes Resolver to properly cache empty results.

7d4addd448c040f2e8c796fe2382c5d24c3a94a6 authored almost 11 years ago by Christopher Foo <[email protected]>
Ensures unit test http server is fully started up.

228bd44c3e5f643a783156fcbaad7cd0cd19f271 authored almost 11 years ago by Christopher Foo <[email protected]>
Adds more debug statements.

14fb8a9eb58715658877d57b2412cb9b126ce04f authored almost 11 years ago by Christopher Foo <[email protected]>
Changes all IOLoop.instance() to IOLoop.current().

80b2513b8d69c9ebf5978f0c5473a476914752b1 authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes Resolver to use global cache.

adde4888550501e299d19bb2e9a39058b921a05f authored almost 11 years ago by Christopher Foo <[email protected]>
Backports functools.total_ordering.

fa3af877899001993ddf8b4127e54d739b4803c1 authored almost 11 years ago by Christopher Foo <[email protected]>
Implements --rotate-dns.

1312f75358d271985f865fbb2a454b3128d2c4a4 authored almost 11 years ago by Christopher Foo <[email protected]>
Implements dns caching.

4aa3a7b12321709b87d1c5a843d4c6bc37f61601 authored almost 11 years ago by Christopher Foo <[email protected]>
Adds cache module.

0195d92b4a6e758c8ebe07379c86819e951a3dc5 authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes keep alive not working. Implements --no-http-keep-alive.

301be9891e29aca158815886a46c92a876a5be0c authored almost 11 years ago by Christopher Foo <[email protected]>
Moves readme from Markdown to RST.

This allows proper rendering in PyPI.

01196fd865cfe79cc70f6f0ab8d2d82caafd5cda authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes finish time formatting.

6c50acb903f2d54dbd9c3740873a46243d12af5b authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes --continue option to be grouped.

7fb246ed81681abd9703ec9d4bdb011d3e7c2d4f authored almost 11 years ago by Christopher Foo <[email protected]>
Increases the unit test connection read timeout again for Travis CI.

1355019e375fed8f0cb77ff556bbd5f18540347d authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.5a1.

9580b7564b7ed70da93af780f5ffc461d1c4961f authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.4.

3e5d706cc8e384b1123f23312893fbfcdb679297 authored almost 11 years ago by Christopher Foo <[email protected]>
Merge branch 'develop'

a40402eb43e27b791b3a25e896d0819c48e53828 authored almost 11 years ago by Christopher Foo <[email protected]>
Increases unit test connection read timeout.

bfe85542d247450515b91fb68ee0265debdfa26f authored almost 11 years ago by Christopher Foo <[email protected]>
Gets the result from future directly within wait_future().

Avoids losing exceptions.

3bf6bcc6fcb105d4f4ef1c63ef291c40037a7c88 authored almost 11 years ago by Christopher Foo <[email protected]>
Sets the unittest connection timeout values.

dd37362af1171f2ccd7c4ce7eb45b3b6131382ac authored almost 11 years ago by Christopher Foo <[email protected]>
util.py sleep(), wait_future(): Uses current ioloop instead.

69930f6e933f1b13840b582da74c6cef51625c18 authored almost 11 years ago by Christopher Foo <[email protected]>
Implements --continue.

8fb21b5a8541e21a98cfe748545b904fbf017181 authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes recorders not working.

4ae2d8bb9b87fbb1e73a3209b9da441bec505e45 authored almost 11 years ago by Christopher Foo <[email protected]>
writer.py: Removes utime keyword parameter.

c1eec53e83a3bb3a641eab3d642125882d3c16b6 authored almost 11 years ago by Christopher Foo <[email protected]>
Adds http short close test.

d6696d54239faf1bd5fd8046c9fa0e40bc4f9c26 authored almost 11 years ago by Christopher Foo <[email protected]>
http.py: Catches only socket related exceptions.

8fe5329b40c2aedfc6fd46f59b62f72bfc62bf9d authored almost 11 years ago by Christopher Foo <[email protected]>
Implements --no-clobber, --timestamping.

56e9f1f7dcce81da1dc4a8c4edb05c74c2c27e0c authored almost 11 years ago by Christopher Foo <[email protected]>
Groups the options.

12a70cf43173fb32695c7c2a12dff8ed5ef9df7a authored almost 11 years ago by Christopher Foo <[email protected]>
requirements.txt: Loosen restriction on lxml.

633c912f9a0d55ab08be138a5545ceaa05f8a16f authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes treatment of fatal connection errors.

35770e39b156fad65ca98d408cbd122b42804255 authored about 11 years ago by Christopher Foo <[email protected]>
Normalizes the URLs before saving in database.

146251719b17fecccc80634705efcb3ad0f12e35 authored about 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.3.2.

Merge branch 'develop'

Conflicts:
wpull/version.py

2a20addf150cfcef8c7e45f048a62ce73a987608 authored about 11 years ago by Christopher Foo <[email protected]>
network.py: Fixes typo from previous commit.

7cb8f94e569f099b19b329afcfdef244989482d3 authored about 11 years ago by Christopher Foo <[email protected]>
Edits the todo task tags.

0b992427e36769c919d4e0942523ffd313d10ca4 authored about 11 years ago by Christopher Foo <[email protected]>
database.py: Works around only first row saved.

6c034f5bc3b52945df8e05f7a2f32358241bc052 authored about 11 years ago by Christopher Foo <[email protected]>
Fixes MANIFEST.in to include correct files.

dc8331bcb069f3220ed5145cfd12d3e9c25a109a authored about 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.3.1

d1cbded15cab2a7abda6528f0888f418c9ea9acb authored about 11 years ago by Christopher Foo <[email protected]>
Fixes MANIFEST.in to include correct files.

a41f167df9abe16e34818503464c12013b4502a2 authored about 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.4a1.

e403676c4d7dfa9c1ed8bb0dfd8c76e6a780bda8 authored about 11 years ago by Christopher Foo <[email protected]>
readme: Minor tweak to archive example's log option.

ea25ca668ba5ac12a57cbac653be0a53df32c92c authored about 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.3.

Merge branch 'develop'

Conflicts:
wpull/version.py

32ea6f566071ad3410f5dbaf6b02065f8a31d093 authored about 11 years ago by Christopher Foo <[email protected]>
Implements --hostnames,--exclude-hostnames

eafbfdf19ba04b5c08e0ade3332f8120ef367e5a authored about 11 years ago by Christopher Foo <[email protected]>
WARCRecorder: Records the Python version.

91962ccf0daae6c8b0ecbd89ec53ea4334bab378 authored about 11 years ago by Christopher Foo <[email protected]>
Refactors WebProcessor to branch code to functions earlier.

9e2fc9b77cc556af28b812f27bdff1897f8ec412 authored about 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.3a1.

02d1369206f02fd347b27eac3afb68a9c5f7abfd authored about 11 years ago by Christopher Foo <[email protected]>
Merge branch 'develop'

37b7950512d4860badf64b2aae2d44258ffeb716 authored about 11 years ago by Christopher Foo <[email protected]>
readme.md: Expands archive example with --database option.

4c803ef25178e1871d03a92768aab23a2059e099 authored about 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.2.

bcd801d3b2946a36a522a67c848fd2ac78553423 authored about 11 years ago by Christopher Foo <[email protected]>
Merge branch 'develop'

24df339293f04178024c2fd3adb214715e32c657 authored about 11 years ago by Christopher Foo <[email protected]>
Fixes URLInfo scheme detection for Python 2.6.

c281a97b5b6deae6f500cb1403816cd3aff72d89 authored about 11 years ago by Christopher Foo <[email protected]>
Backports additional code for python 2.6.

caf386f2679169dc21a5c3bb1078b8e15c94a500 authored about 11 years ago by Christopher Foo <[email protected]>
Backports OrderedDict.

f8e0ec1e5f302060e04f9862f968d3044fa405fb authored about 11 years ago by Christopher Foo <[email protected]>
Fixes typo in requirements-py.txt

b771ba8c30d52a6e7da30fb8e8e7498561224cac authored about 11 years ago by Christopher Foo <[email protected]>
Works around bug in 2to3. Adds requirements-py2.txt.

c8d8b484330f4cfa71aa14199c3fc86f56b9c7ef authored about 11 years ago by Christopher Foo <[email protected]>
Fixes travis config so build fails immediately in 3to2 process.

b685f7efa08d4d20ed3e03155ddec3f09a20c7e6 authored about 11 years ago by Christopher Foo <[email protected]>
Adds missing package in setup.py.

febd3bfd20da0529ecc76325a425b7833bd87ada authored about 11 years ago by Christopher Foo <[email protected]>
Fixes shell syntax in travis build file.

62f9245b68c1343ad871d100a2a1ca5d40afd0ad authored about 11 years ago by Christopher Foo <[email protected]>
Removes lib3to2 workaround.

a8b321d85211672f838582ec6208a0e70925541f authored about 11 years ago by Christopher Foo <[email protected]>
Adds lib3to2 support via backport module.

f3ebaa0d3b23a14e904e85093e81244b22eb9d3c authored about 11 years ago by Christopher Foo <[email protected]>
Fixes broken calls. Implements --header.

1c5e2966f7d916aea67727ceb8276c86948e0111 authored about 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.2a1.

52a82cb90550da352780588cf88cf99bf554f64b authored about 11 years ago by Christopher Foo <[email protected]>
Considers app as beta quality. Bumps version to 0.1.

9e6c41958cfb2cf21a1d4e97e5229aa6da7c9b06 authored about 11 years ago by Christopher Foo <[email protected]>
Fixes missing module robotexclusionrulesparser.

c915225c04700e69f654c9a41f82502b6f4bf288 authored about 11 years ago by Christopher Foo <[email protected]>
Implements handling of robots.txt

9552bc23a000a1f27fdba90d42a9d45f6a0c58e9 authored about 11 years ago by Christopher Foo <[email protected]>
URLInfo normalizes default ports.

5cfaa3a2a8e357497b1563f3eced4340a478a510 authored about 11 years ago by Christopher Foo <[email protected]>
Fixes WARCRecorder not saving HTTP headers.

e7f08725cac515a15fa625bc5b297654c575d037 authored about 11 years ago by Christopher Foo <[email protected]>
Refactors ParentFilter. Implements --include/exclude-direcotories.

4c8f3bcf08ed8a99b88ba052308bb8041e29baa8 authored about 11 years ago by Christopher Foo <[email protected]>
Implements --relative.

6199b0e3de6c7fd82044756b6904504cad8079ad authored about 11 years ago by Christopher Foo <[email protected]>
Implements --accept-regex, --reject-regex

3d645da9dc9ea2cea7db027354855d371092c684 authored about 11 years ago by Christopher Foo <[email protected]>
Releases in-progress items on start up.

f36388acbdb2f210f3b0de3d7e1e3a7df5757a65 authored about 11 years ago by Christopher Foo <[email protected]>
Joins base url for CSSScraper.

cab63484e0cb02c8735632471e2fb7d7bea6e966 authored about 11 years ago by Christopher Foo <[email protected]>
Refactors HTMLScraper to use unified lxml and wget logic.

f7217bf24419ae410016655af8c26e4237f5599f authored about 11 years ago by Christopher Foo <[email protected]>
Adds CSSScraper implementation.

7f0a3def995675328725f7edcae650b0a86820a9 authored about 11 years ago by Christopher Foo <[email protected]>
Fixes referer not included in requests.

10b5692da77d56ffe29dea51ca2521c9315f2990 authored about 11 years ago by Christopher Foo <[email protected]>
Implements --database option.

49410be5e55d8df69d35ae0208dbcffabd4f47b3 authored about 11 years ago by Christopher Foo <[email protected]>
Prints statistics when finished.

6c60b55a226d95abd59f3129b5477140388675ad authored about 11 years ago by Christopher Foo <[email protected]>
Implements WARC file logging.

917b1b429845a367248575f996a9db05ee5af9fe authored about 11 years ago by Christopher Foo <[email protected]>
Comments out --background which is not yet supported.

ba78ba76a3abfab3be9952e913834dbe1a2c3a8b authored about 11 years ago by Christopher Foo <[email protected]>
Fixes WARCRecoder.

279a1cf7a3f84322dad4afeebf3bdf2d1a9d865b authored about 11 years ago by Christopher Foo <[email protected]>
Implements --max-redirect.

a61d24542f1d0a3c0272723e1afef9171b43c021 authored about 11 years ago by Christopher Foo <[email protected]>
Removes saving both http and content files.

Saving the http file is sort of redundant and may use up valuable disk
space.

3ed1d140bf10fb639ed33cc80062fdfb70bbef6d authored about 11 years ago by Christopher Foo <[email protected]>
Refactor WARCRecorder to save files in --warc-tempfile.

cc1f6343cbc485a0de1e8c51da665e181b165668 authored about 11 years ago by Christopher Foo <[email protected]>