Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ArchiveTeam/wpull

Wget-compatible web downloader and crawler.
https://github.com/ArchiveTeam/wpull

changedlog: Add entry about CacheItem fix.

[ci skip]

4fabb9c7572d812f81ba754a611b57bb54f7c0ce authored over 10 years ago by Christopher Foo <[email protected]>
cache.cacheItem: Compare key in total ordering to avoid false equals

Closes chfoo/wpull#191

bc5c3afa40b9946ce39bb3c2001a17106d1a4936 authored over 10 years ago by Christopher Foo <[email protected]>
fuzz_fusil: Update fusil test cases to match URL warning

[ci skip]

01bfd328fa3e289bf5098e77bf3d7c5c7ae21391 authored over 10 years ago by Christopher Foo <[email protected]>
document.css,scraper.css: Reject links longer than 500 chars.

7f9a3b48f43460e0e80779254cbca91bf783fb64 authored over 10 years ago by Christopher Foo <[email protected]>
scaper.javascript: Try JSON decode before is_unlikely_text

342acf5368d0b2fc484b4cf3e5e44fb390b9ab4a authored over 10 years ago by Christopher Foo <[email protected]>
scraper.util.is_unlikely_link: Include \ and html tags.

07dff57fb02651b69f177302af65774576a56010 authored over 10 years ago by Christopher Foo <[email protected]>
ftp.client: Support fetch both NLST and LIST.

845e80c75d312ccc849f1e78c7fc33061e489bb9 authored over 10 years ago by Christopher Foo <[email protected]>
ftp.ls: Fix up parsing Unix ls where 1 space between user and size.

8d74cccaf167b4b67e59c99efec85ce90255dfb2 authored over 10 years ago by Christopher Foo <[email protected]>
body: Support passthrough __iter__()

00cbc3dfb409a2e5cdb9f0b5e42461b7dd607399 authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'issue/23_ftp_2' into develop

c434a1c917a7954c44ae4b30e485ca76cf56e8f9 authored over 10 years ago by Christopher Foo <[email protected]>
setup.py: Include ftp.ls

13095c3573b9d6152bd5d88078dc19575c3d85d8 authored over 10 years ago by Christopher Foo <[email protected]>
doc: Update module listings.

b945f0e55d2b267f954d3c1430efff6471d48d77 authored over 10 years ago by Christopher Foo <[email protected]>
http.client: Put proxy concerns into abstract.client.

a6c11789112d7b090ffad7bce60e1e129d8392c3 authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'temp1' into issue/23_ftp_2

Conflicts:
wpull/connection.py
wpull/http/client.py

565b1879f2c7f6d3ef7138870860c155d3da2720 authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'develop' into temp1

Conflicts:
setup.py

36931e07e5b93ea03b1f7a2e77ba872d65945063 authored over 10 years ago by Christopher Foo <[email protected]>
scraper.html: Percent decode javascript: links

Closes chfoo/wpull#141

c319dbb829aa4e120726227f89ae69ec04e768c5 authored over 10 years ago by Christopher Foo <[email protected]>
Move DB get/add/queued/dequeued hook concerns to URLTableHookWrapper.

queued/dequeued hooks get called for all add and check outs

Closes chfoo/wpull#190

50635712863567bfc791665e7a332c5a3ccb6bb6 authored over 10 years ago by Christopher Foo <[email protected]>
database: Add wrap.URLTableHookWrapper.

69964590053398a7a3a8bd90e2e622596e994327 authored over 10 years ago by Christopher Foo <[email protected]>
engine: Poison workers if producer dies to avoid deadlock.

5edfa6876978b64f19507e0f5b803e2f0c3952eb authored over 10 years ago by Christopher Foo <[email protected]>
engine,hook,process.web: Use parse_url_or_log().

131a0781f38242569671c8a39eaefb86a0e00951 authored over 10 years ago by Christopher Foo <[email protected]>
url: Add parse_url_or_log()

391e8c644df9c340a6582c699e436c9292c8f4ec authored over 10 years ago by Christopher Foo <[email protected]>
changelog: Add entry about better database performance

[ci skip]

c2310d10d80c80bc64d32c651423569952943315 authored over 10 years ago by Christopher Foo <[email protected]>
sqltable: Optimize performance (0.80s→0.03s Wow!)

f3eabbe4ce32df3395785741db469c5a4317bc67 authored over 10 years ago by Christopher Foo <[email protected]>
fixup! fixup! sqltable: Fix insert().values() syntax cause CompileError

Fix "CompileError: The 'sqlite' dialect with current database version
settings does not support ...

e8026ee55fb65c0d98606b24c92b9b5dfadded0f authored over 10 years ago by Christopher Foo <[email protected]>
fixup! sqltable: Fix insert().values() syntax cause CompileError

Fix "CompileError: The 'sqlite' dialect with current database version
settings does not support ...

6e87d7a27a5bf5b74b181aa88ab2ddab453b408c authored over 10 years ago by Christopher Foo <[email protected]>
sqltable: Fix insert().values() syntax cause CompileError

Fix "CompileError: The 'sqlite' dialect with current database version
settings does not support ...

962cdeac2b1f85db6987e0aaabff19cf96807bcb authored over 10 years ago by Christopher Foo <[email protected]>
setup.py: Update packages to incldue wpull.database

104a43059698ccff8ba162aea5db8020faf0fa7d authored over 10 years ago by Christopher Foo <[email protected]>
doc: Update API docs with new modules.

66c5592092949e8cb7f5749b5fa874f836c810c6 authored over 10 years ago by Christopher Foo <[email protected]>
Fix database modules import references and updated function names.

4c858a3514997f417bb4860c0530cfbf9a9952ae authored over 10 years ago by Christopher Foo <[email protected]>
database: Move into package. Rename add()->add_many(). Rm url_encoding col.

Move database module into package. Seperate base tables from SQL
implementations.

Rename method...

ac53a176f4bf2c09904805cea0d47d0da84c2e38 authored over 10 years ago by Christopher Foo <[email protected]>
doc: Rename template.py->api_template.py. Walk for module names.

08e54f607876b08b53d7a7bc8bc6d8683d7b7d27 authored over 10 years ago by Christopher Foo <[email protected]>
Add --link-extractors option.

f6ff5418c692c372ee52bcd2fee7cef587d5457d authored over 10 years ago by Christopher Foo <[email protected]>
scraper: Make ElementWalker an instance

5912dc2b8c2b6a1cb30fd0c98134adfbebab4824 authored over 10 years ago by Christopher Foo <[email protected]>
changelog: Update about new URL parsing.

Closes chfoo/wpull#146 Closes chfoo/wpull#147

[ci skip]

ef556420725591556f94e98edbb1558300631b19 authored over 10 years ago by Christopher Foo <[email protected]>
url_test.py: Add benchmark test

[ci skip]

9282765825077d6a3ac9b88cd6850093bed15826 authored over 10 years ago by Christopher Foo <[email protected]>
Increase unit test timeout for pypy on Travis CI.

33fbebd611fce27215728ab1982feff83d8b0794 authored over 10 years ago by Christopher Foo <[email protected]>
url: More unit test and authority and resource delimiter fix edge cases.

5662a1a823c553edfd6c29be03d56b903df2a545 authored over 10 years ago by Christopher Foo <[email protected]>
processor.web: Update debug log found->candidate

34063bec6fb174c52244d4dc806f85feccbd7121 authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'topic/url_parse_2' into develop

4679cc3575dfba62c1f47784460b581db4f933c8 authored over 10 years ago by Christopher Foo <[email protected]>
engine: Catch ValueError during URL parse.

Bad URLs may sneak into the database.

Closes chfoo/wpull#132

d0be248f115fcd9dd20537b6f7bcbfa2e67f9c84 authored over 10 years ago by Christopher Foo <[email protected]>
url: Reinstate normalize() convenience function.

eac0ee179af98f7ee8a085dd69f786f35bb51fb6 authored over 10 years ago by Christopher Foo <[email protected]>
url: Add some lru_cache() decorators. Optimize idna check.

965703e9deb823535ba05f6d99483318a79f74de authored over 10 years ago by Christopher Foo <[email protected]>
url: Write own parse and not use urllib.parse.urlsplit. Don't unquote.

Refactor attempt number 2.

re: chfoo/wpull#146 chfoo/wpull#147

385f0219e6ddb7495c4aa02ebce27a19a23aabd4 authored over 10 years ago by Christopher Foo <[email protected]>
Move is_likely_link() to scraper.util

eec38f615704b464ff036a1106c2f63267f853fa authored over 10 years ago by Christopher Foo <[email protected]>
app_test.py: Allow for PyPy JIT warm up time.

678ffe462fcbe07ecfc034d4ca5960012cede803 authored over 10 years ago by Christopher Foo <[email protected]>
fixup! travis: Don't install and test lua or lxml packages under pypy3

8f745a100e900445f3ae7b4b3f18181d4ee67ca3 authored over 10 years ago by Christopher Foo <[email protected]>
travis: Don't install and test lua or lxml packages under pypy3

5dcbc5d96b7366aa3d9a340004b46819fbbca877 authored over 10 years ago by Christopher Foo <[email protected]>
Support PyPy 2.3.1 with Python 3.2 implementation.

59db8f9dda8337dc2af37c0083598749828f7a49 authored over 10 years ago by Christopher Foo <[email protected]>
Add --html-parser option.

Closes chfoo/wpull#140

31f48e29fda86a79074c14ef5e4d9b1195cf2cda authored over 10 years ago by Christopher Foo <[email protected]>
setup.py/requirements.txt: Require html5lib>=0.999

8e6093663ed9d83023823bc2b31ab0669be29493 authored over 10 years ago by Christopher Foo <[email protected]>
converter/htmlparse: Fix unhashable dicts with FrozenDict

01459adb4a6088a05be1280b977b3ee32635361f authored over 10 years ago by Christopher Foo <[email protected]>
collections: Add FrozenDict

77b04094051ebbfdf3f94b01b0812112f6e66cd7 authored over 10 years ago by Christopher Foo <[email protected]>
document,scraper,converter: Add HTML5LibHTMLParser.

4049e5bb7f6e6a65c8810f78912dca4dd3cffabc authored over 10 years ago by Christopher Foo <[email protected]>
Update docs for document & scraper abstraction refactor changes.

84fa7f3f90aacdc20dbd9d5dc59bb8fc8c815278 authored over 10 years ago by Christopher Foo <[email protected]>
Fix up builder and convert to match document/scraper changes.

ab74368462526f2c65a1017cddcd5ae8d3871072 authored over 10 years ago by Christopher Foo <[email protected]>
scraper: Follow document reader abstractions.

7699655fc307a6868fd86f55784cefe06d3096a1 authored over 10 years ago by Christopher Foo <[email protected]>
document: More abstractions. Put lxml concerns into document.htmlparse package

24b3c1514b15645ece9b3764a4f75482fbf90762 authored over 10 years ago by Christopher Foo <[email protected]>
Add regexstream module.

ab1454d47f7baa54ddcb83057500290517a9f06b authored over 10 years ago by Christopher Foo <[email protected]>
builder: Fix missing ProxyAdapter. Fix unsafe option docstring

re: d5857eac21aa3e4fc0a4bae514afd68189c17a30#commitcomment-7711092

5df18d47a7deec4f352e209af3926612a61592da authored over 10 years ago by Christopher Foo <[email protected]>
processor.web: Refactor url filter and robots.txt concerns into processor.rule

8d4eac0fffe5a67f298d606595ab48eeab846884 authored over 10 years ago by Christopher Foo <[email protected]>
hook: Drop support for ver 1. Shove lua stuff into adapters in _luahook.py

548c186833cde3c2122968e1b7e96bbc9db4e948 authored over 10 years ago by Christopher Foo <[email protected]>
changelog: Add entry for warc-move fix.

Re: chfoo/wpull#135

caf966a30a41bb55849c12322943295b86630603 authored over 10 years ago by Christopher Foo <[email protected]>
Make --warc-move work

cd19a9b950ba4314d89ce0ea55bfac3499aea42e authored over 10 years ago by Ivan Kozik <[email protected]>
ftp.ls: Implement unix ls parsing.

078c558fb6bece98a47da5acb199eb39a8033e39 authored over 10 years ago by Christopher Foo <[email protected]>
ftp.ls.date: Implement internationalized date parsing for top 11 langs.

e4d250d338070156ae61ccfc2b6d4cbe8cf99f57 authored over 10 years ago by Christopher Foo <[email protected]>
Bump version to 0.1001a1

[ci skip]

da4653c850d265ae1ec3c32e37831dd8fb66f732 authored over 10 years ago by Christopher Foo <[email protected]>
Add --ignore-fatal-errors option.

d5857eac21aa3e4fc0a4bae514afd68189c17a30 authored over 10 years ago by Christopher Foo <[email protected]>
Don't handle StopIteration.

ede5358ef4d6a4226880e5c5ca4941ef6dd01ea8 authored over 10 years ago by Christopher Foo <[email protected]>
builder: Capture warnings if INFO or lower.

7e60223a4a0f4ce707b99a8d767f703a1fcfaeea authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'issue/41-proxy_support' into develop

Conflicts:
doc/changelog.rst
wpull/builder.py

63257d9326f46e9a673d47d3955f64674c50144d authored over 10 years ago by Christopher Foo <[email protected]>
Support proxy authentication.

Closes chfoo/wpull#41

893c95dc75e570d33f5ea8d208860e31bb0c141c authored over 10 years ago by Christopher Foo <[email protected]>
Basic HTTP proxy support.

RE: chfoo/wpull#41

9653333f81c668dd39d7c1ca5147d23e5fb2f826 authored over 10 years ago by Christopher Foo <[email protected]>
builder: Warn unsafe/silly options last to fix not being logged.

a3ae4ef5b00eabf4867bfd46adbe9f0c5caa6f12 authored over 10 years ago by Christopher Foo <[email protected]>
Update docs and setup.py for document and scraper packages.

a90d3e548b3de98e54f018c45682abafeedee1ee authored over 10 years ago by Christopher Foo <[email protected]>
Fix modules referencing document and scraper move.

7e944bfbac2f1e7e757ba02d30b13413e733ad17 authored over 10 years ago by Christopher Foo <[email protected]>
Move scraper module into package.

475dfe4062d15b9b8f793f46cb0bfab6c3aea159 authored over 10 years ago by Christopher Foo <[email protected]>
Move document module into package.

fba668ab67ddad10acce0a3e3f05f339fa8a3aff authored over 10 years ago by Christopher Foo <[email protected]>
WIP proxy support

re: chfoo/wpull#41

[ci skip]

c5a0d689af4dc0981ec3864fa4c23e016dbe9966 authored over 10 years ago by Christopher Foo <[email protected]>
connection.py: Assert port is int.

192ecc64bc0487519b64de0a22b046122c1ce0c9 authored over 10 years ago by Christopher Foo <[email protected]>
factory: Raise error if new() is called more than once per class

0b4896b162523427bfeed4cbc2683fc81b5036d2 authored over 10 years ago by Christopher Foo <[email protected]>
ftp.ls: WIP Stateful date time heuristics for 12/24 period and DMY/MDY.

fea819d5c296d91a7656d68ce6fbaf1f7b9c028e authored over 10 years ago by Christopher Foo <[email protected]>
document: Remove to_str() used for Python 2.

5e24508c61a012a53cf4931880b91875e0d3338f authored over 10 years ago by Christopher Foo <[email protected]>
dns: Use repr() instead of coerce_str_to_ascii()

92d67b43e4bd8ee6a8c927868d614f31debb9b61 authored over 10 years ago by Christopher Foo <[email protected]>
Add assert messages to some assert statements.

fa19c68c7514909becebfb1d1a51af994b94642c authored over 10 years ago by Christopher Foo <[email protected]>
fuzz_fusil: Catch socket.error

re: https://bitbucket.org/haypo/fusil/issue/3/serverclientclose-raises-errno-107

[ci skip]

9ee3da30e52332d6aaec58671de8969ea8d794fd authored over 10 years ago by Christopher Foo <[email protected]>
sys.setdlopenflags doesn't exist on PyPy, so catch AttributeError

0aede95917ac11d8ddbfa8efce008fa5a22837d6 authored over 10 years ago by Ivan Kozik <[email protected]>
Bump version 0.1000

152b186cb20cdf05c3d2c4a766dc6dccf79618de authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'develop'

9f1930fa7093104738cc0f3ad01ec8dcc43855fc authored over 10 years ago by Christopher Foo <[email protected]>
changelog: Put 0.1000 release date.

a09aefa133ba8073ad06efb93dc213624dbb7434 authored over 10 years ago by Christopher Foo <[email protected]>
requirements.txt: Pin trollius to 1.0.2 dev on bitbucket.

2a9cee72d00541ef276dd9ec2a16eb673b8082e5 authored over 10 years ago by Christopher Foo <[email protected]>
Move list of dependencies from readme to docs for conciseness.

Clarify supported platform and Python versions.

[ci skip]

b9bf650e8716410e7d81532af58b6a28abda02c8 authored over 10 years ago by Christopher Foo <[email protected]>
string: Normalize codec name before check for ASCII.

Closes chfoo/wpull#184

86d83511caeaad4cae7b006a5143594e1de03811 authored over 10 years ago by Christopher Foo <[email protected]>
connection: Comment out host pool count assert.

re: chfoo/wpull#182

56a6cdf4a6ee44e770064ae6ba27a267b459f539 authored over 10 years ago by Christopher Foo <[email protected]>
Add WIP LIST parser.

[ci skip]

120e64630be3e33d7a3e7e9c778a882378cf16b4 authored over 10 years ago by Christopher Foo <[email protected]>
fixup! WIP FTP MLSD support.

8b3cadcb981135ecf55714e33337e261707c038a authored over 10 years ago by Christopher Foo <[email protected]>
WIP FTP MLSD support.

f8a6a109b026a13b014d0ed79bcda79223e3c584 authored over 10 years ago by Christopher Foo <[email protected]>
fixup! http.stream: Bound max size of headers to 32 KB.

[ci skip]

c15b4a7b9ee019fe827023b338479e2044cf7463 authored over 10 years ago by Christopher Foo <[email protected]>
http.stream: Bound max size of headers to 32 KB.

7cb9bc2f01d9f5974f8947b046b214b29065cab6 authored over 10 years ago by Christopher Foo <[email protected]>
http.stream: Handle case where HTTP header is missing.

Closes chfoo/wpull#181

0230c646b51b88cff0c5098480330b2110bc3b39 authored over 10 years ago by Christopher Foo <[email protected]>
connection.ConnectionPool: Fix race condition in clean/check_in/check_out

Closes chfoo/wpull#179

a32ed55706089ef08ea07ece77252137aa13af99 authored over 10 years ago by Christopher Foo <[email protected]>