Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ArchiveTeam/ludios_wpull

wpull fork with fixes and faster parsing using html5-parser; used by grab-site; should go away when wpull is similarly improved
https://github.com/ArchiveTeam/ludios_wpull

scaper.javascript: Try JSON decode before is_unlikely_text

342acf5368d0b2fc484b4cf3e5e44fb390b9ab4a authored over 10 years ago
scraper.util.is_unlikely_link: Include \ and html tags.

07dff57fb02651b69f177302af65774576a56010 authored over 10 years ago
ftp.client: Support fetch both NLST and LIST.

845e80c75d312ccc849f1e78c7fc33061e489bb9 authored over 10 years ago
ftp.ls: Fix up parsing Unix ls where 1 space between user and size.

8d74cccaf167b4b67e59c99efec85ce90255dfb2 authored over 10 years ago
body: Support passthrough __iter__()

00cbc3dfb409a2e5cdb9f0b5e42461b7dd607399 authored over 10 years ago
Merge branch 'issue/23_ftp_2' into develop

c434a1c917a7954c44ae4b30e485ca76cf56e8f9 authored over 10 years ago
setup.py: Include ftp.ls

13095c3573b9d6152bd5d88078dc19575c3d85d8 authored over 10 years ago
doc: Update module listings.

b945f0e55d2b267f954d3c1430efff6471d48d77 authored over 10 years ago
http.client: Put proxy concerns into abstract.client.

a6c11789112d7b090ffad7bce60e1e129d8392c3 authored over 10 years ago
Merge branch 'temp1' into issue/23_ftp_2

Conflicts:
wpull/connection.py
wpull/http/client.py

565b1879f2c7f6d3ef7138870860c155d3da2720 authored over 10 years ago
Merge branch 'develop' into temp1

Conflicts:
setup.py

36931e07e5b93ea03b1f7a2e77ba872d65945063 authored over 10 years ago
scraper.html: Percent decode javascript: links

Closes chfoo/wpull#141

c319dbb829aa4e120726227f89ae69ec04e768c5 authored over 10 years ago
Move DB get/add/queued/dequeued hook concerns to URLTableHookWrapper.

queued/dequeued hooks get called for all add and check outs

Closes chfoo/wpull#190

50635712863567bfc791665e7a332c5a3ccb6bb6 authored over 10 years ago
database: Add wrap.URLTableHookWrapper.

69964590053398a7a3a8bd90e2e622596e994327 authored over 10 years ago
engine: Poison workers if producer dies to avoid deadlock.

5edfa6876978b64f19507e0f5b803e2f0c3952eb authored over 10 years ago
engine,hook,process.web: Use parse_url_or_log().

131a0781f38242569671c8a39eaefb86a0e00951 authored over 10 years ago
url: Add parse_url_or_log()

391e8c644df9c340a6582c699e436c9292c8f4ec authored over 10 years ago
changelog: Add entry about better database performance

[ci skip]

c2310d10d80c80bc64d32c651423569952943315 authored over 10 years ago
sqltable: Optimize performance (0.80s→0.03s Wow!)

f3eabbe4ce32df3395785741db469c5a4317bc67 authored over 10 years ago
fixup! fixup! sqltable: Fix insert().values() syntax cause CompileError

Fix "CompileError: The 'sqlite' dialect with current database version
settings does not support ...

e8026ee55fb65c0d98606b24c92b9b5dfadded0f authored over 10 years ago
fixup! sqltable: Fix insert().values() syntax cause CompileError

Fix "CompileError: The 'sqlite' dialect with current database version
settings does not support ...

6e87d7a27a5bf5b74b181aa88ab2ddab453b408c authored over 10 years ago
sqltable: Fix insert().values() syntax cause CompileError

Fix "CompileError: The 'sqlite' dialect with current database version
settings does not support ...

962cdeac2b1f85db6987e0aaabff19cf96807bcb authored over 10 years ago
setup.py: Update packages to incldue wpull.database

104a43059698ccff8ba162aea5db8020faf0fa7d authored over 10 years ago
doc: Update API docs with new modules.

66c5592092949e8cb7f5749b5fa874f836c810c6 authored over 10 years ago
Fix database modules import references and updated function names.

4c858a3514997f417bb4860c0530cfbf9a9952ae authored over 10 years ago
database: Move into package. Rename add()->add_many(). Rm url_encoding col.

Move database module into package. Seperate base tables from SQL
implementations.

Rename method...

ac53a176f4bf2c09904805cea0d47d0da84c2e38 authored over 10 years ago
doc: Rename template.py->api_template.py. Walk for module names.

08e54f607876b08b53d7a7bc8bc6d8683d7b7d27 authored over 10 years ago
Add --link-extractors option.

f6ff5418c692c372ee52bcd2fee7cef587d5457d authored over 10 years ago
scraper: Make ElementWalker an instance

5912dc2b8c2b6a1cb30fd0c98134adfbebab4824 authored over 10 years ago
changelog: Update about new URL parsing.

Closes chfoo/wpull#146 Closes chfoo/wpull#147

[ci skip]

ef556420725591556f94e98edbb1558300631b19 authored over 10 years ago
url_test.py: Add benchmark test

[ci skip]

9282765825077d6a3ac9b88cd6850093bed15826 authored over 10 years ago
Increase unit test timeout for pypy on Travis CI.

33fbebd611fce27215728ab1982feff83d8b0794 authored over 10 years ago
url: More unit test and authority and resource delimiter fix edge cases.

5662a1a823c553edfd6c29be03d56b903df2a545 authored over 10 years ago
processor.web: Update debug log found->candidate

34063bec6fb174c52244d4dc806f85feccbd7121 authored over 10 years ago
Merge branch 'topic/url_parse_2' into develop

4679cc3575dfba62c1f47784460b581db4f933c8 authored over 10 years ago
engine: Catch ValueError during URL parse.

Bad URLs may sneak into the database.

Closes chfoo/wpull#132

d0be248f115fcd9dd20537b6f7bcbfa2e67f9c84 authored over 10 years ago
url: Reinstate normalize() convenience function.

eac0ee179af98f7ee8a085dd69f786f35bb51fb6 authored over 10 years ago
url: Add some lru_cache() decorators. Optimize idna check.

965703e9deb823535ba05f6d99483318a79f74de authored over 10 years ago
url: Write own parse and not use urllib.parse.urlsplit. Don't unquote.

Refactor attempt number 2.

re: chfoo/wpull#146 chfoo/wpull#147

385f0219e6ddb7495c4aa02ebce27a19a23aabd4 authored over 10 years ago
Move is_likely_link() to scraper.util

eec38f615704b464ff036a1106c2f63267f853fa authored over 10 years ago
app_test.py: Allow for PyPy JIT warm up time.

678ffe462fcbe07ecfc034d4ca5960012cede803 authored over 10 years ago
fixup! travis: Don't install and test lua or lxml packages under pypy3

8f745a100e900445f3ae7b4b3f18181d4ee67ca3 authored over 10 years ago
travis: Don't install and test lua or lxml packages under pypy3

5dcbc5d96b7366aa3d9a340004b46819fbbca877 authored over 10 years ago
Support PyPy 2.3.1 with Python 3.2 implementation.

59db8f9dda8337dc2af37c0083598749828f7a49 authored over 10 years ago
Add --html-parser option.

Closes chfoo/wpull#140

31f48e29fda86a79074c14ef5e4d9b1195cf2cda authored over 10 years ago
setup.py/requirements.txt: Require html5lib>=0.999

8e6093663ed9d83023823bc2b31ab0669be29493 authored over 10 years ago
converter/htmlparse: Fix unhashable dicts with FrozenDict

01459adb4a6088a05be1280b977b3ee32635361f authored over 10 years ago
collections: Add FrozenDict

77b04094051ebbfdf3f94b01b0812112f6e66cd7 authored over 10 years ago
document,scraper,converter: Add HTML5LibHTMLParser.

4049e5bb7f6e6a65c8810f78912dca4dd3cffabc authored over 10 years ago
Update docs for document & scraper abstraction refactor changes.

84fa7f3f90aacdc20dbd9d5dc59bb8fc8c815278 authored over 10 years ago
Fix up builder and convert to match document/scraper changes.

ab74368462526f2c65a1017cddcd5ae8d3871072 authored over 10 years ago
scraper: Follow document reader abstractions.

7699655fc307a6868fd86f55784cefe06d3096a1 authored over 10 years ago
document: More abstractions. Put lxml concerns into document.htmlparse package

24b3c1514b15645ece9b3764a4f75482fbf90762 authored over 10 years ago
Add regexstream module.

ab1454d47f7baa54ddcb83057500290517a9f06b authored over 10 years ago
builder: Fix missing ProxyAdapter. Fix unsafe option docstring

re: d5857eac21aa3e4fc0a4bae514afd68189c17a30#commitcomment-7711092

5df18d47a7deec4f352e209af3926612a61592da authored over 10 years ago
processor.web: Refactor url filter and robots.txt concerns into processor.rule

8d4eac0fffe5a67f298d606595ab48eeab846884 authored over 10 years ago
hook: Drop support for ver 1. Shove lua stuff into adapters in _luahook.py

548c186833cde3c2122968e1b7e96bbc9db4e948 authored over 10 years ago
changelog: Add entry for warc-move fix.

Re: chfoo/wpull#135

caf966a30a41bb55849c12322943295b86630603 authored over 10 years ago
Make --warc-move work

cd19a9b950ba4314d89ce0ea55bfac3499aea42e authored over 10 years ago
ftp.ls: Implement unix ls parsing.

078c558fb6bece98a47da5acb199eb39a8033e39 authored over 10 years ago
ftp.ls.date: Implement internationalized date parsing for top 11 langs.

e4d250d338070156ae61ccfc2b6d4cbe8cf99f57 authored over 10 years ago
Bump version to 0.1001a1

[ci skip]

da4653c850d265ae1ec3c32e37831dd8fb66f732 authored over 10 years ago
Add --ignore-fatal-errors option.

d5857eac21aa3e4fc0a4bae514afd68189c17a30 authored over 10 years ago
Don't handle StopIteration.

ede5358ef4d6a4226880e5c5ca4941ef6dd01ea8 authored over 10 years ago
builder: Capture warnings if INFO or lower.

7e60223a4a0f4ce707b99a8d767f703a1fcfaeea authored over 10 years ago
Merge branch 'issue/41-proxy_support' into develop

Conflicts:
doc/changelog.rst
wpull/builder.py

63257d9326f46e9a673d47d3955f64674c50144d authored over 10 years ago
Support proxy authentication.

Closes chfoo/wpull#41

893c95dc75e570d33f5ea8d208860e31bb0c141c authored over 10 years ago
Basic HTTP proxy support.

RE: chfoo/wpull#41

9653333f81c668dd39d7c1ca5147d23e5fb2f826 authored over 10 years ago
builder: Warn unsafe/silly options last to fix not being logged.

a3ae4ef5b00eabf4867bfd46adbe9f0c5caa6f12 authored over 10 years ago
Update docs and setup.py for document and scraper packages.

a90d3e548b3de98e54f018c45682abafeedee1ee authored over 10 years ago
Fix modules referencing document and scraper move.

7e944bfbac2f1e7e757ba02d30b13413e733ad17 authored over 10 years ago
Move scraper module into package.

475dfe4062d15b9b8f793f46cb0bfab6c3aea159 authored over 10 years ago
Move document module into package.

fba668ab67ddad10acce0a3e3f05f339fa8a3aff authored over 10 years ago
WIP proxy support

re: chfoo/wpull#41

[ci skip]

c5a0d689af4dc0981ec3864fa4c23e016dbe9966 authored over 10 years ago
connection.py: Assert port is int.

192ecc64bc0487519b64de0a22b046122c1ce0c9 authored over 10 years ago
factory: Raise error if new() is called more than once per class

0b4896b162523427bfeed4cbc2683fc81b5036d2 authored over 10 years ago
ftp.ls: WIP Stateful date time heuristics for 12/24 period and DMY/MDY.

fea819d5c296d91a7656d68ce6fbaf1f7b9c028e authored over 10 years ago
document: Remove to_str() used for Python 2.

5e24508c61a012a53cf4931880b91875e0d3338f authored over 10 years ago
dns: Use repr() instead of coerce_str_to_ascii()

92d67b43e4bd8ee6a8c927868d614f31debb9b61 authored over 10 years ago
Add assert messages to some assert statements.

fa19c68c7514909becebfb1d1a51af994b94642c authored over 10 years ago
fuzz_fusil: Catch socket.error

re: https://bitbucket.org/haypo/fusil/issue/3/serverclientclose-raises-errno-107

[ci skip]

9ee3da30e52332d6aaec58671de8969ea8d794fd authored over 10 years ago
sys.setdlopenflags doesn't exist on PyPy, so catch AttributeError

0aede95917ac11d8ddbfa8efce008fa5a22837d6 authored over 10 years ago
Bump version 0.1000

152b186cb20cdf05c3d2c4a766dc6dccf79618de authored over 10 years ago
Merge branch 'develop'

9f1930fa7093104738cc0f3ad01ec8dcc43855fc authored over 10 years ago
changelog: Put 0.1000 release date.

a09aefa133ba8073ad06efb93dc213624dbb7434 authored over 10 years ago
requirements.txt: Pin trollius to 1.0.2 dev on bitbucket.

2a9cee72d00541ef276dd9ec2a16eb673b8082e5 authored over 10 years ago
Move list of dependencies from readme to docs for conciseness.

Clarify supported platform and Python versions.

[ci skip]

b9bf650e8716410e7d81532af58b6a28abda02c8 authored over 10 years ago
string: Normalize codec name before check for ASCII.

Closes chfoo/wpull#184

86d83511caeaad4cae7b006a5143594e1de03811 authored over 10 years ago
connection: Comment out host pool count assert.

re: chfoo/wpull#182

56a6cdf4a6ee44e770064ae6ba27a267b459f539 authored over 10 years ago
Add WIP LIST parser.

[ci skip]

120e64630be3e33d7a3e7e9c778a882378cf16b4 authored over 10 years ago
fixup! WIP FTP MLSD support.

8b3cadcb981135ecf55714e33337e261707c038a authored over 10 years ago
WIP FTP MLSD support.

f8a6a109b026a13b014d0ed79bcda79223e3c584 authored over 10 years ago
fixup! http.stream: Bound max size of headers to 32 KB.

[ci skip]

c15b4a7b9ee019fe827023b338479e2044cf7463 authored over 10 years ago
http.stream: Bound max size of headers to 32 KB.

7cb9bc2f01d9f5974f8947b046b214b29065cab6 authored over 10 years ago
http.stream: Handle case where HTTP header is missing.

Closes chfoo/wpull#181

0230c646b51b88cff0c5098480330b2110bc3b39 authored over 10 years ago
connection.ConnectionPool: Fix race condition in clean/check_in/check_out

Closes chfoo/wpull#179

a32ed55706089ef08ea07ece77252137aa13af99 authored over 10 years ago
testing.ftp: Add get_url(). Fix unsafe-thread writer.close()

99d0d076963198ded919c7bfb00ab5cc89d25344 authored over 10 years ago
WIP Add FTP client

ef1e0bf015541d8943cdc8df6da19a2bcf730325 authored over 10 years ago
fixup! Abstract client and client session.

b291bdb14ec1c5f34430d756de024be28e2d1ce5 authored over 10 years ago
connecion.ConnectionPool.session: Update docstring.

[ci skip]

0bd2577ab43a1c9d4898d35182a2f4fc4d54bb50 authored over 10 years ago