Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ArchiveTeam/ludios_wpull

wpull fork with fixes and faster parsing using html5-parser; used by grab-site; should go away when wpull is similarly improved
https://github.com/ArchiveTeam/ludios_wpull

util.py: Adds parse_iso8601_str().

7dd4b9576a3cc46221838ca5049c9fdaa5cdfa9e authored almost 11 years ago by Christopher Foo <[email protected]>
Moves recorder.WARCRecord to warc module. Adds some docstrings.

bc1cadf0f54a01165d16ebb97f0209c5877c1089 authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes description of --delete-after to be more accurate.

b09aab59b6d7af490cd49423df9a63837ace91d4 authored almost 11 years ago by Christopher Foo <[email protected]>
Implements --no-warc-digests.

1f35ecc70497b94ce0f0aade644fa42c998bb1bb authored almost 11 years ago by Christopher Foo <[email protected]>
fuzz_fusil/runner.py: Supports writing gzip encoding.

20865ed0f6cf94756ca18c8d14b6f05b00bf6c7f authored almost 11 years ago by Christopher Foo <[email protected]>
Tweaks Fusil fuzz tester to be more aggressive, random content-types.

7390f8626560d4bf7a5090cdca6e2f17c2d27386 authored almost 11 years ago by Christopher Foo <[email protected]>
doc/api.rst: Notes about usage of Tornado coroutines.

7088321acb810d8f114d2d7dc8f662044545e429 authored almost 11 years ago by Christopher Foo <[email protected]>
Merge branch 'topic/engine_refactor' into develop

30dc1b834c296077e0da916df581458e3188fa37 authored almost 11 years ago by Christopher Foo <[email protected]>
http.Connection: Fixes url_info not set on response.

e32fc0e74655eb4aabdfc166f8d76e277e0b12db authored almost 11 years ago by Christopher Foo <[email protected]>
processor.py: Fixes items not processed if single redirect request.

3f8d55a02e946375490f716c890859840d4ef648 authored almost 11 years ago by Christopher Foo <[email protected]>
web.RichClientSession: Joins base URL for next_location. Fixes unit tests.

5646caf332a4a48a1ec6a0f86e243e28debacc42 authored almost 11 years ago by Christopher Foo <[email protected]>
http_test.py: Handles possible closed connection in test_buffer_overflow.

ce971722a407e053901563fb06ab437db95da0bd authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Adds entry about API Engine and Processor refactor.

11bb618da3c5f97ca9b9c44ea0d2e095a5872f7c authored almost 11 years ago by Christopher Foo <[email protected]>
conversation.py: Adds some docstring.

732560a32d452c5e6580c5d6068e113fdff0cc34 authored almost 11 years ago by Christopher Foo <[email protected]>
Refactors Engine and Processor interaction to be asynchronous.

Moves request client concern out of Engine and into Engine.
Engine handles stats start and stop....

5a3f80a3cea15e7b95bcd7ebb53c2196c0b64cd6 authored almost 11 years ago by Christopher Foo <[email protected]>
Adds docs for web module.

07996c7f916aaef0177ca7c8e2e51598ef9612ec authored almost 11 years ago by Christopher Foo <[email protected]>
Moves http.RedirectTracker, http.RichClient to web module.

Removes RobotsTxtRichClientSessionMixin.

91c8f09bd349ba8a51c67273f4d67b2e19fc7ef8 authored almost 11 years ago by Christopher Foo <[email protected]>
engine.py: Fixes typo in docstring.

b94458e0afb5c74286b15ce2f7c2f0b9a12834ec authored almost 11 years ago by Christopher Foo <[email protected]>
http.py: Adds RobotsTxtRichClientSession.

32a7d069dc8118cd67de968cbb48240964b3ce33 authored almost 11 years ago by Christopher Foo <[email protected]>
robotstxt.py: Adds RobotsTxtRichClientSessionMixin.

91aa8ebe8b36fda9cb6a16f19d41edd10fa56eb4 authored almost 11 years ago by Christopher Foo <[email protected]>
http.py: Adds RichClient.

ebb58e75faf2e0efa61347466c662a4bf54ad8ea authored almost 11 years ago by Christopher Foo <[email protected]>
Adds wrapper module documentation.

272c316c5ee920ed29b78ea702a8c352d58c60e8 authored almost 11 years ago by Christopher Foo <[email protected]>
hook.py: Document reasons argument for accept_url().

699ae121bff24a6b3cb2514b25613f9f0dcf3186 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.17.3.

d4770251c4d0051c169734a81fb25d6fefff3a76 authored almost 11 years ago by Christopher Foo <[email protected]>
Merge branch 'develop'

f8640b150fd4995f77a004dd6e0934f0fa47c6bb authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Updates latest to 0.17.3.

73190aec5c92df25f3d61d1f3339f2f4e9adef29 authored almost 11 years ago by Christopher Foo <[email protected]>
processor.py: Fixes AttributeError on retry_dns_error.

b1ef8081e95659876e316ad8f48ec0d17ae91da2 authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Adds entry about fixes ca-bundle missing.

c9dda7cb428047db599afa228e4409c0cbcbe3da authored almost 11 years ago by Christopher Foo <[email protected]>
setup.py: Fixes ca-bundle not included on install.

6bd06cc4a516158c823c57d6cadce7181755710e authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.17.2.

4677d9f4084acf19cfb68d34381ad3bad23a7834 authored almost 11 years ago by Christopher Foo <[email protected]>
Merge branch 'develop'

e29d9a1d104fd869499148613deebfe86de9a1fe authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Adds entry and updates latest to 0.17.2.

8883294190fd406d05192b08607802c39b278f68 authored almost 11 years ago by Christopher Foo <[email protected]>
http.py: Sets the close callback again if already connected (chfoo/wpull#27).

IOStream is wrapping the context. Maybe the context is leaked and that's
the problem?

3ec7d46449dc688bb886dc8a6dae66acccafc5d3 authored almost 11 years ago by Christopher Foo <[email protected]>
http: Adds unit tests to test TCP reset.

Does not reproduce issue at chfoo/wpull#27 :rage:.

0819076b2d6062f932f357793bcfd04de500547b authored almost 11 years ago by Christopher Foo <[email protected]>
http.py: Implements connection pool cleaning.

5c804715a3f0cdd0db403faf2a4a50c9e196ccea authored almost 11 years ago by Christopher Foo <[email protected]>
http.py: Refactors Connection to process a queue for streaming calls.

d3a5bf977725076e55940c04c326edbfbc2b537e authored almost 11 years ago by Christopher Foo <[email protected]>
extended.py: Adds IOStream methods returning a queue for streaming.

4a820c2312549cb2d96f4d4701f143141efb2d5c authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.17.1.

Merge branch 'develop'

Conflicts:
wpull/version.py

f9b4794be6369cbe0ea9fcb7e3a87cc990539d13 authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Updates latest to 0.17.1.

d527c65ba5c14fc8a7e766847cdd81f18cf258c1 authored almost 11 years ago by Christopher Foo <[email protected]>
http. Refactors ConnectionPool/HostConnectionPool interaction.

Moves semaphore release outside finally clause to maybe fix worker hang
problem (chfoo/wpull#27)...

3e9e95d613665eb2212a53ec38e127555901bffa authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.17.

51d56f19b5be0677152da251e0a243fe54850233 authored almost 11 years ago by Christopher Foo <[email protected]>
Merge commit '334da'

3089871682c0f92455be86531a25ed7c2123c205 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.18a1.

def15367786168efca8660507bbf0a908475da91 authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Updates latest to 0.17.

334da341eef74e96a4a31fbf33fe5307319cca7a authored almost 11 years ago by Christopher Foo <[email protected]>
http: Removes connection ready queue (chfoo/wpull#27).

Can't tell if connection ready queue or the semaphore is buggy. Using a
for loop isn't that expe...

3cc362563ae69518e6ef373765ddcc6f3f158a6b authored almost 11 years ago by Christopher Foo <[email protected]>
options.py: Turns off robots if not recursive (closes chfoo/wpull#28).

7f7cbb98ca64b6a5f72291d90bba7f5deba16646 authored almost 11 years ago by Christopher Foo <[email protected]>
backport.urlparse.py: Fixes syntax error warning.

4683d296544f0abe69aab2e534de7c27ccd929af authored almost 11 years ago by Christopher Foo <[email protected]>
wrapper.py: Fixes Python 2 compatibility.

9b5081d79256d054b90852b61842ae91a3eddf03 authored almost 11 years ago by Christopher Foo <[email protected]>
changelog.rst: Adds cookies. Updates terse_options.rst.

0bae03d642f4c0b5fa1fc789c1b2addc4d992cf6 authored almost 11 years ago by Christopher Foo <[email protected]>
Implements cookie support (closes chfoo/wpull#6).

d78bd4ec7771e2e1ece5535d957acaf76ce00595 authored almost 11 years ago by Christopher Foo <[email protected]>
Adds wrapper.py.

9f439d8987a8fde971893edc7fba788a0a574af6 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.16.1.

Merge branch 'develop'

Conflicts:
wpull/version.py

42464a6f95b7aee233047ada96e8479e95dcb68c authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes unit test factory_test.py using wrong unittest import.

5614c263ee4e537e03498f43dba921dcc8d83f47 authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Updates latest to 0.16.1

105cacae7a1c501ea9ec75b55cda399bb6ba1716 authored almost 11 years ago by Christopher Foo <[email protected]>
database.py: Fixes leaking of imported objects into documentation.

71898e967c5e44294be447d63462d90902e44454 authored almost 11 years ago by Christopher Foo <[email protected]>
app.py: Makes Builder use Factory.

f113d5859086b497b6a26d0a3e9fec4b1e312652 authored almost 11 years ago by Christopher Foo <[email protected]>
Adds factory.py.

d4c3760b4ab52bf1041cbe400230427c14a84f98 authored almost 11 years ago by Christopher Foo <[email protected]>
procssor: Refactors WebProcessorSession to not need many arguments.

cbf832850ccee175d0a8c5170e5c8f1fcd6cba6f authored almost 11 years ago by Christopher Foo <[email protected]>
Refactors WebProcessor to use DemuxDocumentScraper.

e85bbc36a446b8f51d8ca43cf6a399f3cf857eb0 authored almost 11 years ago by Christopher Foo <[email protected]>
scraper.py: Adds DemuxDocumentScraper.

0231d1d19c9d6499ff9aa0c0d99cb96f4f9d6ff2 authored almost 11 years ago by Christopher Foo <[email protected]>
Refactors WebProcessor to use DemuxURLFilter.

621ac7a27ea1cd539312cf180031aefdcbb2a676 authored almost 11 years ago by Christopher Foo <[email protected]>
url.py: Adds DemuxURLFilter.

82fa8ba9500f78fa819a9f3ba03796fca0ee5280 authored almost 11 years ago by Christopher Foo <[email protected]>
docs: Fixup features, adds info about stopping. Adds --help output.

cf5ef625ff95becac2449e1485669da8ec290412 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.17a1.

4afbca360c218f786b39686112dd96f32a590ab1 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.16.

8a97ca44062ceaeccef95cc171c572abaf35638b authored almost 11 years ago by Christopher Foo <[email protected]>
Merge branch 'develop'

9e0111a89295e7ca5532830835245d637bad8f27 authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Updates latest to 0.16.

52c85f8f9eb987cf9bc3d9ef63bdd9cd0db93882 authored almost 11 years ago by Christopher Foo <[email protected]>
database.py: Optimizes one query away when adding URLs.

Gets the URL string ID before and uses that for comparison.

28f00eeac255714f22dc796d7d0b95fe7eab04e0 authored almost 11 years ago by Christopher Foo <[email protected]>
Implements all the SSL options (closes chfoo/wpull#5).

1e2a72033739a41da7b63766c606b5066675f9dc authored almost 11 years ago by Christopher Foo <[email protected]>
app.py: Prints warning about unsafe options.

f9b7b38573d8b909e579f4885982e5618850af50 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.15.2.

d89b6402d5885c13e12dec6de95f7bd71a23a2c6 authored almost 11 years ago by Christopher Foo <[email protected]>
Merge branch 'develop'

94f35c802ce067881f29b31e0bc0e061f6b4d66d authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Updates latest to 0.15.2.

a2731ff46aa29778d6f09cef35a80278b460a58d authored almost 11 years ago by Christopher Foo <[email protected]>
database.py: Uses SQLAlchemy core for bulk insert. Adds caching for IDs.

65d39339b905888c4efd6c7fd9e160feb794d5d5 authored almost 11 years ago by Christopher Foo <[email protected]>
cache.py: Fixes __iter__().

247953db44d2e2f708e701a61099a03e5c2d1607 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.15.1.

Merge branch 'develop'

Conflicts:
wpull/version.py

5bdd4ebcc3da23a350ee4d9c03231a11de82d7ac authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Updates latest to 0.15.1.

a61250aba96e6d9de99ac7f4472b624c5da9e5c7 authored almost 11 years ago by Christopher Foo <[email protected]>
database.py: Uses first() instead of scalar().

22d11c09e257980ab9cb6fdb1a05aeed8796e7f3 authored almost 11 years ago by Christopher Foo <[email protected]>
hook.py: Fixes the docstring formatting. Adds more docstrings.

1b3c03aa7afbb27417d41e9ce9cb4b0fc5d49360 authored almost 11 years ago by Christopher Foo <[email protected]>
util.format_size: Adds docstring.

964c860a7e8f3fd4e6b1f2b313e4f2ad4d894580 authored almost 11 years ago by Christopher Foo <[email protected]>
Checks for "inline" in scripting hook unit tests.

2681603cd740308e62c6b981eb2c5b4b3b768312 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version 0.16a1.

9604910ed40636438a776da8587a2367e6d25268 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.15.

bf1239ccf9be27cf10b4ac049a9ae9eeb9c77004 authored almost 11 years ago by Christopher Foo <[email protected]>
Merge branch 'develop'

be89a4006b8b5b2ca24e205f0d3435d3a4c34af2 authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Adds note about db change. Updates latest to 0.15.

e2c964da10e3cfe36a8e93647883ea09c9129c46 authored almost 11 years ago by Christopher Foo <[email protected]>
Merge branch 'issue/21-db_refactor' into develop

8cfb5690c6841b9a5c4149840290c9c6b42ceb3b authored almost 11 years ago by Christopher Foo <[email protected]>
processor,robotstxt.py: Ensures RobotsTxtPool is passed to sessions.

eecb1d17badc6b656bf8b0a7db76478260f6666a authored almost 11 years ago by Christopher Foo <[email protected]>
database.py: Abstracts the SQLAlchemy logic.

0597778f40dfe349b0c8d43608041c7ea2dfb659 authored almost 11 years ago by Christopher Foo <[email protected]>
database.py: Forces using SingletonThreadPool for SQLite.

556f2325f22e54271e1e491b7798039224c6b43a authored almost 11 years ago by Christopher Foo <[email protected]>
testing/goodapp.py: Adds infinite handler for manual testing.

4610cd5a2b068d952e0bed1242b760c87300fe08 authored almost 11 years ago by Christopher Foo <[email protected]>
database.py: Fixes top_url not set correctly.

23774ec5b0fef2cdaf4cfd661cb86db898fbc4ed authored almost 11 years ago by Christopher Foo <[email protected]>
database.py: Refactors URL insertion.

b176f113a8c9033fc6abd87c8058f11337d37f62 authored almost 11 years ago by Christopher Foo <[email protected]>
database.py: Normalizes the URL strings (chfoo/wpull#21).

d8449813b1231997b928e53215d87660207f43ff authored almost 11 years ago by Christopher Foo <[email protected]>
hook.py: Supports "replace" as part of get_urls().

7879a842a506b0fae66f50e747398eb72463bf35 authored almost 11 years ago by Christopher Foo <[email protected]>
database.URLTable adds remove(). engine.URLItem: Exposes URLTable.

fe7e0430c7e39a81607c3e5fb7d01354f69652b4 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.14.1.

Merge branch 'develop'

Conflicts:
wpull/version.py

d894e75d2c892a254e2da9ba3df9fae6852598d4 authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Updates latest to 0.14.1

098453a0064b5c17e066e6108f73c8b59bad6bdf authored almost 11 years ago by Christopher Foo <[email protected]>
recorder.py: Uses field name case overrides to match hanzo warc-tools.

7896591d9a7f093f59d45c7cf234d2b50a8d5a44 authored almost 11 years ago by Christopher Foo <[email protected]>
recorder.py: Forces root logger level to be debug here too.

c5fb30664b64e936d92cc26e10b3323665679040 authored almost 11 years ago by Christopher Foo <[email protected]>
namevalue.py: Supports case normalization overrides.

e77fa819aabe0c88bc68e2c2a35d09b3d206bd4e authored almost 11 years ago by Christopher Foo <[email protected]>