Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ArchiveTeam/ludios_wpull

wpull fork with fixes and faster parsing using html5-parser; used by grab-site; should go away when wpull is similarly improved
https://github.com/ArchiveTeam/ludios_wpull

Implements --quota. Closes chfoo/wpull#3.

98e063dbf910a3efbd9d6e79e484f60181e5cd05 authored almost 11 years ago
processor,hook: Renames _should_fetch. Checks filters before inserting.

Re: chfoo/wpull#70

803065fc048e4efb75fe763d2c7bb874de911cb7 authored almost 11 years ago
engine.py: Adds URLItem.child_url_record()

0fea8ce75f4316d0e3a9da2d427067da010ea6bb authored almost 11 years ago
testing/badapp.py: Allows path with queries.

d3d84716c7427192f0a66b2dfd9297a6d082f074 authored almost 11 years ago
writer.py: Refactors safe_filename and fixes Python 2 support.

e3e791693576d66e391660b77d53da4dd6fb6ac9 authored almost 11 years ago
changelog: Fixes typo in save_document.

d6e2cbc7b1390d9ebc6755e1237dcd0a4cddb574 authored almost 11 years ago
engine.set_status(): Use filename keyword instead of kwargs.

9500a65bd56850f1ee65db3ce2973a4fb9de007d authored almost 11 years ago
Saves filename to database to allow converter to work consistently.

Closes chfoo/wpull#45.
converter.py: Uses filename field instead of PathNamer.
database.py: Sche...

c589df3c076be488b91ca13babccd226f4c1fb47 authored almost 11 years ago
Implements --restrict-file-names. Closes chfoo/wpull#62.

b97cec0346bb7aaad1c297de40edb1f3b1214daf authored almost 11 years ago
conversation.py: Removes deprecated function.

d9033f38c65de1174dcd836fabeea2235c5b6d32 authored almost 11 years ago
changelog: Rewords robots.txt filter bypass.

ef955aa9dcbb3432597323e7183255331edd8a58 authored almost 11 years ago
changelog: Rewords robots.txt filter bypass.

4ef8841a939bc9e3e64be90f1c333a5ba2c2fc1d authored almost 11 years ago
Bumps version to 0.24.

54fe096feccab9d42940ad6bd9c465c3c58dd0f5 authored almost 11 years ago
Merge commit '658b4c1e2418e814feb33d96489dcb239de0bb9a'

98b4bdda6f451803a6ecb3aebe01060ef6818b05 authored almost 11 years ago
Bumps version to 0.25a1.

0a692310ea47151a4fef7d7edfc6b5649eaef6c3 authored almost 11 years ago
changelog: Updates latest to 0.24.

658b4c1e2418e814feb33d96489dcb239de0bb9a authored almost 11 years ago
url.py: Adds relaxed safe chars for URL path normalization.

Closes chfoo/wpull#68.

f3c67aab96b12c75e7bfe3fa5345649857116911 authored almost 11 years ago
writer.py: Fixes files saved into wrong path (extra directory).

Closes chfoo/wpull#60.

846f7efc931fd356c6a99e64847bf2cc087035fa authored almost 11 years ago
document.py: Fixes wrong arguments passed.

Adds another shift_jis sample and unit test.

75185a9ab461267232df650655745c66221ab4d2 authored almost 11 years ago
changelog.rst: Fixes previous merge on 2013 typos

68b925c625cc3c6ab78c7a34eecc86bb11644a40 authored almost 11 years ago
Merge branch 'develop' of github.com:chfoo/wpull into develop

Conflicts:
doc/changelog.rst

8da9966628ea755c8c07858b1c30e97004e8aac9 authored almost 11 years ago
Updates terse_options.rst.

f9ca1b1a65496e2b00aeab7b02e87ca0c7c2f88b authored almost 11 years ago
Implicitly span hosts on redirects.

Adds --no-strong-redirects. Adds reason to accept_url()'s info dict.
WebProcessorSession._should...

b7b1e8109449c47ec4bdc14a21ef3c6afd1f99b3 authored almost 11 years ago
Bumps version to 0.23.1

Merge branch 'develop'

Conflicts:
wpull/version.py

6daf4e803e949103fa2071d6aff355466975b0bd authored almost 11 years ago
changelog: Updates latest to 0.23.1.

17583ef209d62b849afae6ed4597a1d09fb7037f authored almost 11 years ago
database.py: Adds missing unique constraint.

7689db993fec4b608844946d16ccf265db92ca6c authored almost 11 years ago
Fetches robots.txt unconditionally. Adds --no-strong-robots option.

Closes chfoo/wpull#58.

8b540a38b63b514a86afb0c9bd23b001fea8aeb1 authored almost 11 years ago
Bumps version to 0.23.

a782f962b4dfd9d307953b3497d57e7156050ff3 authored almost 11 years ago
Merge commit 'f7b11a97c76f31559f17ea7d3ce142b23cf01c6d'

7d710b0d3c394b1cc608379c027e420a3184bace authored almost 11 years ago
Bumps version to 0.24a1.

e23f51d260aef213a2e4b5d598e0ae8fa4adb96a authored almost 11 years ago
changelog: Updates latest to 0.23.

f7b11a97c76f31559f17ea7d3ce142b23cf01c6d authored almost 11 years ago
scraper.py: Adds clean_link_soup() to properly strip links.

Closes chfoo/wpull#64.

f3c482309394008746060e756f5039818db37574 authored almost 11 years ago
recorder.py: Prints and flush status on response end to show 100% bar.

f19051c046d27382e349188d07340eec2f674b26 authored almost 11 years ago
url.py: Resolves dot segments in URLs. Adds flatten_path().

Closes chfoo/wpull#63.

edbfe133bfb87fddb0811d8c35714e6965451df1 authored almost 11 years ago
__main__.py: Calls stop from signal handlers within ioloop safely.

Closes chfoo/wpull#65.

9f04f96d7d3a7a2dc80f71166e13ec00e947b503 authored almost 11 years ago
changelog: Adds points about broken robots.txt and port number in hosts.

836474b15def85260e615afdecff2593541c7402 authored almost 11 years ago
web.py: Fixes clearing robots redirect url before it can be used.

Re: chfoo/wpull#58

When next_request was called multiple times, the redirect URL was set to
Non...

0595831e3122783c3c173408c31406eebb3c613a authored almost 11 years ago
request.py: Fixes port not set on Host header.

Closes chfoo/wpull#56

da82600d7aaa3e2007f999fb75f0b980fe643758 authored almost 11 years ago
url.py: Implements URLInfo.hostname_with_port()

5dc040676f732bbb98040401a296192ea51401de authored almost 11 years ago
changelog: Adds new entry about encoding improvements.

Closes chfoo/wpull#59.

6c13f95565bd8459afc2123d334eebc6916b8d71 authored almost 11 years ago
Updates requirements to include beautifulsoup4

4a14e3537c6f5ca3e3042e1612c23aca1964a8ef authored almost 11 years ago
document.py: Detects encoding ourselves before passing to lxml parser.

c936b7964eec192a6f35ae56dbca0826774916e6 authored almost 11 years ago
util.py: Uses beautifulsoup for encoding detection. Supports encoding alias.

800952cd8b15e12cbb26cfbc9f3f1173cd55e1ad authored almost 11 years ago
conversation.py: Fix docstring typo.

68803bd63f1d1626af96fbc35242af59541caffd authored almost 11 years ago
changelog: Fixes typo on date.

[ci skip]

9b3f0a39a8c6db31c997a967597c012aa600ff2a authored almost 11 years ago
Bumps version to 0.22.5.

5001acacf8b4dcee8976909c3a6b6df600464f82 authored almost 11 years ago
Merge branch 'develop'

3a92f2804183ad5b67ad71b3d6169e9520718253 authored almost 11 years ago
changelog: Updates latest to 0.22.5.

364aec69f3464432574bc4f8796aa4e6d31394f5 authored almost 11 years ago
changelog: Adds point about buffer size work around.

3aedd6eacd7a674f46c873edab3bd97be262b724 authored almost 11 years ago
Merge branch 'issue/53-fire-hose-ingress' into develop

1c8d93372373f0e8eaf8a3b59fd93878628f3772 authored almost 11 years ago
http_test.py: Removes skip on test_big.

2e978e18a703b5c874ebaf1e454fa056b6896c59 authored almost 11 years ago
extended.py: read_from_fd returns None if buffer is nearing full.

chfoo/wpull#53

7b4240d9e4e02259f410498551333eec2bdd88f3 authored almost 11 years ago
recorder.py: Truncates the WARC file to safe offset if record fails to write.

Closes chfoo/wpull#52.

cbf68bf2528134ca3c0a012d578049404cd6205f authored almost 11 years ago
scraper.py: Scrapes Refresh header.

Closes chfoo/wpull#51.

c5f6183d8901271ca670d9c8edff46b284e69e15 authored almost 11 years ago
Bumps version to 0.22.4.

5311246790c6926b0a1eabd5a1baca0e162fbc0e authored almost 11 years ago
Merge branch 'develop'

b74850857459265ee35ea1c9ade756bca52ba5b2 authored almost 11 years ago
changelog: Adds "newlines in links" entry. Updates latest to 0.22.4.

948c00f31c4128a55de7d74a34534cd02c5d5ee7 authored almost 11 years ago
scraper.py: Strips newlines from links.

Closes chfoo/wpull#55.

e3faeb6f84bd2f9108269a9f6e1c310bd8637742 authored almost 11 years ago
connection.py: Fixes handling chunked-transfer due to regex fail.

Closes chfoo/wpull#54.

a5761da2f375fd655c6ee436de3729e173ff704b authored almost 11 years ago
http_test.py: Adds extra chunked transfer encoding tests.

29d519e0cf1e7e02674cd1a788d73baae4639444 authored almost 11 years ago
http_test.py: Adds a disabled big file test

35ceec1cb2dd2a35673d8de2e8da35687d2de7f1 authored almost 11 years ago
Revert "connection.py: Sets buffer size to 50MB."

This reverts commit 341d318f7bab632a15a18930c95ed704fb73e44a.

0b06b75163ba70f02ddf8999ae481afd29457c60 authored almost 11 years ago
connection.py: Sets buffer size to 50MB.

341d318f7bab632a15a18930c95ed704fb73e44a authored almost 11 years ago
usage.rst: Clarifies that resuming requires previous options.

ab75ce0da596a9884ec69704f0a29145048a9171 authored almost 11 years ago
Bumps version 0.22.3.

6922fccd5b4e4c53c095d32cf40e4130bdbc9812 authored almost 11 years ago
Merge branch 'develop'

2d611dae28b9f99d2a22e3f67817da6e6ceef7c7 authored almost 11 years ago
changelog: Updates latest to 0.22.3.

29ee0da9ca0d0d0eb1b3cf71f7b5227b7b3c9065 authored almost 11 years ago
app.py: Implicitly enables --span-hosts when not recursive.

360ae9b5129b3eb790fe43a563a3e6f7c9844742 authored almost 11 years ago
connection.py: Catches bad gzip error. Closes chfoo/wpull#32.

8b40bd96e4bf63194261704f0738733b1e4a047b authored almost 11 years ago
web_test.py: Uses longer timeout.

d8c1316c6bbbf8591d0d2255b2acb320a856cf6e authored almost 11 years ago
processor: Sets url item to skipped at end of fetch loop if not processed.

9b135f435ea076685280c2d4587a8abc9e5bb648 authored almost 11 years ago
Bumps version to 0.22.2

c6b38738d33277a828b098c9ff30e625c821a5ee authored almost 11 years ago
Merge branch 'develop'

5fdd30d09d44126554610f23b49321f3d194a7dc authored almost 11 years ago
changelog: Updates latest to 0.22.2.

b0e7b12876692af71e09d1c7a547a83eb99b0c50 authored almost 11 years ago
document.py: Uses plain classmethod decorator to fix docstring.

ed0ff2069321b2d94c9bbe6f4ce10a8832354727 authored almost 11 years ago
install.rst,hook.py: Fixes doc format/typos.

3f3e7e843a8c1a33b61833c6034650eacff025ce authored almost 11 years ago
database.py: Uses bulk queries and inserts for URL strings.

8c98d334de834316f609a07b0477076a42ac1358 authored almost 11 years ago
Bumps version to 0.22.1

Merge branch 'develop'

Conflicts:
wpull/version.py

3181064466e68d32f5e402829a65c1dfebd8cdfd authored almost 11 years ago
changelog: Updates latest to 0.22.1.

fe77f2c1e1a444b8db708053c273734ff55a7ea3 authored almost 11 years ago
processor.py: Fixes url_item not processed when robots.txt denies a URL.

78c00ed138e945d013b3464a3655550272700a1d authored almost 11 years ago
connection.py: Fixes handling of HTTP 204.

affa9ed1ec41229964ee06007bdff735a53d678a authored almost 11 years ago
proxy.py: Ignores StreamClosedError from clients using proxy.

7b75d8ad5b378bfbf0105dff60bd72edb32825e1 authored almost 11 years ago
PhantomJSController: Uses consistent page scrolling. Lengthen page size.

4db3319d2c20063e40a1b17d816d6c7a1882aeed authored almost 11 years ago
Bumps version to 0.22.

e15dac17789f5d17a1b53e9c06be0af5a0d3b686 authored almost 11 years ago
Merge commit '08d9f'

ae58024405e80dad3a661668cb260d7220a3872d authored almost 11 years ago
Bumps version to 0.23a1.

9056ece23ab65c7e18e2517667b7812f6676d272 authored almost 11 years ago
changelog: Updates latest to 0.22.

08d9f445362161874572269397de231b4c984bc0 authored almost 11 years ago
Documents PhantomJS scroll and snapshot behavior.

1698ae1d48a778d2fea194fb2429c19d89d81743 authored almost 11 years ago
Implements --phantomjs-scroll,--phantomjs-wait

3731c6716b4114a4fe114078ff686be8dc8804ae authored almost 11 years ago
writer.py: Adds extra_resource_path().

16513775998809316726c3fc9de9cf3cdd03b767 authored almost 11 years ago
Bumps version to 0.21.1.

Merge branch 'develop'

Conflicts:
wpull/version.py

f6c0273131b85e9fd4bb26198770db583e8cb701 authored almost 11 years ago
changelog: Updates latest to 0.21.1. usage.rst: Clarify PhantomJS support.

b36ee066cc3f88a29f29c21cdda57c9beb702c3b authored almost 11 years ago
processor.py: Supports simple PhantomJS page scrape.

9af0323fae80f2f50ad88a08508ae8ac52c5d2df authored almost 11 years ago
changelog: Updates entry about PhantomJS fixes.

5015acbfc124a720969cab60ea712a639e50d273 authored almost 11 years ago
processor.py: Hooks up PhantomJS to stats.

c85efd290c9fece710a4e93659e115cb71ea1834 authored almost 11 years ago
phantomjs,proxy: Supports HTTPS by hacking the URL.

c5992f74a95cf5c1150eddb5af5711b53063553b authored almost 11 years ago
phantomjs.py: Adds return_code docstring.

9e613b7ad87a35a1fc531f73b61784d8bc05fa0b authored almost 11 years ago
setup.py: Includes missing dependencies and latest package data.

7c07300e3991d572db929a53a2c7d78f4ced4c20 authored almost 11 years ago
usage.rst: Adds warning PhantomJS RPC and proxy servers are on localhost.

037e93badc6e61461054fd0c2593e9b0e3a13b25 authored almost 11 years ago
phantomjs.py: Calls the _send_loop() future result.

Avoids losing the stack trace.

50d881678feb0d35b6e654bfcc0b49a85906cc9f authored almost 11 years ago