Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ArchiveTeam/wpull

Wget-compatible web downloader and crawler.
https://github.com/ArchiveTeam/wpull

connection.py: Checks for Gzip using new util.GzipDecompressor.

Adds util.GzipDecompressor().
Closes chfoo/wpull#115.

5d5b185b832a3bd9e8647caced04bf913505e681 authored over 10 years ago by Christopher Foo <[email protected]>
document.py: Uses HTML hack only for HTML parsers.

8e14bf27b2fd41f5a98b8db1d65a3eb49bfc6d0d authored over 10 years ago by Christopher Foo <[email protected]>
badapp.py: Adds HEAD request handler for Py 3.4.

4316736ef33cf0ba97c92288a437774b1a6bf3df authored over 10 years ago by Christopher Foo <[email protected]>
Bumps version to 0.32.

Merge commit 'd07172c0be2f910d9d95fddb487a4917aab73c70'

Conflicts:
wpull/version.py

bf2c971521752a97b39292f6838ce6f432bd23e7 authored over 10 years ago by Christopher Foo <[email protected]>
Bumps version to 0.33a1.

006d90c7ebc76ca0eb1fca4d21428136c70643ad authored over 10 years ago by Christopher Foo <[email protected]>
changelog: Updates latest to 0.32.

d07172c0be2f910d9d95fddb487a4917aab73c70 authored over 10 years ago by Christopher Foo <[email protected]>
scraper.HTMLScraper: Scrapes links from link,url,icon element text.

59bd1894dcfe280b18a878571d7a29f5f8b0414d authored over 10 years ago by Christopher Foo <[email protected]>
converter,document: Adds XML, XHTML detection. Supports XHTML doc conversion.

458ead19fe960a49b98dd20456e2a8bc072e834f authored over 10 years ago by Christopher Foo <[email protected]>
document.py: Adds XMLDetector and detector unit tests.

226f62f703cbf639303faf092af6fc5a6ee5867f authored over 10 years ago by Christopher Foo <[email protected]>
document.py: Abstracts doc type detection into BaseDocumentDetector.

214258fee0af6c0868820e8ac09575898c2b17eb authored over 10 years ago by Christopher Foo <[email protected]>
url.py: Simplifies if cases for is_percent_encoded() and split_query()

d5c9cbbbf89bc753244cf4dc6ad80331b59d3c5d authored over 10 years ago by Christopher Foo <[email protected]>
app.py: Sets the highest min logger level possible instead of debug.

Reduce string formatting calls when --debug is not enabled.

3647ed45614ebd86c4ec4a701554288c8fc27099 authored over 10 years ago by Christopher Foo <[email protected]>
document.py: Lowers detect_response_encoding peek default to 128KB.

1MB can sufficently cause Wpull to hang for too long.

5aa3df82ed374833813cb0c159871046b8cf67f7 authored over 10 years ago by Christopher Foo <[email protected]>
processor.py: Ignores document decode errors during scraping.

Closes chfoo/wpull#113

0a70e1e1ac1289ff0c888a766aec60062f0751d3 authored over 10 years ago by Christopher Foo <[email protected]>
scraper.py: Checks None before parse_refresh().

Closes chfoo/wpull#112

714d55fb85bc06aa05daa49595881e863a908abe authored over 10 years ago by Christopher Foo <[email protected]>
Bumps version to 0.31.

Merge commit '8ff666313f42f988c166827d62b7704c1451d9e6'

Conflicts:
wpull/version.py

5d07d997fcbb76f854e7bf8fd41c38fc3d573918 authored over 10 years ago by Christopher Foo <[email protected]>
Bumps version to 0.32a1

[ci skip]

8b459a15939e5f6174a3d86257387cc7a662d37a authored over 10 years ago by Christopher Foo <[email protected]>
changelog: Updates latest to 0.31

Closes chfoo/wpull#105. Closes chfoo/wpull#104.
[ci skip]

8ff666313f42f988c166827d62b7704c1451d9e6 authored over 10 years ago by Christopher Foo <[email protected]>
urlfilter.ParentFilter: Fixes scheme comparison.

Re: chfoo/wpull#66

1be0ac1a42a55cf984409e2b5ff50a44ec16c663 authored over 10 years ago by Christopher Foo <[email protected]>
scraper.py: Fixes AttributeError on scraping params.

Closes chfoo/wpull#110.

2c79251cb5914dcedcdbb17561f1b136defc2ae9 authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'issue/107-scripting_record_info' into develop

6847ab1d61d9cd3cddcc6cd2d71f604713420479 authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'issue/105-doc-stream-reader' into develop

Conflicts:
doc/changelog.rst

9bcb99f4392f5cd5e94663bdebb999c9b5504746 authored over 10 years ago by Christopher Foo <[email protected]>
urlfilter.LevelFilter: Always return True for inline.

adb724fca5130ee29dedd786470871b997675930 authored over 10 years ago by Christopher Foo <[email protected]>
urlfilter.ParentFilter: Returns True if different host or changed scheme.

Fixes to follows Wget's behavior for no_parent.
Re: chfoo/wpull#66

56bee4653565d7956b3fb8a0308fb1f832baaaf3 authored over 10 years ago by Christopher Foo <[email protected]>
document.py: Fixes CSSReader read links greater than 1MB.

8c6f29afd8be4d9a4258f951e0ce264ee8b09966 authored over 10 years ago by Christopher Foo <[email protected]>
hook.py: Supports version select. Adds record_info arg.

Closes chfoo/wpull#107.

2134c11172162c956f5461c7d3162c1442238de4 authored over 10 years ago by Christopher Foo <[email protected]>
version.py: Implements version_info.

7cc125158ac9cf513897a8a36a3e91897b5f7bec authored over 10 years ago by Christopher Foo <[email protected]>
scraper.py: Resets file offset after read. Swaps in CSS iterative read.

b82eee16bfd92107d9d57ce43087c96b6bcf4685 authored over 10 years ago by Christopher Foo <[email protected]>
document.py: Uses StreamReader instead.

TextIOWrapper closes our file and doesn't support SpooledTemporaryFile.

50e91504472f15a80d8706a5ce9ac41dd02d7907 authored over 10 years ago by Christopher Foo <[email protected]>
converter.py: Uses temp filename. Fixes writing end tags.

1d5d825a0ea361be730fab7283a4ae14af7201f5 authored over 10 years ago by Christopher Foo <[email protected]>
document.py: HTMLParserTarget emits end tags as well.

41e153dddd0bc7017b6da013d5c24473cdf79053 authored over 10 years ago by Christopher Foo <[email protected]>
converter.py: Rewrites converters using new iterative readers.

8144fc3e1d2a6f3c999146e74d1fcf8a118093f8 authored over 10 years ago by Christopher Foo <[email protected]>
document.py: Removes incorrect fixme comment.

It's deflate that does not have the magic bytes.
[ci skip]

10ff02fc8f61cfad8ba1215bad071949bcae0120 authored over 10 years ago by Christopher Foo <[email protected]>
document,scraper.py: WIP Uses target parser iterative document readers.

re: chfoo/wpull#105, chfoo/wpull#104
[ci skip]

e26ea2ccd0ba708248f23e1e96db42920bd57b76 authored over 10 years ago by Christopher Foo <[email protected]>
testing/badapp.py: Adds inline links to many_links.

86c7fd4721332aab796255277c3748f11fcbda26 authored over 10 years ago by Christopher Foo <[email protected]>
app.py: Adds URL filters only when enabled.

0c9288d31eb3038313069842c639746c22ce7c76 authored almost 11 years ago by Christopher Foo <[email protected]>
cookie_test.py: Fixes imports, FakeResponse for Py 2.

0d8cede69274d7935b70364cf628a733d891e44f authored almost 11 years ago by Christopher Foo <[email protected]>
Limits cookie length and max number per domain.

Closes chfoo/wpull#102.

6027c09055c9e169427b37aba9166e98afed8960 authored almost 11 years ago by Christopher Foo <[email protected]>
iostream.py: Fixes typo. conf.py: Mocks out lxml.etree.

[ci skip]

9b50ad98c77f738d3e37ce7030ecca5adb583533 authored almost 11 years ago by Christopher Foo <[email protected]>
Implements --base,--force-html

Closes chfoo/wpull#40.

7adca03296cc0d2ade05dff51cf50e8ef8057660 authored almost 11 years ago by Christopher Foo <[email protected]>
iostream.py: Rewords StreamClosedError message.

e4df33e6b733d7863bcb40f5f962f8bead6cd84b authored almost 11 years ago by Christopher Foo <[email protected]>
Adds "crawler" to description.

[ci skip]

5d916a4ad0d1e66a41838ccd9bc9498b04858d60 authored almost 11 years ago by Christopher Foo <[email protected]>
app.py: Removes Mozilla from User-agent.

Closes chfoo/wpull#101

465da114aab4bba4df73dc83e6aa91b3d86170d7 authored almost 11 years ago by Christopher Foo <[email protected]>
Includes PhantomJS version in warcinfo.

892f8fa28d4eb9057983e5872af3742ae6fc1244 authored almost 11 years ago by Christopher Foo <[email protected]>
recorder.py: Uses param object for WARCRecorder.

dbc1bd8a65756aaf0289b1f56693c7e579c31271 authored almost 11 years ago by Christopher Foo <[email protected]>
recorder.py: Uses constant for software string.

8f654f42be8f5a8cd5c660392a5dd9e2899f5feb authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.30.

Merge commit 'f2bc845bbad562389152546a68b8b4814dff25a6'

Conflicts:
wpull/version.py

fbd508b3ccb406809561dbd033811cdf552bdbbf authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.31a1.

67e826e63aab492cb407a560fe877fd2aa97e9f7 authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Updates latest to 0.30.

f2bc845bbad562389152546a68b8b4814dff25a6 authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Adds entry about Engine's use of AdjustableSemaphore.

074a4262fae3ecf54de412b2fe8ce07b17a7087b authored almost 11 years ago by Christopher Foo <[email protected]>
Merge branch 'issue/93-concurrency' into develop

Conflicts:
wpull/util.py
wpull/util_test.py

9516149b094cc1eb86d4edfc124af2bea5d5c7f4 authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Adds entry about cx_freeze support.

b782db49b75f7da39f0cadf3af8175a0aff51afa authored almost 11 years ago by Christopher Foo <[email protected]>
Merge branch 'topic/cx_freeze' into develop

b26c16ee7f3fe13e0b09cb2a31937e3d25f421e4 authored almost 11 years ago by Christopher Foo <[email protected]>
Uses get_package_filename() for phantomjs.js for cx_freeze.

99f71c8b52414919e880b952aed35c08f406d65a authored almost 11 years ago by Christopher Foo <[email protected]>
options: Changes scroll default. processor: Changes viewport size, scrolling.

options.py: Changes --phantomjs-scroll default to 10.
processor.py: Changes viewport and page si...

a9dc4f8017d80a9231175f44ed4217d58989b417 authored almost 11 years ago by Christopher Foo <[email protected]>
processor.py: Uses key constants and sends scrollTo on PhantomJS remote.

21717bf11642dd42362672b05bf20de457c78907 authored almost 11 years ago by Christopher Foo <[email protected]>
phantomjs.py: Increases timeout to account for slowness on Travis CI.

35ccd8c1cdd90297f86237f73acab45995afbbf6 authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Adds note about PhantomJS timeouts.

8a1c84b2e4937ec6655f44a044c8b3180df9308f authored almost 11 years ago by Christopher Foo <[email protected]>
proxy.py: Returns early if no request.

2921a3be182989b152c4bc479e7f888a7d3bb22b authored almost 11 years ago by Christopher Foo <[email protected]>
errors.py: Fixes docstring spelling.

f7660bc1139503ca09372dc95fbf814e99249512 authored almost 11 years ago by Christopher Foo <[email protected]>
phantomjs.py: Adds default timeouts.

Closes chfoo/wpull#47.

f39b60055972c612b15ad0fb7695a94ed4b04286 authored almost 11 years ago by Christopher Foo <[email protected]>
Supports reading certs from zip file under cx_freeze.

c3319afa7ac1969592d15bab69581f88b239cf2f authored almost 11 years ago by Christopher Foo <[email protected]>
proxy.py: Calls attach() for io stream to register socket properly.

Closes chfoo/wpull#100

e57ea86d0e02de93078463d93aaf840e34f9d7e3 authored almost 11 years ago by Christopher Foo <[email protected]>
iostream.py: Adds attach() for already connected sockets.

2400339ae572c4d0ac3f4e719d2c8fc9e606d8f6 authored almost 11 years ago by Christopher Foo <[email protected]>
proxy.py: Uses regular read_bytes callback from new iostream.

Closes chfoo/wpull#99.

0c476df23a296844bf6b826e6caf34cf603084ac authored almost 11 years ago by Christopher Foo <[email protected]>
setup.py: Adds WIP cx_freeze support.

97a84dade4db2587767fac2463da2922951f8eee authored almost 11 years ago by Christopher Foo <[email protected]>
Includes missing changes to 79bdfbe25a81bf7845aff12e340a9bee6bb3a89f.

c58ee9b8a81d2c380a060d3b86c2462fe41d2f37 authored almost 11 years ago by Christopher Foo <[email protected]>
Moves URL filters into new urlfilter module.

79bdfbe25a81bf7845aff12e340a9bee6bb3a89f authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Updates entry for issue 98.

Closes chfoo/wpull#98

24278a8a928bcd12a26ca29efb16754d49aa61f5 authored almost 11 years ago by Christopher Foo <[email protected]>
Merge branch 'issue/98-attributeerror' into develop

4f305787d559d5c18647b23dc977c6914f94679b authored almost 11 years ago by Christopher Foo <[email protected]>
app.py: Removes blocking ioloop tracebacks.

Closes chfoo/wpull#97.

0e08e32c64491a7dbe62e91a2cb39eb332b633b0 authored almost 11 years ago by Christopher Foo <[email protected]>
iostream.py: Reraises AttributeError as NetworkError.

35ea46b1fd5f18a58f824162df7a3e81c0a8ccb7 authored almost 11 years ago by Christopher Foo <[email protected]>
url.py: Switches to LRUCache.

3b78d7b16b71222b7106bfaf788f50b5658c7046 authored almost 11 years ago by Christopher Foo <[email protected]>
cache.py: Implements LRUCache.

81cad63bf6b7aac716a9d34b01f65c3c6c7818a5 authored almost 11 years ago by Christopher Foo <[email protected]>
Adds collections module containing LinkedList.

4b25d7f4b2d05590186dd0f34474c6b1c24f4b0a authored almost 11 years ago by Christopher Foo <[email protected]>
network,url.py: Switches to FIFOCache.

d58f7ee7f037bae0d9398ba1ac01f8ee9289bba1 authored almost 11 years ago by Christopher Foo <[email protected]>
cache.py: Adds FIFOCache.

c6e4c12e39ee45cb466f4dd29617e9b88c4bbbf8 authored almost 11 years ago by Christopher Foo <[email protected]>
iostream.py: Catches AttributeError on do_handshake().

Re: chfoo/wpull#98

2d8daa5bfd0a8fac63976f65102a635accd2471a authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.29.

5ee21a40b9b6a61e7353eb35881f1acf8e2100b3 authored almost 11 years ago by Christopher Foo <[email protected]>
Merge commit '07831965a106a6c1e84bb79de55128ccca315f99'

c7934d33e35633edb0c1852091bb8791b17956d8 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.30a1

[ci skip]

cb423bcc8f562242506fea9a9f8ba330643f64eb authored almost 11 years ago by Christopher Foo <[email protected]>
changelog.rst: Updates latest to 0.29.

[ci skip]

07831965a106a6c1e84bb79de55128ccca315f99 authored almost 11 years ago by Christopher Foo <[email protected]>
setup.py: Moves setup args into a dict.

89255d8191fa0381bfc743934414f038bfb4810c authored almost 11 years ago by Christopher Foo <[email protected]>
web.py: Breaks loop if robots.txt is error with mock robots.txt entry.

59951ddfde97f31ed82dfb0c96784c717d719adc authored almost 11 years ago by Christopher Foo <[email protected]>
__main__.py: Hooks in pdb.

e3f5856ef50cb6fa6a10c1ff5e53fa62d0539f7d authored almost 11 years ago by Christopher Foo <[email protected]>
url.py: Caches the parsed URLInfos.

e6c33af6e298f4c8908362aa2c6c5a8bca30d33e authored almost 11 years ago by Christopher Foo <[email protected]>
Uses old style debugs for url filters instead of backward-compat style.

1bde33bf93e0cd7eb22ee929d448dc56bfa13d66 authored almost 11 years ago by Christopher Foo <[email protected]>
Implements --http-compression.

Closes chfoo/wpull#94.

36cee4967b5dd29e504f551a092a0d43e8a61fdf authored almost 11 years ago by Christopher Foo <[email protected]>
connection.py: Supports deflate compression.

1158cd296635f22de075529da36bd4e930e52e5e authored almost 11 years ago by Christopher Foo <[email protected]>
util.py: Adds DeflateDecompressor.

91e890184bbbf73bdde3542d41c5500a873d3224 authored almost 11 years ago by Christopher Foo <[email protected]>
util.detect_encoding(): Continues loop if encoding is None.

Closes chfoo/wpull#96

f8ccc11ed4a9fd4d31ad8f97fad1bec085264344 authored almost 11 years ago by Christopher Foo <[email protected]>
url.py: Double encodes hostname to work around issue 21103.

Closes chfoo/wpull#82

8bcaddfb53231559fbe752039b1b464577e6420f authored almost 11 years ago by Christopher Foo <[email protected]>
Implements --output-document.

Closes chfoo/wpull#42

f09a79039475e7b853fdf06a7f86b0f29b455d49 authored almost 11 years ago by Christopher Foo <[email protected]>
engine,util: Add and use AdjustableSemaphore.

Re: chfoo/wpull#93

3cf5eeefb39e6c457a566be69b09b70bf2945a5a authored almost 11 years ago by Christopher Foo <[email protected]>
engine.py: Releases semaphore and return if stopping on poll.

e7dcc5a0d88475270d0b52c9a7e7e369b7aac9c6 authored almost 11 years ago by Christopher Foo <[email protected]>
options.py: Sorts the choices before printing.

9bbb717b5b8843e4c366494ad18ef9a732e4aadf authored almost 11 years ago by Christopher Foo <[email protected]>
Implements --ignore-length.

Closes chfoo/wpull#44

ba9dad2193f599f25781aa3b5fabd007ce02c1a3 authored almost 11 years ago by Christopher Foo <[email protected]>
hook: Adds engine_run() script callback. Exposes factory instance.

774cd317a9059b79d603c2b37661298108cd51aa authored almost 11 years ago by Christopher Foo <[email protected]>
connection.py: Uses ConnectionParams for Connection args.

6a90a2d3858d8e696c75d62a88bd28ea1de7c5e6 authored almost 11 years ago by Christopher Foo <[email protected]>
connection.py: Sets connection close on request header if not keep-alive.

5293415ae3ba27c7b65b74160dbf20640aa5935b authored almost 11 years ago by Christopher Foo <[email protected]>