Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ArchiveTeam/wpull

Wget-compatible web downloader and crawler.
https://github.com/ArchiveTeam/wpull

testing/py_hook_script_stop.py: Add missing record_info arg.

08f7991738b572be653739e360222a0468c516af authored over 10 years ago by Christopher Foo <[email protected]>
Bump version to 0.35.

da5b791d358e3fdd95c2bee338f0729e543bb239 authored over 10 years ago by Christopher Foo <[email protected]>
Merge commit '977a29c048a0e55b21dabea5dfb101bb84e57fc9'

384ac8f4a62f6ba80b8d2bd2d6b6e778d3a20ce7 authored over 10 years ago by Christopher Foo <[email protected]>
Bump version to 0.36a1.

8f75e166ce694e557078cf18e25c0728203fb2e8 authored over 10 years ago by Christopher Foo <[email protected]>
changelog: Update latest to 0.35.

[ci skip]

977a29c048a0e55b21dabea5dfb101bb84e57fc9 authored over 10 years ago by Christopher Foo <[email protected]>
Format with autopep8.

a9b35babf42caef9b23d76ac47497f40c288c27d authored over 10 years ago by Christopher Foo <[email protected]>
Fix up parent merge.

bc5f704f8252c4d25ed8d0a48f1ea681d20ed55c authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'temp1' into develop

Conflicts:
doc/changelog.rst
wpull/builder.py

103c20268fa7595ddb0981a99ef3d5a3ba10b78c authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'issue/135-warc-move' into temp1

Conflicts:
doc/changelog.rst

b991f18ac50e5843ca5e73922478c46f02a5e43d authored over 10 years ago by Christopher Foo <[email protected]>
changelog: Add bullet about --warc-move.

[ci skip]

add60b66677305656bed6f90a6ba37e57c7cac74 authored over 10 years ago by Christopher Foo <[email protected]>
WARCRecorder._move_file_to_dest_dir: Add docstring.

[ci skip]

7ccab71fc43e073ea20ae615f80564b6a5e4f57c authored over 10 years ago by Christopher Foo <[email protected]>
Change default scripting version to 2.

95fe12162683e37740582e9b1afc5f07c07b28e7 authored over 10 years ago by Christopher Foo <[email protected]>
app.py: Fix up warc move arg.

38d90880c45e05c2a1013463e35cbca6858b027f authored over 10 years ago by Christopher Foo <[email protected]>
Rename "--move-warc-to"→"--warc-move", move all WARCs and CDX, unit tests

Renames to --warc-move for consistency. Implements moving all WARCs and
the CDX file to given di...

b1b4c41ccbe42fb264b15d702d927bf23dac77c2 authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'yipdw-move-to' into issue/135-warc-move

ae2d7e75a17f96a0ec983b4126823993b5040d3f authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'topic/callback_hooks' into develop

8601b4b79e2786e8688a4ccb193aaa6426112928 authored over 10 years ago by Christopher Foo <[email protected]>
hook.py: Fixes lua type conversion on exit status callback.

4e36ee365bb82408b49032c00a6d7eafcd6a7524 authored over 10 years ago by Christopher Foo <[email protected]>
builder.py: Defaults back to IPv4 preference for backward compatiblity

eaf827289f6274ff10c59f5982bef99a034bc997 authored over 10 years ago by Christopher Foo <[email protected]>
network.Resolver: Fixes address preference sorting.

63e5847f9c5a0e66265a3a4474d29017573acf27 authored over 10 years ago by Christopher Foo <[email protected]>
network.Resolver: Defaults family PREFER_IPv4 for backward compatibility

fba68f90381ddfc1cce81c1beab42a65824a710f authored over 10 years ago by Christopher Foo <[email protected]>
changelog: Adds entries about HookableMixin and Resolver param change.

fe3cd52bfd4fcf2fc6cedfd502117aed1bcea796 authored over 10 years ago by Christopher Foo <[email protected]>
network.Resolver: Implements IPv4/6 perference sorting.

597e56490b318f84fafc6192e54777401cd1ff97 authored over 10 years ago by Christopher Foo <[email protected]>
Cleans up commented-out code.

d4ab47243eafd896f6f431ca4d4bbe5e9f4cd7f9 authored over 10 years ago by Christopher Foo <[email protected]>
hook.py: Implements remaining script hooks.

321bc3f85874279b7fc97e65b427f717f32b4cf5 authored over 10 years ago by Christopher Foo <[email protected]>
network.Resolver: WIP Uses AF_UNSPEC flag instead. Rename param to family.

Use of AF_INET | AF_INET6 may fail for things such as localhost.

[ci skip]

6255c39d84d3a56915221741aa3b1e286beb88c6 authored over 10 years ago by Christopher Foo <[email protected]>
Spike out --move-warc-to.

7011afa0d15a963025b0e7eb4cd98fbcf7a0b307 authored over 10 years ago by David Yip <[email protected]>
WIP adds callback hooks. network.py: Use explicit errno for dns socket errors.

WIP adds explicit callback hooks into classes instead of subclassing.
Allows attaching callbacks...

48a92834c55dc3b4a5b1c6eb07ef22cc6e65e881 authored over 10 years ago by Christopher Foo <[email protected]>
app.Application: Adds run_sync()

d46503c2140ad13869d2ce884d5d1a371271069f authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'temp1' into develop

095349ef02f32d25ed37ec6711e46ea31be8486f authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'issue/133-pep8' into temp1

920f20adba87d7367d650c4057e6dcfc37c56410 authored over 10 years ago by Christopher Foo <[email protected]>
Reformat with autopep8.

f23ba4796427b9a2cbb9970a71e59380a8238cc6 authored over 10 years ago by Christopher Foo <[email protected]>
writer.py: Wrap long line.

631d8c4122bd1e0fee054e6b6521fc0a83328c47 authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'lowks-master' into issue/133-pep8

241a7ed4b0293e9221f3f6679747e58942ba0e1f authored over 10 years ago by Christopher Foo <[email protected]>
Removing pep8 violations

d36f4fc24f1364f25795e4feab83d78c4fde1f14 authored over 10 years ago by Low Kian Seong <[email protected]>
app.py: Adds Application class.

SIGINT/SIGTERM signal handlers moved into
Application.setup_signal_handlers()

c81e681bdf0f485d9d0d4051f756e75d5a6349dd authored over 10 years ago by Christopher Foo <[email protected]>
Renames app.py to builder.py

b0b78d4f722c9fe15561f100abed8e599dc817e0 authored over 10 years ago by Christopher Foo <[email protected]>
Minor removal of pep8 violations

d2db7d354e621ed2fb08d6357b82c9fd9c02bee3 authored over 10 years ago by Low Kian Seong <[email protected]>
Bumps version to 0.34.1

Merge branch 'develop'

Conflicts:
wpull/version.py

f6179f725110caf5c7f1d82b7b676c9ed3ee53c2 authored over 10 years ago by Christopher Foo <[email protected]>
changelog: Updates latest to 0.34.1

22c7a58774755dcf2c14e54f9c23f2063327f148 authored over 10 years ago by Christopher Foo <[email protected]>
processor.py: Works around bad URL parse-then-format.

re: chfoo/wpull#132

966bf57ed184c7bddb0728078768231d6bdddcc4 authored over 10 years ago by Christopher Foo <[email protected]>
url.py: Removes stray brackets stripping due to bad urlparse behavior.

No longer supports IPv6 URLs with extra/misplaced brackets.

e9282a5c90256cd4cdb630e4b3a5b5218a466da0 authored over 10 years ago by Christopher Foo <[email protected]>
Clean up todo tags.

[ci skip]

29639b488ab51ea4b7e043e198771fc2e14386fb authored over 10 years ago by Christopher Foo <[email protected]>
Bumps version to 0.34.

7a6c4d39fdf49f6a85ca1376c66080f6a847b0e6 authored over 10 years ago by Christopher Foo <[email protected]>
Merge commit '4ecfaa3e3ca98ca236462446e58ffa693ff0b76f'

0956137ad2d0d33b0693976419622294a35d157e authored over 10 years ago by Christopher Foo <[email protected]>
Bumps version to 0.35a1.

[ci skip]

d4897b7b75f82dc1ace0ea01f4005955e9960d6f authored over 10 years ago by Christopher Foo <[email protected]>
changelog: Updates latest to 0.34.

[ci skip]

4ecfaa3e3ca98ca236462446e58ffa693ff0b76f authored over 10 years ago by Christopher Foo <[email protected]>
app.py: Ignores cookie file header checks with RelaxedMozillaCookieJar.

Closes chfoo/wpull#129

408593d22fb3857e436b69591268977d4640ddda authored over 10 years ago by Christopher Foo <[email protected]>
Adds debugging console (--debug-console-port)

Closes chfoo/wpull#127

2596548d1599fdbab3e8ca26c73dd888eb9f58bd authored over 10 years ago by Christopher Foo <[email protected]>
app.py: Hooks in phantomjs_snapshot option.

548366eca340c2d97fa964dadf8d67589f1cf6a9 authored over 10 years ago by Christopher Foo <[email protected]>
processor.py: Removes buggy and unneeded robots.txt special case.

Removes --no-strong-robots option.
urlfilter.DemuxURLFilter: Adds result a mapping of names to v...

2daa67a3f7dc890ea480b126427793f60466ada8 authored over 10 years ago by Christopher Foo <[email protected]>
processor.py: Closes snapshot temp file and use root path.

Closes chfoo/wpull#128

ec8c9be7e8c50f72801167e4a955aa7a3623e335 authored over 10 years ago by Christopher Foo <[email protected]>
scraper.py: Scrapes onclick, onmouseX, onkeyX, and data- attribs.

Closes chfoo/wpull#48

9aee023a2b097cfedfe6bc7294ca199ba3dd52b1 authored over 10 years ago by Christopher Foo <[email protected]>
Revert "Adds Python 3.4 to travis config."

This reverts commit 5e32a2f737f7a2109b7d3663c9f26f9375379990.

Re: chfoo/wpull#125

7f80481be587a95fcef5f7392bc0da0b15eebfab authored over 10 years ago by Christopher Foo <[email protected]>
Adds Python 3.4 to travis config.

5e32a2f737f7a2109b7d3663c9f26f9375379990 authored over 10 years ago by Christopher Foo <[email protected]>
Bumps version to 0.33.2.

b1ac4f6ea2094b26111b1ce3b61226f0ecc87462 authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'develop'

7009383e28d045df1925e7ccce25e3d4b356ecc8 authored over 10 years ago by Christopher Foo <[email protected]>
changelog: Updates latest to 0.33.2.

[ci skip]

93041a1a64ac965ea77134672574073f0e774d0e authored over 10 years ago by Christopher Foo <[email protected]>
PhantomJS: Munges URL by rewriting scheme and appending prefix

instead of munging the hostname.

Munging the hostname may have unseen effects such as cookies l...

dc3d2e99eba9cde1b0294845f14cb062b727fda6 authored over 10 years ago by Christopher Foo <[email protected]>
Adds missing basehref.html file.

7967dfe0a7165405b19007ccfd69121f23340bf8 authored over 10 years ago by Christopher Foo <[email protected]>
scraper.HTMLScraper: Finds and uses base href link.

Closes chfoo/wpull#122

66b1cbd5031b860ca3677f7eb45381aa00ea6df9 authored over 10 years ago by Christopher Foo <[email protected]>
Bumps version to 0.33.1.

Merge branch 'develop'

Conflicts:
wpull/version.py

c6a6840abebb436df586231ac746f779874ea7e0 authored over 10 years ago by Christopher Foo <[email protected]>
changelog: Updates latest to 0.33.1.

26d87fe5d5aa322cff1613854aa22ac01e73dfcc authored over 10 years ago by Christopher Foo <[email protected]>
scraper_test.py: Removes link from false positive fixed in prev commit.

cef58153a5a55104dd9d3e4d5a186c36fa49130a authored over 10 years ago by Christopher Foo <[email protected]>
url.is_likely_link: Consults mimetypes module and forbids common TLD.

49e3f0a8074ac25b520e048904fc450c2aac110d authored over 10 years ago by Christopher Foo <[email protected]>
url,url_test.py: Simplifies bracket removal. Adds more bracket testing.

5b79985ccc345e12c8b9bba02c282059ec47cc60 authored over 10 years ago by Christopher Foo <[email protected]>
url.URLInfo: Uses better checks to format IPv6 addresses.

Removes brackets from hostname as part of normalization.

Closes chfoo/wpull#121

201259db72b2d0934a81394a29c68005476c7fca authored over 10 years ago by Christopher Foo <[email protected]>
options.py: Uses curdir for --warc-tempdir.

956fc28fb6b2f035f7ab468a71cde8094e403d9b authored over 10 years ago by Christopher Foo <[email protected]>
hook.py: Access filename after rollover.

37ead5babdf1649825098f54144e379b9cb4917f authored over 10 years ago by Christopher Foo <[email protected]>
processor.py: Uses SpooledTempFile for scripts to access a filename.

Closes chfoo/wpull#120

fa31346e15230e54eebcffc75cbaa0a897a3c538 authored over 10 years ago by Christopher Foo <[email protected]>
app.py: Hooks up the --bind-address option.

5a21c7521f3960283acd8758aca6b208c6d35099 authored over 10 years ago by Christopher Foo <[email protected]>
decompression_test.py: Adds GzipDecompressor test.

c4027a5a61214b2fcbf9e1050d4aef7ff521820f authored over 10 years ago by Christopher Foo <[email protected]>
cache_test.py: Better coverage on LRUCache touch by get vs set.

48cab4cd99757989e37cf0aa8b38fdf7de2e35a9 authored over 10 years ago by Christopher Foo <[email protected]>
factory_test.py: Better coverage on Factory container methods.

a0f5e0496f223f670a47f8e7b7a587d5ba1b35a5 authored over 10 years ago by Christopher Foo <[email protected]>
string_test.py: Adds better coverage on tuple and else case.

d57a3fa3c8c7e305b8e17f5be63e7d7721e17c99 authored over 10 years ago by Christopher Foo <[email protected]>
document_test.py: Fixes wrong class tested for test_sitemap_detect

c498478f589d9cf7c664275ef4e4c7267c4620c8 authored over 10 years ago by Christopher Foo <[email protected]>
url.py: Adds more rules to is_unlikely_link().

121942c38c410eb9797b251e507bd07f9019af4c authored over 10 years ago by Christopher Foo <[email protected]>
Bumps version to 0.33.

f2f20e263bef2aa9e19ead88823ad3e28294fbbc authored over 10 years ago by Christopher Foo <[email protected]>
Merge commit '53afeb5378ea5d42c1e3caffcc7390583390a4e5'

5b7f66348f65982417f9e006ed33168b94e59948 authored over 10 years ago by Christopher Foo <[email protected]>
Bumps version to 0.34a1.

4f2febba5fbcf8344b05aa7dbdc515f7936d8a37 authored over 10 years ago by Christopher Foo <[email protected]>
changelog: Updates latest to 0.33.

53afeb5378ea5d42c1e3caffcc7390583390a4e5 authored over 10 years ago by Christopher Foo <[email protected]>
version.py: Fixes wrong version from merge.

0ca91eb3c0f29f09a87a799baccf2ef1788321fb authored over 10 years ago by Christopher Foo <[email protected]>
document.py: Uses HTML parser on XHTML docs instead of XHTML parser.

The XHTML parser does not handle soup well.

97dbe491adc325d1c77a6a288e2800e430029265 authored over 10 years ago by Christopher Foo <[email protected]>
scraper: Handles catching of UnicodeError/LxmlError. Allows partial scrapes.

processor: Don't catch UnicodeError.

fc80999bdefebb3962af294db7d060e19b31dd9b authored over 10 years ago by Christopher Foo <[email protected]>
document.py: Catches any lxml.etree errors parsing doctype.

Closes chfoo/wpull#118.

7f7ff184181ce5f6c747c8062cf341907c404996 authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'release/v0.32.1' into develop

0c965bcee06fb76e20134b0ac0b589f417e87eaf authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'release/v0.32.1'

Conflicts:
wpull/version.py

43623d99f0bef3a059f46ef6718d0726415ce775 authored over 10 years ago by Christopher Foo <[email protected]>
Bumps version & changelog to 0.32.1

d5b4dbb423c12b52898dfa9d35ea4447464afcde authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'issue/114-wait_time_hook' into develop

7db216a5a004628b939c3c7d2803ef8d181b4d5d authored over 10 years ago by Christopher Foo <[email protected]>
Adds unit tests for wait_time() hook.

4cf5e63505c672bc4a16e33d2cc2debf4e6495ed authored over 10 years ago by Christopher Foo <[email protected]>
document.py: Improves JS regex to scrape absolute URLs with space.

c25363546cd9d2393122bc40e0c6a4f9c6728ff5 authored over 10 years ago by Christopher Foo <[email protected]>
Implements JavaScript link scraping.

Adds JavaScriptReader, JavaScriptScraper.

Re: chfoo/wpull#74.

930e49c0155e76e3e27ffb1b4dd1d852157e793e authored over 10 years ago by Christopher Foo <[email protected]>
url.py: Adds is_likely_link().

2d786ad2b7bc4453a3f28f33e359e1e2b266eb81 authored over 10 years ago by Christopher Foo <[email protected]>
Removes extended module.

ef91868b62ff3aa463511709a89266bbffd54f6b authored over 10 years ago by Christopher Foo <[email protected]>
Merge branch 'topic/code_cleanup' into develop

2d0c165eda23b08ba7a99b2f0bc9ad388802574b authored over 10 years ago by Christopher Foo <[email protected]>
Moves string releated functions from util to string module.

Moves to_bytes, to_str, normalized_codec_name, detect_encoding,
try_decoding, format_size, print...

0a193c6d09a636da0ee5c11de426106246cd6fd1 authored over 10 years ago by Christopher Foo <[email protected]>
Moves sleep,TimedOut,wait_future,AdjustableSemaphore from util to async.

f7c86a6dd99d32a917c4b9d21d122a6490ba70db authored over 10 years ago by Christopher Foo <[email protected]>
Moves DeflateDecompressor, gzip_decompress from util to decompression.

75d71754ad8f4c0a32dc25b9d70be07eead8a684 authored over 10 years ago by Christopher Foo <[email protected]>
Moves OrderedDefaultDict from util to collections.

474678738604f5664d2b7d7932f208e36a0bbfa6 authored over 10 years ago by Christopher Foo <[email protected]>
hook,processor: Adds wait_time() scripting callback hook.

Closes chfoo/wpull#114.

ce1495d60bfbf6d08ea7cdb80d2d1c40099ed559 authored over 10 years ago by Christopher Foo <[email protected]>
app_test.py: Cleans up logging handlers.

dfdd194a9b9df0a428aac8f8168b6295a460cce5 authored over 10 years ago by Christopher Foo <[email protected]>