Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ArchiveTeam/ludios_wpull

wpull fork with fixes and faster parsing using html5-parser; used by grab-site; should go away when wpull is similarly improved
https://github.com/ArchiveTeam/ludios_wpull

changelog: Update latest to 0.35.

[ci skip]

977a29c048a0e55b21dabea5dfb101bb84e57fc9 authored over 10 years ago
Format with autopep8.

a9b35babf42caef9b23d76ac47497f40c288c27d authored over 10 years ago
Fix up parent merge.

bc5f704f8252c4d25ed8d0a48f1ea681d20ed55c authored over 10 years ago
Merge branch 'temp1' into develop

Conflicts:
doc/changelog.rst
wpull/builder.py

103c20268fa7595ddb0981a99ef3d5a3ba10b78c authored over 10 years ago
Merge branch 'issue/135-warc-move' into temp1

Conflicts:
doc/changelog.rst

b991f18ac50e5843ca5e73922478c46f02a5e43d authored over 10 years ago
changelog: Add bullet about --warc-move.

[ci skip]

add60b66677305656bed6f90a6ba37e57c7cac74 authored over 10 years ago
WARCRecorder._move_file_to_dest_dir: Add docstring.

[ci skip]

7ccab71fc43e073ea20ae615f80564b6a5e4f57c authored over 10 years ago
Change default scripting version to 2.

95fe12162683e37740582e9b1afc5f07c07b28e7 authored over 10 years ago
app.py: Fix up warc move arg.

38d90880c45e05c2a1013463e35cbca6858b027f authored over 10 years ago
Rename "--move-warc-to"→"--warc-move", move all WARCs and CDX, unit tests

Renames to --warc-move for consistency. Implements moving all WARCs and
the CDX file to given di...

b1b4c41ccbe42fb264b15d702d927bf23dac77c2 authored over 10 years ago
Merge branch 'yipdw-move-to' into issue/135-warc-move

ae2d7e75a17f96a0ec983b4126823993b5040d3f authored over 10 years ago
Merge branch 'topic/callback_hooks' into develop

8601b4b79e2786e8688a4ccb193aaa6426112928 authored over 10 years ago
hook.py: Fixes lua type conversion on exit status callback.

4e36ee365bb82408b49032c00a6d7eafcd6a7524 authored over 10 years ago
builder.py: Defaults back to IPv4 preference for backward compatiblity

eaf827289f6274ff10c59f5982bef99a034bc997 authored over 10 years ago
network.Resolver: Fixes address preference sorting.

63e5847f9c5a0e66265a3a4474d29017573acf27 authored over 10 years ago
network.Resolver: Defaults family PREFER_IPv4 for backward compatibility

fba68f90381ddfc1cce81c1beab42a65824a710f authored over 10 years ago
changelog: Adds entries about HookableMixin and Resolver param change.

fe3cd52bfd4fcf2fc6cedfd502117aed1bcea796 authored over 10 years ago
network.Resolver: Implements IPv4/6 perference sorting.

597e56490b318f84fafc6192e54777401cd1ff97 authored over 10 years ago
Cleans up commented-out code.

d4ab47243eafd896f6f431ca4d4bbe5e9f4cd7f9 authored over 10 years ago
hook.py: Implements remaining script hooks.

321bc3f85874279b7fc97e65b427f717f32b4cf5 authored over 10 years ago
network.Resolver: WIP Uses AF_UNSPEC flag instead. Rename param to family.

Use of AF_INET | AF_INET6 may fail for things such as localhost.

[ci skip]

6255c39d84d3a56915221741aa3b1e286beb88c6 authored over 10 years ago
Spike out --move-warc-to.

7011afa0d15a963025b0e7eb4cd98fbcf7a0b307 authored over 10 years ago
WIP adds callback hooks. network.py: Use explicit errno for dns socket errors.

WIP adds explicit callback hooks into classes instead of subclassing.
Allows attaching callbacks...

48a92834c55dc3b4a5b1c6eb07ef22cc6e65e881 authored over 10 years ago
app.Application: Adds run_sync()

d46503c2140ad13869d2ce884d5d1a371271069f authored over 10 years ago
Merge branch 'temp1' into develop

095349ef02f32d25ed37ec6711e46ea31be8486f authored over 10 years ago
Merge branch 'issue/133-pep8' into temp1

920f20adba87d7367d650c4057e6dcfc37c56410 authored over 10 years ago
Reformat with autopep8.

f23ba4796427b9a2cbb9970a71e59380a8238cc6 authored over 10 years ago
writer.py: Wrap long line.

631d8c4122bd1e0fee054e6b6521fc0a83328c47 authored over 10 years ago
Merge branch 'lowks-master' into issue/133-pep8

241a7ed4b0293e9221f3f6679747e58942ba0e1f authored over 10 years ago
Removing pep8 violations

d36f4fc24f1364f25795e4feab83d78c4fde1f14 authored over 10 years ago
app.py: Adds Application class.

SIGINT/SIGTERM signal handlers moved into
Application.setup_signal_handlers()

c81e681bdf0f485d9d0d4051f756e75d5a6349dd authored over 10 years ago
Renames app.py to builder.py

b0b78d4f722c9fe15561f100abed8e599dc817e0 authored over 10 years ago
Minor removal of pep8 violations

d2db7d354e621ed2fb08d6357b82c9fd9c02bee3 authored over 10 years ago
Bumps version to 0.34.1

Merge branch 'develop'

Conflicts:
wpull/version.py

f6179f725110caf5c7f1d82b7b676c9ed3ee53c2 authored over 10 years ago
changelog: Updates latest to 0.34.1

22c7a58774755dcf2c14e54f9c23f2063327f148 authored over 10 years ago
processor.py: Works around bad URL parse-then-format.

re: chfoo/wpull#132

966bf57ed184c7bddb0728078768231d6bdddcc4 authored over 10 years ago
url.py: Removes stray brackets stripping due to bad urlparse behavior.

No longer supports IPv6 URLs with extra/misplaced brackets.

e9282a5c90256cd4cdb630e4b3a5b5218a466da0 authored over 10 years ago
Clean up todo tags.

[ci skip]

29639b488ab51ea4b7e043e198771fc2e14386fb authored over 10 years ago
Bumps version to 0.34.

7a6c4d39fdf49f6a85ca1376c66080f6a847b0e6 authored over 10 years ago
Merge commit '4ecfaa3e3ca98ca236462446e58ffa693ff0b76f'

0956137ad2d0d33b0693976419622294a35d157e authored over 10 years ago
Bumps version to 0.35a1.

[ci skip]

d4897b7b75f82dc1ace0ea01f4005955e9960d6f authored over 10 years ago
changelog: Updates latest to 0.34.

[ci skip]

4ecfaa3e3ca98ca236462446e58ffa693ff0b76f authored over 10 years ago
app.py: Ignores cookie file header checks with RelaxedMozillaCookieJar.

Closes chfoo/wpull#129

408593d22fb3857e436b69591268977d4640ddda authored almost 11 years ago
Adds debugging console (--debug-console-port)

Closes chfoo/wpull#127

2596548d1599fdbab3e8ca26c73dd888eb9f58bd authored almost 11 years ago
app.py: Hooks in phantomjs_snapshot option.

548366eca340c2d97fa964dadf8d67589f1cf6a9 authored almost 11 years ago
processor.py: Removes buggy and unneeded robots.txt special case.

Removes --no-strong-robots option.
urlfilter.DemuxURLFilter: Adds result a mapping of names to v...

2daa67a3f7dc890ea480b126427793f60466ada8 authored almost 11 years ago
processor.py: Closes snapshot temp file and use root path.

Closes chfoo/wpull#128

ec8c9be7e8c50f72801167e4a955aa7a3623e335 authored almost 11 years ago
scraper.py: Scrapes onclick, onmouseX, onkeyX, and data- attribs.

Closes chfoo/wpull#48

9aee023a2b097cfedfe6bc7294ca199ba3dd52b1 authored almost 11 years ago
Revert "Adds Python 3.4 to travis config."

This reverts commit 5e32a2f737f7a2109b7d3663c9f26f9375379990.

Re: chfoo/wpull#125

7f80481be587a95fcef5f7392bc0da0b15eebfab authored almost 11 years ago
Adds Python 3.4 to travis config.

5e32a2f737f7a2109b7d3663c9f26f9375379990 authored almost 11 years ago
Bumps version to 0.33.2.

b1ac4f6ea2094b26111b1ce3b61226f0ecc87462 authored almost 11 years ago
Merge branch 'develop'

7009383e28d045df1925e7ccce25e3d4b356ecc8 authored almost 11 years ago
changelog: Updates latest to 0.33.2.

[ci skip]

93041a1a64ac965ea77134672574073f0e774d0e authored almost 11 years ago
PhantomJS: Munges URL by rewriting scheme and appending prefix

instead of munging the hostname.

Munging the hostname may have unseen effects such as cookies l...

dc3d2e99eba9cde1b0294845f14cb062b727fda6 authored almost 11 years ago
Adds missing basehref.html file.

7967dfe0a7165405b19007ccfd69121f23340bf8 authored almost 11 years ago
scraper.HTMLScraper: Finds and uses base href link.

Closes chfoo/wpull#122

66b1cbd5031b860ca3677f7eb45381aa00ea6df9 authored almost 11 years ago
Bumps version to 0.33.1.

Merge branch 'develop'

Conflicts:
wpull/version.py

c6a6840abebb436df586231ac746f779874ea7e0 authored almost 11 years ago
changelog: Updates latest to 0.33.1.

26d87fe5d5aa322cff1613854aa22ac01e73dfcc authored almost 11 years ago
scraper_test.py: Removes link from false positive fixed in prev commit.

cef58153a5a55104dd9d3e4d5a186c36fa49130a authored almost 11 years ago
url.is_likely_link: Consults mimetypes module and forbids common TLD.

49e3f0a8074ac25b520e048904fc450c2aac110d authored almost 11 years ago
url,url_test.py: Simplifies bracket removal. Adds more bracket testing.

5b79985ccc345e12c8b9bba02c282059ec47cc60 authored almost 11 years ago
url.URLInfo: Uses better checks to format IPv6 addresses.

Removes brackets from hostname as part of normalization.

Closes chfoo/wpull#121

201259db72b2d0934a81394a29c68005476c7fca authored almost 11 years ago
options.py: Uses curdir for --warc-tempdir.

956fc28fb6b2f035f7ab468a71cde8094e403d9b authored almost 11 years ago
hook.py: Access filename after rollover.

37ead5babdf1649825098f54144e379b9cb4917f authored almost 11 years ago
processor.py: Uses SpooledTempFile for scripts to access a filename.

Closes chfoo/wpull#120

fa31346e15230e54eebcffc75cbaa0a897a3c538 authored almost 11 years ago
app.py: Hooks up the --bind-address option.

5a21c7521f3960283acd8758aca6b208c6d35099 authored almost 11 years ago
decompression_test.py: Adds GzipDecompressor test.

c4027a5a61214b2fcbf9e1050d4aef7ff521820f authored almost 11 years ago
cache_test.py: Better coverage on LRUCache touch by get vs set.

48cab4cd99757989e37cf0aa8b38fdf7de2e35a9 authored almost 11 years ago
factory_test.py: Better coverage on Factory container methods.

a0f5e0496f223f670a47f8e7b7a587d5ba1b35a5 authored almost 11 years ago
string_test.py: Adds better coverage on tuple and else case.

d57a3fa3c8c7e305b8e17f5be63e7d7721e17c99 authored almost 11 years ago
document_test.py: Fixes wrong class tested for test_sitemap_detect

c498478f589d9cf7c664275ef4e4c7267c4620c8 authored almost 11 years ago
url.py: Adds more rules to is_unlikely_link().

121942c38c410eb9797b251e507bd07f9019af4c authored almost 11 years ago
Bumps version to 0.33.

f2f20e263bef2aa9e19ead88823ad3e28294fbbc authored almost 11 years ago
Merge commit '53afeb5378ea5d42c1e3caffcc7390583390a4e5'

5b7f66348f65982417f9e006ed33168b94e59948 authored almost 11 years ago
Bumps version to 0.34a1.

4f2febba5fbcf8344b05aa7dbdc515f7936d8a37 authored almost 11 years ago
changelog: Updates latest to 0.33.

53afeb5378ea5d42c1e3caffcc7390583390a4e5 authored almost 11 years ago
version.py: Fixes wrong version from merge.

0ca91eb3c0f29f09a87a799baccf2ef1788321fb authored almost 11 years ago
document.py: Uses HTML parser on XHTML docs instead of XHTML parser.

The XHTML parser does not handle soup well.

97dbe491adc325d1c77a6a288e2800e430029265 authored almost 11 years ago
scraper: Handles catching of UnicodeError/LxmlError. Allows partial scrapes.

processor: Don't catch UnicodeError.

fc80999bdefebb3962af294db7d060e19b31dd9b authored almost 11 years ago
document.py: Catches any lxml.etree errors parsing doctype.

Closes chfoo/wpull#118.

7f7ff184181ce5f6c747c8062cf341907c404996 authored almost 11 years ago
Merge branch 'release/v0.32.1' into develop

0c965bcee06fb76e20134b0ac0b589f417e87eaf authored almost 11 years ago
Merge branch 'release/v0.32.1'

Conflicts:
wpull/version.py

43623d99f0bef3a059f46ef6718d0726415ce775 authored almost 11 years ago
Bumps version & changelog to 0.32.1

d5b4dbb423c12b52898dfa9d35ea4447464afcde authored almost 11 years ago
Merge branch 'issue/114-wait_time_hook' into develop

7db216a5a004628b939c3c7d2803ef8d181b4d5d authored almost 11 years ago
Adds unit tests for wait_time() hook.

4cf5e63505c672bc4a16e33d2cc2debf4e6495ed authored almost 11 years ago
document.py: Improves JS regex to scrape absolute URLs with space.

c25363546cd9d2393122bc40e0c6a4f9c6728ff5 authored almost 11 years ago
Implements JavaScript link scraping.

Adds JavaScriptReader, JavaScriptScraper.

Re: chfoo/wpull#74.

930e49c0155e76e3e27ffb1b4dd1d852157e793e authored almost 11 years ago
url.py: Adds is_likely_link().

2d786ad2b7bc4453a3f28f33e359e1e2b266eb81 authored almost 11 years ago
Removes extended module.

ef91868b62ff3aa463511709a89266bbffd54f6b authored almost 11 years ago
Merge branch 'topic/code_cleanup' into develop

2d0c165eda23b08ba7a99b2f0bc9ad388802574b authored almost 11 years ago
Moves string releated functions from util to string module.

Moves to_bytes, to_str, normalized_codec_name, detect_encoding,
try_decoding, format_size, print...

0a193c6d09a636da0ee5c11de426106246cd6fd1 authored almost 11 years ago
Moves sleep,TimedOut,wait_future,AdjustableSemaphore from util to async.

f7c86a6dd99d32a917c4b9d21d122a6490ba70db authored almost 11 years ago
Moves DeflateDecompressor, gzip_decompress from util to decompression.

75d71754ad8f4c0a32dc25b9d70be07eead8a684 authored almost 11 years ago
Moves OrderedDefaultDict from util to collections.

474678738604f5664d2b7d7932f208e36a0bbfa6 authored almost 11 years ago
hook,processor: Adds wait_time() scripting callback hook.

Closes chfoo/wpull#114.

ce1495d60bfbf6d08ea7cdb80d2d1c40099ed559 authored almost 11 years ago
app_test.py: Cleans up logging handlers.

dfdd194a9b9df0a428aac8f8168b6295a460cce5 authored almost 11 years ago
connection.py: Checks for Gzip using new util.GzipDecompressor.

Adds util.GzipDecompressor().
Closes chfoo/wpull#115.

5d5b185b832a3bd9e8647caced04bf913505e681 authored almost 11 years ago
document.py: Uses HTML hack only for HTML parsers.

8e14bf27b2fd41f5a98b8db1d65a3eb49bfc6d0d authored almost 11 years ago
badapp.py: Adds HEAD request handler for Py 3.4.

4316736ef33cf0ba97c92288a437774b1a6bf3df authored almost 11 years ago
Bumps version to 0.32.

Merge commit 'd07172c0be2f910d9d95fddb487a4917aab73c70'

Conflicts:
wpull/version.py

bf2c971521752a97b39292f6838ce6f432bd23e7 authored almost 11 years ago