Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ArchiveTeam/wpull

Wget-compatible web downloader and crawler.
https://github.com/ArchiveTeam/wpull

Implements --warc-dedup.

Closes chfoo/wpull#36

0b0027226bc439b80b3cde369361e5657a9274a5 authored almost 11 years ago by Christopher Foo <[email protected]>
database.py: Renames URLDBRecord→URL, URLStrDBRecord→URLString.

1d206e4a356b33bd2f509c8e910345ac890dee02 authored almost 11 years ago by Christopher Foo <[email protected]>
item,processor: Adds LinkType constants.

e5521f2bcac612337220fe25e2e4219ab8539797 authored almost 11 years ago by Christopher Foo <[email protected]>
writer.py: Appends suffix if file in directory path or filename is directory.

Closes chfoo/wpull#71.

6778e6f02f8cea210f819d5e8995ab0ddfd4d51b authored almost 11 years ago by Christopher Foo <[email protected]>
Revert "readme.rst: Uses SVG version of Travis badge."

This reverts commit fff82fde5e77b695fc15ec181782a885f334c219.

Re: chfoo/wpull#92

[ci skip]

07f7bb9ac1fc6c26b4b31c3b85fcc1980dce42e2 authored almost 11 years ago by Christopher Foo <[email protected]>
readme.rst: Uses SVG version of Travis badge.

fff82fde5e77b695fc15ec181782a885f334c219 authored almost 11 years ago by Christopher Foo <[email protected]>
url.py: Relaxes URL query percent-encode escapes.

Closes chfoo/wpull#67. Closes chfoo/wpull#90.

738fa11e0dff52ae6fde26ef96c667221422cc76 authored almost 11 years ago by Christopher Foo <[email protected]>
app.py: Removes implicit span-hosts enable.

Superseded by strong redirect logic.

b951856c12248341cedaf874b7affc7b4c33e6bd authored almost 11 years ago by Christopher Foo <[email protected]>
connection.py: SSLError raises NetworkError intead of SSLVerficationError.

Closes chfoo/wpull#91.

a0cc26a19448f021c8756c85747ba1df266739a3 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.28.

Merge branch 'develop'

Conflicts:
wpull/version.py

94c86145a4a472bc7f74d0040ef30c57be608ae0 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.29a1.

7d11e79c2964586ac14985ec61710e75c72afa6a authored almost 11 years ago by Christopher Foo <[email protected]>
changelog.rst: Updates latest to 0.28.

28aa34b3c6b85490d54014bebbcf9654e6dfe5b9 authored almost 11 years ago by Christopher Foo <[email protected]>
options.py: Shows choices and defaults.

db829423e807ed4ae9b4af90c4de36996712aab8 authored almost 11 years ago by Christopher Foo <[email protected]>
ca-bundle.pem: Updates to Tue Jan 28 09:38:07 2014.

115eadb89051ed9c5d481b6e766f09b10c7d8c61 authored almost 11 years ago by Christopher Foo <[email protected]>
iostream.py: Closes connection if extra data from server.

80182eed7faef2fefbd9c1d52856af25d2a0897f authored almost 11 years ago by Christopher Foo <[email protected]>
connection.py: Catches Exception instead of naked.

Closes chfoo/wpull#83.

dd666f3b6906a813c3c4ca7815fa57027892c824 authored almost 11 years ago by Christopher Foo <[email protected]>
document.py: Reduces encoding file peek to 1MB.

Adds corrupt UTF-8 document unit test.

43bb21460469d8604b528af2f89b838d76cff655 authored almost 11 years ago by Christopher Foo <[email protected]>
util.py: Normalizes codec name later to not invoke chardet early.

890cf02d3e681b79d5a799219d8b697898c46722 authored almost 11 years ago by Christopher Foo <[email protected]>
{py,lua}_hook_script.{py,lua}: Adds malformed URL for extra unit testing.

de5bdd16bc46613ff4dfd582d77573e02d4e632f authored almost 11 years ago by Christopher Foo <[email protected]>
network.py: Uses ASCII in exception msg.

bbea81c072431f00574a8daa8368f8a32e741052 authored almost 11 years ago by Christopher Foo <[email protected]>
iostream.py: Catches update_handler errors.

181aff68665e9ca358611bf9ad9ea94b20ad8ad9 authored almost 11 years ago by Christopher Foo <[email protected]>
proxy.py,phantomjs.js: Rewrite all URLs.

e9a5f13173056ba389a534b5058db4a6d9c427df authored almost 11 years ago by Christopher Foo <[email protected]>
network.py: Includes the hostname in msg for DNSNotFound exceptions.

2bac82c0029627450c3d13e35ff66a40465178e2 authored almost 11 years ago by Christopher Foo <[email protected]>
app.py: Include --header for PhantomJS.

534fe38650d7ec53211c65d9bc0d9a31eeb4bcdd authored almost 11 years ago by Christopher Foo <[email protected]>
proxy.py,phantomjs.js: Uses iostream module. Fixes relative URLs rewriting.

c518ff477fdbdf9e9ec207a5d53a2b7260d5f04a authored almost 11 years ago by Christopher Foo <[email protected]>
Updates and fixes up the docstrings.

[ci skip]

f23b3c23c9517fe224f8c0c8732223639e2844ca authored almost 11 years ago by Christopher Foo <[email protected]>
Moves database.Status,URLRecord & engine.URLItem to new item module.

5f757523b52a026e94e7264cfdfc4d0776ff6d43 authored almost 11 years ago by Christopher Foo <[email protected]>
app.py: Disables PhantomJS disk-cache erroneously added in 10a200c.

Closes chfoo/wpull#88

b0afeed855d0c963d594436065a6bb21c2370724 authored almost 11 years ago by Christopher Foo <[email protected]>
Merge branch 'topic/iostream_rewrite' into develop

Conflicts:
doc/changelog.rst

eb13ecbcc0267c9b9e693c8beb9f150ffebbcd50 authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Adds entry about new iostream.

extended.py: Adds deprecation notice.

Closes chfoo/wpull#53

fcd3e074ba1ea3ed0d36479081d39c6b3941604f authored almost 11 years ago by Christopher Foo <[email protected]>
Adds payload equality unit test.

a7379ff12f9a3b165cfce5cf2648836f98dad517 authored almost 11 years ago by Christopher Foo <[email protected]>
writer.py: Fixes root not used in PathNamer.

cbaa5f9d966bf86be7ca4e1612935c476f49bbfc authored almost 11 years ago by Christopher Foo <[email protected]>
iostream.py: Loops recv to account for SSL.

2393121e3fff4f4f40003f44c73d46cf014579d0 authored almost 11 years ago by Christopher Foo <[email protected]>
connection.py: Implements proper keep-alive check. Skips impossible unit test.

278e39fd1e95414ba63ea86743cb806ed7cd947c authored almost 11 years ago by Christopher Foo <[email protected]>
connection.py: Monitors for stream close.

be0deec917c9129720c9b835793a59836e90bd9d authored almost 11 years ago by Christopher Foo <[email protected]>
iostream.py: Adds functions to monitor for socket close.

10837bf8c4498eea703d06e696f11e129db928fd authored almost 11 years ago by Christopher Foo <[email protected]>
connection.py: Closes connection if HTTP Connection is close.

d6bf56d1f250c98dc02d7a1cc7fca831713ea7ce authored almost 11 years ago by Christopher Foo <[email protected]>
iostream.py: Friendlier events error messages.

602b9e044ffc173db6cfcf264a424d2a97f90fd2 authored almost 11 years ago by Christopher Foo <[email protected]>
iostream: Logs spurious events as debug.

8310d0e37cf19b81303eeab501e497ec2c06876c authored almost 11 years ago by Christopher Foo <[email protected]>
badapp.py: Fixes TCP reset unit test.

272af831d5603f4ee5bb6d45f0af97232759598d authored almost 11 years ago by Christopher Foo <[email protected]>
iostream.py: WIP adds read/write fast path & exception handling.

47041d63bbd93a50bbcffce46aa7f0b7c2968f37 authored almost 11 years ago by Christopher Foo <[email protected]>
connection.py: Swaps in WIP iostream.

[ci skip]

8c8b13f01162db0e396e80d1f23f3605deb58639 authored almost 11 years ago by Christopher Foo <[email protected]>
iostream.py: Adds WIP SSLIOStream.

[ci skip]

602aadd1c476e6936016edc93e3aea86b514e494 authored almost 11 years ago by Christopher Foo <[email protected]>
iostream.py: Adds WIP DataBuffer and IOStream methods.

[ci skip]

f4a3ea17745fc23ec62d1cb7186c494a84ed1a57 authored almost 11 years ago by Christopher Foo <[email protected]>
web.py: Catch and rethrows bad redirect URL errors.

Closes chfoo/wpull#87

c63f609caf350e389eca38fd326af2a82b5b27f6 authored almost 11 years ago by Christopher Foo <[email protected]>
connection.py: Fixes unbound variables.

b5c08fd2b67e3983ed304b980300df60ab4b5068 authored almost 11 years ago by Christopher Foo <[email protected]>
Adds new WIP iostream module.

[ci skip]

8f6360e2f0ded74d73a52215d9c72fffd1972c26 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.27.

Merge commit '6d785f96f6251c711aa25b1daea1aacd887c548e'

Conflicts:
wpull/version.py

ac8d254e13fec987a6d8e36250fda2a730f31637 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.28a1.

[ci skip]

f935ee07f8ee0be6d57637f4d012823884f46c4f authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Updates latest to 0.27.

[ci skip]

6d785f96f6251c711aa25b1daea1aacd887c548e authored almost 11 years ago by Christopher Foo <[email protected]>
Don't show 0 B/s when no speed. Implements print speed for exit stats.

a130945668f2c77f7ebbfdc49edcee90b5cc17f6 authored almost 11 years ago by Christopher Foo <[email protected]>
url.py: hostname_with_port uses empty string if hostname is None.

2c4b651ab82d626e5b055d97c3ac632937dab2ac authored almost 11 years ago by Christopher Foo <[email protected]>
url.py: Fixes formatting of IPv6 address.

Closes chfoo/wpull#86.

a6d87817b6f668ddddf9797de6f9d98c13634ff8 authored almost 11 years ago by Christopher Foo <[email protected]>
document_test.py: Comments out encodings not in Py 2.6.

cbceaa39b747b823af95a185b37df832061c6ffa authored almost 11 years ago by Christopher Foo <[email protected]>
document.py: Massages encoding name for lxml. Falls back to latin1.

Closes chfoo/wpull#84.

9b72cd06dcf2328a182aa19164be08df4ab0d929 authored almost 11 years ago by Christopher Foo <[email protected]>
processor.py: Renames WebProcessorSession._parse_url() to parse_url().

ce57b5cfb57962a473bbea1167e528cab718c43a authored almost 11 years ago by Christopher Foo <[email protected]>
setup.py: Adds Py 2 and Py 3 classifiers.

622592eeeed03a0ccb2816cebdd8c03159cf7c5c authored almost 11 years ago by Christopher Foo <[email protected]>
Implements --https-only

Closes chfoo/wpull#72.

bb2aaccfebce6923c41fcbae8b4f016804e38dc2 authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes SSL connection unit tests that did not actually run.

f6967d4f5d304a0b2abd29a00aae252407878f35 authored almost 11 years ago by Christopher Foo <[email protected]>
connection.HostConnectionPool: Runs the loop with add_future().

cb75f724ecb8f59d86b355ecf1704eacd707f1e7 authored almost 11 years ago by Christopher Foo <[email protected]>
app.py: Warns when no-iri is used.

bcfea00037f3610f52906782998b72accfd52c71 authored almost 11 years ago by Christopher Foo <[email protected]>
processor.py: Fixes bad URL filter bypass for strong redirects.

Allows only SpanHostsFilter to be bypassed.
Closes chfoo/wpull#81.

d984a41f5dd2344f52dce16295f18ce6dc1b2c4c authored almost 11 years ago by Christopher Foo <[email protected]>
Implements --no-iri.

Closes chfoo/wpull#9

be8eb670cdb478d7164edecc4c9304861f4a6531 authored almost 11 years ago by Christopher Foo <[email protected]>
Implements --remote-encoding.

Re: chfoo/wpull#9

25a589c1d9c1ecc999a9ac1988f5fb0c63d04648 authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes option.get_argv_encoding() decode error.

d34beb1d80e0f5c07b680c85cf40ee195cbf93d5 authored almost 11 years ago by Christopher Foo <[email protected]>
util.py: Simplifies slee().

6ba14e3ea19884311997c85d55118b0db4fef46f authored almost 11 years ago by Christopher Foo <[email protected]>
util.py: Makes printable_bytes() Py 2 compatible.

289fb088a7c283c99f9d7c4ed3f7f7c64195cec4 authored almost 11 years ago by Christopher Foo <[email protected]>
Implements --local-encoding. Coerces argv. Fixes ignored URLs args.

Re: chfoo/wpull#9
coerces argv to bytes to get that value of the --local-encoding option.
Fixes ...

4fffd0cb9c50e597511a535d482a0baccd298747 authored almost 11 years ago by Christopher Foo <[email protected]>
document.py: Uses printable_bytes(). Checks also for .dhtml files.

6b412b510425c862d17bb24285757e169ba1fbce authored almost 11 years ago by Christopher Foo <[email protected]>
Adds util.printable_bytes().

a499d20e5f4b46f2a640526e37af7e07a66bebb3 authored almost 11 years ago by Christopher Foo <[email protected]>
phantomjs.js,processor.py: Scrolls only when page is dynamic.

695b8bf8d4ad9f402f7be191b7a01644b96f8729 authored almost 11 years ago by Christopher Foo <[email protected]>
phantomjs,processor: Implements smart scrolling.

Closes chfoo/wpull#46.

10a200cdefdb624164e0d2a6108c7a93311bbe8e authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.26.

Merge commit 'eec31fbc91d5e1f44da04ee39dda90251fd0d4cc'

Conflicts:
wpull/version.py

e892b3855ba23f6c7a5d8308521e22488534fa2e authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.27a1.

[ci skip]

d0a073f98e1a50f93987bc73fa5b33e715d863b0 authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Updates latest to 0.26.

eec31fbc91d5e1f44da04ee39dda90251fd0d4cc authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes span-hosts-allow unit test URLs to point to correct port.

647e53284fa5eec6050237b59ab6f291e4e014e8 authored almost 11 years ago by Christopher Foo <[email protected]>
Implements --span-hosts-allow.

Re: chfoo/wpull#61, chfoo/wpull#66.

084036fcd48da8c3a69c1e77971c82277c7e5806 authored almost 11 years ago by Christopher Foo <[email protected]>
url.py: RecursiveFilter accepts keyword args.

7084eddeee149777c4ca6318bbe683174d67870f authored almost 11 years ago by Christopher Foo <[email protected]>
Implements --max-filename-length.

Closes chfoo/wpull#39.

862d6d798aba86bfe2163c90e914e8317f58540c authored almost 11 years ago by Christopher Foo <[email protected]>
writer.py: Simplifies safe_filename() usage.

url_to_dir_path() renamed to url_to_dir_parts(). Now returns a list.
url_to_filename(), url_to_d...

2851562815e9832f7ed7840e9134a936788dbdce authored almost 11 years ago by Christopher Foo <[email protected]>
Merge branch 'issue/76-query-string-delim' into develop

Conflicts:
doc/changelog.rst

2053db7bba6ad8f3fb95175035c74da39bf310a6 authored almost 11 years ago by Christopher Foo <[email protected]>
url.py: Makes URLInfo.url a property.

Removes redundant URLInfo.normalize. Adds url.normalize() convenience
function.

6e8bc0e285cd93fd1d41155173e603b3d66842ba authored almost 11 years ago by Christopher Foo <[email protected]>
scraper.py: Renames urljoin to urljoin_safe and return None.

ba50c5729f6174b9024404fdc3780c8031f337f8 authored almost 11 years ago by Christopher Foo <[email protected]>
changelog.rst,terse_options.rst: Update for --sitemaps option.

dde7b6eb866390c38a854065cb1904e2b6d321ba authored almost 11 years ago by Christopher Foo <[email protected]>
Merge branch 'issue/69-sitemaps' into develop

Conflicts:
wpull/scraper.py

1173d65a491a6eb3ccbc2baffd1ded5c6cac1581 authored almost 11 years ago by Christopher Foo <[email protected]>
scraper.SitemapScraper: Catches XML parse errors.

1db1bb86acb5f07e9f4867f6c05ce2882688f224 authored almost 11 years ago by Christopher Foo <[email protected]>
url.py: Preseves whether query strings contain '='.

Closes chfoo/wpull#76

1bda12d9bb8464d1c02a4065bf80162962ee4f0a authored almost 11 years ago by Christopher Foo <[email protected]>
Adds some IPv6 URL unit tests.

6d170140e7b46f5c25258eda2e45367f51094d94 authored almost 11 years ago by Christopher Foo <[email protected]>
scraper.py: Catches and discards invalid URLs on urljoin.

Closes chfoo/wpull#77.

532674da0de14ebad44772057e4812633d4f5068 authored almost 11 years ago by Christopher Foo <[email protected]>
Implements --sitemaps. Closes chfoo/wpull#69.

262341d854efe210d7f375e6fbb1f86a965f77f4 authored almost 11 years ago by Christopher Foo <[email protected]>
document.py: Updates HTMLReader.is_html_file() to match libmagic rules.

66e41c75c53a6c3273d7e70d988204f00dfc90a0 authored almost 11 years ago by Christopher Foo <[email protected]>
options.py: Implements choices validation for --restrict-file-names.

27001a0dab3dcc5a69b57c0c0503cb5a50bf68dc authored almost 11 years ago by Christopher Foo <[email protected]>
readme: Fixes typo in requirements list.

084f1c5d95af52699c687a158ca3b1651ec01991 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.25.

Merge commit 'c4935e0175b64ad3246cd60a66260a37281d786c'

Conflicts:
wpull/version.py

0cfafba8a7b7a5bf11908d7d016b11d2287f152c authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.26a1.

[ci skip]

c107bee94628956b230d44e14f7bc150572b0c16 authored almost 11 years ago by Christopher Foo <[email protected]>
changelog: Updates latest to 0.25.

c4935e0175b64ad3246cd60a66260a37281d786c authored almost 11 years ago by Christopher Foo <[email protected]>
Adds command scripts. Closes chfoo/wpull#50.

a6ae4614491842288bfaf599d016658b602b9bb8 authored almost 11 years ago by Christopher Foo <[email protected]>
processor.py: Logs PhantomJS scroll actions as JSON dump.

Closes chfoo/wpull#49.

55b2325b14e811c7ecf8d072c414a6da82d2d3e7 authored almost 11 years ago by Christopher Foo <[email protected]>
Implements --content-on-error. Closes chfoo/wpull#37.

4cc66fde3799c0fc5f8e60a501233dd669737a34 authored almost 11 years ago by Christopher Foo <[email protected]>
Refactors WebProcessor.__init__ to use parameter object pattern.

2a895e644db1cd13edf987059fed91c630d323c8 authored almost 11 years ago by Christopher Foo <[email protected]>