Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/ArchiveTeam/wpull

Wget-compatible web downloader and crawler.
https://github.com/ArchiveTeam/wpull

readme.rst: Fixes link formatting.

2dcce2c61635b4ef4de793271c1c7c8763e7fa6d authored almost 11 years ago by Christopher Foo <[email protected]>
Merge branch 'topic/improve_requirements' into develop

4c8ebd007bd9cce2ee6d9ebda1996eff2f24a069 authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes syntax error in travis config.

8b8b1de5f545c427e26daa5929000e7a7fb29032 authored almost 11 years ago by Christopher Foo <[email protected]>
Adds lib3to2 as a Python 2 requirement.

bf42e361eae647615d5698b6b5c1eec2fe524e6a authored almost 11 years ago by Christopher Foo <[email protected]>
options.py: Decodes arguments before processing.

4b011dd5271052d7c34eceb50ef4285a20641465 authored almost 11 years ago by Christopher Foo <[email protected]>
Revert "app.py: Converts input URLs to strings."

This reverts commit d1992463d2bd566b2f6f0b30dc8c386a24eef28f.

827311d5b3dc09152cb8a63aad3e664bc6f7150a authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.10.1.

Merge branch 'hotfix'

Conflicts:
wpull/version.py

3bdb44d5c9e7b91085c0d6b8899077d3daf6325b authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version 0.11b1.

cad07a186b4d70836660cfd480c84b94d1bb616e authored almost 11 years ago by Christopher Foo <[email protected]>
app.py: Converts input URLs to strings.

d1992463d2bd566b2f6f0b30dc8c386a24eef28f authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.11a1.

2380b210f208dc190128b521d18aa6bf64fb45d5 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.10.

0023a01e3562acc7046ec1cd2fd916266d964b22 authored almost 11 years ago by Christopher Foo <[email protected]>
Merge branch 'develop'

6052893a8e0e32acc5d672a67b56c3d81957a6b3 authored almost 11 years ago by Christopher Foo <[email protected]>
url.py: Fixes naive percent-encoding detection.

Adds quasi_quote(), quasi_quote_plus(), is_percent_encoded().

4d100402b55142c5cafe501c5fa6f763bf5608c9 authored almost 11 years ago by Christopher Foo <[email protected]>
engine.py: Uses UTF-8 if not specified in record for URL item.

8efe8975a4d029ecacc2035e111327bd9c0f84e3 authored almost 11 years ago by Christopher Foo <[email protected]>
Implements processing IRI URLs (chfoo/wpull#7).

b6158687d582b332b0f3bb7ac6b000e33ad51dc9 authored almost 11 years ago by Christopher Foo <[email protected]>
util.py: Tests detect_encoding() more robustly.

184669d0f9b8dbae4f102bbcf18bbbaefd5fe3bd authored almost 11 years ago by Christopher Foo <[email protected]>
url.py: Implements URLInfo IRI encoding other than UTF-8.

7d67a6a5c2afe34dd6bd832515565e45a3bb51fe authored almost 11 years ago by Christopher Foo <[email protected]>
Implements NameValueRecord.parse to fallback to Latin-1 encoding.

d17c20aa110265b587b1db9b014e0a25791689d2 authored almost 11 years ago by Christopher Foo <[email protected]>
Requires chardet. Loosens requirements.txt.

32ff760f7ec85d34541d0fe969f2c34c6e90dec8 authored almost 11 years ago by Christopher Foo <[email protected]>
document.py: Uses HTTP header charset for encoding detection.

CSSScraper supports encoding detect.

4820cc19f1db74ed27ef437719c8f5f62bf57f99 authored almost 11 years ago by Christopher Foo <[email protected]>
http,util: Adds encoding detection routines.

Fixes to_str, to_bytes where decode/encode was present in Python 2
strings when it shouldn't be.

f3c845029b61c4850035576ed8c663703531b01e authored almost 11 years ago by Christopher Foo <[email protected]>
processor,hook: Rewritten scraping due to schema and scraper API changes.

0e51390ae38fa4cd72a9318dbf101e5a84a681af authored almost 11 years ago by Christopher Foo <[email protected]>
Schema change: Supports setting link_type, url_encoding into URL table.

dafc91427c2d5edbb14b0055cc1f42206a31ec8e authored almost 11 years ago by Christopher Foo <[email protected]>
document.py: BaseDocumentScraper.scrape returns an info dict.

a966bc1ec826c9bd143d208a9d30f17e50e083b1 authored almost 11 years ago by Christopher Foo <[email protected]>
document.py: Converts links from lxml to str for Python 2.

url.py: Asserts str in URLInfo.parse().

418f93f7c9f4d411ce41df7d70e02615d9f321c8 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.9.5.

02aab17c83416ff7cabe9a48946fc0fd93c060b8 authored almost 11 years ago by Christopher Foo <[email protected]>
Merge branch 'develop'

b37d90f02cfa6fbe52503e3dbcc36b38fbc8eb68 authored almost 11 years ago by Christopher Foo <[email protected]>
setup.py: Fixes installs not working.

Fixes backport configs.

c88e52b1b9e7faba1ea7047c2b57cf5048631126 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.9.4.

63b63d4b6b21a334d9c62eab1743c499ec655fbe authored almost 11 years ago by Christopher Foo <[email protected]>
Merge branch 'develop'

7f3c1a09efd537d4338e2d586929b9ab05059b62 authored almost 11 years ago by Christopher Foo <[email protected]>
writer.py: Uses url.qoute instead.

a4cad84f413562102b1a8afa74cb0c15a411f548 authored almost 11 years ago by Christopher Foo <[email protected]>
url.py: Adds percent-encoding normalization.

Normalize paths and query strings. Percent-encoding escapes are
uppercased. Adds uppercase_perce...

a0cfb152d57c90301b7893abadfa4d7dc3eddf8c authored almost 11 years ago by Christopher Foo <[email protected]>
processor.py: Makes _parse_url() classmethod.

2cfea47d4fb6505913f3edc5a0061e1db0360f04 authored almost 11 years ago by Christopher Foo <[email protected]>
processor.py: Fixes minor _parse_url() log formatting.

65dba30029cb489a58cdfa3587d0de30be106b16 authored almost 11 years ago by Christopher Foo <[email protected]>
url.py: Quotes URL paths and queries. Converts hostnames to ASCII.

1744391aa5119ab288591417060a94c0b242c0bd authored almost 11 years ago by Christopher Foo <[email protected]>
processor.py: Ignores malformed URLs.

7cba187cfb448def494ae6e1d89b93223a557198 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.9.3.

c4fd9c7c0c49224fb750721c60513af23e779e8c authored almost 11 years ago by Christopher Foo <[email protected]>
Merge branch 'develop'

232990cf9402f086b5ba2bc5ba92098d442e04f1 authored almost 11 years ago by Christopher Foo <[email protected]>
hook.py: Fixes should_fetch not setting URLItem status.

433e51a78a66bcb778898c8b6b851de2a7750ed6 authored almost 11 years ago by Christopher Foo <[email protected]>
Uses explicit UTF-8 encoding for output log.

0f562453aef09d292a680f182f8c426dcce6f27f authored almost 11 years ago by Christopher Foo <[email protected]>
Uses backslashreplace for ASCIIStreamWriter errors.

d7033a79c49cd6eda77da2116a103681b8d13998 authored almost 11 years ago by Christopher Foo <[email protected]>
Explicitly use UTF-8 encoding for WARC log.

c7aeadb49c9a238068ff56ca52fb58003b1d16c6 authored almost 11 years ago by Christopher Foo <[email protected]>
hook.py: Fixes items not marked as done with FINISH action.

299f1f089ccd5cdd8e25462aab135d3310c4864d authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes ProgressRecorder print. Implements --ascii-print.

Fixes ProgressRecorder to print to stderr instead of stdout.

Implements --ascii-print to print ...

d4f7cee6abaf3373c82fb2499ec0766cd768ca77 authored almost 11 years ago by Christopher Foo <[email protected]>
Explicitly closes warc log file after use.

ba75dd52212ae831da0be9419cda084ff9f20a44 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.9.2.

1fe9f2a6fc32e9a34da51e213ceaaa6f8164cdc0 authored almost 11 years ago by Christopher Foo <[email protected]>
Merge branch 'develop'

726f5b1316b70d4d3c0ee9c4260608950fdf1d36 authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes logging not working in WARC, output.

Ensures logger handlers are properly removed. Uses seperate
StreamHandler for console logging.

1745ffce68759281961dbbd354e57b592c1bae72 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.9.1.

Merge branch 'develop'

Conflicts:
wpull/version.py

294c8bc9862c42ebe19cfc19652dc388493347bc authored almost 11 years ago by Christopher Foo <[email protected]>
Works around SpooledTemporaryFile.name attribute missing.

27a54a73526cd62ec3e0c2818d5acb1a029d2b81 authored almost 11 years ago by Christopher Foo <[email protected]>
Hooks: Fixes Python/Lua types. Fix Lua library requires.

Uses Lua tables instead of dicts when needed. DLFCN to allow Lua
libraries to be linked properly.

f077c8c945bad26fe5792f6b73a1ec45c208ee24 authored almost 11 years ago by Christopher Foo <[email protected]>
engine.py: Adds debug log to Engine._get_next_url_record.

8c74f353cde5b7048e3441c91134437a7d587588 authored almost 11 years ago by Christopher Foo <[email protected]>
app.py: Adds DLFCN magic for lua library linking.

cf43a42701a364c173bcbce76024ec9f59708147 authored almost 11 years ago by Christopher Foo <[email protected]>
hook.py: Returns response body as info dict.

ce2558434adc1a7ba1637dfca344b73798a02ba7 authored almost 11 years ago by Christopher Foo <[email protected]>
Adds Body().to_dict().

3f3706166b3faf771ec5172b4225d80b10e23f46 authored almost 11 years ago by Christopher Foo <[email protected]>
url.py: Checks for empty URL before parsing.

e6eb965dfdddfce5945e2853ed4ee3f21c412bf0 authored almost 11 years ago by Christopher Foo <[email protected]>
__main__.py: Moves code into main().

e8c48284894d74eddbf3912da08af1d8d290b65b authored almost 11 years ago by Christopher Foo <[email protected]>
Updates readme example with --no-check-certificate.

9270e529a43497e5b63eb788aca3dce200a34047 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.10a1.

7eab71094f78633ce92555605584456a5fc285d9 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.9.

Merge branch 'develop'

Conflicts:
wpull/version.py

37fae5da8a1e67e13234423484e7e0dbb8245a8c authored almost 11 years ago by Christopher Foo <[email protected]>
util.py: Fixes backwards compat with list.

c969866fb57a4a01cec2b12a2db840f5e30a317d authored almost 11 years ago by Christopher Foo <[email protected]>
Implements basic SSL options. (chfoo/wpull#5)

Implements --no-check-certificate, --ca-certificate, --ca-directory,
--no-use-internal-ca-certs....

319b5082d853a223ed79715071f7e4f89f250cfa authored almost 11 years ago by Christopher Foo <[email protected]>
Adds filter_pem()

62f764ebe696c4e97d7858f5e10b5ecb5e68c9ad authored almost 11 years ago by Christopher Foo <[email protected]>
Orders the error code map and adds SSLVerficationError.

c8d26a86b665f98b9877f9befb0815ea35d78583 authored almost 11 years ago by Christopher Foo <[email protected]>
Uses the current directory for temp files.

The temporary directory on Unix-like systems may be too small and it's
safer to assume that the ...

60dd60bbdb1cf8a3f34bb4d68d13993f826b6dfd authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.9a1.

e03f56912abe954a14330ed9eb2eb5545fba50ae authored almost 11 years ago by Christopher Foo <[email protected]>
Adds .noseids to .gitignore.

429a09f7231717a9683bd86aa3d615309cc5a8ba authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.8.

Merge branch 'develop'

Conflicts:
wpull/version.py

1a4af956b695e475e334129bb8fc63739843ec6f authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes to_lua_type() in itertools.count for Python 2.6.

15f95015eecf138628ab4462c54e65cb59509a1c authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes Lua type issues under Python 2.

076aabbcb16520e6f5e2f81f8865130d866971fc authored almost 11 years ago by Christopher Foo <[email protected]>
readme.rst: Notes lunatic python not supported in Python 3.2.

94e5e9d93bac5b41ccdd9c17460378cc9bcfb05d authored almost 11 years ago by Christopher Foo <[email protected]>
Disables test for lua in Python 3.2.

32ef38241c528138cac24665ab295f693499ebf6 authored almost 11 years ago by Christopher Foo <[email protected]>
Adds more script tests. Fixes adding urls from scripts.

2791c163d89d79c6905dd4af70db8f6e932ebdbb authored almost 11 years ago by Christopher Foo <[email protected]>
Implements --lua-script for Lua scripting support. (chfoo/wpull#1)

e94badc2cdd65834db56ef3d43a3a0f89a397fff authored almost 11 years ago by Christopher Foo <[email protected]>
Reads Python script in binary mode.

94a1f5f1f443037eaa99711be39a7330aff0a386 authored almost 11 years ago by Christopher Foo <[email protected]>
Adds missing files.

5111a467fd17a969df3483f0601b7f654feab0f2 authored almost 11 years ago by Christopher Foo <[email protected]>
Implements --python-script for basic Python scripting hooks.

15fd8e5014614e88d9a76476528976bb4b6a9aca authored almost 11 years ago by Christopher Foo <[email protected]>
Adds to_dict() to URLRecord, Response, URLInfo.

ac69488253a12a16322771fb748c4a0b8c1df841 authored almost 11 years ago by Christopher Foo <[email protected]>
document_test.py: Adds non-HTML URLs.

da3f2efa04ecc10caa6499f096c8c899f1c2d127 authored almost 11 years ago by Christopher Foo <[email protected]>
Closes {Request,Response}.body.content_file created within Engine.

d30972b01f0d3bf1ef027edad7e297eb67745716 authored almost 11 years ago by Christopher Foo <[email protected]>
Splits WebProcessorSession._test_url_filter.

Adds _is_url_filtered and _filter_url.

8930b4665b119b039ad37891373914677fd89d4b authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.8a1.

e16fc4f0e9e2d06672e0ea0bde93a2839e31d924 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.7.

Merge branch 'develop'

Conflicts:
wpull/version.py

b96f66b880e5f7ad400dc8fc2641c20907fc6403 authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes robots.txt handling logic flow.

bb427ee8162c45a7f53e0ee0c690700f66c0ab06 authored almost 11 years ago by Christopher Foo <[email protected]>
Updates readme with links and third-party code notice.

f0438e4d6b93ac1533467a8d0f40c1ca870c535c authored almost 11 years ago by Christopher Foo <[email protected]>
Bundles robotexclusionrulesparser because it is not hosted in PyPI.

Resolves pip insecurity warning.

7dacae73c7dc0eb6aa89a8a606b25a1d63375fbb authored almost 11 years ago by Christopher Foo <[email protected]>
Moves the URLItem status setting concern out of Engine.

3fae569ad3b6971e9b28ce96c5ee05d28a41a97c authored almost 11 years ago by Christopher Foo <[email protected]>
Rewrites robots.txt concern into a mixin for WebProcessorSession.

cf45e830d624b64f4dbcb9621ccb948539dced32 authored almost 11 years ago by Christopher Foo <[email protected]>
Adds URLItem. Reduces tight coupling with Engine and ProcessorSessions.

eb0a3342053a8def079c8f6095403095dd051aa1 authored almost 11 years ago by Christopher Foo <[email protected]>
Moves RobotsTxtSubsession into robotstxt module.

cd86da17aaa9347de7b01294a58bb018c51d82b8 authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes chunked transfer encoding field match.

cca303ce333dd84a39b5b1c4ca221b2e8c1bb119 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.7a1.

edb72b4809c83bec19d0c47085a78a200b567596 authored almost 11 years ago by Christopher Foo <[email protected]>
Bumps version to 0.6.

Merge branch 'develop'

Conflicts:
wpull/version.py

ae347cfb9e7630d9a779de72b414cf1f6349969b authored almost 11 years ago by Christopher Foo <[email protected]>
badapp.py: Changes loop to finite.

486f99dae5f1ff42977d662d483cdc8aa4f6c743 authored almost 11 years ago by Christopher Foo <[email protected]>
Fixes truncate for older versions in WARCRecorder.

0a60f9af6ea2c1c069f2f19f1408204b28796dab authored almost 11 years ago by Christopher Foo <[email protected]>
Sets the default for --read-timeout to 900.

24a6c1b5806ad896c6fd51e9287b7ff71edc7b46 authored almost 11 years ago by Christopher Foo <[email protected]>
Implements --warc-append.

1299368c295f4dc1041709a4d4a66d223b0bacc1 authored almost 11 years ago by Christopher Foo <[email protected]>
WARCRecorder: Sets the required target-uri for log record.

8a9f70f96797280019a4d1a6e767c43c9f87eaa6 authored almost 11 years ago by Christopher Foo <[email protected]>
Implements --concurrent option.

9477e429645fe3a6362efd8f5f61a4d5c2a06cb6 authored almost 11 years ago by Christopher Foo <[email protected]>
Git ignores MANIFEST, .settings.

fbd2cb052f318b772756df57f5bd9ae00d68711d authored almost 11 years ago by Christopher Foo <[email protected]>