Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/webrecorder/warcio
Streaming WARC/ARC library for fast web archive IO
https://github.com/webrecorder/warcio
Document memory-efficient use of capture_http
jcushman opened this issue 25 days ago
jcushman opened this issue 25 days ago
bump version to 1.7.5
ikreymer opened this pull request 2 months ago
ikreymer opened this pull request 2 months ago
Handle deprecation of naive datetime functions like utcnow()
tw4l opened this pull request 4 months ago
tw4l opened this pull request 4 months ago
feat: try py 3.13, plus typos
wumpus opened this pull request 4 months ago
wumpus opened this pull request 4 months ago
Stream Recompressor
white-gecko opened this pull request 5 months ago
white-gecko opened this pull request 5 months ago
Add docs and https://warcio.readthedocs.io
Florents-Tselai opened this pull request 5 months ago
Florents-Tselai opened this pull request 5 months ago
py3.12 and setuptools
wumpus opened this issue 5 months ago
wumpus opened this issue 5 months ago
feat: test old ubuntu version
wumpus opened this pull request 5 months ago
wumpus opened this pull request 5 months ago
doc: document how to use brotli; test brotli
wumpus opened this pull request 6 months ago
wumpus opened this pull request 6 months ago
feat: add darwin and windows CI
wumpus opened this pull request 6 months ago
wumpus opened this pull request 6 months ago
feat: try darwin and windows [skip actions]
wumpus opened this pull request 6 months ago
wumpus opened this pull request 6 months ago
chore: finish py3.12
wumpus opened this pull request 6 months ago
wumpus opened this pull request 6 months ago
Test python 3.12
white-gecko opened this pull request 6 months ago
white-gecko opened this pull request 6 months ago
Remove superfluous ci step
white-gecko opened this pull request 6 months ago
white-gecko opened this pull request 6 months ago
Add very simple test for version argument and use importlib feature instead of deprecated pkg_resources for version
white-gecko opened this pull request 6 months ago
white-gecko opened this pull request 6 months ago
Run pytest directly. setup.py test was removed in setuptools 72.
white-gecko opened this pull request 6 months ago
white-gecko opened this pull request 6 months ago
Update codecov/codecov-action from v1 to v4
white-gecko opened this pull request 6 months ago
white-gecko opened this pull request 6 months ago
Adjust classifiers to the actually tested build matrix
white-gecko opened this pull request 6 months ago
white-gecko opened this pull request 6 months ago
Migrate from setup.py to poetry/pyproject.toml
white-gecko opened this pull request 6 months ago
white-gecko opened this pull request 6 months ago
Add dependency for setuptools, which is required by cli get_version command
white-gecko opened this pull request 6 months ago
white-gecko opened this pull request 6 months ago
Bump urllib3 from 1.25.11 to 1.26.19
dependabot[bot] opened this pull request 8 months ago
dependabot[bot] opened this pull request 8 months ago
Bump urllib3 from 1.25.11 to 1.26.18
dependabot[bot] opened this pull request 9 months ago
dependabot[bot] opened this pull request 9 months ago
Add test to HTTPS proxies
tw4l opened this issue 9 months ago
tw4l opened this issue 9 months ago
Migrate to GitHub Actions CI and resolve dependency issues
tw4l opened this pull request 9 months ago
tw4l opened this pull request 9 months ago
DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version
benoit74 opened this issue 12 months ago
benoit74 opened this issue 12 months ago
warcio recompress adds "WARC-Payload-Digest" to records without understanding them
acidus99 opened this issue about 1 year ago
acidus99 opened this issue about 1 year ago
warcio recompress adds WARC-Block-Digest fields to records without one
acidus99 opened this issue about 1 year ago
acidus99 opened this issue about 1 year ago
Fix typos discovered by codespell
cclauss opened this pull request about 1 year ago
cclauss opened this pull request about 1 year ago
Delete .travis.yml because Travis CI is no longer free
cclauss opened this pull request about 1 year ago
cclauss opened this pull request about 1 year ago
"warcio check" does not warn of illegal characters in field names or values, including LF
acidus99 opened this issue over 1 year ago
acidus99 opened this issue over 1 year ago
warcio accepts a bare LF everywhere a CRLF is required by the spec
acidus99 opened this issue over 1 year ago
acidus99 opened this issue over 1 year ago
"warcio check" incorrectly reporting payload digest failures for non-HTTP WARCs
acidus99 opened this issue over 1 year ago
acidus99 opened this issue over 1 year ago
doc bugs linking to source code files
wumpus opened this issue over 1 year ago
wumpus opened this issue over 1 year ago
Deimos/add https type
Deimos4Flare opened this pull request almost 2 years ago
Deimos4Flare opened this pull request almost 2 years ago
Add support for the 1995 NCSA 1.5.1 webserver
omgoo opened this pull request almost 2 years ago
omgoo opened this pull request almost 2 years ago
wget warc status code?
JohnMaguire opened this issue almost 2 years ago
JohnMaguire opened this issue almost 2 years ago
webrecorder fails to open IA warc file on MacOS X Ventura 13.2.1
theopathic opened this issue almost 2 years ago
theopathic opened this issue almost 2 years ago
warcio cannot write wet files
mraslann opened this issue over 2 years ago
mraslann opened this issue over 2 years ago
Patching WARCs using warcio
wsdookadr opened this issue over 2 years ago
wsdookadr opened this issue over 2 years ago
GitHub Action to lint Python code
cclauss opened this pull request over 2 years ago
cclauss opened this pull request over 2 years ago
Trying to write to closed file when using `requests.Session`
maxyousif15 opened this issue over 2 years ago
maxyousif15 opened this issue over 2 years ago
Empty WARC files when deploying warcio on Airflow
maxyousif15 opened this issue over 2 years ago
maxyousif15 opened this issue over 2 years ago
fix utf-8 encoding
tomeksporczyk opened this pull request over 2 years ago
tomeksporczyk opened this pull request over 2 years ago
warcio.exceptions.ArchiveLoadFailed: Unknown archive format
KyloPrem opened this issue almost 3 years ago
KyloPrem opened this issue almost 3 years ago
Documentation: Clarify that capture_http writer with filename has no get_stream methood
voltagex opened this issue almost 3 years ago
voltagex opened this issue almost 3 years ago
Issues with encoding of http-answers
Weyaaron opened this issue almost 3 years ago
Weyaaron opened this issue almost 3 years ago
Warcio does not support replay of sites hosted on NCSA 1.5
omgoo opened this issue almost 3 years ago
omgoo opened this issue almost 3 years ago
Record not followed by newline (conversion error)
mw0000 opened this issue about 3 years ago
mw0000 opened this issue about 3 years ago
`capture_http` fails in tests, but works otherwise
maxyousif15 opened this issue about 3 years ago
maxyousif15 opened this issue about 3 years ago
warcio check does not raise error when GZip records are truncated
anjackson opened this issue about 3 years ago
anjackson opened this issue about 3 years ago
extract entire warc file?
catharsis71 opened this issue over 3 years ago
catharsis71 opened this issue over 3 years ago
CLI Indexer: silently ignore brokenpipe signal
sebastian-nagel opened this pull request over 3 years ago
sebastian-nagel opened this pull request over 3 years ago
Add offline mode to skip tests that require an internet connection
Luflosi opened this pull request over 3 years ago
Luflosi opened this pull request over 3 years ago
Failsafe if it fails to % - encode headers
manueldeprada opened this pull request over 3 years ago
manueldeprada opened this pull request over 3 years ago
Offline tests
Luflosi opened this issue over 3 years ago
Luflosi opened this issue over 3 years ago
get_test_file missing from the PyPI release
Apteryks opened this issue over 3 years ago
Apteryks opened this issue over 3 years ago
Not compatible with WARC-files/records writtin by ArchiveSpark
parismic opened this issue over 3 years ago
parismic opened this issue over 3 years ago
quoted-string WARC header values are not parsed correctly
JustAnotherArchivist opened this issue over 3 years ago
JustAnotherArchivist opened this issue over 3 years ago
warcio does not preserve HTTP header whitespace
JustAnotherArchivist opened this issue over 3 years ago
JustAnotherArchivist opened this issue over 3 years ago
warcio mangles non-ASCII HTTP headers
JustAnotherArchivist opened this issue over 3 years ago
JustAnotherArchivist opened this issue over 3 years ago
Invalid WARCs are silently accepted instead of raising an error
JustAnotherArchivist opened this issue about 4 years ago
JustAnotherArchivist opened this issue about 4 years ago
Add version tags to the repository
JustAnotherArchivist opened this issue about 4 years ago
JustAnotherArchivist opened this issue about 4 years ago
Header methods do not work well with repeated headers
JustAnotherArchivist opened this issue about 4 years ago
JustAnotherArchivist opened this issue about 4 years ago
check_digests is under-documented, confusing everyone
wumpus opened this issue about 4 years ago
wumpus opened this issue about 4 years ago
Block digest verification fails on some copied record
dlazesz opened this issue about 4 years ago
dlazesz opened this issue about 4 years ago
warcio.bufferedreaders.BufferedReader.readline can get stuck in an infinite loop
ThomasA opened this issue about 4 years ago
ThomasA opened this issue about 4 years ago
Suspicion of incorrect handling of content length in WARC records
ThomasA opened this issue about 4 years ago
ThomasA opened this issue about 4 years ago
Migrate CI
wumpus opened this issue about 4 years ago
wumpus opened this issue about 4 years ago
add digest_algorithm option in writer
ThomasLiennard opened this pull request over 4 years ago
ThomasLiennard opened this pull request over 4 years ago
Support ZStd Compression for WARCs
ikreymer opened this issue over 4 years ago
ikreymer opened this issue over 4 years ago
Plans for adding type annotations?
dnaaun opened this issue over 4 years ago
dnaaun opened this issue over 4 years ago
capture_http/indexer tweaks
ikreymer opened this pull request over 4 years ago
ikreymer opened this pull request over 4 years ago
Enable writing block digests for warcinfo records
JustAnotherArchivist opened this pull request over 4 years ago
JustAnotherArchivist opened this pull request over 4 years ago
record.content_stream().read() alters the record and causes a write out to fail
thomaspreece opened this issue almost 5 years ago
thomaspreece opened this issue almost 5 years ago
Fix capture_http() with http and https proxies
ikreymer opened this pull request almost 5 years ago
ikreymer opened this pull request almost 5 years ago
capture_http doesn't work requests http/https proxies (was: capture_http and requests import order)
MaxYousif opened this issue almost 5 years ago
MaxYousif opened this issue almost 5 years ago
Fix ordering of arguments in README
baali opened this pull request almost 5 years ago
baali opened this pull request almost 5 years ago
Confusing documentation around request filter
baali opened this issue almost 5 years ago
baali opened this issue almost 5 years ago
Option to read the optional headers (languages-cld2, fetchTimeMs, charset-detected)
thomas0sae opened this issue almost 5 years ago
thomas0sae opened this issue almost 5 years ago
Develop->Master for 1.7.2
ikreymer opened this pull request almost 5 years ago
ikreymer opened this pull request almost 5 years ago
%-encoding fix: if header value does not contain a mutli-value separa…
ikreymer opened this pull request almost 5 years ago
ikreymer opened this pull request almost 5 years ago
Fix issues with read/write same record
ikreymer opened this pull request almost 5 years ago
ikreymer opened this pull request almost 5 years ago
ci: bound jinja2<3.0.0 for py27 fix, possible fix for #103
ikreymer opened this pull request almost 5 years ago
ikreymer opened this pull request almost 5 years ago
ArchiveIterator is adding bytes to payload HTTP header without updating Content-length
lpla opened this issue almost 5 years ago
lpla opened this issue almost 5 years ago
Jinja2 started using f-strings & stopped being installable in Python 2.7
wumpus opened this issue almost 5 years ago
wumpus opened this issue almost 5 years ago
Error reading WAT files
MohammedElsayyed opened this issue about 5 years ago
MohammedElsayyed opened this issue about 5 years ago
Using warcio with scrapy - what does the payload need to look like?
Chris8080 opened this issue about 5 years ago
Chris8080 opened this issue about 5 years ago
Use scrapy together with warcio
CuloArdido opened this issue about 5 years ago
CuloArdido opened this issue about 5 years ago
Add feature to skip past corrupted records in a warc.gz file
lukeplausin opened this pull request over 5 years ago
lukeplausin opened this pull request over 5 years ago
include record offsets in `warcio check` output
nlevitt opened this pull request over 5 years ago
nlevitt opened this pull request over 5 years ago
fix payload digest for chunked response in test warc
nlevitt opened this pull request over 5 years ago
nlevitt opened this pull request over 5 years ago
writer: use 1.1 revisit profile when writing WARC/1.1 revisits, fixes #94
ikreymer opened this pull request over 5 years ago
ikreymer opened this pull request over 5 years ago
UnicodeEncodeError when using 'warcio recompress'
zuny26 opened this issue over 5 years ago
zuny26 opened this issue over 5 years ago
Incorrect WARC-Profile for revisit records when using WARC/1.1
JustAnotherArchivist opened this issue over 5 years ago
JustAnotherArchivist opened this issue over 5 years ago
WARC-Payload-Digest should only be written for HTTP records
JustAnotherArchivist opened this issue over 5 years ago
JustAnotherArchivist opened this issue over 5 years ago
Undocumented and non-standardised default Content-Type application/warc-record
JustAnotherArchivist opened this issue over 5 years ago
JustAnotherArchivist opened this issue over 5 years ago
Provide API for parsed warcinfo payload in conjunction with the raw form
dlazesz opened this issue over 5 years ago
dlazesz opened this issue over 5 years ago
Do not allow writing records which content_stream() has been modified as it results in partial or empty content
dlazesz opened this issue over 5 years ago
dlazesz opened this issue over 5 years ago
Threadpool executor creates zero byte warc files
naumansiddiqui4 opened this issue over 5 years ago
naumansiddiqui4 opened this issue over 5 years ago
UTF-8 characters in Link header parameters raises exception
staylor-ds opened this issue over 5 years ago
staylor-ds opened this issue over 5 years ago