Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/webrecorder/warcio
Streaming WARC/ARC library for fast web archive IO
https://github.com/webrecorder/warcio
bump version to 1.7.5
ikreymer opened this pull request 29 days ago
ikreymer opened this pull request 29 days ago
Handle deprecation of naive datetime functions like utcnow()
tw4l opened this pull request 2 months ago
tw4l opened this pull request 2 months ago
feat: try py 3.13, plus typos
wumpus opened this pull request 3 months ago
wumpus opened this pull request 3 months ago
Stream Recompressor
white-gecko opened this pull request 4 months ago
white-gecko opened this pull request 4 months ago
Add docs and https://warcio.readthedocs.io
Florents-Tselai opened this pull request 4 months ago
Florents-Tselai opened this pull request 4 months ago
py3.12 and setuptools
wumpus opened this issue 4 months ago
wumpus opened this issue 4 months ago
feat: test old ubuntu version
wumpus opened this pull request 4 months ago
wumpus opened this pull request 4 months ago
doc: document how to use brotli; test brotli
wumpus opened this pull request 4 months ago
wumpus opened this pull request 4 months ago
feat: add darwin and windows CI
wumpus opened this pull request 4 months ago
wumpus opened this pull request 4 months ago
feat: try darwin and windows [skip actions]
wumpus opened this pull request 4 months ago
wumpus opened this pull request 4 months ago
chore: finish py3.12
wumpus opened this pull request 5 months ago
wumpus opened this pull request 5 months ago
Test python 3.12
white-gecko opened this pull request 5 months ago
white-gecko opened this pull request 5 months ago
Remove superfluous ci step
white-gecko opened this pull request 5 months ago
white-gecko opened this pull request 5 months ago
Add very simple test for version argument and use importlib feature instead of deprecated pkg_resources for version
white-gecko opened this pull request 5 months ago
white-gecko opened this pull request 5 months ago
Run pytest directly. setup.py test was removed in setuptools 72.
white-gecko opened this pull request 5 months ago
white-gecko opened this pull request 5 months ago
Update codecov/codecov-action from v1 to v4
white-gecko opened this pull request 5 months ago
white-gecko opened this pull request 5 months ago
Adjust classifiers to the actually tested build matrix
white-gecko opened this pull request 5 months ago
white-gecko opened this pull request 5 months ago
Migrate from setup.py to poetry/pyproject.toml
white-gecko opened this pull request 5 months ago
white-gecko opened this pull request 5 months ago
Add dependency for setuptools, which is required by cli get_version command
white-gecko opened this pull request 5 months ago
white-gecko opened this pull request 5 months ago
Bump urllib3 from 1.25.11 to 1.26.19
dependabot[bot] opened this pull request 7 months ago
dependabot[bot] opened this pull request 7 months ago
Bump urllib3 from 1.25.11 to 1.26.18
dependabot[bot] opened this pull request 8 months ago
dependabot[bot] opened this pull request 8 months ago
Add test to HTTPS proxies
tw4l opened this issue 8 months ago
tw4l opened this issue 8 months ago
Migrate to GitHub Actions CI and resolve dependency issues
tw4l opened this pull request 8 months ago
tw4l opened this pull request 8 months ago
DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version
benoit74 opened this issue 10 months ago
benoit74 opened this issue 10 months ago
warcio recompress adds "WARC-Payload-Digest" to records without understanding them
acidus99 opened this issue about 1 year ago
acidus99 opened this issue about 1 year ago
warcio recompress adds WARC-Block-Digest fields to records without one
acidus99 opened this issue about 1 year ago
acidus99 opened this issue about 1 year ago
Fix typos discovered by codespell
cclauss opened this pull request about 1 year ago
cclauss opened this pull request about 1 year ago
Delete .travis.yml because Travis CI is no longer free
cclauss opened this pull request about 1 year ago
cclauss opened this pull request about 1 year ago
"warcio check" does not warn of illegal characters in field names or values, including LF
acidus99 opened this issue over 1 year ago
acidus99 opened this issue over 1 year ago
warcio accepts a bare LF everywhere a CRLF is required by the spec
acidus99 opened this issue over 1 year ago
acidus99 opened this issue over 1 year ago
"warcio check" incorrectly reporting payload digest failures for non-HTTP WARCs
acidus99 opened this issue over 1 year ago
acidus99 opened this issue over 1 year ago
doc bugs linking to source code files
wumpus opened this issue over 1 year ago
wumpus opened this issue over 1 year ago
Deimos/add https type
Deimos4Flare opened this pull request over 1 year ago
Deimos4Flare opened this pull request over 1 year ago
Add support for the 1995 NCSA 1.5.1 webserver
omgoo opened this pull request almost 2 years ago
omgoo opened this pull request almost 2 years ago
wget warc status code?
JohnMaguire opened this issue almost 2 years ago
JohnMaguire opened this issue almost 2 years ago
webrecorder fails to open IA warc file on MacOS X Ventura 13.2.1
theopathic opened this issue almost 2 years ago
theopathic opened this issue almost 2 years ago
warcio cannot write wet files
mraslann opened this issue over 2 years ago
mraslann opened this issue over 2 years ago
Patching WARCs using warcio
wsdookadr opened this issue over 2 years ago
wsdookadr opened this issue over 2 years ago
GitHub Action to lint Python code
cclauss opened this pull request over 2 years ago
cclauss opened this pull request over 2 years ago
Trying to write to closed file when using `requests.Session`
maxyousif15 opened this issue over 2 years ago
maxyousif15 opened this issue over 2 years ago
Empty WARC files when deploying warcio on Airflow
maxyousif15 opened this issue over 2 years ago
maxyousif15 opened this issue over 2 years ago
fix utf-8 encoding
tomeksporczyk opened this pull request over 2 years ago
tomeksporczyk opened this pull request over 2 years ago
warcio.exceptions.ArchiveLoadFailed: Unknown archive format
KyloPrem opened this issue over 2 years ago
KyloPrem opened this issue over 2 years ago
Documentation: Clarify that capture_http writer with filename has no get_stream methood
voltagex opened this issue over 2 years ago
voltagex opened this issue over 2 years ago
Issues with encoding of http-answers
Weyaaron opened this issue almost 3 years ago
Weyaaron opened this issue almost 3 years ago
Warcio does not support replay of sites hosted on NCSA 1.5
omgoo opened this issue almost 3 years ago
omgoo opened this issue almost 3 years ago
Record not followed by newline (conversion error)
mw0000 opened this issue almost 3 years ago
mw0000 opened this issue almost 3 years ago
`capture_http` fails in tests, but works otherwise
maxyousif15 opened this issue about 3 years ago
maxyousif15 opened this issue about 3 years ago
warcio check does not raise error when GZip records are truncated
anjackson opened this issue about 3 years ago
anjackson opened this issue about 3 years ago
extract entire warc file?
catharsis71 opened this issue about 3 years ago
catharsis71 opened this issue about 3 years ago
CLI Indexer: silently ignore brokenpipe signal
sebastian-nagel opened this pull request about 3 years ago
sebastian-nagel opened this pull request about 3 years ago
Add offline mode to skip tests that require an internet connection
Luflosi opened this pull request over 3 years ago
Luflosi opened this pull request over 3 years ago
Failsafe if it fails to % - encode headers
manueldeprada opened this pull request over 3 years ago
manueldeprada opened this pull request over 3 years ago
Offline tests
Luflosi opened this issue over 3 years ago
Luflosi opened this issue over 3 years ago
get_test_file missing from the PyPI release
Apteryks opened this issue over 3 years ago
Apteryks opened this issue over 3 years ago
Not compatible with WARC-files/records writtin by ArchiveSpark
parismic opened this issue over 3 years ago
parismic opened this issue over 3 years ago
quoted-string WARC header values are not parsed correctly
JustAnotherArchivist opened this issue over 3 years ago
JustAnotherArchivist opened this issue over 3 years ago
warcio does not preserve HTTP header whitespace
JustAnotherArchivist opened this issue over 3 years ago
JustAnotherArchivist opened this issue over 3 years ago
warcio mangles non-ASCII HTTP headers
JustAnotherArchivist opened this issue over 3 years ago
JustAnotherArchivist opened this issue over 3 years ago
Invalid WARCs are silently accepted instead of raising an error
JustAnotherArchivist opened this issue about 4 years ago
JustAnotherArchivist opened this issue about 4 years ago
Add version tags to the repository
JustAnotherArchivist opened this issue about 4 years ago
JustAnotherArchivist opened this issue about 4 years ago
Header methods do not work well with repeated headers
JustAnotherArchivist opened this issue about 4 years ago
JustAnotherArchivist opened this issue about 4 years ago
check_digests is under-documented, confusing everyone
wumpus opened this issue about 4 years ago
wumpus opened this issue about 4 years ago
Block digest verification fails on some copied record
dlazesz opened this issue about 4 years ago
dlazesz opened this issue about 4 years ago
warcio.bufferedreaders.BufferedReader.readline can get stuck in an infinite loop
ThomasA opened this issue about 4 years ago
ThomasA opened this issue about 4 years ago
Suspicion of incorrect handling of content length in WARC records
ThomasA opened this issue about 4 years ago
ThomasA opened this issue about 4 years ago
Migrate CI
wumpus opened this issue about 4 years ago
wumpus opened this issue about 4 years ago
add digest_algorithm option in writer
ThomasLiennard opened this pull request about 4 years ago
ThomasLiennard opened this pull request about 4 years ago
Support ZStd Compression for WARCs
ikreymer opened this issue over 4 years ago
ikreymer opened this issue over 4 years ago
Plans for adding type annotations?
dnaaun opened this issue over 4 years ago
dnaaun opened this issue over 4 years ago
capture_http/indexer tweaks
ikreymer opened this pull request over 4 years ago
ikreymer opened this pull request over 4 years ago
Enable writing block digests for warcinfo records
JustAnotherArchivist opened this pull request over 4 years ago
JustAnotherArchivist opened this pull request over 4 years ago
record.content_stream().read() alters the record and causes a write out to fail
thomaspreece opened this issue over 4 years ago
thomaspreece opened this issue over 4 years ago
Fix capture_http() with http and https proxies
ikreymer opened this pull request over 4 years ago
ikreymer opened this pull request over 4 years ago
capture_http doesn't work requests http/https proxies (was: capture_http and requests import order)
MaxYousif opened this issue over 4 years ago
MaxYousif opened this issue over 4 years ago
Fix ordering of arguments in README
baali opened this pull request almost 5 years ago
baali opened this pull request almost 5 years ago
Confusing documentation around request filter
baali opened this issue almost 5 years ago
baali opened this issue almost 5 years ago
Option to read the optional headers (languages-cld2, fetchTimeMs, charset-detected)
thomas0sae opened this issue almost 5 years ago
thomas0sae opened this issue almost 5 years ago
Develop->Master for 1.7.2
ikreymer opened this pull request almost 5 years ago
ikreymer opened this pull request almost 5 years ago
%-encoding fix: if header value does not contain a mutli-value separa…
ikreymer opened this pull request almost 5 years ago
ikreymer opened this pull request almost 5 years ago
Fix issues with read/write same record
ikreymer opened this pull request almost 5 years ago
ikreymer opened this pull request almost 5 years ago
ci: bound jinja2<3.0.0 for py27 fix, possible fix for #103
ikreymer opened this pull request almost 5 years ago
ikreymer opened this pull request almost 5 years ago
ArchiveIterator is adding bytes to payload HTTP header without updating Content-length
lpla opened this issue almost 5 years ago
lpla opened this issue almost 5 years ago
Jinja2 started using f-strings & stopped being installable in Python 2.7
wumpus opened this issue almost 5 years ago
wumpus opened this issue almost 5 years ago
Error reading WAT files
MohammedElsayyed opened this issue almost 5 years ago
MohammedElsayyed opened this issue almost 5 years ago
Using warcio with scrapy - what does the payload need to look like?
Chris8080 opened this issue about 5 years ago
Chris8080 opened this issue about 5 years ago
Use scrapy together with warcio
CuloArdido opened this issue about 5 years ago
CuloArdido opened this issue about 5 years ago
Add feature to skip past corrupted records in a warc.gz file
lukeplausin opened this pull request about 5 years ago
lukeplausin opened this pull request about 5 years ago
include record offsets in `warcio check` output
nlevitt opened this pull request about 5 years ago
nlevitt opened this pull request about 5 years ago
fix payload digest for chunked response in test warc
nlevitt opened this pull request about 5 years ago
nlevitt opened this pull request about 5 years ago
writer: use 1.1 revisit profile when writing WARC/1.1 revisits, fixes #94
ikreymer opened this pull request about 5 years ago
ikreymer opened this pull request about 5 years ago
UnicodeEncodeError when using 'warcio recompress'
zuny26 opened this issue over 5 years ago
zuny26 opened this issue over 5 years ago
Incorrect WARC-Profile for revisit records when using WARC/1.1
JustAnotherArchivist opened this issue over 5 years ago
JustAnotherArchivist opened this issue over 5 years ago
WARC-Payload-Digest should only be written for HTTP records
JustAnotherArchivist opened this issue over 5 years ago
JustAnotherArchivist opened this issue over 5 years ago
Undocumented and non-standardised default Content-Type application/warc-record
JustAnotherArchivist opened this issue over 5 years ago
JustAnotherArchivist opened this issue over 5 years ago
Provide API for parsed warcinfo payload in conjunction with the raw form
dlazesz opened this issue over 5 years ago
dlazesz opened this issue over 5 years ago
Do not allow writing records which content_stream() has been modified as it results in partial or empty content
dlazesz opened this issue over 5 years ago
dlazesz opened this issue over 5 years ago
Threadpool executor creates zero byte warc files
naumansiddiqui4 opened this issue over 5 years ago
naumansiddiqui4 opened this issue over 5 years ago
UTF-8 characters in Link header parameters raises exception
staylor-ds opened this issue over 5 years ago
staylor-ds opened this issue over 5 years ago
No block digest written for warcinfo records
JustAnotherArchivist opened this issue over 5 years ago
JustAnotherArchivist opened this issue over 5 years ago