Ecosyste.ms: OpenCollective

An open API service for software projects hosted on Open Collective.

github.com/mwmbl/mwmbl

An open source, non-profit web search engine
https://github.com/mwmbl/mwmbl

Enough preprocessing

cfca015efe93bce190d3b1f0089dce1f8c0cc103 authored over 2 years ago
Run preprocessing

003cd217f4dc9c0bc587873b644e2dad8b9fc674 authored over 2 years ago
Just index a single page for now

bcd31326b8a1bb544f0e2e257c6e3467c9ee2459 authored over 2 years ago
Use a more specific exception in case we're discarding ones we shouldn't

a471bc2437ded7aea51b2cef9a7abec4c03db1fb authored over 2 years ago
Run update

ce9f52267a80c9da6223867819bb7b52df15e873 authored over 2 years ago
Catch corrupt data

09a9390c92ef03edd69bb2fb8bbfc64da818b750 authored over 2 years ago
Add util script to send batch; add logging

93307ad1ec2d31947f683073904cb38b00863ad5 authored over 2 years ago
Merge pull request #66 from mwmbl/fix-unicode-encode-error

Fix unicode encode error; bigger index

3c97fdb3a0a6211a8e1d1f0bdbbb9292f0787c88 authored over 2 years ago
Fix unicode encoding error

680fe1ca0c9d9b8f3793bbcec5572e5fe843909d authored over 2 years ago
Merge pull request #61 from milovanderlinden/issue-60-consistent-use-of-env-vars

Fix issue #60

e1e1b0057bc1312ec6966186427064d595d53454 authored over 2 years ago
10x index size

fee5cbb40050636d633bd5cd7a23e49672539ea9 authored over 2 years ago
Fix issue #60

dfd3f3962ec7bcfba6b7b20f1c6f44e114fc1740 authored over 2 years ago
Don't include web.archive.org as a curated domain

dba50b372fb50c74d31d656db64ba4dacef3a359 authored over 2 years ago
Merge pull request #58 from mwmbl/improve-ranking-for-root-domains

Improve ranking for root domains

2e40ae1dcaa8c6716785113ccb408ff554bb9fd0 authored over 2 years ago
Add a URL length penalty

43815c73225598fe2d66798adab45f6a9737a960 authored over 2 years ago
Score domain and path, weight components

a3ff2f537f3988966e14aae270be931f97fc6ccf authored over 2 years ago
Merge pull request #57 from mwmbl/clear-indexed-documents

Delete documents that have been preprocessed from the database to sav…

4b5df76ca5c41e8b024136f80a2e6667974805b6 authored over 2 years ago
Delete documents that have been preprocessed from the database to save space

9482ae5028d31f9ec48ad7b8b2038a5ed74ef788 authored over 2 years ago
Merge pull request #56 from mwmbl/allow-links-from-unknown-domains

Allow crawling links from unknown domains

6fa192daa40faed0e2e06664c1227165dd9d8cba authored over 2 years ago
Record new batches as being local

f9fefa0b62221a9a39ef780c70ce37874bd59661 authored over 2 years ago
Allow crawling links from unknown domains

e578d557891153b780867244c69ea31cd8cedc5f authored over 2 years ago
Merge pull request #55 from mwmbl/index-continuously

Index continuously

4967830ae1cbfb53d7eacba91f6e1dde6ea9f273 authored over 2 years ago
Don't require a slash for the search URL

db1aa1a9280db4c3b92100a9facb639dfb5e1164 authored over 2 years ago
Actually used the passed in timestamp

24f82a3c2f783056969e6d6c4734c8450446664e authored over 2 years ago
CONFIRMED no longer exists

d47457b834916ceefe2d2ca41d62f84db0054db9 authored over 2 years ago
Fix log message

b6f29548dba0e06052397708d9d6f52a56fe2715 authored over 2 years ago
Wrap background tasks in try/except

e9835edc452f71a8d61c05df40261985cdfaa389 authored over 2 years ago
Allow batches to fail silently

6ea3a95684ba601a218e5491f0787b0bcfcc8309 authored over 2 years ago
Queue the right type of batch

ddc8664c11d8081cd25b582e74153a5d8a882bb0 authored over 2 years ago
Queue new batches for indexing

2b52b50569f96b36e94e9f2f59631aad51d35ad4 authored over 2 years ago
Correctly insert new URLs

b8c495bda860aa916168c5650a4db870dcfae2d3 authored over 2 years ago
Prevent deadlock when inserting URLs

955d650cf434527b9c1152d407e73970c84ac209 authored over 2 years ago
Cache batches; start a background process

1457cba2c2a931b6d390e0288bd5774b65ce6536 authored over 2 years ago
Use different scores for same domain links

ff2312a5ca211813b7e98419d6a4263b0eb1f7e3 authored over 2 years ago
Fix logic in found URL logic SQL and allow crawling URLs crawled by one user for now

36b168a8f676f2da943ca396a16390a3e1188295 authored over 2 years ago
Temporarily disable startup background processes; add root domains; check for empty batches.

5e1ec9ccd54583f9a8cc74dec7b7dc69d51f1132 authored over 2 years ago
Investigate duplication of URLs in batches

e27d749e18c59ed5d98ca23c357ef63c97141b68 authored over 2 years ago
Add a script to count urls in the index

eb571fc5fee75e71fd3f44cfaa9c62d968f8f876 authored over 2 years ago
Make more robust

1d9b5cb3ca9ca59ca58abdf0392cc5837f7cb861 authored over 2 years ago
Update queued pages in the index

30e1e190729eead94916da9a4b06dad66aa9580d authored over 2 years ago
Tokenize documents and store pages to be added to the index

4330551e0fe6302914dbfddc8d8c9712b1f51cc0 authored over 2 years ago
WIP: index continuously. Retrieve batches and store in Postgres

9594915de12ddffed4ed8278fc585e96933fe309 authored over 2 years ago
Factor out connection code

b8b605daedd12b6f055ad94155bd354f289af201 authored over 2 years ago
CORS is handled by nginx

c31cea710f75fd0f1c3bd65c86bb89b18c0cdda9 authored over 2 years ago
Don't add CORS on the python side

96da534ca561ee80dc3fc04c89921feb9daffdd0 authored over 2 years ago
Use updated CORS settings

9dbb724ba9cce1561c58eb0a50cd3bee9af9a76e authored over 2 years ago
Remove seemingly extraneous backslashes

e3baf879183b5d67f4af0944643b52be199c34d7 authored over 2 years ago
Use an updated template

c245be775b86a0d845fdfaaa7af94c076aa45a62 authored over 2 years ago
Remove problematic SSL_DIRECTIVES line

01772517dae3410ffcaf7dc2239c7848190dac59 authored over 2 years ago
Enable CORS in nginx

a67ca7b298c870ba9e123d479e94523b03ac5ca8 authored over 2 years ago
Use the dokku app storage

866c17f2dcf44da0be22a3416db1a37a3f866575 authored over 2 years ago
Start processing historical data on startup

16c26920997f53399a66b033525545fae9d22391 authored over 2 years ago
Add script to process historical data

d4009506892a6113152d0d300e4c2c4f3cf98d5a authored over 2 years ago
Expose the port

eb1c59990cc1c2554fd5b2272ab26c1a6a4e44ee authored over 2 years ago
Use the correct port for dokku

d7c6dcb5c2d414e47125bd92caa6adfe7b7db3ae authored over 2 years ago
Use a database URL env var

77088a8a1b14b02d86183c1a29848537dadf4471 authored over 2 years ago
Put the resources in the package

476481c5f817ca2b94b148489bd9d5ce2f714908 authored over 2 years ago
Copy the resources

505e7521d45e50c6831db30819aebf804ddb2c2c authored over 2 years ago
Fix relative path

5ea9efcfa2fd6d3f3ab44846a247c633b592598b authored over 2 years ago
Don't depend on existing data

1c7420e5fb534e63dfeb09749be9df5da1be80ba authored over 2 years ago
Fix boto3 dependency

a003914e91433c97e523093c4db2afc199f7df45 authored over 2 years ago
Update Dockerfile for changes

363103468e67d91c5a6d8fc5fa941f651ec2fb63 authored over 2 years ago
Combine crawler and search servers

e2eb4050837f1b29889a3c96d77a77932c5909fc authored over 2 years ago
Merge pull request #53 from mwmbl/record-historical-batches

Record historical batches

77716576842eacf434e941f2bddeef3c1bcdc95b authored over 2 years ago
Use new server

14107acc754b2e3ec78cd6dd66cd4e4d6f7985fb authored over 2 years ago
Record historical batches via the API

aaca8b2b6e79ee87c8dc6ca36aaeeeda9f9dd110 authored over 2 years ago
Merge pull request #51 from mwmbl/learning-to-rank

Learning to rank

617666e3b70c5df37426c5e3c857e5d850783665 authored over 2 years ago
Refactor feature extraction

770b4b945bae2dcdc0b44ab63345be1e458cf180 authored over 2 years ago
Make order_results public

87d8b40cad09dccd5ad977ffe3b2d7fc4088aeb5 authored over 2 years ago
Refactor to allow LTR ranker

229819e57e2c1e9d80a4ffbff55bf6753c75a69f authored almost 3 years ago
Get features for each string separately

94287cec012357739c4ae0fbdb0b8dd7e046612f authored almost 3 years ago
Add domain score feature

4740d89c6a1809156b4ae2095c336f81320e1972 authored almost 3 years ago
Implement learning to rank feature extraction and thresholding

af6a28fac3688be8639da248e3cd39093c9a3270 authored almost 3 years ago
Make get_results() public for learning to rank

2d334074af66b3d762cab66bf3bfde9eff00b1df authored almost 3 years ago
Experiment with score variations (best is simple weighted domain score)

ee5ca6bcf6bbe2d78f296227d6b94df175034345 authored almost 3 years ago
Use addition instead of multiplication

6fb310c3638f1277a916f6ad57a1bf1c98afab04 authored almost 3 years ago
Scale by 0.99

4e6516ccf179c15240c36b1491223be0ddad650b authored almost 3 years ago
Handle empty list

f5afbed2e5364befd4ff91d678bb443f25af0765 authored almost 3 years ago
Rank using item score as well as match score

efafec521477a656c53d25e72a5ace19d9bab04a authored almost 3 years ago
Dedupe before indexing

e1e9e404a3f2c1f41947c61ea7a2ca37435bafa8 authored almost 3 years ago
Index link counts

f5b20d01285ce0b39dca1a8aa0c365b878c00200 authored almost 3 years ago
Store computed link counts

b5b2005323e6fdbd73f4d07107e445b4fb074356 authored almost 3 years ago
Remove unused code

00d18c34749714eda6ce5df712433e4227903a40 authored almost 3 years ago
Merge pull request #47 from mwmbl/include-metadata-in-index

Include metadata in index

d19e0e51f75b3d5912a71d47c1072f7c93709125 authored almost 3 years ago
Fixes to mwmbl API for changes to the index

04a33a134bb06264142c13ab9187bbc0239a0415 authored almost 3 years ago
Fixes for API changes

ae3b334a7fce469f764afee161e1da334e53c81c authored almost 3 years ago
Use JSON instead of struct to store metadata

326f7e3d7f412068d5f1194805f9c6c4c2e40352 authored almost 3 years ago
WIP: include metadata in index - using struct approach

e6273c7f7636e8afa7b59625705e5352c57e8417 authored almost 3 years ago
Merge pull request #46 from mwmbl/refactor-for-evaluation

Refactor to enable easier evaluation

82c46b50bc5a6acb6499d74816f5b5f78121c6f3 authored almost 3 years ago
Refactor to enable easier evaluation

e03e379ccf08f30860085d51a8ffa3bf4e4f3d15 authored almost 3 years ago
Merge pull request #42 from mwmbl/update-readme-for-new-crawler

Update readme for recent changes

4e36ee198cc29ea6f20e9f4e4177c8b87d6f0752 authored almost 3 years ago
Update readme for recent changes

c4e86ce3136a53cf08a347644d22eef42a52e90a authored almost 3 years ago
Merge branch 'master' of github.com:mwmbl/mwmbl

51f2dd2690ba65dd01adc62c396745ad47a1c7d1 authored almost 3 years ago
Merge pull request #41 from ColinEspinas/add-branding

Add branding to readme

9f78d19c8c0a63801de9103fd497939e583623dc authored almost 3 years ago
docs: better title display on readme

b2e01d33e82040aa65f064e4da4c2acdbc563ca7 authored almost 3 years ago
Merge branch 'mwmbl:master' into add-branding

95c9bcfe3bc77f7d8a165d50b497524d6ee6ab9e authored almost 3 years ago
docs: added branding to readme and required assets files

cd57372a84e4c29eeaada45a8498c266f4aedd79 authored almost 3 years ago
New index; more pages

6e5e56f99a7a78a76aad208329a1cc3b3bd5e109 authored almost 3 years ago
Merge pull request #39 from mwmbl/analyse-links

Analyse links

bdf0fd1797fd59a3980ba6a73a06c662f02f124a authored almost 3 years ago
Count unique domains instead of links

2fc999b4027978cd313f5abfd5ce1c7208b91cc9 authored almost 3 years ago