Ecosyste.ms: OpenCollective
An open API service for software projects hosted on Open Collective.
github.com/mwmbl/mwmbl
An open source, non-profit web search engine
https://github.com/mwmbl/mwmbl
Optimise URL update
77e39b4a89f59101bd603d68ea04fad7750f3ce0 authored about 2 years ago
77e39b4a89f59101bd603d68ea04fad7750f3ce0 authored about 2 years ago
Speed up domain parsing
66700f8a3e36c50d9387903967b19d2dfb167080 authored about 2 years ago
66700f8a3e36c50d9387903967b19d2dfb167080 authored about 2 years ago
Try and balance URLs before adding to queue
2b36f2ccc17628cc289829d38f903ea80ba2acba authored about 2 years ago
2b36f2ccc17628cc289829d38f903ea80ba2acba authored about 2 years ago
Create a custom URL queue
603fcd4eb2b707a1df3f2baf2f12838cc22e89e4 authored about 2 years ago
603fcd4eb2b707a1df3f2baf2f12838cc22e89e4 authored about 2 years ago
Return updated URLs
01f08fd88d5205930635c7920541f2171139192a authored about 2 years ago
01f08fd88d5205930635c7920541f2171139192a authored about 2 years ago
Don't try and update an empty list of URLs
bd0cc3863e7fcbb2472a4f84bceaec121f5a4642 authored about 2 years ago
bd0cc3863e7fcbb2472a4f84bceaec121f5a4642 authored about 2 years ago
Update URL queue separately from the other background process to speed it up
d347a17d634773b827f67615f4b37770eb956eb9 authored about 2 years ago
d347a17d634773b827f67615f4b37770eb956eb9 authored about 2 years ago
Fix some bugs in URL fetching query
7bd12c1ead30242b82ee6cd24e1caf5ea79e43ee authored about 2 years ago
7bd12c1ead30242b82ee6cd24e1caf5ea79e43ee authored about 2 years ago
Fix postgres install
a50f1d8ae35d1c2b391f1675a4aa147594d8f9d2 authored about 2 years ago
a50f1d8ae35d1c2b391f1675a4aa147594d8f9d2 authored about 2 years ago
Install postgres client
1ab16b1fb4c0bf68c9662fdb9efad0eab32ad960 authored about 2 years ago
1ab16b1fb4c0bf68c9662fdb9efad0eab32ad960 authored about 2 years ago
Add core domains
dda5a25ad0484e0891ebb4cb9cd75b0cc06371d8 authored about 2 years ago
dda5a25ad0484e0891ebb4cb9cd75b0cc06371d8 authored about 2 years ago
Exclude google plus
ab37bbe0a5f5d2c135581059d2acd2eedcb45381 authored about 2 years ago
ab37bbe0a5f5d2c135581059d2acd2eedcb45381 authored about 2 years ago
Allow posting extra links with lower score weighting
2336ed7f7d05bc9a070824aa2af3753ac1d7d382 authored about 2 years ago
2336ed7f7d05bc9a070824aa2af3753ac1d7d382 authored about 2 years ago
Check the domain is correct, potential bug in psql
6edf48693b7e8bbef524ef23a4b1b236b4627772 authored about 2 years ago
6edf48693b7e8bbef524ef23a4b1b236b4627772 authored about 2 years ago
Tidy, improve logging
b7984684c9ee6e2641a5292a0b249b6928c1ff83 authored about 2 years ago
b7984684c9ee6e2641a5292a0b249b6928c1ff83 authored about 2 years ago
Update the URL queue earlier
7c14cd99f8336723c44f7d62e3dd60b5e9d1008a authored about 2 years ago
7c14cd99f8336723c44f7d62e3dd60b5e9d1008a authored about 2 years ago
Merge pull request #86 from mwmbl/improve-crawling
Improve crawling
0d33b4f68f682c38ca033c493b25d533df295b73 authored about 2 years ago
Reinstate background tasks
a86e172bf3b93574c1fd2fe2e41413e7074f528c authored about 2 years ago
a86e172bf3b93574c1fd2fe2e41413e7074f528c authored about 2 years ago
Get results from other domains
d9cd3c585b8d3e192a39980293778692011c0f79 authored about 2 years ago
d9cd3c585b8d3e192a39980293778692011c0f79 authored about 2 years ago
Update URL status
77f08d8f0a91e360f66bd17a0cd82553dfcbc2c3 authored about 2 years ago
77f08d8f0a91e360f66bd17a0cd82553dfcbc2c3 authored about 2 years ago
Sample domains
36af579f7cec61bc45bae1a8a8f947e4e8a1a810 authored about 2 years ago
36af579f7cec61bc45bae1a8a8f947e4e8a1a810 authored about 2 years ago
WIP: improve method of getting URLs for crawling
ea16e7b5cd0207816b62e9b0f2f11b6bf884d374 authored about 2 years ago
ea16e7b5cd0207816b62e9b0f2f11b6bf884d374 authored about 2 years ago
WIP: improve method of getting URLs for crawling
7dae39b78048e8bbcf03303769b9dbc2362b8e7d authored about 2 years ago
7dae39b78048e8bbcf03303769b9dbc2362b8e7d authored about 2 years ago
Don't delete an index if the sizes don't match
c69108cfcc1555a3d386c0734f97b26357c47104 authored about 2 years ago
c69108cfcc1555a3d386c0734f97b26357c47104 authored about 2 years ago
Number of pages is an int
bb8a36a612c5764618d1569690a016fdbd76314d authored about 2 years ago
bb8a36a612c5764618d1569690a016fdbd76314d authored about 2 years ago
Merge branch 'master' of github.com:mwmbl/mwmbl
c01129cdb9f7d7658a302f13d2fe14e7e26aa268 authored about 2 years ago
c01129cdb9f7d7658a302f13d2fe14e7e26aa268 authored about 2 years ago
Use the correct storage location in prod
26351a10728a7608de825b306f13638d3327fc8d authored about 2 years ago
26351a10728a7608de825b306f13638d3327fc8d authored about 2 years ago
Merge pull request #83 from omasanori/spacy-deps-rework
Rework installation of spaCy models for clarity
f3f3831a97d03cc759a43c6d7b81b3a4eb81c9a8 authored about 2 years ago
Rework installation of spaCy models for clarity
- Install the wheel package for compatibility with future pip
- Use `spacy download` for install...
Remove apt command
d85067ec09310ea38a2fc7d173a9ceedeabeb647 authored about 2 years ago
d85067ec09310ea38a2fc7d173a9ceedeabeb647 authored about 2 years ago
Put install in correct place
1ef60e8d5db39f772b6966e4be968e32f6b12b2f authored about 2 years ago
1ef60e8d5db39f772b6966e4be968e32f6b12b2f authored about 2 years ago
Install psql client
8e613dd36884794b13426278209412416d856aca authored about 2 years ago
8e613dd36884794b13426278209412416d856aca authored about 2 years ago
Exclude a domain
80282cfc7a6d5a8333031f730e72e85fad0912e3 authored about 2 years ago
80282cfc7a6d5a8333031f730e72e85fad0912e3 authored about 2 years ago
Format fetched url
8676abbc63540528bbed7dc5a2056441c69f0c07 authored about 2 years ago
8676abbc63540528bbed7dc5a2056441c69f0c07 authored about 2 years ago
Update README.md
57295846cb3a01573d7d5f9db8f95a7d435abc9d authored about 2 years ago
57295846cb3a01573d7d5f9db8f95a7d435abc9d authored about 2 years ago
Add endpoint to fetch a URL and return title and extract
0a4e1e4aee25ad13798603ca14fbbb25ff2c61ef authored about 2 years ago
0a4e1e4aee25ad13798603ca14fbbb25ff2c61ef authored about 2 years ago
Implement validation
c7571120cc6d7669c189869a6f027b7d17b5e673 authored about 2 years ago
c7571120cc6d7669c189869a6f027b7d17b5e673 authored about 2 years ago
Separate out the curation to make it easier to store in a comment
061462460b6f33e7b93d1df05d832cf32ebc1497 authored about 2 years ago
061462460b6f33e7b93d1df05d832cf32ebc1497 authored about 2 years ago
Fix serialisation issue
6cf27fa47f0bf7879fa0b18bd0d67f2e8aeece83 authored about 2 years ago
6cf27fa47f0bf7879fa0b18bd0d67f2e8aeece83 authored about 2 years ago
Require the whole result
b559a50506df9879e8863ad22b99577c5ee989eb authored about 2 years ago
b559a50506df9879e8863ad22b99577c5ee989eb authored about 2 years ago
Merge branch 'master' into user-registration
5eab543f3b11a3d4f2a6dbc9a05dbbb2f162de7e authored about 2 years ago
5eab543f3b11a3d4f2a6dbc9a05dbbb2f162de7e authored about 2 years ago
Rename some parameters; return curation ID
a88a1a3e95119475288354a5ddb472508f3a5911 authored about 2 years ago
a88a1a3e95119475288354a5ddb472508f3a5911 authored about 2 years ago
Merge pull request #78 from mwmbl/make-dev-easier
Make it easier to run mwmbl locally
efc8e8e383a4050289bc0cc80cb621bf7bd235ec authored about 2 years ago
Add curations
31c27daca4729ca483f8ce3eb3b5e7bce7b22559 authored about 2 years ago
31c27daca4729ca483f8ce3eb3b5e7bce7b22559 authored about 2 years ago
Create a post when beginning curation
f89e1d6043119ff0b0dca910d8c881906e3a77a7 authored about 2 years ago
f89e1d6043119ff0b0dca910d8c881906e3a77a7 authored about 2 years ago
Follow a begin curate/update curation workflow
eadb7f3e2898ae039c5640ac09c99660914829fe authored about 2 years ago
eadb7f3e2898ae039c5640ac09c99660914829fe authored about 2 years ago
Suggest using dokku instead of docker directly
f8ab6092b03e7ef817ab28b3bcce742bc4ab6c51 authored about 2 years ago
f8ab6092b03e7ef817ab28b3bcce742bc4ab6c51 authored about 2 years ago
Allow login
8aa51e548bcae89c5c829c3371041a3aa60e0fc4 authored about 2 years ago
8aa51e548bcae89c5c829c3371041a3aa60e0fc4 authored about 2 years ago
Actually allow registration
cf6ceedfd55d60bacbf4d6a773bc7a364370faa9 authored about 2 years ago
cf6ceedfd55d60bacbf4d6a773bc7a364370faa9 authored about 2 years ago
Make it easier to rum mwmbl locally
a50bc284362c9591ae7c2b1693499f6b9414058f authored about 2 years ago
a50bc284362c9591ae7c2b1693499f6b9414058f authored about 2 years ago
Start to implement user registration using Lemmy as a back end
d8d7149f4a131601e62c88e15c1b8e5c326f3874 authored about 2 years ago
d8d7149f4a131601e62c88e15c1b8e5c326f3874 authored about 2 years ago
Update matrix badge
c0f89ba6c3dc86af4e1fcb08038e324cb257aa77 authored about 2 years ago
c0f89ba6c3dc86af4e1fcb08038e324cb257aa77 authored about 2 years ago
Exclude an annoying web site
dd4dd8a75236e498c70d54457c4d262e2606dc7d authored about 2 years ago
dd4dd8a75236e498c70d54457c4d262e2606dc7d authored about 2 years ago
Update index name
40f9eade9ac4d6d9c696b8736438e3b9c1399875 authored over 2 years ago
40f9eade9ac4d6d9c696b8736438e3b9c1399875 authored over 2 years ago
Merge pull request #74 from mwmbl/evaluate-indexing
Evaluate indexing
b6183e00ea35a0e339d2b28f1147a0559bd8c853 authored over 2 years ago
Split out URL updating from indexing
cf253ae524971612ecb4c655f5a277efdd0023b6 authored over 2 years ago
cf253ae524971612ecb4c655f5a277efdd0023b6 authored over 2 years ago
Use terms and bigrams from the beginning of the string only
f4fb9f831a13f3c442960e4999d796f9931ce58a authored over 2 years ago
f4fb9f831a13f3c442960e4999d796f9931ce58a authored over 2 years ago
Don't remove stopwords
619b6c3a932e8ae4fb8e59515c438776d0e77904 authored over 2 years ago
619b6c3a932e8ae4fb8e59515c438776d0e77904 authored over 2 years ago
Don't replace full stops and commas
578b70560937da34c92b208569c70e7f38825386 authored over 2 years ago
578b70560937da34c92b208569c70e7f38825386 authored over 2 years ago
Use a custom tokenizer
4779371cf37786b9d98cb9d943a21f2cdb7b0b69 authored over 2 years ago
4779371cf37786b9d98cb9d943a21f2cdb7b0b69 authored over 2 years ago
Script to index local batch for evaluation
b1eea2457f8026bb9dea84ff06ecfec27245784e authored over 2 years ago
b1eea2457f8026bb9dea84ff06ecfec27245784e authored over 2 years ago
Fix bug in completions with duplicated terms
480be85cfd6226428dfb3d05e3b04367808ec504 authored over 2 years ago
480be85cfd6226428dfb3d05e3b04367808ec504 authored over 2 years ago
Merge pull request #73 from mwmbl/completion
Completion
f7660bcd278c20c4cc2c56b370bdf9688d52495d authored over 2 years ago
Suggest searching Google if there are no search results
627f82d19f491c63cc582cb086d0b8f26ebfa824 authored over 2 years ago
627f82d19f491c63cc582cb086d0b8f26ebfa824 authored over 2 years ago
Search google if there are no results
f1c77d1389e322913de496f1770e04ce8aba63b9 authored over 2 years ago
f1c77d1389e322913de496f1770e04ce8aba63b9 authored over 2 years ago
Exclude web.archive.org as we're only crawling that right now
fe5eff7b641767905246a6eaf07f9655e6fde49b authored over 2 years ago
fe5eff7b641767905246a6eaf07f9655e6fde49b authored over 2 years ago
Require matching at least half the terms
00705703f3361b232d8e395dbe7bbe51c599bf9c authored over 2 years ago
00705703f3361b232d8e395dbe7bbe51c599bf9c authored over 2 years ago
Restrict to https and strip the prefix and / on the end
eda78707885caa17ec11a492e8b727651a4cc645 authored over 2 years ago
eda78707885caa17ec11a492e8b727651a4cc645 authored over 2 years ago
Simplify completions
23e47e963b66c5cc6d78d8b8daa02dcb5aca6348 authored over 2 years ago
23e47e963b66c5cc6d78d8b8daa02dcb5aca6348 authored over 2 years ago
Merge pull request #72 from mwmbl/improve-ranking-with-multi-term-search
Improve ranking with multi term search
c6773b46c4751061351dcf3dacb00104dcc116ea authored over 2 years ago
Improve printing of search results in script
74107667b47c8b06dc1f4944f2ad68d7c1d212ac authored over 2 years ago
74107667b47c8b06dc1f4944f2ad68d7c1d212ac authored over 2 years ago
Use heuristic ranker
3bcb7f42c1005904963e1d131e2482fe520fed25 authored over 2 years ago
3bcb7f42c1005904963e1d131e2482fe520fed25 authored over 2 years ago
Add new LTR model
c1b9e70743131be8e15dcab38a8ab99902efe422 authored over 2 years ago
c1b9e70743131be8e15dcab38a8ab99902efe422 authored over 2 years ago
Tweak features
57476ed2c887f690104321a8f402d4447b27d200 authored over 2 years ago
57476ed2c887f690104321a8f402d4447b27d200 authored over 2 years ago
Get best-performing configuration
c99e813398eaad8e33185c07f19de32c54eecaf8 authored over 2 years ago
c99e813398eaad8e33185c07f19de32c54eecaf8 authored over 2 years ago
Add in match score feature (although it hurts the results)
8b50643303f03cf34cbd395df9a3ad6f7eb4ba76 authored over 2 years ago
8b50643303f03cf34cbd395df9a3ad6f7eb4ba76 authored over 2 years ago
Create a get_features function and make it work like the heuristic approach
c60b73a403a8af0fc98a19b20c4404d09e62d798 authored over 2 years ago
c60b73a403a8af0fc98a19b20c4404d09e62d798 authored over 2 years ago
New LTR model trained on more data
c1d361c0a0b8c3aa2c32a10061365352e076efc1 authored over 2 years ago
c1d361c0a0b8c3aa2c32a10061365352e076efc1 authored over 2 years ago
Search for the term itself as well as its completion
b99d9d1c6a6774a426f98f63a7a99d5e3619a2fe authored over 2 years ago
b99d9d1c6a6774a426f98f63a7a99d5e3619a2fe authored over 2 years ago
Allow running with no background script
f40d82c4495af46d605a2609c0f14618ef8fa713 authored over 2 years ago
f40d82c4495af46d605a2609c0f14618ef8fa713 authored over 2 years ago
Merge pull request #71 from mwmbl/fix-missing-scores
Store the best items, not the worst ones
046f86f7e3ddb72334ad1935bf1c74fb5c1eabbc authored over 2 years ago
Store the best items, not the worst ones
ae658906dd82094a2016a675ee8dcbf990a29eb9 authored over 2 years ago
ae658906dd82094a2016a675ee8dcbf990a29eb9 authored over 2 years ago
Merge pull request #70 from mwmbl/reduce-new-batch-contention
Reduce new batch contention
aa5878fd2f39c5657dd5cd716838c12a0f46a831 authored over 2 years ago
Reinstate correct num_pages
fc1742e24f5c278dbd0763e25df9975c33f79344 authored over 2 years ago
fc1742e24f5c278dbd0763e25df9975c33f79344 authored over 2 years ago
Use an in-memory queue
bb5186196f327beb0189ced1f3c92ff3400554b9 authored over 2 years ago
bb5186196f327beb0189ced1f3c92ff3400554b9 authored over 2 years ago
Use a randomised timeout for getting a new batch
62ba9ddc7ec5441ea36fdb47363196e2706a1e94 authored over 2 years ago
62ba9ddc7ec5441ea36fdb47363196e2706a1e94 authored over 2 years ago
Merge pull request #69 from mwmbl/reduce-contention-for-client-queries
Reduce contention for client queries
a54e093cf1229fae715143d83296bd063b493731 authored over 2 years ago
Get URL scores in batches
2942d83673d0a3ea1d1cd4bfaecf322de3eedcca authored over 2 years ago
2942d83673d0a3ea1d1cd4bfaecf322de3eedcca authored over 2 years ago
Use correct index path; retrieve historical batches
3709cb236f16ed9ce8672893167ed5ccfbb062ca authored over 2 years ago
3709cb236f16ed9ce8672893167ed5ccfbb062ca authored over 2 years ago
args.index no longer exists
063ebb4504f679783abd4738e6828f4bb3b68dcd authored over 2 years ago
063ebb4504f679783abd4738e6828f4bb3b68dcd authored over 2 years ago
Double index size
ea32c0ba0089601263115a9d03133f2e4b4b1dc3 authored over 2 years ago
ea32c0ba0089601263115a9d03133f2e4b4b1dc3 authored over 2 years ago
More threads for retrieving batches
2d5235f6f6504309adeb96d62c49377d8661f2f7 authored over 2 years ago
2d5235f6f6504309adeb96d62c49377d8661f2f7 authored over 2 years ago
Delete unused SQL
218d87365442ec9ae2e97e1428ca8a946804bd62 authored over 2 years ago
218d87365442ec9ae2e97e1428ca8a946804bd62 authored over 2 years ago
Index batches in memory
6209382d76845f51c7a9af48c07a752bca68efb2 authored over 2 years ago
6209382d76845f51c7a9af48c07a752bca68efb2 authored over 2 years ago
Implement new indexing approach
1bceeae3dfb393d6d9a193706f6b39a3400594e0 authored over 2 years ago
1bceeae3dfb393d6d9a193706f6b39a3400594e0 authored over 2 years ago
Use URL path to store locally so that we can easily get a local path from a URL
a8a6c6723985b8a86c9968ccb266b132f6a63941 authored over 2 years ago
a8a6c6723985b8a86c9968ccb266b132f6a63941 authored over 2 years ago
Implement a batch cache to store files locally before preprocessing
0d1e7d841c57c24dc6f4469b13ed4c8c81940198 authored over 2 years ago
0d1e7d841c57c24dc6f4469b13ed4c8c81940198 authored over 2 years ago
Merge pull request #68 from mwmbl/fix-missing-query
Fix missing query
27a4784d08e99b65275b17918396be99838ad43d authored over 2 years ago
Log at info level
5ce333cc9aaa368413743af22ac1859058c19822 authored over 2 years ago
5ce333cc9aaa368413743af22ac1859058c19822 authored over 2 years ago
Allow more tries so that popular terms can be indexed
a097ec9fbe889285eda5ca47db43ef00e13bcc39 authored over 2 years ago
a097ec9fbe889285eda5ca47db43ef00e13bcc39 authored over 2 years ago