Enabling search across all sites
These notes are about the setup on the docs machine to support
search across all sites under an entry point, that is,
https://docs.mbari.org/ or https://docs.mbari.org/internal/.
Meilisearch
Note
- Under testing.
- The steps here might be incomplete; to be updated.
Ref: https://docs.meilisearch.com/
2022-02-07 - Status of our setup:
- Basic but functional (for both public and internal sites)
- Configuration files (
*-meili.json) for thedocs-scrapertool are very basic.- TODO: adjustments to better extract the information, indicate "levels," etc., so the search results are better presented to the user.
- Meilisearch service started
@reboottime. - Launching of scraper processes for both public and internal sites
is for now done via cronjob every couple of hours.
- TODO: determine an appropriate mechanism to update the relevant index soon after a site is deployed.
Meilisearch Server
Setting up the server
Note: At the moment, compiling meilisearch from source, but there are published binaries that we could use as well.
cd ~/meilisearch/
git clone https://github.com/meilisearch/meilisearch.git
cd ~/meilisearch/meilisearch/
cargo build --release
Running the server
cd ~/meilisearch/meilisearch/
export MEILISEARCH_API_KEY=myMasterKey
target/release/meilisearch --no-analytics
Cronjob
A @reboot cronjob was set up to launch the service, see meilisearch_server_cronjob.sh:
@reboot /home/docsadm/mkdocs/docs-mbari-org-webhook-doc/bin/meilisearch_server_cronjob.sh
Proxy-pass
In /etc/httpd/conf.d/ssl.conf:
<Location /meilisearch/>
ProxyPass http://localhost:7700/
ProxyPassReverse http://localhost:7700/
</Location>
This allows the service to be accessible externally at https://docs.mbari.org/meilisearch/,
which is needed for the external search pages to work.
Scraper
Setting up docs-scraper
Ref: https://github.com/meilisearch/docs-scraper
cd ~/meilisearch/
git clone https://github.com/meilisearch/docs-scraper.git
cd ~/meilisearch/docs-scraper
python3.9 -m pip install --user pipx
~/.local/bin/pipx install pipenv
PATH=/home/docsadm/.local/bin:$PATH
pipenv install
Scrapping
cd ~/meilisearch/docs-scraper
export PATH=/home/docsadm/.local/bin:$PATH
export MEILISEARCH_API_KEY=myMasterKey
export MEILISEARCH_HOST_URL=http://localhost:7700
Public sites:
pipenv run ./docs_scraper ./public-docs-meili.json
Internal sites:
pipenv run ./docs_scraper ./internal-docs-meili.json
Cronjob
For the moment, we have set up a cronjob to run the scraping every
couple of hours, see meilisearch_scraper_cronjob.sh:
42 */3 * * * /home/docsadm/mkdocs/docs-mbari-org-webhook-doc/bin/meilisearch_scraper_cronjob.sh
Including the search field
The generate_site_list.py
script has been updated to include the field
in the site listing index.html files.