Enabling search across all sites

These notes are about the setup on the docs machine to support search across all sites under an entry point, that is, https://docs.mbari.org/ or https://docs.mbari.org/internal/.

Meilisearch

Note

Under testing.
The steps here might be incomplete; to be updated.

Ref: https://docs.meilisearch.com/

2022-02-07 - Status of our setup:

Basic but functional (for both public and internal sites)
Configuration files (*-meili.json) for the docs-scraper tool are very basic.
- TODO: adjustments to better extract the information, indicate "levels," etc., so the search results are better presented to the user.
Meilisearch service started @reboot time.
Launching of scraper processes for both public and internal sites is for now done via cronjob every couple of hours.
- TODO: determine an appropriate mechanism to update the relevant index soon after a site is deployed.

Meilisearch Server

Setting up the server

Note: At the moment, compiling meilisearch from source, but there are published binaries that we could use as well.

cd ~/meilisearch/
git clone https://github.com/meilisearch/meilisearch.git
cd ~/meilisearch/meilisearch/
cargo build --release

Running the server

cd ~/meilisearch/meilisearch/
export MEILISEARCH_API_KEY=myMasterKey
target/release/meilisearch --no-analytics

Cronjob

A @reboot cronjob was set up to launch the service, see meilisearch_server_cronjob.sh:

@reboot /home/docsadm/mkdocs/docs-mbari-org-webhook-doc/bin/meilisearch_server_cronjob.sh

Proxy-pass

In /etc/httpd/conf.d/ssl.conf:

<Location /meilisearch/>
  ProxyPass        http://localhost:7700/
  ProxyPassReverse http://localhost:7700/
</Location>

This allows the service to be accessible externally at https://docs.mbari.org/meilisearch/, which is needed for the external search pages to work.

Scraper

Setting up docs-scraper

Ref: https://github.com/meilisearch/docs-scraper

cd ~/meilisearch/
git clone https://github.com/meilisearch/docs-scraper.git
cd ~/meilisearch/docs-scraper
python3.9 -m pip install --user pipx
~/.local/bin/pipx install pipenv
PATH=/home/docsadm/.local/bin:$PATH
pipenv install

Scrapping

cd ~/meilisearch/docs-scraper
export PATH=/home/docsadm/.local/bin:$PATH
export MEILISEARCH_API_KEY=myMasterKey
export MEILISEARCH_HOST_URL=http://localhost:7700

Public sites:

pipenv run ./docs_scraper ./public-docs-meili.json

Internal sites:

pipenv run ./docs_scraper ./internal-docs-meili.json

Cronjob

For the moment, we have set up a cronjob to run the scraping every couple of hours, see meilisearch_scraper_cronjob.sh:

42 */3 * * * /home/docsadm/mkdocs/docs-mbari-org-webhook-doc/bin/meilisearch_scraper_cronjob.sh

Including the search field

The generate_site_list.py script has been updated to include the field in the site listing index.html files.