Skip to content

Enabling search across all sites

These notes are about the setup on the docs machine to support search across all sites under an entry point, that is, https://docs.mbari.org/ or https://docs.mbari.org/internal/.

Meilisearch

Note

  • Under testing.
  • The steps here might be incomplete; to be updated.

Ref: https://docs.meilisearch.com/

2022-02-07 - Status of our setup:

  • Basic but functional (for both public and internal sites)
  • Configuration files (*-meili.json) for the docs-scraper tool are very basic.
    • TODO: adjustments to better extract the information, indicate "levels," etc., so the search results are better presented to the user.
  • Meilisearch service started @reboot time.
  • Launching of scraper processes for both public and internal sites is for now done via cronjob every couple of hours.
    • TODO: determine an appropriate mechanism to update the relevant index soon after a site is deployed.

Meilisearch Server

Setting up the server

Note: At the moment, compiling meilisearch from source, but there are published binaries that we could use as well.

cd ~/meilisearch/
git clone https://github.com/meilisearch/meilisearch.git
cd ~/meilisearch/meilisearch/
cargo build --release

Running the server

cd ~/meilisearch/meilisearch/
export MEILISEARCH_API_KEY=myMasterKey
target/release/meilisearch --no-analytics

Cronjob

A @reboot cronjob was set up to launch the service, see meilisearch_server_cronjob.sh:

@reboot /home/docsadm/mkdocs/docs-mbari-org-webhook-doc/bin/meilisearch_server_cronjob.sh

Proxy-pass

In /etc/httpd/conf.d/ssl.conf:

<Location /meilisearch/>
  ProxyPass        http://localhost:7700/
  ProxyPassReverse http://localhost:7700/
</Location>

This allows the service to be accessible externally at https://docs.mbari.org/meilisearch/, which is needed for the external search pages to work.

Scraper

Setting up docs-scraper

Ref: https://github.com/meilisearch/docs-scraper

cd ~/meilisearch/
git clone https://github.com/meilisearch/docs-scraper.git
cd ~/meilisearch/docs-scraper
python3.9 -m pip install --user pipx
~/.local/bin/pipx install pipenv
PATH=/home/docsadm/.local/bin:$PATH
pipenv install

Scrapping

cd ~/meilisearch/docs-scraper
export PATH=/home/docsadm/.local/bin:$PATH
export MEILISEARCH_API_KEY=myMasterKey
export MEILISEARCH_HOST_URL=http://localhost:7700

Public sites:

pipenv run ./docs_scraper ./public-docs-meili.json

Internal sites:

pipenv run ./docs_scraper ./internal-docs-meili.json

Cronjob

For the moment, we have set up a cronjob to run the scraping every couple of hours, see meilisearch_scraper_cronjob.sh:

42 */3 * * * /home/docsadm/mkdocs/docs-mbari-org-webhook-doc/bin/meilisearch_scraper_cronjob.sh

Including the search field

The generate_site_list.py script has been updated to include the field in the site listing index.html files.