跳到主要内容

5 篇博文 含有标签「docusaurus」

查看所有标签

Docusaurus With Algolia

· 阅读需 5 分钟
Hui Liu
Author

1. 注册Algolia账号,申请 Docsearch

Docusaurus集成Algolia

  1. 使用Github账号注册Algolia
  2. Docsearch申请,需要填写以下内容
    1. Github pages 地址: https://lorchr.github.io/light-docusaurus/
    2. 邮箱 whitetulips@163.com
    3. 仓库地址 https://github.com/lorchr/light-docusaurus
  3. 等待审核,审核通过会收到一封邮件

2. 创建 Application

  1. 登录Algolia官网
  2. 创建Application,设置名称,选择免费计划
  3. 控制台打开,设置页面,点击 API Keys,记录以下内容
    1. Application ID
    2. Search-Only API Key
    3. Admin API Key

3. Web端部署爬虫

  1. 在DocSearch审核通过后,建立应用及索引,获取APPLICAITON_IDAPI_KEY
  2. 使用Algolia账号登录Algolia Crawler
  3. 使用官方模板配置Docusaurus v3 template部署爬虫
new Crawler({
appId: 'TLGHDZ3Y2I',
apiKey: '0b9a9b1f4fd5fbe9a1962088169c1262',
rateLimit: 8,
maxDepth: 10,
startUrls: ['https://lorchr.github.io/light-docusaurus/'],
sitemaps: ['https://lorchr.github.io/light-docusaurus/sitemap.xml'],
ignoreCanonicalTo: true,
discoveryPatterns: ['https://lorchr.github.io/light-docusaurus/**'],
actions: [
{
indexName: 'light-docusaurus',
pathsToMatch: ['https://lorchr.github.io/light-docusaurus/**'],
recordExtractor: ({ $, helpers }) => {
// priority order: deepest active sub list header -> navbar active item -> 'Documentation'
const lvl0 =
$(
'.menu__link.menu__link--sublist.menu__link--active, .navbar__item.navbar__link--active'
)
.last()
.text() || 'Documentation';

return helpers.docsearch({
recordProps: {
lvl0: {
selectors: '',
defaultValue: lvl0,
},
lvl1: ['header h1', 'article h1'],
lvl2: 'article h2',
lvl3: 'article h3',
lvl4: 'article h4',
lvl5: 'article h5, article td:first-child',
lvl6: 'article h6',
content: 'article p, article li, article td:last-child',
},
indexHeadings: true,
aggregateContent: true,
recordVersion: 'v3',
});
},
},
],
initialIndexSettings: {
YOUR_INDEX_NAME: {
attributesForFaceting: [
'type',
'lang',
'language',
'version',
'docusaurus_tag',
],
attributesToRetrieve: [
'hierarchy',
'content',
'anchor',
'url',
'url_without_anchor',
'type',
],
attributesToHighlight: ['hierarchy', 'content'],
attributesToSnippet: ['content:10'],
camelCaseAttributes: ['hierarchy', 'content'],
searchableAttributes: [
'unordered(hierarchy.lvl0)',
'unordered(hierarchy.lvl1)',
'unordered(hierarchy.lvl2)',
'unordered(hierarchy.lvl3)',
'unordered(hierarchy.lvl4)',
'unordered(hierarchy.lvl5)',
'unordered(hierarchy.lvl6)',
'content',
],
distinct: true,
attributeForDistinct: 'url',
customRanking: [
'desc(weight.pageRank)',
'desc(weight.level)',
'asc(weight.position)',
],
ranking: [
'words',
'filters',
'typo',
'attribute',
'proximity',
'exact',
'custom',
],
highlightPreTag: '<span class="algolia-docsearch-suggestion--highlight">',
highlightPostTag: '</span>',
minWordSizefor1Typo: 3,
minWordSizefor2Typos: 7,
allowTyposOnNumericTokens: false,
minProximity: 1,
ignorePlurals: true,
advancedSyntax: true,
attributeCriteriaComputedByMinProximity: true,
removeWordsIfNoResults: 'allOptional',
separatorsToIndex: '_',
},
},
});

4. 配置 Docusaurus

  1. 配置 .env (键值不带双引号)
APPLICATION_ID=Application ID
API_KEY=Admin API Key # 务必确认, 这是坑点 不要用 'Write API Key' 或者 'Search API Key'
  1. docusaurus.config.js
module.exports = {
// ...
presets: [[
// ...
"classic",
/** @type {import('@docusaurus/preset-classic').Options} */
({
// 这个插件会为你的站点创建一个站点地图
// 以便搜索引擎的爬虫能够更准确地爬取你的网站
sitemap: {
changefreq: "weekly",
priority: 0.5,
ignorePatterns: ["/tags/**"],
filename: "sitemap.xml",
},
})
]],
// ...
themeConfig: {
// ...
algolia: {
appId: 'YOUR_APP_ID', // Application ID
// 公开 API密钥:提交它没有危险
apiKey: 'YOUR_SEARCH_API_KEY', // Search-Only API Key
indexName: 'YOUR_INDEX_NAME'
},
}
}
  1. docsearch-config.json (爬虫配置文件)

需修改3处:

  • index_name
  • start_urls
  • sitemap_urls
{
"index_name": "light-docusaurus",
"start_urls": [
"https://lorchr.github.io/light-docusaurus/"
],
"sitemap_urls": [
"https://lorchr.github.io/light-docusaurus/sitemap.xml"
],
"sitemap_alternate_links": true,
"stop_urls": [
"/tests"
],
"selectors": {
"lvl0": {
"selector": "(//ul[contains(@class,'menu__list')]//a[contains(@class, 'menu__link menu__link--sublist menu__link--active')]/text() | //nav[contains(@class, 'navbar')]//a[contains(@class, 'navbar__link--active')]/text())[last()]",
"type": "xpath",
"global": true,
"default_value": "Documentation"
},
"lvl1": "header h1",
"lvl2": "article h2",
"lvl3": "article h3",
"lvl4": "article h4",
"lvl5": "article h5, article td:first-child",
"lvl6": "article h6",
"text": "article p, article li, article td:last-child"
},
"strip_chars": " .,;:#",
"custom_settings": {
"separatorsToIndex": "_",
"attributesForFaceting": [
"language",
"version",
"type",
"docusaurus_tag"
],
"attributesToRetrieve": [
"hierarchy",
"content",
"anchor",
"url",
"url_without_anchor",
"type"
]
},
"js_render": true,
"conversation_id": [
"833762294"
],
"nb_hits": 46250
}

5. 执行爬虫程序 - docsearch-scraper

1. 本地 执行爬虫

前置条件:

jq安装完成后, 在命令行执行 爬虫脚本

docker run -it --env-file=.env -e "CONFIG=$(cat docsearch-config.json | jq -r tostring)" algolia/docsearch-scraper

等待 容器运行完成, 如下即可

Getting https://lorchr.github.io/light-docusaurus/docs/react/hooks/custom-hooks from selenium
Getting https://lorchr.github.io/light-docusaurus/docs/react/hooks/useMemo from selenium
Getting https://lorchr.github.io/light-docusaurus/docs/react/hooks/useCallback from selenium
Getting https://lorchr.github.io/light-docusaurus/docs/javascript/versions/es-2016 from selenium
Getting https://lorchr.github.io/light-docusaurus/docs/javascript/versions/es-2015 from selenium
> DocSearch: https://lorchr.github.io/light-docusaurus/docs/plugins-and-libraries/big-screen/ 17 records)
> DocSearch: https://lorchr.github.io/light-docusaurus/docs/server/nginx/nginx-forward-proxy-vs-reverse-proxy/ 8 records)
> DocSearch: https://lorchr.github.io/light-docusaurus/docs/category/caddy/ 3 records)
> DocSearch: https://lorchr.github.io/light-docusaurus/docs/category/nginx/ 5 records)

Nb hits: 1369

2. GitHub Actions 执行爬虫

.github/workflows/ 文件夹下 创建 docsearch-scraper.yml, 用来定义 GitHub Actions 工作流

docsearch-scraper.yml

name: DocSearch-Scraper

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
scan:
runs-on: ubuntu-latest

steps:
- name: Sleep for 10 seconds
run: sleep 10s
shell: bash

- name: Checkout repo
uses: actions/checkout@v3

- name: Run scraper
env:
APPLICATION_ID: ${{ secrets.ALGOLIA_APPLICATION_ID }}
API_KEY: ${{ secrets.ALGOLIA_API_KEY }}
run: |
CONFIG="$(cat docsearch-config.json)"
docker run -i --rm \
-e APPLICATION_ID=$APPLICATION_ID \
-e API_KEY=$API_KEY \
-e CONFIG="${CONFIG}" \
algolia/docsearch-scraper

然后在 GitHub 的 Secrets 创建

  • APPLICATION_ID
  • API_KEY

当使用 Git 推送项目到 GitHub时, Actions就会自动执行 爬虫任务

在 Docusaurus v2 中使用 Algolia DocSearch搜索功能

Welcome

· 阅读需 1 分钟
Sébastien Lorber
Docusaurus maintainer
Yangshun Tay
Front End Engineer @ Facebook

Docusaurus blogging features are powered by the blog plugin.

Simply add Markdown files (or folders) to the blog directory.

Regular blog authors can be added to authors.yml.

The blog post date can be extracted from filenames, such as:

  • 2019-05-30-welcome.md
  • 2019-05-30-welcome/index.md

A blog post folder can be convenient to co-locate blog post images:

Docusaurus Plushie

The blog supports tags as well!

And if you don't want a blog: just delete this directory, and use blog: false in your Docusaurus config.

First Blog Post

· 阅读需 1 分钟
Gao Wei
Docusaurus Core Team

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque elementum dignissim ultricies. Fusce rhoncus ipsum tempor eros aliquam consequat. Lorem ipsum dolor sit amet