MoinMoin Wiki 转换 Text 脚本

我使用 MoinMoin 作为我的 KMS，具体使用方式可以参考 [使用MoinMoin作为个人KMS][moin-kms]。另外，我将 MoinMoin 的 data 目录链接到 Dropbox 同步目录下面，从而可以备份并查看我的知识库。

我想同步阅读 KMS 数据

想在移动设备阅读 KMS 数据时，我遇到一个麻烦，MoinMoin 使用文件来保存 wiki 数据，比如「28个Unix.Linux的命令行神器」这篇 wiki 目录在 data/28(e4b8aa)Unix(2e)Linux(e79a84e591bde4bba4e8a18ce7a59ee599a8) 中，结构如下：

|---revisions
|        |---00000002
|        |---00000001
|---edit-log
|---current
|---attachments
|          |---http___coolshell.cn_wp-content_uploads_2012_07_xargs_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_sl.jpg
|          |---http___coolshell.cn_wp-content_uploads_2012_07_mtr_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_lftp_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_htop_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_calcurse_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_multitail_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_ack_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_tpp_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_powertop_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_newsbeuter_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_socat_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_siege_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_duplicity_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_ipbt_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_iftop_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_curl_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_iptraf-tcpudp.gif
|          |---http___coolshell.cn_wp-content_uploads_2012_07_vim_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_earthquake.jpg
|          |---http___coolshell.cn_wp-content_uploads_2012_07_tmux3.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_cowsay_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_TaskWarrior2.0.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_vifm_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_taskwarrior_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_linuxlogo.jpg
|          |---http___coolshell.cn_wp-content_uploads_2012_07_ranger.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_nethack_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_rtorrent_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_dtach+dvtm.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_gnu_screen_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_ledger_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_rsync_screenshot.png
|          |---http___coolshell.cn_wp-content_uploads_2012_07_byobu-tmux.jpg
|          |---http___coolshell.cn_wp-content_uploads_2012_07_ttytter_screenshot.png
|---cache
|    |---text_html
|    |---pagelinks

可以看到，中文名称的 wiki 在存储时候，MoinMoin 会将中文保存为 utf-8 码，不能直接阅读。并且我需要将目录下面有多级目录，阅读麻烦。

一个脚本

为了解决这个问题，我写了如下小脚本，帮我解决这个问题：

#!/usr/bin/env python2
# coding=utf-8

# convert MoinMoin wiki to text
# for moinmoin 1.9.x
# author: alswl
# update at: 2012-07-22

import sys
import os
import argparse
import binascii
import re
import shutil

IS_DECODE_PATH = False

def convert(root, dir, target):
    name = name_decode(dir)
    if not name:
        return
    dst = os.path.join(target, name.replace('/', '-') + '.txt')
    try:
        version = open(os.path.join(root, dir, 'current'),
                       'r').readline().strip()
        src = os.path.join(root, dir, 'revisions', version)
        shutil.copyfile(src, dst)
    except IOError, e:
        if IS_DECODE_PATH:
            dir = dir.replace('(', r'\(').replace(')', r'\)')
        sys.stderr.write('File %s, Name: %s, Message: %s\n'
                         %(dir, name, str(e)))

def name_decode(name):
    raw = ''
    lastpos = 0
    ENCODE_RE = re.compile(r'\(([\w\d]+)\)')
    match =ENCODE_RE.search(name)
    while(match):
        raw += name[lastpos : lastpos + match.start()]
        raw += binascii.unhexlify(match.groups()[0])
        lastpos += match.end()
        match = ENCODE_RE.search(name[lastpos:])
    return raw

def walk(path, target):
    for dir in os.listdir(path):
        convert(path, dir, target)

def main():
    parser = argparse.ArgumentParser(
        description='Convert moin wiki to text archieves'
        )
    parser.add_argument('--input', '-i',
                        help='the path of moinmoin/data/pages',
                        type=str,
                        required=True)
    parser.add_argument('--output', '-o',
                        help='the path os target',
                        type=str,
                        required=True)
    args = parser.parse_args()

    walk(args.input, args.output)

if __name__ == '__main__':
    main()

使用帮助：

usage: moin2txt.py [-h] --input INPUT --output OUTPUT

Convert moin wiki to text archieves

optional arguments:
  -h, --help            show this help message and exit
  --input INPUT, -i INPUT
                        the path of moinmoin/data/pages
  --output OUTPUT, -o OUTPUT
                        the path os target

使用范例：

python moin2txt.py -i /your/moin/path/data/pages -o /your/dropbox/path/kms

命令运行完，就能在对应目录生成一坨 txt 文件，文件名还是中文的，Mission complete.

我将这个命令加入了 cron，每天执行一次，保证 Dropbox 中是最新的 wiki。

原文链接: MoinMoin Wiki 转换 Text 脚本 | Log4D

3a1ff193cee606bd1e2ea554a16353ee

欢迎关注我的微信公众号：窥豹

我想同步阅读 KMS 数据#

一个脚本#

我想同步阅读 KMS 数据

一个脚本