用 Python 实现简单的 Markdown 转换器

今天心血来潮,写了一个 Markdown 转换器。

import os, re,webbrowser
text = '''
# TextHeader
    ## Header1
        List
            - 1 
            - 2
            - 3
        > **quote**
        》 quote2
    ## Header2
        1. *斜体*
        2. [@以茄之名](https://www.zhihu.com/people/e4f87c3476a926c1e2ef51b4fcd18fa3)
        3、 ![](https://pic4.zhimg.com/v2-8560440c136c746730a63813ed701f52_is.jpg)
        
    ## Header3 
        `*[文章地址](https://zhuanlan.zhihu.com/p/39742445)*`
        ·**code1**·
        - [x]是否点赞
'''

程序开头先处理一些行内的语法,比如 code、strong、i 等,用正则直接替换:

text = re.sub(re.compile('([\`·])([^`·]+)[\`·]'), r'<code>\2</code>', text)
text = re.sub(re.compile('\*\*([^\*]+)\*\*'), r'<strong>\1</strong>', text)
text = re.sub(re.compile('([^\*])\*([^\*]+)\*'), r'\1<i>\2</i>', text)

接着是复杂一点的图片和链接:

text = re.sub(re.compile('([^\!])\[([^\]]+)\]\(([^)]+)\)'),
              r'\1<a href="\3" target="_blank">\2</a>', text)
text = re.sub(re.compile('\!\[([^\]]*)\]\(([^)]+)\)'),
              r'<img src="\2" >', text)

接着就处理其他的语法,先把文本按每一行分开:

lines = text.split('\n')
html = ''
list_flag = ''

处理列表和待办事项的问题:

for line in lines:
    line = line.strip(' ')
    if re.match('- \[[ x]\]', line):
        print('matched')
        p_html = ''
        if re.match('- \[x\]', line):
            p_html = ' checked="checked"'
        line = re.sub('- \[[ x]\]', '', line)
        html += '''<label class="cssCheckbox">
        <input type="checkbox" %s  />
        <span></span>%s
        </label>''' % (p_html, line)

因为有序列表和无序列表的区别是头尾的ol和ul,所以要用 list_flag 变量来判断

elif re.match('[\+\-\*] ', line):
    if list_flag == '':
        html += '<ul>\n'
        list_flag = 'ul'
    line = re.sub('[\+\-\*] ', '', line)
    html += '<li>%s</li>\n' % (line)
elif re.match('[\d]+[.、] ', line):
    if list_flag == '':
        list_flag = 'ol'
        html += '<ol>\n'
    line = re.sub('[\d]+[.、] ', '', line)
    html += '<li>%s</li>\n' % (line)

处理完后处理其他的语法:

else:
        if list_flag != '':
            html += '</%s>\n' % list_flag
            list_flag = ''
        if re.match('\#+', line):
            well = re.match('\#+', line).group().count('#')
            line = re.sub('\#+', '', line)
            html += '<h%i>%s</h%i>\n' % (well, line, well)
        elif re.match('[>》 ]', line):
            line = re.sub('^\s*[>》 ]', '', line)
            html += '<blockquote>%s</blockquote>\n' % (line)

        # elif re.match('[>》 ]', line):
        #     line = re.sub('^\s*[>》 ]', '', line)
        #     html += '<blockquote>%s</blockquote>\n' % (line)
        else:
            html += line

这里我稍微修改了一点,让 > 和 》 都可以转换成引用,主要是切换中英文标点太难了。

然后就是添加 CSS,自己改了一点马克飞象的进去,因为他的引用做得很漂亮:

with open('markdown.html', 'w', encoding='utf-8')as f:
    f.write('''
<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<style>body{
    margin: 0 auto;
    font-family: "ubuntu", "Tahoma", "Microsoft YaHei", arial,sans-serif;
    color: #444444;
    line-height: 1;
    padding: 30px;
} 
input[type='checkbox']+span::before {
  content:' ';/*不换行空格*/
  display: inline-block;
  vertical-align: 0.2em;
  width:0.8em;
  height:0.8em;
  margin-right: .2em;
  border-radius:.2em;
  background: silver;/*复选框的背景色*/
  text-indent:0.15em;
  line-height: 0.65;
}
input[type='checkbox'] {
  /*隐藏掉原先实际的 checkbox 框,之所以没用 display:none; 这种简单直接的方式,是因为这种方法会把它从键盘 tab 键切换焦点的队列中完全删除*/
 
  position: absolute;
  clip:rect(0,0,0,0);
}
input[type='checkbox']:checked+span::before {
  content:'\u221a'; /*对号的 Unicode字符*/
  background: yellowgreen;/*对号的颜色*/
}
img {
    max-width: 100%;
}
@media screen and (min-width: 1000px) {
    body {
        width: 842px;
        margin: 10px auto;
    }

    
}
h1, h2, h3, h4 {
    color: #111111;
    font-weight: 400;
    margin-top: 1em;
}

h1, h2, h3, h4, h5 {
    font-family: Georgia, Palatino, serif;
}
h1, h2, h3, h4, h5, dl{
    margin-bottom: 16px;
    padding: 0;
}

p {
    margin-top: 8px;
    margin-bottom: 3px;
}
h1 {
    font-size: 48px;
    line-height: 54px;
}
h2 {
    font-size: 36px;
    line-height: 42px;
}
h1, h2 {
    border-bottom: 1px solid #EFEAEA;
    padding-bottom: 10px;
}
h3 {
    font-size: 24px;
    line-height: 30px;
}
h4 {
    font-size: 21px;
    line-height: 26px;
}
h5 {
    font-size: 18px;
    line-height: 23px;
}
a {
    color: #0099ff;
    margin: 0 2px;
    padding: 0;
    vertical-align: baseline;
    text-decoration: none;
}
a:hover {
    text-decoration: none;
    color: #ff6600;
}
a:visited {
    /*color: purple;*/
}
ul, ol {
    padding: 0;
    padding-left: 18px;
    margin: 0;
}
li {
    line-height: 24px;
}
p, ul, ol {
    font-size: 16px;
    line-height: 24px;
}

ol ol, ul ol {
    list-style-type: lower-roman;
}

code, pre {
    font-family: Consolas, Monaco, Andale Mono, monospace;
    background-color:#f7f7f7;
    color: inherit;
}

code {
    font-family: Consolas, Monaco, Andale Mono, monospace;
    margin: 0 2px;
}

pre {
    font-family: Consolas, Monaco, Andale Mono, monospace;
    line-height: 1.7em;
    overflow: auto;
    padding: 6px 10px;
    border-left: 5px solid #6CE26C;
}

pre > code {
    font-family: Consolas, Monaco, Andale Mono, monospace;
    border: 0;
    display: inline;
    max-width: initial;
    padding: 0;
    margin: 0;
    overflow: initial;
    line-height: 1.6em;
    font-size: .95em;
    white-space: pre;
    background: 0 0;

}

code {
    color: #666555;
}

aside {
    display: block;
    float: right;
    width: 390px;
}
blockquote {
    border-left-width: 10px;
    background-color: rgba(102,128,153,0.05);
    border-top-right-radius: 5px;
    border-bottom-right-radius: 5px;
    padding: 15px 20px;
}
blockquote  cite {
    font-size:14px;
    line-height:20px;
    color:#bfbfbf;
}
blockquote cite:before {
    content: '\2014 \00A0';
}

blockquote p {
    color: #666;
}
hr {
    text-align: left;
    color: #999;
    height: 2px;
    padding: 0;
    margin: 16px 0;
    background-color: #e7e7e7;
    border: 0 none;
}

dl {
    padding: 0;
}

dl dt {
    padding: 10px 0;
    margin-top: 16px;
    font-size: 1em;
    font-style: italic;
    font-weight: bold;
}

dl dd {
    padding: 0 16px;
    margin-bottom: 16px;
}

dd {
    margin-left: 0;
}

table {
    *border-collapse: collapse; /* IE7 and lower */
    border-spacing: 0;
    width: 100%;
}
table {
    border: solid #ccc 1px;
}

table thead {
    background: #f7f7f7;
}

table thead tr:hover {
    background: #f7f7f7
}
table tr:hover {
    background: #fbf8e9;
    -o-transition: all 0.1s ease-in-out;
    -webkit-transition: all 0.1s ease-in-out;
    -moz-transition: all 0.1s ease-in-out;
    -ms-transition: all 0.1s ease-in-out;
    transition: all 0.1s ease-in-out;
}
table td, .table th {
    border-left: 1px solid #ccc;
    border-top: 1px solid #ccc;
    padding: 10px;
    text-align: left;
}

table th {
    border-top: none;
    text-shadow: 0 1px 0 rgba(255,255,255,.5);
    padding: 5px;
    border-left: 1px solid #ccc;
}

table td:first-child, table th:first-child {
    border-left: none;
}</style></head>''')
    f.write(html)
    f.write('</html>')

用 Chrome 打开网页:

webbrowser.get('C:/Program Files (x86)/CentBrowser/Application/chrome.exe %s').open(
    'file:///'+os.getcwd()+'/markdown.html')

话说这里也是个坑,系统自带的 Edge 一直打开失败,用那个注册器注册 Chrome 也没办法用 ,最后还是在外网找到了解决方案。

最后的效果:

《用 Python 实现简单的 Markdown 转换器》

    原文作者:以茄之名
    原文地址: https://segmentfault.com/a/1190000015635627
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞