當前位置：首頁 > 编程资源 > 综合教程 >内容正文

综合教程

.md图片链接转存并替换路径，及相关报错解决方法

發布時間：2023/12/13 综合教程 25 生活家

生活随笔收集整理的這篇文章主要介紹了 .md图片链接转存并替换路径，及相关报错解决方法小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

最初我想把Typora中.md文件中的web圖片鏈接都下載保存到本地，并且替換.md文本中的路徑

說干就干，因為在網上沒有找到現成的程序所以自己寫了這個程序

思路是循環查找文件夾中的文件，然后yield返回

再用readlines()方法讀取該文件，開始是采用 r 模式讀取，后來遇到一些編碼問題就改為 rb 模式，后面會介紹

獲取文件中的數據后按行得到了一個list，再對每行進行正則匹配，匹配到圖片鏈接就進行下載，并返回該文件名

再用正則替換該文件內容，大致就是這樣

從文件夾獲取文件函數

def get_files(dir):
    """
    獲取一個目錄下所有文件列表，包括子目錄
    :param dir:
    :return:
    """
    for root, dirs, files in os.walk(dir, topdown=False):
        if 'HTML' in root or '.assets' in root:　　# 文件過濾
            continue
        for file in files:
            if '.zip' in file:
                continue
            yield os.path.join(root, file)

　得到文件路徑進行讀取，但顯示編碼報錯 "UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 48: illegal multibyte sequence"

with open(file, 'r') as f:

然后我試了加encoding="gbk"和utf-8編碼格式都不行，最后采取rb二進制讀取解決此問題

下面是對數據匹配后進行替換，代碼如下

def thread_task(file, md_content):
    """
    多線程任務
    :param file:文件路徑
    :param md_content:文件轉為list二進制數據
    :return:
    """
    print(f'正在處理：{file}')
    for index, url in enumerate(md_content):
        if uu := re.findall(br'((http|https://.+.w+))', url):
            print(f'下載中：{uu[0]}')
            if file_name := download_pics(uu[0].decode(), file):
                md_content[index] = re.sub(br'((http|https://.+.w+))', f'({file_name})'.encode(), url)
    with open(file, 'wb') as f:
        f.writelines(md_content)
        f.close()
    sem.release()

代碼中用到了海象運算符，所以python版本要在3.8及以上，或者自行改動一點代碼就能使用

因為一個文件中有許多個圖片鏈接，所以我采用readlines方式讀取，得到一個list的二進制數據文件

在對該文件數據進行正則匹配，但是匹配時候報錯 "TypeError: cannot use a string pattern on a bytes-like object"

解決方法就是在正則匹配語句前加上 b 轉為對二進制匹配，不加b默認是字符串匹配，參考如下

re.findall(br'((http|https://.+.w+))', url)

下面是下載文檔中圖片鏈接的代碼，源碼如下

def download_pics(url, file):
    """
    下載圖片
    :param url: https://matplotlib.org/_images/sphx_glr_dark_background_001.png
    :param file: D:codeget_mdPYtext書籍Matplotlib 參考實例MD第10章 樣式表.md
    :return:
    """
    try:
        img_data = requests.get(url).content
    except Exception as e:
        print(f'路徑：{file} 下載出錯：{e}')
        return
    filename = os.path.basename(file)  # 第10章 樣式表.md
    dirname = os.path.dirname(file)  # D:codeget_mdPYtext書籍Matplotlib 參考實例MD
    targer_dir = os.path.join(dirname, f'{filename}.assets')
    if not os.path.exists(targer_dir):
        os.mkdir(targer_dir)
    with open(os.path.join(targer_dir, os.path.basename(url)), 'w+') as f:  # Matplotlib 參考實例MD第10章 樣式表.md.assetsdark_background_001.png
        f.buffer.write(img_data)
        f.close()
    print(url, '下載成功')
    return f'{filename}.assets/{os.path.basename(url)}'

創建文件夾下載圖片保存到里面，也沒啥需要多講的略過~

下一步進行多線程優化，代碼如下

def main():
    for file in get_files(r'D:codeget_mdPYtext書籍'):
        with open(file, 'rb') as f:
            sem.acquire()
            Thread(target=thread_task, args=(file, f.readlines())).start()
            f.close()
            # thread_task(file, f.readlines())

下面是完整的程序源碼，分享給有需要的同志

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
"""
@Project ：get_md 
@File    ：img to local.py
@IDE     ：PyCharm 
@Author  ：Naihe
@Date    ：2021/7/6 14:51 
"""
import os
import re
import requests
import threading

from threading import Thread

sem = threading.Semaphore(5)  # 限制線程的最大數量


def get_files(dir):
    """
    獲取一個目錄下所有文件列表，包括子目錄
    :param dir:
    :return:
    """
    for root, dirs, files in os.walk(dir, topdown=False):
        if 'HTML' in root or '.assets' in root:
            continue
        for file in files:
            if '.zip' in file:
                continue
            yield os.path.join(root, file)


def download_pics(url, file):
    """
    下載圖片
    :param url: https://matplotlib.org/_images/sphx_glr_dark_background_001.png
    :param file: D:codeget_mdPYtext書籍Matplotlib 參考實例MD第10章 樣式表.md
    :return:
    """
    try:
        img_data = requests.get(url).content
    except Exception as e:
        print(f'路徑：{file} 下載出錯：{e}')
        return
    filename = os.path.basename(file)  # 第10章 樣式表.md
    dirname = os.path.dirname(file)  # D:codeget_mdPYtext書籍Matplotlib 參考實例MD
    targer_dir = os.path.join(dirname, f'{filename}.assets')
    if not os.path.exists(targer_dir):
        os.mkdir(targer_dir)
    with open(os.path.join(targer_dir, os.path.basename(url)), 'w+') as f:  # Matplotlib 參考實例MD第10章 樣式表.md.assetsdark_background_001.png
        f.buffer.write(img_data)
        f.close()
    print(url, '下載成功')
    return f'{filename}.assets/{os.path.basename(url)}'


def thread_task(file, md_content):
    """
    多線程任務
    :param file:文件路徑
    :param md_content:文件轉為list二進制數據
    :return:
    """
    print(f'正在處理：{file}')
    for index, url in enumerate(md_content):
        if uu := re.findall(br'((http|https://.+.w+))', url):
            print(f'下載中：{uu[0]}')
            if file_name := download_pics(uu[0].decode(), file):
                md_content[index] = re.sub(br'((http|https://.+.w+))', f'({file_name})'.encode(), url)
    with open(file, 'wb') as f:
        f.writelines(md_content)
        f.close()
    sem.release()


def main():
    for file in get_files(r'D:codeget_mdPYtext書籍'):
        with open(file, 'rb') as f:
            sem.acquire()
            Thread(target=thread_task, args=(file, f.readlines())).start()
            f.close()
            # thread_task(file, f.readlines())


if __name__ == '__main__':
    sem = threading.Semaphore(4)  # 限制線程的最大數量為4個
    main()

碼字不易，還請各位三連鼓勵(^v^)

總結

以上是生活随笔為你收集整理的.md图片链接转存并替换路径，及相关报错解决方法的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：蓝鲸智云安装proxy和p-agent过
下一篇： ubuntu系统安装时 MBR和GPT的

综合教程

.md图片链接转存并替换路径，及相关报错解决方法

最初我想把Typora中.md文件中的web圖片鏈接都下載保存到本地，并且替換.md文本中的路徑

說干就干，因為在網上沒有找到現成的程序所以自己寫了這個程序

思路是循環查找文件夾中的文件，然后yield返回

再用readlines()方法讀取該文件，開始是采用 r 模式讀取，后來遇到一些編碼問題就改為 rb 模式，后面會介紹

獲取文件中的數據后按行得到了一個list，再對每行進行正則匹配，匹配到圖片鏈接就進行下載，并返回該文件名

再用正則替換該文件內容，大致就是這樣

從文件夾獲取文件函數

得到文件路徑進行讀取，但顯示編碼報錯 "UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 48: illegal multibyte sequence"

然后我試了加encoding="gbk"和utf-8編碼格式都不行，最后采取rb二進制讀取解決此問題

下面是對數據匹配后進行替換，代碼如下

代碼中用到了海象運算符，所以python版本要在3.8及以上，或者自行改動一點代碼就能使用

因為一個文件中有許多個圖片鏈接，所以我采用readlines方式讀取，得到一個list的二進制數據文件

在對該文件數據進行正則匹配，但是匹配時候報錯 "TypeError: cannot use a string pattern on a bytes-like object"

解決方法就是在正則匹配語句前加上 b 轉為對二進制匹配，不加b默認是字符串匹配，參考如下

下面是下載文檔中圖片鏈接的代碼，源碼如下

創建文件夾下載圖片保存到里面，也沒啥需要多講的 略過~

下一步進行多線程優化，代碼如下

下面是完整的程序源碼，分享給有需要的同志

碼字不易，還請各位三連鼓勵(^v^)

總結

　得到文件路徑進行讀取，但顯示編碼報錯 "UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 48: illegal multibyte sequence"

創建文件夾下載圖片保存到里面，也沒啥需要多講的略過~