Gelernt - GitHub Copilot, wordpress zu 11ty

Problem

Es liegt eine wordpress-export-datei vor, deren Einträge in das markup-postformat für 11ty umgewandelt werden sollen.

Vorbereitung

Datei wordpress-to-11ty-post.py mit folgendem Text angelegt:

# read file wordpress.2025-06-09.xml

# output-dir: ./11ty-posts

# per entry create file in the output-dir with name yyyy-dd-mm-<post-title> and content of the post

Prompt 1

do it please

Answer 1

This script reads a WordPress XML export file, extracts post entries, and writes each post as a Markdown file in the ./11ty-posts directory. Filenames are formatted as yyyy-dd-mm-<post-title>.md.

import os
import re
import xml.etree.ElementTree as ET
from datetime import datetime

INPUT_FILE = 'wordpress.2025-06-09.xml'
OUTPUT_DIR = './11ty-posts'

def slugify(title):
    return re.sub(r'[^a-z0-9]+', '-', title.lower()).strip('-')

def main():
    os.makedirs(OUTPUT_DIR, exist_ok=True)
    tree = ET.parse(INPUT_FILE)
    root = tree.getroot()
    ns = {'wp': 'http://wordpress.org/export/1.2/', 'content': 'http://purl.org/rss/1.0/modules/content/'}

    for item in root.findall('.//item'):
        title = item.findtext('title', default='untitled')
        post_date = item.findtext('wp:post_date', default='', namespaces=ns)
        content = item.findtext('content:encoded', default='', namespaces=ns)
        if not post_date or not content.strip():
            continue
        try:
            dt = datetime.strptime(post_date, '%Y-%m-%d %H:%M:%S')
        except ValueError:
            continue
        filename = f"{dt.year}-{dt.day:02d}-{dt.month:02d}-{slugify(title)}.md"
        filepath = os.path.join(OUTPUT_DIR, filename)
        with open(filepath, 'w', encoding='utf-8') as f:
            f.write(f"# {title}\n\n{content}")

if __name__ == '__main__':
    main()

This will create one Markdown file per post in the output directory.

Prompt 2

please convert the content of the file from html to markdown

Answer 2

To convert the post content from HTML to Markdown during export, use the markdownify package inside your script. Here’s how to integrate it:

Explanation:

Import markdownify.
Convert each post’s HTML content to Markdown before writing to file.

import os
import re
import xml.etree.ElementTree as ET
from datetime import datetime
import markdownify

INPUT_FILE = 'wordpress.2025-06-09.xml'
OUTPUT_DIR = './11ty-posts'

def slugify(title):
    return re.sub(r'[^a-z0-9]+', '-', title.lower()).strip('-')

def main():
    os.makedirs(OUTPUT_DIR, exist_ok=True)
    tree = ET.parse(INPUT_FILE)
    root = tree.getroot()
    ns = {'wp': 'http://wordpress.org/export/1.2/', 'content': 'http://purl.org/rss/1.0/modules/content/'}

    for item in root.findall('.//item'):
        title = item.findtext('title', default='untitled')
        post_date = item.findtext('wp:post_date', default='', namespaces=ns)
        content = item.findtext('content:encoded', default='', namespaces=ns)
        if not post_date or not content.strip():
            continue
        try:
            dt = datetime.strptime(post_date, '%Y-%m-%d %H:%M:%S')
        except ValueError:
            continue
        # Convert HTML to Markdown
        markdown_content = markdownify.markdownify(content, heading_style="ATX")
        filename = f"{dt.year}-{dt.day:02d}-{dt.month:02d}-{slugify(title)}.md"
        filepath = os.path.join(OUTPUT_DIR, filename)
        with open(filepath, 'w', encoding='utf-8') as f:
            f.write(f"# {title}\n\n{markdown_content}")

if __name__ == '__main__':
    main()

Install the required package if needed:

pip install markdownify

Prompt 3

i have this errormessage ~/.virtualenvs/site/lib/python3.11/site-packages/markdownify/__init__.py:194: MarkupResemblesLocatorWarning: The input looks more like a filename than markup. You may want to open this file and pass the filehandle into Beautiful Soup.

Answer 3

This warning means markdownify thinks the input string is a filename, not HTML markup. This can happen if the content variable is empty or contains a path-like string.

How to fix:
Ensure content contains actual HTML, not a filename or empty string, before passing it to markdownify. You can add a check:

import markdownify

if content and '<' in content:
    markdown_content = markdownify.markdownify(content, heading_style="ATX")
else:
    markdown_content = content  # or handle as needed

This avoids passing non-HTML or empty strings to markdownify, preventing the warning.

Prompt 4

in the header of the created markdown-file add a frontmatter with the following entries:

title: "Gelernt - wordpress-title"
date: wordpress-content published date in format 2025-06-08T17:43:00+02:00
tags:
- gelernt
- post
  layout: "layouts/page-gelernt.njk"

Answer 4

To add the requested frontmatter, use a YAML block at the top of each Markdown file. Format the date as ISO 8601 and include the specified fields.

Explanation:

The frontmatter is written before the post content.
The date is formatted as YYYY-MM-DDTHH:MM:SS+02:00.
The title is prefixed with Gelernt -.
Tags and layout are set as requested.

import os
import re
import xml.etree.ElementTree as ET
from datetime import datetime
import markdownify

INPUT_FILE = 'wordpress.2025-06-09.xml'
OUTPUT_DIR = './11ty-posts'

def slugify(title):
    return re.sub(r'[^a-z0-9]+', '-', title.lower()).strip('-')

def main():
    os.makedirs(OUTPUT_DIR, exist_ok=True)
    tree = ET.parse(INPUT_FILE)
    root = tree.getroot()
    ns = {'wp': 'http://wordpress.org/export/1.2/', 'content': 'http://purl.org/rss/1.0/modules/content/'}

    for item in root.findall('.//item'):
        title = item.findtext('title', default='untitled')
        post_date = item.findtext('wp:post_date', default='', namespaces=ns)
        content = item.findtext('content:encoded', default='', namespaces=ns)
        if not post_date or not content.strip():
            continue
        try:
            dt = datetime.strptime(post_date, '%Y-%m-%d %H:%M:%S')
        except ValueError:
            continue
        # Convert HTML to Markdown
        if content and '<' in content:
            markdown_content = markdownify.markdownify(content, heading_style="ATX")
        else:
            markdown_content = content

        # Format date as ISO 8601 with +02:00 offset
        iso_date = dt.strftime('%Y-%m-%dT%H:%M:%S+02:00')

        frontmatter = (
            f"---\n"
            f"title: \"Gelernt - {title}\"\n"
            f"date: {iso_date}\n"
            f"tags:\n"
            f"  - gelernt\n"
            f"  - post\n"
            f"layout: \"layouts/gelernt.njk\"\n"
            f"---\n\n"
        )

        filename = f"{dt.year}-{dt.day:02d}-{dt.month:02d}-{slugify(title)}.md"
        filepath = os.path.join(OUTPUT_DIR, filename)
        with open(filepath, 'w', encoding='utf-8') as f:
            f.write(frontmatter)
            f.write(markdown_content)

if __name__ == '__main__':
    main()

This will add the required frontmatter to each generated Markdown file.

Prompt 5

in the markdown_content replace the string by

Answer 5

To replace all occurrences of the non-breaking space character (\u00A0) in markdown_content with a regular space, add a .replace('\u00A0', ' ') after the conversion.
Here’s the relevant change:

# Convert HTML to Markdown and replace non-breaking spaces
if content and '<' in content:
    markdown_content = markdownify.markdownify(content, heading_style="ATX").replace('\u00A0', ' ')
else:
    markdown_content = content.replace('\u00A0', ' ')

This ensures all non-breaking spaces are replaced by normal spaces in the output.

Final resolution

Datei wordpress-to-11ty-post.py mit dem Text

# read file wordpress.2025-06-09.xml

# output-dir: ./11ty-posts

# per entry create file in the output-dir with name yyyy-dd-mm-<post-title> and content of the post
import os
import re
import xml.etree.ElementTree as ET
from datetime import datetime

import markdownify

INPUT_FILE = 'wordpress.2025-06-09.xml'
OUTPUT_DIR = './11ty-posts'

def slugify(title):
    return re.sub(r'[^a-z0-9]+', '-', title.lower()).strip('-')

def main():
    os.makedirs(OUTPUT_DIR, exist_ok=True)
    tree = ET.parse(INPUT_FILE)
    root = tree.getroot()
    ns = {'wp': 'http://wordpress.org/export/1.2/', 'content': 'http://purl.org/rss/1.0/modules/content/'}

    for item in root.findall('.//item'):
        title = item.findtext('title', default='untitled')
        post_date = item.findtext('wp:post_date', default='', namespaces=ns)
        content = item.findtext('content:encoded', default='', namespaces=ns)
        if not post_date or not content.strip():
            continue
        try:
            dt = datetime.strptime(post_date, '%Y-%m-%d %H:%M:%S')
        except ValueError:
            continue
        # Convert HTML to Markdown
        if content and '<' in content:
            markdown_content = markdownify.markdownify(content, heading_style="ATX")
        else:
            markdown_content = f"{content}"

        markdown_content = markdown_content.replace('\u00A0', ' ')

        # Format date as ISO 8601 with +02:00 offset
        iso_date = dt.strftime('%Y-%m-%dT%H:%M:%S+02:00')

        frontmatter = (
            f"---\n"
            f"title: \"Gelernt - {title}\"\n"
            f"date: {iso_date}\n"
            f"tags:\n"
            f"  - gelernt\n"
            f"  - post\n"
            f"layout: \"layouts/gelernt.njk\"\n"
            f"---\n\n"
        )

        filename = f"{dt.year}-{dt.day:02d}-{dt.month:02d}-{slugify(title)}.md"
        filepath = os.path.join(OUTPUT_DIR, filename)
        with open(filepath, 'w', encoding='utf-8') as f:
            f.write(frontmatter)
            f.write(f"# {title}\n\n{markdown_content}")

if __name__ == '__main__':
    main()