Gelernt - GitHub Copilot, wordpress zu 11ty
Problem
Es liegt eine wordpress-export-datei vor, deren Einträge in das markup-postformat für 11ty umgewandelt werden sollen.
Vorbereitung
Datei wordpress-to-11ty-post.py mit folgendem Text angelegt:
# read file wordpress.2025-06-09.xml
# output-dir: ./11ty-posts
# per entry create file in the output-dir with name yyyy-dd-mm-<post-title> and content of the post
Prompt 1
do it please
Answer 1
This script reads a WordPress XML export file, extracts post entries, and writes each post as a Markdown file in the ./11ty-posts directory. Filenames are formatted as yyyy-dd-mm-<post-title>.md.
import os
import re
import xml.etree.ElementTree as ET
from datetime import datetime
INPUT_FILE = 'wordpress.2025-06-09.xml'
OUTPUT_DIR = './11ty-posts'
def slugify(title):
return re.sub(r'[^a-z0-9]+', '-', title.lower()).strip('-')
def main():
os.makedirs(OUTPUT_DIR, exist_ok=True)
tree = ET.parse(INPUT_FILE)
root = tree.getroot()
ns = {'wp': 'http://wordpress.org/export/1.2/', 'content': 'http://purl.org/rss/1.0/modules/content/'}
for item in root.findall('.//item'):
title = item.findtext('title', default='untitled')
post_date = item.findtext('wp:post_date', default='', namespaces=ns)
content = item.findtext('content:encoded', default='', namespaces=ns)
if not post_date or not content.strip():
continue
try:
dt = datetime.strptime(post_date, '%Y-%m-%d %H:%M:%S')
except ValueError:
continue
filename = f"{dt.year}-{dt.day:02d}-{dt.month:02d}-{slugify(title)}.md"
filepath = os.path.join(OUTPUT_DIR, filename)
with open(filepath, 'w', encoding='utf-8') as f:
f.write(f"# {title}\n\n{content}")
if __name__ == '__main__':
main()This will create one Markdown file per post in the output directory.
Prompt 2
please convert the content of the file from html to markdown
Answer 2
To convert the post content from HTML to Markdown during export, use the markdownify package inside your script. Here’s how to integrate it:
Explanation:
- Import
markdownify. - Convert each post’s HTML content to Markdown before writing to file.
import os
import re
import xml.etree.ElementTree as ET
from datetime import datetime
import markdownify
INPUT_FILE = 'wordpress.2025-06-09.xml'
OUTPUT_DIR = './11ty-posts'
def slugify(title):
return re.sub(r'[^a-z0-9]+', '-', title.lower()).strip('-')
def main():
os.makedirs(OUTPUT_DIR, exist_ok=True)
tree = ET.parse(INPUT_FILE)
root = tree.getroot()
ns = {'wp': 'http://wordpress.org/export/1.2/', 'content': 'http://purl.org/rss/1.0/modules/content/'}
for item in root.findall('.//item'):
title = item.findtext('title', default='untitled')
post_date = item.findtext('wp:post_date', default='', namespaces=ns)
content = item.findtext('content:encoded', default='', namespaces=ns)
if not post_date or not content.strip():
continue
try:
dt = datetime.strptime(post_date, '%Y-%m-%d %H:%M:%S')
except ValueError:
continue
# Convert HTML to Markdown
markdown_content = markdownify.markdownify(content, heading_style="ATX")
filename = f"{dt.year}-{dt.day:02d}-{dt.month:02d}-{slugify(title)}.md"
filepath = os.path.join(OUTPUT_DIR, filename)
with open(filepath, 'w', encoding='utf-8') as f:
f.write(f"# {title}\n\n{markdown_content}")
if __name__ == '__main__':
main()Install the required package if needed:
pip install markdownifyPrompt 3
i have this errormessage ~/.virtualenvs/site/lib/python3.11/site-packages/markdownify/__init__.py:194: MarkupResemblesLocatorWarning: The input looks more like a filename than markup. You may want to open this file and pass the filehandle into Beautiful Soup.
Answer 3
This warning means markdownify thinks the input string is a filename, not HTML markup. This can happen if the content variable is empty or contains a path-like string.
How to fix:
Ensure content contains actual HTML, not a filename or empty string, before passing it to markdownify. You can add a check:
import markdownify
if content and '<' in content:
markdown_content = markdownify.markdownify(content, heading_style="ATX")
else:
markdown_content = content # or handle as neededThis avoids passing non-HTML or empty strings to markdownify, preventing the warning.
Prompt 4
in the header of the created markdown-file add a frontmatter with the following entries:
title: "Gelernt - wordpress-title"
date: wordpress-content published date in format 2025-06-08T17:43:00+02:00
tags:
- gelernt
- post
layout: "layouts/page-gelernt.njk"Answer 4
To add the requested frontmatter, use a YAML block at the top of each Markdown file. Format the date as ISO 8601 and include the specified fields.
Explanation:
- The frontmatter is written before the post content.
- The date is formatted as
YYYY-MM-DDTHH:MM:SS+02:00. - The title is prefixed with
Gelernt -. - Tags and layout are set as requested.
import os
import re
import xml.etree.ElementTree as ET
from datetime import datetime
import markdownify
INPUT_FILE = 'wordpress.2025-06-09.xml'
OUTPUT_DIR = './11ty-posts'
def slugify(title):
return re.sub(r'[^a-z0-9]+', '-', title.lower()).strip('-')
def main():
os.makedirs(OUTPUT_DIR, exist_ok=True)
tree = ET.parse(INPUT_FILE)
root = tree.getroot()
ns = {'wp': 'http://wordpress.org/export/1.2/', 'content': 'http://purl.org/rss/1.0/modules/content/'}
for item in root.findall('.//item'):
title = item.findtext('title', default='untitled')
post_date = item.findtext('wp:post_date', default='', namespaces=ns)
content = item.findtext('content:encoded', default='', namespaces=ns)
if not post_date or not content.strip():
continue
try:
dt = datetime.strptime(post_date, '%Y-%m-%d %H:%M:%S')
except ValueError:
continue
# Convert HTML to Markdown
if content and '<' in content:
markdown_content = markdownify.markdownify(content, heading_style="ATX")
else:
markdown_content = content
# Format date as ISO 8601 with +02:00 offset
iso_date = dt.strftime('%Y-%m-%dT%H:%M:%S+02:00')
frontmatter = (
f"---\n"
f"title: \"Gelernt - {title}\"\n"
f"date: {iso_date}\n"
f"tags:\n"
f" - gelernt\n"
f" - post\n"
f"layout: \"layouts/gelernt.njk\"\n"
f"---\n\n"
)
filename = f"{dt.year}-{dt.day:02d}-{dt.month:02d}-{slugify(title)}.md"
filepath = os.path.join(OUTPUT_DIR, filename)
with open(filepath, 'w', encoding='utf-8') as f:
f.write(frontmatter)
f.write(markdown_content)
if __name__ == '__main__':
main()This will add the required frontmatter to each generated Markdown file.
Prompt 5
in the markdown_content replace the string by
Answer 5
To replace all occurrences of the non-breaking space character (\u00A0) in markdown_content with a regular space, add a .replace('\u00A0', ' ') after the conversion.
Here’s the relevant change:
# Convert HTML to Markdown and replace non-breaking spaces
if content and '<' in content:
markdown_content = markdownify.markdownify(content, heading_style="ATX").replace('\u00A0', ' ')
else:
markdown_content = content.replace('\u00A0', ' ')This ensures all non-breaking spaces are replaced by normal spaces in the output.
Final resolution
Datei wordpress-to-11ty-post.py mit dem Text
# read file wordpress.2025-06-09.xml
# output-dir: ./11ty-posts
# per entry create file in the output-dir with name yyyy-dd-mm-<post-title> and content of the post
import os
import re
import xml.etree.ElementTree as ET
from datetime import datetime
import markdownify
INPUT_FILE = 'wordpress.2025-06-09.xml'
OUTPUT_DIR = './11ty-posts'
def slugify(title):
return re.sub(r'[^a-z0-9]+', '-', title.lower()).strip('-')
def main():
os.makedirs(OUTPUT_DIR, exist_ok=True)
tree = ET.parse(INPUT_FILE)
root = tree.getroot()
ns = {'wp': 'http://wordpress.org/export/1.2/', 'content': 'http://purl.org/rss/1.0/modules/content/'}
for item in root.findall('.//item'):
title = item.findtext('title', default='untitled')
post_date = item.findtext('wp:post_date', default='', namespaces=ns)
content = item.findtext('content:encoded', default='', namespaces=ns)
if not post_date or not content.strip():
continue
try:
dt = datetime.strptime(post_date, '%Y-%m-%d %H:%M:%S')
except ValueError:
continue
# Convert HTML to Markdown
if content and '<' in content:
markdown_content = markdownify.markdownify(content, heading_style="ATX")
else:
markdown_content = f"{content}"
markdown_content = markdown_content.replace('\u00A0', ' ')
# Format date as ISO 8601 with +02:00 offset
iso_date = dt.strftime('%Y-%m-%dT%H:%M:%S+02:00')
frontmatter = (
f"---\n"
f"title: \"Gelernt - {title}\"\n"
f"date: {iso_date}\n"
f"tags:\n"
f" - gelernt\n"
f" - post\n"
f"layout: \"layouts/gelernt.njk\"\n"
f"---\n\n"
)
filename = f"{dt.year}-{dt.day:02d}-{dt.month:02d}-{slugify(title)}.md"
filepath = os.path.join(OUTPUT_DIR, filename)
with open(filepath, 'w', encoding='utf-8') as f:
f.write(frontmatter)
f.write(f"# {title}\n\n{markdown_content}")
if __name__ == '__main__':
main()