2 Home
Miguel edited this page 2025-02-05 08:17:05 -03:00

Email to Chronological Markdown Converter

Overview

This script processes email files (.eml) into a chronological narrative in Markdown format, optimized for processing with Large Language Models (LLMs). It extracts essential information from emails while removing unnecessary metadata, creating a clean, temporal narrative that can be easily analyzed.

Purpose

The main goal is to convert email threads and nested communications into a simple, chronological text format that LLMs can effectively process to:

  • Analyze communication patterns
  • Extract key information
  • Track project development
  • Identify important decisions and their timeline
  • Understand relationships and interactions between participants

Core Functionality

Input Processing

  • Reads .eml files from the current directory
  • Handles nested .eml files found as attachments
  • Supports multiple languages (English, Italian)
  • Extracts sender name and timestamp as primary identifiers

Content Management

  • Maintains a single chronological file (cronologia.md)
  • Removes redundant metadata and formatting
  • Preserves essential content in plain text
  • Converts tables to markdown format
  • Links to non-email attachments
  • Eliminates duplicate entries

Format Structure

## YYYYMMDDhhmmss|Sender Name

Message content in plain text...
Tables converted to markdown format...

### Attachments
- [[document1.pdf]]
- [[image1.jpg]]

---

Key Features

  1. Chronological Organization

    • All messages sorted by date
    • Consistent timestamp format
    • Clear sender identification
  2. Content Cleaning

    • Removes email headers
    • Eliminates signatures
    • Strips formatting
    • Preserves table structure
  3. Attachment Handling

    • Creates 'attachments' folder
    • Maintains file references
    • Processes nested emails
    • Prevents duplicates

LLM Processing Considerations

The output is structured to facilitate:

  1. Temporal analysis
  2. Relationship mapping
  3. Topic tracking
  4. Decision point identification
  5. Project timeline reconstruction

The consistent formatting and cleaned content allow LLMs to focus on:

  • Message content analysis
  • Temporal relationships
  • Communication patterns
  • Project development tracking
  • Key information extraction

This format enables LLMs to effectively process email communications while maintaining the contextual and temporal relationships essential for understanding the narrative flow of information.