|
| 1 | +# gomark Architecture |
| 2 | + |
| 3 | +This document explains the architectural decisions and design philosophy behind gomark. |
| 4 | + |
| 5 | +## Design Philosophy |
| 6 | + |
| 7 | +gomark is built on the principle of **pragmatic simplicity**: |
| 8 | + |
| 9 | +> "Solve real problems efficiently without over-engineering" |
| 10 | +
|
| 11 | +### Core Principles |
| 12 | + |
| 13 | +1. **Simplicity over Complexity**: Choose the simplest solution that works |
| 14 | +2. **Performance over Features**: Fast, reliable parsing over theoretical completeness |
| 15 | +3. **Maintainability over Flexibility**: Code that's easy to understand and modify |
| 16 | +4. **Real Needs over Theoretical Needs**: Implement what's actually used |
| 17 | +5. **Direct Solutions**: Avoid layers of abstraction when direct approaches work |
| 18 | + |
| 19 | +## Architectural Decisions |
| 20 | + |
| 21 | +### 1. Token-Based Parsing ✅ |
| 22 | + |
| 23 | +**Decision**: Use single-pass tokenization followed by token-based parsing |
| 24 | + |
| 25 | +**Rationale**: |
| 26 | +- **Performance**: Single-pass tokenization is very fast |
| 27 | +- **Simplicity**: Tokens are easy to work with and debug |
| 28 | +- **Reusability**: Tokens can be reused by multiple parsers |
| 29 | +- **Memory Efficiency**: Tokens reference original string data |
| 30 | + |
| 31 | +**Alternative Considered**: Text-based parsing (like goldmark) |
| 32 | +**Why Rejected**: Added complexity without clear benefits for our use cases |
| 33 | + |
| 34 | +### 2. Simple AST Interface ✅ |
| 35 | + |
| 36 | +**Decision**: Use minimal `Node` interface with direct field access |
| 37 | + |
| 38 | +```go |
| 39 | +type Node interface { |
| 40 | + Type() NodeType |
| 41 | + Restore() string |
| 42 | +} |
| 43 | +``` |
| 44 | + |
| 45 | +**Rationale**: |
| 46 | +- **Performance**: Direct field access (`node.Children`) is faster than method calls |
| 47 | +- **Simplicity**: Easy to understand and work with |
| 48 | +- **Focused**: Only implements what's actually needed |
| 49 | +- **Memory Efficient**: No overhead for unused tree navigation features |
| 50 | + |
| 51 | +**Alternative Considered**: Complex tree interface (like goldmark) |
| 52 | +**Why Rejected**: Analysis showed no actual usage of tree navigation in our codebase |
| 53 | + |
| 54 | +### 3. Stateless Parsers ✅ |
| 55 | + |
| 56 | +**Decision**: Each parser is independent and stateless |
| 57 | + |
| 58 | +**Rationale**: |
| 59 | +- **Simplicity**: No complex context management |
| 60 | +- **Debuggability**: Easy to test individual parsers |
| 61 | +- **Performance**: No context overhead |
| 62 | +- **Maintainability**: Clear separation of concerns |
| 63 | + |
| 64 | +**Alternative Considered**: Context-heavy parsing |
| 65 | +**Why Rejected**: Added complexity without clear benefits |
| 66 | + |
| 67 | +### 4. String-Based Node Types ✅ |
| 68 | + |
| 69 | +**Decision**: Use `NodeType string` constants |
| 70 | + |
| 71 | +```go |
| 72 | +type NodeType string |
| 73 | +const ParagraphNode NodeType = "PARAGRAPH" |
| 74 | +``` |
| 75 | + |
| 76 | +**Rationale**: |
| 77 | +- **Debuggability**: Easy to inspect and debug |
| 78 | +- **Simplicity**: No complex type hierarchies |
| 79 | +- **Extensibility**: Easy to add new types |
| 80 | +- **JSON-Friendly**: Serializes naturally |
| 81 | + |
| 82 | +**Alternative Considered**: Interface-based type system |
| 83 | +**Why Rejected**: Unnecessary complexity for our needs |
| 84 | + |
| 85 | +### 5. Configuration-Based Extensions ✅ |
| 86 | + |
| 87 | +**Decision**: Use configuration to enable/disable features |
| 88 | + |
| 89 | +**Rationale**: |
| 90 | +- **Performance**: Disabled features have zero overhead |
| 91 | +- **Flexibility**: Easy to customize for different use cases |
| 92 | +- **Maintainability**: Clear feature boundaries |
| 93 | +- **User-Friendly**: Simple API for configuration |
| 94 | + |
| 95 | +### 6. Buffer-Based Rendering ✅ |
| 96 | + |
| 97 | +**Decision**: Use `bytes.Buffer` for output accumulation |
| 98 | + |
| 99 | +**Rationale**: |
| 100 | +- **Performance**: Efficient string building |
| 101 | +- **Memory**: Reusable buffers |
| 102 | +- **Simplicity**: Standard Go pattern |
| 103 | +- **Flexibility**: Easy to extend |
| 104 | + |
| 105 | +## Package Organization |
| 106 | + |
| 107 | +### Public vs Internal |
| 108 | + |
| 109 | +**Public Packages** (goldmark-style): |
| 110 | +``` |
| 111 | +├── ast/ # AST definitions - users need access |
| 112 | +├── config/ # Configuration - users need to configure |
| 113 | +├── parser/ # Parser interfaces - users may extend |
| 114 | +├── renderer/ # Renderer interfaces - users may extend |
| 115 | +``` |
| 116 | + |
| 117 | +**Internal Implementation**: |
| 118 | +``` |
| 119 | +└── parser/internal/ # Parser implementations - users don't need access |
| 120 | +``` |
| 121 | + |
| 122 | +**Rationale**: |
| 123 | +- Public APIs allow extensibility where it matters |
| 124 | +- Internal packages keep implementation details hidden |
| 125 | +- Follows goldmark patterns for familiarity |
| 126 | + |
| 127 | +## Performance Optimizations |
| 128 | + |
| 129 | +### 1. Minimal Allocations |
| 130 | +- Reuse token slices where possible |
| 131 | +- Buffer pooling in renderers |
| 132 | +- Direct field access instead of method calls |
| 133 | + |
| 134 | +### 2. Single-Pass Processing |
| 135 | +- Tokenization is single-pass |
| 136 | +- No multiple traversals of input text |
| 137 | +- Direct token-to-AST conversion |
| 138 | + |
| 139 | +### 3. Focused Features |
| 140 | +- Only implement actually-used functionality |
| 141 | +- No complex tree operations unless needed |
| 142 | +- Disable unused extensions for zero overhead |
| 143 | + |
| 144 | +## Intentional Limitations |
| 145 | + |
| 146 | +These are **conscious decisions**, not oversights: |
| 147 | + |
| 148 | +### 1. HTML Attributes |
| 149 | +**Current**: Basic HTML tags without attributes |
| 150 | +**Rationale**: Complex attribute parsing adds significant complexity for minimal benefit |
| 151 | + |
| 152 | +### 2. Multi-Character Tokens |
| 153 | +**Current**: Single-character tokenization |
| 154 | +**Rationale**: Works for all supported markdown features, simpler implementation |
| 155 | + |
| 156 | +### 3. Complex Tree Navigation |
| 157 | +**Current**: Direct field access only |
| 158 | +**Rationale**: No actual usage found in codebase analysis |
| 159 | + |
| 160 | +### 4. Parsing Context |
| 161 | +**Current**: Stateless parsers |
| 162 | +**Rationale**: Sufficient for current feature set, much simpler |
| 163 | + |
| 164 | +## Recent Improvements |
| 165 | + |
| 166 | +### Fixed Blockquote Blank Lines (GitHub Issue #19) |
| 167 | +**Problem**: Blank lines in blockquotes weren't rendered correctly |
| 168 | +**Solution**: Enhanced `Blockquote.Restore()` to handle `LineBreak` nodes properly |
| 169 | +**Result**: Perfect preservation of blank lines in blockquotes |
| 170 | + |
| 171 | +### Package Refactoring |
| 172 | +**Problem**: Everything was in `internal/` packages |
| 173 | +**Solution**: Moved key packages to public for extensibility |
| 174 | +**Result**: goldmark-style architecture with better extensibility |
| 175 | + |
| 176 | +## Comparison with goldmark |
| 177 | + |
| 178 | +| Aspect | goldmark | gomark | |
| 179 | +|--------|----------|--------| |
| 180 | +| **Complexity** | High | Low | |
| 181 | +| **Performance** | Good | Excellent | |
| 182 | +| **Extensibility** | Very High | Moderate | |
| 183 | +| **Maintainability** | Moderate | High | |
| 184 | +| **Learning Curve** | Steep | Gentle | |
| 185 | +| **Feature Set** | Comprehensive | Focused | |
| 186 | + |
| 187 | +## When to Choose gomark |
| 188 | + |
| 189 | +✅ **Choose gomark when**: |
| 190 | +- You need fast, reliable markdown parsing |
| 191 | +- You want simple, maintainable code |
| 192 | +- You're building applications, not markdown libraries |
| 193 | +- You need good performance with moderate extensibility |
| 194 | + |
| 195 | +❌ **Choose goldmark when**: |
| 196 | +- You need maximum extensibility |
| 197 | +- You're building a markdown processing library |
| 198 | +- You need complex AST transformations |
| 199 | +- You need full CommonMark compliance edge cases |
| 200 | + |
| 201 | +## Future Evolution |
| 202 | + |
| 203 | +gomark is designed to evolve pragmatically: |
| 204 | + |
| 205 | +1. **Add features only when needed**: No speculative features |
| 206 | +2. **Maintain simplicity**: New features shouldn't complicate existing code |
| 207 | +3. **Performance first**: New features shouldn't hurt performance |
| 208 | +4. **Backward compatibility**: Changes should be additive |
| 209 | + |
| 210 | +### Potential Future Additions |
| 211 | + |
| 212 | +**Only if there's demonstrated need**: |
| 213 | +- AST walking API (if users request it) |
| 214 | +- More output formats (if users request them) |
| 215 | +- Advanced HTML attributes (if simple approach proves insufficient) |
| 216 | +- Text-based parsing (if token-based proves limiting) |
| 217 | + |
| 218 | +## Conclusion |
| 219 | + |
| 220 | +gomark represents a **pragmatic approach** to markdown parsing: |
| 221 | + |
| 222 | +- **Goldmark-inspired architecture** for familiarity and extensibility |
| 223 | +- **Performance-focused implementation** for real-world applications |
| 224 | +- **Simple, maintainable code** that developers can understand and modify |
| 225 | +- **Focused feature set** that solves real problems without over-engineering |
| 226 | + |
| 227 | +This approach delivers excellent performance and maintainability while providing enough extensibility for most real-world use cases. |
0 commit comments