Tree representation of markdown documents

The tree representation of a markdown document is a graph representing a hierarchy, in which all nodes, except the root node, have one and only one parent (out-tree). When representing a markdown document, the header is used to create the root node. All headings are children of the root or of other headings, depending on the level of the heading, like the titles of chapters and sections in a book. The text blocks of the markdown become the children of the headings where the text is contained.

In LM markdown, a default header is created if the document has none when a tree is formed. Metadata are attached to the heading or text block that follows. Metadata without following text are attached to a created text block without content. Because of these modifications, unfolding the tree hieararchy and recreating the file may not reproduce the original document.

Tree creation

This module provides functionality to represent a Markdown document as a tree. The headings constitute the nodes of the tree (HeadingNode) and the text blocks constitute the leafs (TextNode). Metadata in the markdown document are converted to metadata properties of nodes. The metadata blocks are interpreted as annotations for the metadata of the block that follows, be it a heading or a text block.

In general, trees will be not constructed manually but will be constructed from blocks of parsed markdown:

blocks = load_markdown("my_markdown.md")
root = blocks_to_tree(blocks)

After working on the markdown, it may be retransformed into a block list:

blocks = tree_to_blocks(root)
save_markdown("my_markdown.md", blocks)

The first block in the list must be a header or a metadata block, from which the root node of the tree is built. If no header block is present at the beginning of the list, one is created with default values. Other parent nodes are built from the heading blocks, and the leaf nodes from the text blocks. Except for the first block, metadata blocks are used annotate the nodes with properties saved as metadata.

---  # This will become the metadata of the root node
title: The document  # This will also be the content of the root node
description: header
---

---  # This metadata block will annotate the heading that follows
summary: introductory words
---

# Introduction

This is the text of the introduction. The property 'summary' will by
default be applied to this text too, since it is a descendant of the
heading for which the metadata were defined.

Importantly, while in a blocklist there are blocks of header/metadata, heading, and text types, a tree only has heading and text nodes. The root node is a heading from the title property of the header metadata block.

Main functions
  • blocks_to_tree() - Convert block list to tree
  • tree_to_blocks() - Convert tree to block list
  • load_tree() - Load markdown file/string into tree
  • save_tree() - Save tree to markdown file
  • serialize_tree() - Serialize tree to string
  • traverse_tree() - Apply function to each node with filtering
  • traverse_tree_nodetype() - Apply function to nodes of specific type
  • extract_content() - Extract information bottom-up from children to parents
  • propagate_content() - Propagate information top-down from parents to children
  • pre_order_traversal(), post_order_traversal() - Tree traversal utilities
  • fold_tree() - Accumulate value across tree
  • get_text_nodes(), get_heading_nodes() - Get node collections by type
Behaviour

Functions use logger-based error reporting (via LoggerBase parameter). Error blocks are reported to logger but processing continues. No exceptions are raised that cause side effects. Tree operations work through references, with side effects documented per function.

Note

the tree is a representation of the blocks in the blocklist, and not a copy of them. It is used to operate on the markdown data through side effects.

HeadingNode

Bases: MarkdownNode

Represents a heading node in the markdown tree structure.

Use accessor functions to read properties.

Source code in lmm/markdown/tree.py
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
class HeadingNode(MarkdownNode):
    """ Represents a heading node in the markdown tree structure.

    Use accessor functions to read properties.
    """
    def __init__(
        self,
        block: HeadingBlock | HeaderBlock,
        parent: "HeadingNode | None" = None,
    ):
        # this can only happen if type checks were ignored
        assert isinstance(
            block, (HeadingBlock, HeaderBlock)
        ), f"Invalid block type: {type(block)}"
        # type checker complains here, but it will enforce the
        # type at any subsequent access to the block data member,
        # simulating covariance
        self.block: HeadingBlock | HeaderBlock  # type: ignore
        super().__init__(block, parent)

    # Overrides
    @override
    def is_heading_node(self) -> bool:
        return True

    # Copy
    def naked_copy(self) -> 'HeadingNode':
        """Make a deep copy of this node and take it off the tree"""
        block_copy: HeadingBlock | HeaderBlock = (
            self.block.model_copy(deep=True)
        )
        new_node = HeadingNode(block_copy)
        new_node.metadata = copy.deepcopy(self.metadata)
        if self.metadata_block:
            new_node.metadata_block = self.metadata_block.model_copy(
                deep=True
            )

        return new_node

    def node_copy(self) -> 'HeadingNode':
        """Make a deep copy of this node, maintaining links to
        children. This creates a new branch root with reference to
        all children, but detached from the upper tree.
        """

        newnode = self.naked_copy()
        newnode.children = self.children
        return newnode

    def tree_copy(self) -> 'HeadingNode':
        """Make a deep copy of this node and its children
        (copy subtree)"""

        new_node = self.naked_copy()

        for child in self.children:
            child_copy = child.tree_copy()
            child_copy.parent = new_node
            new_node.children.append(child_copy)

        return new_node

    # Utility functions to retrieve basic properties
    def get_text_children(self) -> list['TextNode']:
        """Returns a list of children text nodes."""
        return [n for n in self.children if isinstance(n, TextNode)]

    def get_heading_children(self) -> list['HeadingNode']:
        """Returns a list of children heading nodes."""
        return [
            n for n in self.children if isinstance(n, HeadingNode)
        ]

    def heading_level(self) -> int:
        """Returns an integer for the heading level. The root
        node, if corresponding to a markdown header, has level 0."""
        if isinstance(self.block, HeadingBlock):
            return self.block.level
        return 0  # Root level for HeaderBlock

    def get_content(self) -> str:
        """Returns the title of the heading represented by the node"""
        if self.is_header_node():
            return self.get_metadata_string_for_key('title') or ""
        else:
            return str(self.block.get_content())

    def set_content(self, content: str) -> None:
        """Sets the text of the title of the heading represented
        by the node
        """
        match self.block:
            case HeaderBlock():
                self.set_metadata_for_key('title', content)
            case HeadingBlock():
                self.block.content = content

    def get_info(self, indent: int = 0) -> str:
        """
        Reports information about a node, including its type, content,
        and metadata.

        Args:
            indent: The indentation level for pretty printing
        """
        import yaml

        indent_str = "  " * indent

        # Collect block type and content
        info: str
        if self.is_header_node():
            info = "Header node:\n"
            info += f"{indent_str}Content: {self.get_content()}"
        else:
            info = "Heading node\n"
            info += (
                f"{indent_str}Heading (Level {self.heading_level()}):"
                + f" {self.get_content()}"
            )

        # Info on children
        if self.children:
            info += f"\nHas {self.count_children()} children, of "
            info += (
                f"which {len(self.get_text_children())} are "
                + "text children"
            )
        else:
            info += "\nNode with no children"

        # Collect metadata
        if self.metadata:
            info += (
                f"\n{indent_str}Metadata:"
                + f"\n{yaml.safe_dump(self.get_metadata())}"
            )

        if not info[-1] == '\n':
            info += "\n"
        return info

    def add_child(self, child_node: MarkdownNode) -> None:
        """
        Add a child node to this node.

        Args:
            child_node: The node to add as a child

        Note:
            one cannot add a heading node with a level equal or higher
            than that of the parent node. The level of the heading
            node is adjusted downwards automatically. Beyond the 
            lowest heading level, a text node is added.
        """
        if isinstance(child_node, HeadingNode):
            if self.heading_level() == LOWEST_HEADING_LEVEL:
                self.children.append(
                    TextNode.from_content(
                        content="#" * (LOWEST_HEADING_LEVEL + 1) + " "
                        + child_node.get_content(),
                        metadata=child_node.metadata,
                        parent=self,
                    )
                )
                return
            elif child_node.heading_level() <= self.heading_level():
                new_node = HeadingNode(
                    block=HeadingBlock(
                        level=self.heading_level() + 1,
                        content=child_node.get_content(),
                    )
                )
                new_node.metadata = child_node.metadata
                new_node.metadata_block = MetadataBlock(
                    content=child_node.metadata
                )
                child_node = new_node

        child_node.parent = self
        self.children.append(child_node)

add_child(child_node)

Add a child node to this node.

Parameters:

Name Type Description Default
child_node MarkdownNode

The node to add as a child

required
Note

one cannot add a heading node with a level equal or higher than that of the parent node. The level of the heading node is adjusted downwards automatically. Beyond the lowest heading level, a text node is added.

Source code in lmm/markdown/tree.py
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
def add_child(self, child_node: MarkdownNode) -> None:
    """
    Add a child node to this node.

    Args:
        child_node: The node to add as a child

    Note:
        one cannot add a heading node with a level equal or higher
        than that of the parent node. The level of the heading
        node is adjusted downwards automatically. Beyond the 
        lowest heading level, a text node is added.
    """
    if isinstance(child_node, HeadingNode):
        if self.heading_level() == LOWEST_HEADING_LEVEL:
            self.children.append(
                TextNode.from_content(
                    content="#" * (LOWEST_HEADING_LEVEL + 1) + " "
                    + child_node.get_content(),
                    metadata=child_node.metadata,
                    parent=self,
                )
            )
            return
        elif child_node.heading_level() <= self.heading_level():
            new_node = HeadingNode(
                block=HeadingBlock(
                    level=self.heading_level() + 1,
                    content=child_node.get_content(),
                )
            )
            new_node.metadata = child_node.metadata
            new_node.metadata_block = MetadataBlock(
                content=child_node.metadata
            )
            child_node = new_node

    child_node.parent = self
    self.children.append(child_node)

get_content()

Returns the title of the heading represented by the node

Source code in lmm/markdown/tree.py
544
545
546
547
548
549
def get_content(self) -> str:
    """Returns the title of the heading represented by the node"""
    if self.is_header_node():
        return self.get_metadata_string_for_key('title') or ""
    else:
        return str(self.block.get_content())

get_heading_children()

Returns a list of children heading nodes.

Source code in lmm/markdown/tree.py
531
532
533
534
535
def get_heading_children(self) -> list['HeadingNode']:
    """Returns a list of children heading nodes."""
    return [
        n for n in self.children if isinstance(n, HeadingNode)
    ]

get_info(indent=0)

Reports information about a node, including its type, content, and metadata.

Parameters:

Name Type Description Default
indent int

The indentation level for pretty printing

0
Source code in lmm/markdown/tree.py
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
def get_info(self, indent: int = 0) -> str:
    """
    Reports information about a node, including its type, content,
    and metadata.

    Args:
        indent: The indentation level for pretty printing
    """
    import yaml

    indent_str = "  " * indent

    # Collect block type and content
    info: str
    if self.is_header_node():
        info = "Header node:\n"
        info += f"{indent_str}Content: {self.get_content()}"
    else:
        info = "Heading node\n"
        info += (
            f"{indent_str}Heading (Level {self.heading_level()}):"
            + f" {self.get_content()}"
        )

    # Info on children
    if self.children:
        info += f"\nHas {self.count_children()} children, of "
        info += (
            f"which {len(self.get_text_children())} are "
            + "text children"
        )
    else:
        info += "\nNode with no children"

    # Collect metadata
    if self.metadata:
        info += (
            f"\n{indent_str}Metadata:"
            + f"\n{yaml.safe_dump(self.get_metadata())}"
        )

    if not info[-1] == '\n':
        info += "\n"
    return info

get_text_children()

Returns a list of children text nodes.

Source code in lmm/markdown/tree.py
527
528
529
def get_text_children(self) -> list['TextNode']:
    """Returns a list of children text nodes."""
    return [n for n in self.children if isinstance(n, TextNode)]

heading_level()

Returns an integer for the heading level. The root node, if corresponding to a markdown header, has level 0.

Source code in lmm/markdown/tree.py
537
538
539
540
541
542
def heading_level(self) -> int:
    """Returns an integer for the heading level. The root
    node, if corresponding to a markdown header, has level 0."""
    if isinstance(self.block, HeadingBlock):
        return self.block.level
    return 0  # Root level for HeaderBlock

naked_copy()

Make a deep copy of this node and take it off the tree

Source code in lmm/markdown/tree.py
489
490
491
492
493
494
495
496
497
498
499
500
501
def naked_copy(self) -> 'HeadingNode':
    """Make a deep copy of this node and take it off the tree"""
    block_copy: HeadingBlock | HeaderBlock = (
        self.block.model_copy(deep=True)
    )
    new_node = HeadingNode(block_copy)
    new_node.metadata = copy.deepcopy(self.metadata)
    if self.metadata_block:
        new_node.metadata_block = self.metadata_block.model_copy(
            deep=True
        )

    return new_node

node_copy()

Make a deep copy of this node, maintaining links to children. This creates a new branch root with reference to all children, but detached from the upper tree.

Source code in lmm/markdown/tree.py
503
504
505
506
507
508
509
510
511
def node_copy(self) -> 'HeadingNode':
    """Make a deep copy of this node, maintaining links to
    children. This creates a new branch root with reference to
    all children, but detached from the upper tree.
    """

    newnode = self.naked_copy()
    newnode.children = self.children
    return newnode

set_content(content)

Sets the text of the title of the heading represented by the node

Source code in lmm/markdown/tree.py
551
552
553
554
555
556
557
558
559
def set_content(self, content: str) -> None:
    """Sets the text of the title of the heading represented
    by the node
    """
    match self.block:
        case HeaderBlock():
            self.set_metadata_for_key('title', content)
        case HeadingBlock():
            self.block.content = content

tree_copy()

Make a deep copy of this node and its children (copy subtree)

Source code in lmm/markdown/tree.py
513
514
515
516
517
518
519
520
521
522
523
524
def tree_copy(self) -> 'HeadingNode':
    """Make a deep copy of this node and its children
    (copy subtree)"""

    new_node = self.naked_copy()

    for child in self.children:
        child_copy = child.tree_copy()
        child_copy.parent = new_node
        new_node.children.append(child_copy)

    return new_node

MarkdownNode

Bases: ABC

Represents a node in the markdown tree structure.

Each node contains a markdown block (heading or text), a reference to its parent, a list of children, and associated metadata. Both heading and text blocks have textual content.

Use accessor functions to read properties of the node:

  • get_metadata()/fetch_metadata()
  • get_metadata_for_key()/fetch_metadata_for_key()
  • get_metadata_string_for_key()/fetch_metadata_string_for_key()

The names of the functions indicate use:

  • get_* - node's own metadata
  • fetch_* - with inheritance
  • string - returns string representation
Source code in lmm/markdown/tree.py
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
class MarkdownNode(ABC):
    """
    Represents a node in the markdown tree structure.

    Each node contains a markdown block (heading or text), a reference
    to its parent, a list of children, and associated metadata. Both
    heading and text blocks have textual content.

    Use accessor functions to read properties of the node:

    - get_metadata()/fetch_metadata()
    - get_metadata_for_key()/fetch_metadata_for_key()
    - get_metadata_string_for_key()/fetch_metadata_string_for_key()

    The names of the functions indicate use:

    - get_* - node's own metadata
    - fetch_* - with inheritance
    - *_string_* - returns string representation
    """

    def __init__(
        self, block: Block, parent: 'HeadingNode | None' = None
    ):
        """
        Initialize a new MarkdownNode.

        Args:
            block: The original block (heading or text)
            parent: The parent node (None for root)
        """
        self.block: Block = block  # Original block
        self.parent: 'HeadingNode | None' = parent  # Parent node
        self.children: list['MarkdownNode'] = []  # Child nodes
        self.metadata: MetadataDict = {}  # Associated metadata
        # Original metadata block. This is a private immutable data
        # member that is used at serialization to reconstitute the
        # original markdown. Changes in self.metadata will be
        # incorporated in the original metadata block when the block
        # is serialized.
        self.metadata_block: HeaderBlock | MetadataBlock | None = None

    # Copy
    @abstractmethod
    def naked_copy(self) -> Self:
        """Make a deep copy of this node and take it off the tree."""
        pass

    @abstractmethod
    def node_copy(self) -> Self:
        """Make a copy of this node, but keep links to children."""
        pass

    @abstractmethod
    def tree_copy(self) -> Self:
        """Make a deep copy of this node and its children
        (copy subtree)."""
        pass

    # Utility functions to retrieve basic properties
    def is_header_node(self) -> bool:
        """A node initialized from a header block."""
        return self.metadata_block is not None and isinstance(
            self.metadata_block, HeaderBlock
        )

    def is_root_node(self) -> bool:
        """A node w/o parents, not necessarily a header."""
        return self.parent is None

    def is_text_node(self) -> bool:
        "Is this a TextNode"
        return False

    def is_heading_node(self) -> bool:
        "Is this a HeadingNode"
        return False

    @abstractmethod
    def get_text_children(self) -> list['TextNode']:
        pass

    @abstractmethod
    def get_heading_children(self) -> list['HeadingNode']:
        pass

    def get_parent(self) -> 'HeadingNode | None':
        return self.parent

    def count_children(self) -> int:
        return len(self.children)

    @abstractmethod
    def heading_level(self) -> int | None:
        pass

    # Content and metadata
    @abstractmethod
    def get_content(self) -> str:
        """Returns text of headings or of text nodes."""
        pass

    @abstractmethod
    def set_content(self, content: str) -> None:
        """Set text of headings or of text nodes."""
        pass

    def has_metadata_key(
        self,
        key: str,
        inherit: bool = False,
        include_header: bool = False,
    ) -> bool:
        if inherit:
            return (
                self.fetch_metadata_for_key(key, include_header)
                is not None
            )
        return bool(self.metadata) and (key in self.metadata)

    def get_metadata(self, key: str | None = None) -> MetadataDict:
        """
        Get the metadata of the current node. For the header node,
        the document header is the metadata.

        Returns:
            a conformant dictionary.
        """
        if not key:
            return copy.deepcopy(self.metadata)
        elif self.metadata and key in self.metadata:
            return {key: self.metadata[key]}
        return {}

    def get_metadata_for_key(
        self, key: str, default: MetadataValue = None
    ) -> MetadataValue:
        """
        Get the key value in the metadata of the current node. For the
        root node, the header value for that key is returned.

        Args:
            key: the key for which the metadata is searched
            default: a default value if the key is absent

        Returns:
            the key value, or a default value if the key is not
                found (None if no default specified).
        """
        if key in self.metadata:
            return self.metadata[key]
        return default

    def get_metadata_string_for_key(
        self, key: str, default: str | None = None
    ) -> str | None:
        """
        Get the string representation of the key value in the
        metadata of the current node. If the value is a dict
        of list, return None. If the key is not present, return
        a default value. For the root node, the header value for
        that key is returned.

        Args:
            key: the key for which the metadata is searched
            default: a default value if the key is absent

        Returns:
            the key value, a default value if the key is not
                found (None if no default specified). If the value
                is not a primitive value of the int, float, str, or
                bool type, returns None.
        """
        if key in self.metadata:
            value: MetadataValue = self.metadata[key]
            if is_metadata_primitive(value):
                return str(value)
            else:
                return None
        return default

    def set_metadata_for_key(
        self, key: str, value: MetadataValue
    ) -> None:
        """Set the metadata value at key of the current node.

        Args:
            key: the key where the value should be set
            value: the metadata value
        """
        self.metadata[key] = value

    def fetch_metadata(
        self, key: str | None = None, include_header: bool = True
    ) -> MetadataDict:
        """
        Returns the effective metadata for this node by traversing up
        the tree if necessary to find inherited metadata. The metadata
        are those of the first node with metadata. If a key is given,
        only a dictionary with that key will be returned. If
        include_header is False, the header node is not considered.

        Args:
            key: the key for which the metadata is searched
            include_header: if the header metadata should be included
                in the search

        Returns:
            A dictionary giving the effective metadata for this node
        """

        if not key:
            if self.metadata:
                if not include_header and self.is_header_node():
                    return {}
                return self.metadata.copy()
            elif self.parent:
                return self.parent.fetch_metadata(
                    None,
                    include_header,
                )
            return {}
        else:
            value = self.fetch_metadata_for_key(key, include_header)
            return {key: value} if value else {}

    def fetch_metadata_for_key(
        self,
        key: str,
        include_header: bool = True,
        default: MetadataValue = None,
    ) -> MetadataValue:
        """
        Returns the value for a specific metadata key by traversing up
        the tree if necessary to find inherited metadata. If
        include_header is False, the header node is not considered.

        This function extends the concept of metadata inheritance to
        look for a specific key in the node's metadata or its
        ancestors' metadata.

        Args:
            key: The specific metadata key to look for
            include_header: If to include the header in the search
            default: the value to return if the key is not found

        Returns:
            The value for the specified key, or a default value if not
            found in the node or any of its ancestors (or None if no
            default was specified).
        """

        if not key:
            return default

        if self.metadata and key in self.metadata:
            if not include_header and self.is_header_node():
                return default
            return self.metadata[key]
        elif self.parent:
            return self.parent.fetch_metadata_for_key(
                key, include_header, default
            )
        return default

    def fetch_metadata_string_for_key(
        self,
        key: str,
        include_header: bool = True,
        default: str | None = None,
    ) -> str | None:
        """
        Returns the string representation of the value of a specific
        metadata key by traversing up the tree if necessary to find
        inherited metadata. If the key is not present, return a
        default value. If the key is a dict or list, return None. If
        include_header is False, the header node is not considered.

        This function extends the concept of metadata inheritance to
        look for a specific key in the node's metadata or its
        ancestors' metadata.

        Args:
            key: The specific metadata key to look for
            include_header: If to include the header in the search
            default: the value to return if the key is not found

        Returns:
            The value for the specified key, or a default value if not
            found in the node or any of its ancestors (or None if no
            default was specified). If the value is not a primitive
            value of the int, float, str, or bool type returns None.
        """

        if not key:
            return default

        if self.metadata and key in self.metadata:
            if not include_header and self.is_header_node():
                return default
            return self.get_metadata_string_for_key(key)
        elif self.parent:
            return self.parent.fetch_metadata_string_for_key(
                key, include_header, default
            )
        return default

    def as_dict(
        self, inherit: bool = False, include_header: bool = True
    ) -> NodeDict:
        """
        Return a dictionary representation of the node.

        Args:
            inherit: if True, the metadata of the first parent with
                metadata
            include_header: if False, the header is not considered as
                a metadata source.

        Returns: a dictionary with keys 'content' and
        'metadata'.
        """
        if not inherit:
            return {
                'content': self.get_content(),
                'metadata': self.get_metadata(),
            }
        else:
            return {
                'content': self.get_content(),
                'metadata': self.fetch_metadata(None, include_header),
            }

    @abstractmethod
    def get_info(self) -> str:
        """Return a string rpresentation of the node with info
        on its properties"""
        pass

__init__(block, parent=None)

Initialize a new MarkdownNode.

Parameters:

Name Type Description Default
block Block

The original block (heading or text)

required
parent HeadingNode | None

The parent node (None for root)

None
Source code in lmm/markdown/tree.py
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
def __init__(
    self, block: Block, parent: 'HeadingNode | None' = None
):
    """
    Initialize a new MarkdownNode.

    Args:
        block: The original block (heading or text)
        parent: The parent node (None for root)
    """
    self.block: Block = block  # Original block
    self.parent: 'HeadingNode | None' = parent  # Parent node
    self.children: list['MarkdownNode'] = []  # Child nodes
    self.metadata: MetadataDict = {}  # Associated metadata
    # Original metadata block. This is a private immutable data
    # member that is used at serialization to reconstitute the
    # original markdown. Changes in self.metadata will be
    # incorporated in the original metadata block when the block
    # is serialized.
    self.metadata_block: HeaderBlock | MetadataBlock | None = None

as_dict(inherit=False, include_header=True)

Return a dictionary representation of the node.

Parameters:

Name Type Description Default
inherit bool

if True, the metadata of the first parent with metadata

False
include_header bool

if False, the header is not considered as a metadata source.

True

Returns: a dictionary with keys 'content' and 'metadata'.

Source code in lmm/markdown/tree.py
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
def as_dict(
    self, inherit: bool = False, include_header: bool = True
) -> NodeDict:
    """
    Return a dictionary representation of the node.

    Args:
        inherit: if True, the metadata of the first parent with
            metadata
        include_header: if False, the header is not considered as
            a metadata source.

    Returns: a dictionary with keys 'content' and
    'metadata'.
    """
    if not inherit:
        return {
            'content': self.get_content(),
            'metadata': self.get_metadata(),
        }
    else:
        return {
            'content': self.get_content(),
            'metadata': self.fetch_metadata(None, include_header),
        }

fetch_metadata(key=None, include_header=True)

Returns the effective metadata for this node by traversing up the tree if necessary to find inherited metadata. The metadata are those of the first node with metadata. If a key is given, only a dictionary with that key will be returned. If include_header is False, the header node is not considered.

Parameters:

Name Type Description Default
key str | None

the key for which the metadata is searched

None
include_header bool

if the header metadata should be included in the search

True

Returns:

Type Description
MetadataDict

A dictionary giving the effective metadata for this node

Source code in lmm/markdown/tree.py
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
def fetch_metadata(
    self, key: str | None = None, include_header: bool = True
) -> MetadataDict:
    """
    Returns the effective metadata for this node by traversing up
    the tree if necessary to find inherited metadata. The metadata
    are those of the first node with metadata. If a key is given,
    only a dictionary with that key will be returned. If
    include_header is False, the header node is not considered.

    Args:
        key: the key for which the metadata is searched
        include_header: if the header metadata should be included
            in the search

    Returns:
        A dictionary giving the effective metadata for this node
    """

    if not key:
        if self.metadata:
            if not include_header and self.is_header_node():
                return {}
            return self.metadata.copy()
        elif self.parent:
            return self.parent.fetch_metadata(
                None,
                include_header,
            )
        return {}
    else:
        value = self.fetch_metadata_for_key(key, include_header)
        return {key: value} if value else {}

fetch_metadata_for_key(key, include_header=True, default=None)

Returns the value for a specific metadata key by traversing up the tree if necessary to find inherited metadata. If include_header is False, the header node is not considered.

This function extends the concept of metadata inheritance to look for a specific key in the node's metadata or its ancestors' metadata.

Parameters:

Name Type Description Default
key str

The specific metadata key to look for

required
include_header bool

If to include the header in the search

True
default MetadataValue

the value to return if the key is not found

None

Returns:

Type Description
MetadataValue

The value for the specified key, or a default value if not

MetadataValue

found in the node or any of its ancestors (or None if no

MetadataValue

default was specified).

Source code in lmm/markdown/tree.py
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
def fetch_metadata_for_key(
    self,
    key: str,
    include_header: bool = True,
    default: MetadataValue = None,
) -> MetadataValue:
    """
    Returns the value for a specific metadata key by traversing up
    the tree if necessary to find inherited metadata. If
    include_header is False, the header node is not considered.

    This function extends the concept of metadata inheritance to
    look for a specific key in the node's metadata or its
    ancestors' metadata.

    Args:
        key: The specific metadata key to look for
        include_header: If to include the header in the search
        default: the value to return if the key is not found

    Returns:
        The value for the specified key, or a default value if not
        found in the node or any of its ancestors (or None if no
        default was specified).
    """

    if not key:
        return default

    if self.metadata and key in self.metadata:
        if not include_header and self.is_header_node():
            return default
        return self.metadata[key]
    elif self.parent:
        return self.parent.fetch_metadata_for_key(
            key, include_header, default
        )
    return default

fetch_metadata_string_for_key(key, include_header=True, default=None)

Returns the string representation of the value of a specific metadata key by traversing up the tree if necessary to find inherited metadata. If the key is not present, return a default value. If the key is a dict or list, return None. If include_header is False, the header node is not considered.

This function extends the concept of metadata inheritance to look for a specific key in the node's metadata or its ancestors' metadata.

Parameters:

Name Type Description Default
key str

The specific metadata key to look for

required
include_header bool

If to include the header in the search

True
default str | None

the value to return if the key is not found

None

Returns:

Type Description
str | None

The value for the specified key, or a default value if not

str | None

found in the node or any of its ancestors (or None if no

str | None

default was specified). If the value is not a primitive

str | None

value of the int, float, str, or bool type returns None.

Source code in lmm/markdown/tree.py
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
def fetch_metadata_string_for_key(
    self,
    key: str,
    include_header: bool = True,
    default: str | None = None,
) -> str | None:
    """
    Returns the string representation of the value of a specific
    metadata key by traversing up the tree if necessary to find
    inherited metadata. If the key is not present, return a
    default value. If the key is a dict or list, return None. If
    include_header is False, the header node is not considered.

    This function extends the concept of metadata inheritance to
    look for a specific key in the node's metadata or its
    ancestors' metadata.

    Args:
        key: The specific metadata key to look for
        include_header: If to include the header in the search
        default: the value to return if the key is not found

    Returns:
        The value for the specified key, or a default value if not
        found in the node or any of its ancestors (or None if no
        default was specified). If the value is not a primitive
        value of the int, float, str, or bool type returns None.
    """

    if not key:
        return default

    if self.metadata and key in self.metadata:
        if not include_header and self.is_header_node():
            return default
        return self.get_metadata_string_for_key(key)
    elif self.parent:
        return self.parent.fetch_metadata_string_for_key(
            key, include_header, default
        )
    return default

get_content() abstractmethod

Returns text of headings or of text nodes.

Source code in lmm/markdown/tree.py
219
220
221
222
@abstractmethod
def get_content(self) -> str:
    """Returns text of headings or of text nodes."""
    pass

get_info() abstractmethod

Return a string rpresentation of the node with info on its properties

Source code in lmm/markdown/tree.py
455
456
457
458
459
@abstractmethod
def get_info(self) -> str:
    """Return a string rpresentation of the node with info
    on its properties"""
    pass

get_metadata(key=None)

Get the metadata of the current node. For the header node, the document header is the metadata.

Returns:

Type Description
MetadataDict

a conformant dictionary.

Source code in lmm/markdown/tree.py
242
243
244
245
246
247
248
249
250
251
252
253
254
def get_metadata(self, key: str | None = None) -> MetadataDict:
    """
    Get the metadata of the current node. For the header node,
    the document header is the metadata.

    Returns:
        a conformant dictionary.
    """
    if not key:
        return copy.deepcopy(self.metadata)
    elif self.metadata and key in self.metadata:
        return {key: self.metadata[key]}
    return {}

get_metadata_for_key(key, default=None)

Get the key value in the metadata of the current node. For the root node, the header value for that key is returned.

Parameters:

Name Type Description Default
key str

the key for which the metadata is searched

required
default MetadataValue

a default value if the key is absent

None

Returns:

Type Description
MetadataValue

the key value, or a default value if the key is not found (None if no default specified).

Source code in lmm/markdown/tree.py
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
def get_metadata_for_key(
    self, key: str, default: MetadataValue = None
) -> MetadataValue:
    """
    Get the key value in the metadata of the current node. For the
    root node, the header value for that key is returned.

    Args:
        key: the key for which the metadata is searched
        default: a default value if the key is absent

    Returns:
        the key value, or a default value if the key is not
            found (None if no default specified).
    """
    if key in self.metadata:
        return self.metadata[key]
    return default

get_metadata_string_for_key(key, default=None)

Get the string representation of the key value in the metadata of the current node. If the value is a dict of list, return None. If the key is not present, return a default value. For the root node, the header value for that key is returned.

Parameters:

Name Type Description Default
key str

the key for which the metadata is searched

required
default str | None

a default value if the key is absent

None

Returns:

Type Description
str | None

the key value, a default value if the key is not found (None if no default specified). If the value is not a primitive value of the int, float, str, or bool type, returns None.

Source code in lmm/markdown/tree.py
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
def get_metadata_string_for_key(
    self, key: str, default: str | None = None
) -> str | None:
    """
    Get the string representation of the key value in the
    metadata of the current node. If the value is a dict
    of list, return None. If the key is not present, return
    a default value. For the root node, the header value for
    that key is returned.

    Args:
        key: the key for which the metadata is searched
        default: a default value if the key is absent

    Returns:
        the key value, a default value if the key is not
            found (None if no default specified). If the value
            is not a primitive value of the int, float, str, or
            bool type, returns None.
    """
    if key in self.metadata:
        value: MetadataValue = self.metadata[key]
        if is_metadata_primitive(value):
            return str(value)
        else:
            return None
    return default

is_header_node()

A node initialized from a header block.

Source code in lmm/markdown/tree.py
182
183
184
185
186
def is_header_node(self) -> bool:
    """A node initialized from a header block."""
    return self.metadata_block is not None and isinstance(
        self.metadata_block, HeaderBlock
    )

is_heading_node()

Is this a HeadingNode

Source code in lmm/markdown/tree.py
196
197
198
def is_heading_node(self) -> bool:
    "Is this a HeadingNode"
    return False

is_root_node()

A node w/o parents, not necessarily a header.

Source code in lmm/markdown/tree.py
188
189
190
def is_root_node(self) -> bool:
    """A node w/o parents, not necessarily a header."""
    return self.parent is None

is_text_node()

Is this a TextNode

Source code in lmm/markdown/tree.py
192
193
194
def is_text_node(self) -> bool:
    "Is this a TextNode"
    return False

naked_copy() abstractmethod

Make a deep copy of this node and take it off the tree.

Source code in lmm/markdown/tree.py
165
166
167
168
@abstractmethod
def naked_copy(self) -> Self:
    """Make a deep copy of this node and take it off the tree."""
    pass

node_copy() abstractmethod

Make a copy of this node, but keep links to children.

Source code in lmm/markdown/tree.py
170
171
172
173
@abstractmethod
def node_copy(self) -> Self:
    """Make a copy of this node, but keep links to children."""
    pass

set_content(content) abstractmethod

Set text of headings or of text nodes.

Source code in lmm/markdown/tree.py
224
225
226
227
@abstractmethod
def set_content(self, content: str) -> None:
    """Set text of headings or of text nodes."""
    pass

set_metadata_for_key(key, value)

Set the metadata value at key of the current node.

Parameters:

Name Type Description Default
key str

the key where the value should be set

required
value MetadataValue

the metadata value

required
Source code in lmm/markdown/tree.py
303
304
305
306
307
308
309
310
311
312
def set_metadata_for_key(
    self, key: str, value: MetadataValue
) -> None:
    """Set the metadata value at key of the current node.

    Args:
        key: the key where the value should be set
        value: the metadata value
    """
    self.metadata[key] = value

tree_copy() abstractmethod

Make a deep copy of this node and its children (copy subtree).

Source code in lmm/markdown/tree.py
175
176
177
178
179
@abstractmethod
def tree_copy(self) -> Self:
    """Make a deep copy of this node and its children
    (copy subtree)."""
    pass

NodeDict

Bases: TypedDict

A dictionary representation of a node.

Fields

content (str): the string content of the node (the title for a heading node, the text of a text node) metadata (MetadataDict): the metadata

Source code in lmm/markdown/tree.py
105
106
107
108
109
110
111
112
113
114
115
class NodeDict(TypedDict):
    """A dictionary representation of a node.

    Fields:
        content (str): the string content of the node (the title for
            a heading node, the text of a text node)
        metadata (MetadataDict): the metadata
    """

    content: str
    metadata: MetadataDict

TextNode

Bases: MarkdownNode

Represents a text node in the markdown tree.

User accessor functions to retrieve properties.

Source code in lmm/markdown/tree.py
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
class TextNode(MarkdownNode):
    """Represents a text node in the markdown tree.

    User accessor functions to retrieve properties.
    """
    def __init__(
        self,
        block: TextBlock | ErrorBlock,
        parent: "HeadingNode | None" = None,
    ):
        # this can only happen if type checks were ignored
        assert isinstance(
            block, (TextBlock, ErrorBlock)
        ), f"Invalid block type: {type(block)}"
        # type checker complains here, but it will enforce the
        # type at any subsequent access to the block data member,
        # simulating covariance
        self.block: TextBlock | ErrorBlock  # type: ignore
        super().__init__(block, parent)

    @staticmethod
    def from_content(
        content: str,
        metadata: MetadataDict | None = None,
        parent: HeadingNode | None = None,
    ) -> 'TextNode':
        """Create a text node from content and metadata.

        Args:
            content: the text content of the node
            metadata: a dictionary with metadata (optional)
            parent: if not None, the parent heading of the node

        Returns: 
            a text node.    
        """
        newnode = TextNode(
            block=TextBlock.from_text(content), parent=parent
        )
        if metadata:
            newnode.metadata = metadata
        return newnode

    # Overrides
    @override
    def is_text_node(self) -> bool:
        return True

    # Copy
    def naked_copy(self) -> 'TextNode':
        """Make a deep copy of this node and take it off the tree"""
        block_copy: TextBlock | ErrorBlock = self.block.model_copy(
            deep=True
        )
        new_node = TextNode(block_copy)
        new_node.metadata = copy.deepcopy(self.metadata)
        if self.metadata_block:
            new_node.metadata_block = self.metadata_block.model_copy(
                deep=True
            )

        return new_node

    def node_copy(self) -> 'TextNode':
        """Make a deep copy of this node (same as naked_copy for
        TextNodes)."""

        return self.naked_copy()

    def tree_copy(self) -> 'TextNode':
        """Make a deep copy of this node"""

        # Copy self
        return self.naked_copy()

    # Utility functions to retrieve basic properties
    def get_text_children(self) -> list['TextNode']:
        """Always returns an empty list for text nodes."""
        return []  # Text nodes have no children

    def get_heading_children(self) -> list['HeadingNode']:
        """Always returns an empty list for text nodes."""
        return []  # Text nodes have no children

    def heading_level(self) -> None:
        """Returns None as text nodes have no level."""
        return None  # Text nodes don't have levels

    def get_content(self) -> str:
        """Returns the content of the markdown text represented by
        the node."""
        return self.block.get_content()

    def set_content(self, content: str) -> None:
        """Set the text content of the node.

        Args:
            content: the new text content.
        """
        self.block.content = content

    def get_metadata(self, key: str | None = None) -> MetadataDict:
        """
        Get the metadata of the current node. If the text node has no
        metadata, the metadata of the heading.

        Args:
            key: the key for the metadata value to retrieve. If None,
                the whole dictionary.

        Returns:
            a conformant dictionary.

        Note: 
            in most cases, text will be annotated at the heading 
            level, not at individual paragraphs. The metadata of text
            nodes overrides the default metadata from the heading.
        """
        if not key:
            return copy.deepcopy(self.metadata)
        elif self.metadata and key in self.metadata:
            return {key: self.metadata[key]}
        elif (
            self.parent is not None
            and self.parent.metadata
            and key in self.parent.metadata
        ):
            return {key: self.parent.metadata[key]}
        return {}

    def get_metadata_for_key(
        self, key: str, default: MetadataValue = None
    ) -> MetadataValue:
        """
        Get the key value in the metadata of the current node. If the
        key is not in the node, return the key of the heading.

        Args:
            key: the key for which the metadata is searched
            default: a default value if the key is absent

        Returns:
            the key value, or a default value if the key is not
                found (None if no default specified).

        Note: 
            in most cases, text will be annotated at the heading 
            level, not at individual paragraphs. The metadata of text
            nodes overrides the default metadata from the heading.
        """
        if key in self.metadata:
            return self.metadata[key]
        elif self.parent is not None and key in self.parent.metadata:
            return self.parent.metadata[key]
        return default

    def get_metadata_string_for_key(
        self, key: str, default: str | None = None
    ) -> str | None:
        """
        Get the string representation of the key value in the
        metadata of the current node. If the value is a dict
        of list, return None. If the key is not present, return
        a default value. If the key is not in the metadata,
        return the key value of the heading.

        Args:
            key: the key for which the metadata is searched
            default: a default value if the key is absent

        Returns:
            the key value, a default value if the key is not
                found (None if no default specified). If the value
                is not a primitive value of the int, float, str, or
                bool type, returns None.

        Note: 
            in most cases, text will be annotated at the heading 
            level, not at individual paragraphs. The metadata of text
            nodes overrides the default metadata from the heading.
        """
        if key in self.metadata:
            value: MetadataValue = self.metadata[key]
            if is_metadata_primitive(value):
                return str(value)
            else:
                return None
        elif self.parent is not None and key in self.parent.metadata:
            value: MetadataValue = self.parent.metadata[key]
            if is_metadata_primitive(value):
                return str(value)
            else:
                return None
        return default

    def set_metadata_for_key(
        self, key: str, value: MetadataValue
    ) -> None:
        """Set the metadata value at key of the current node.

        Args:
            key: the key where the value should be set
            value: the metadata value
        """
        self.metadata[key] = value

    def get_info(self, indent: int = 0) -> str:
        """
        Reports information about a node, including its type, content,
        and metadata.

        Args:
            indent: The indentation level for pretty printing
        """
        import yaml

        indent_str = "  " * indent

        # Collect block type and content
        info = "Text node"
        info += "\n" if self.get_parent() else " (freestanding)\n"

        content = self.get_content()
        if len(content) > 50:
            content = content[:47] + "..."
        if len(content) > 0:
            info += f"{indent_str}Text: {content}"
        else:
            info += "Placeholder text block for metadata"

        # Collect metadata
        if self.metadata:
            info += (
                f"\n{indent_str}Metadata:"
                + f"\n{yaml.safe_dump(self.metadata)}"
            )

        if not info[-1] == '\n':
            info += "\n"
        return info

from_content(content, metadata=None, parent=None) staticmethod

Create a text node from content and metadata.

Parameters:

Name Type Description Default
content str

the text content of the node

required
metadata MetadataDict | None

a dictionary with metadata (optional)

None
parent HeadingNode | None

if not None, the parent heading of the node

None

Returns:

Type Description
TextNode

a text node.

Source code in lmm/markdown/tree.py
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
@staticmethod
def from_content(
    content: str,
    metadata: MetadataDict | None = None,
    parent: HeadingNode | None = None,
) -> 'TextNode':
    """Create a text node from content and metadata.

    Args:
        content: the text content of the node
        metadata: a dictionary with metadata (optional)
        parent: if not None, the parent heading of the node

    Returns: 
        a text node.    
    """
    newnode = TextNode(
        block=TextBlock.from_text(content), parent=parent
    )
    if metadata:
        newnode.metadata = metadata
    return newnode

get_content()

Returns the content of the markdown text represented by the node.

Source code in lmm/markdown/tree.py
735
736
737
738
def get_content(self) -> str:
    """Returns the content of the markdown text represented by
    the node."""
    return self.block.get_content()

get_heading_children()

Always returns an empty list for text nodes.

Source code in lmm/markdown/tree.py
727
728
729
def get_heading_children(self) -> list['HeadingNode']:
    """Always returns an empty list for text nodes."""
    return []  # Text nodes have no children

get_info(indent=0)

Reports information about a node, including its type, content, and metadata.

Parameters:

Name Type Description Default
indent int

The indentation level for pretty printing

0
Source code in lmm/markdown/tree.py
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
def get_info(self, indent: int = 0) -> str:
    """
    Reports information about a node, including its type, content,
    and metadata.

    Args:
        indent: The indentation level for pretty printing
    """
    import yaml

    indent_str = "  " * indent

    # Collect block type and content
    info = "Text node"
    info += "\n" if self.get_parent() else " (freestanding)\n"

    content = self.get_content()
    if len(content) > 50:
        content = content[:47] + "..."
    if len(content) > 0:
        info += f"{indent_str}Text: {content}"
    else:
        info += "Placeholder text block for metadata"

    # Collect metadata
    if self.metadata:
        info += (
            f"\n{indent_str}Metadata:"
            + f"\n{yaml.safe_dump(self.metadata)}"
        )

    if not info[-1] == '\n':
        info += "\n"
    return info

get_metadata(key=None)

Get the metadata of the current node. If the text node has no metadata, the metadata of the heading.

Parameters:

Name Type Description Default
key str | None

the key for the metadata value to retrieve. If None, the whole dictionary.

None

Returns:

Type Description
MetadataDict

a conformant dictionary.

Note

in most cases, text will be annotated at the heading level, not at individual paragraphs. The metadata of text nodes overrides the default metadata from the heading.

Source code in lmm/markdown/tree.py
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
def get_metadata(self, key: str | None = None) -> MetadataDict:
    """
    Get the metadata of the current node. If the text node has no
    metadata, the metadata of the heading.

    Args:
        key: the key for the metadata value to retrieve. If None,
            the whole dictionary.

    Returns:
        a conformant dictionary.

    Note: 
        in most cases, text will be annotated at the heading 
        level, not at individual paragraphs. The metadata of text
        nodes overrides the default metadata from the heading.
    """
    if not key:
        return copy.deepcopy(self.metadata)
    elif self.metadata and key in self.metadata:
        return {key: self.metadata[key]}
    elif (
        self.parent is not None
        and self.parent.metadata
        and key in self.parent.metadata
    ):
        return {key: self.parent.metadata[key]}
    return {}

get_metadata_for_key(key, default=None)

Get the key value in the metadata of the current node. If the key is not in the node, return the key of the heading.

Parameters:

Name Type Description Default
key str

the key for which the metadata is searched

required
default MetadataValue

a default value if the key is absent

None

Returns:

Type Description
MetadataValue

the key value, or a default value if the key is not found (None if no default specified).

Note

in most cases, text will be annotated at the heading level, not at individual paragraphs. The metadata of text nodes overrides the default metadata from the heading.

Source code in lmm/markdown/tree.py
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
def get_metadata_for_key(
    self, key: str, default: MetadataValue = None
) -> MetadataValue:
    """
    Get the key value in the metadata of the current node. If the
    key is not in the node, return the key of the heading.

    Args:
        key: the key for which the metadata is searched
        default: a default value if the key is absent

    Returns:
        the key value, or a default value if the key is not
            found (None if no default specified).

    Note: 
        in most cases, text will be annotated at the heading 
        level, not at individual paragraphs. The metadata of text
        nodes overrides the default metadata from the heading.
    """
    if key in self.metadata:
        return self.metadata[key]
    elif self.parent is not None and key in self.parent.metadata:
        return self.parent.metadata[key]
    return default

get_metadata_string_for_key(key, default=None)

Get the string representation of the key value in the metadata of the current node. If the value is a dict of list, return None. If the key is not present, return a default value. If the key is not in the metadata, return the key value of the heading.

Parameters:

Name Type Description Default
key str

the key for which the metadata is searched

required
default str | None

a default value if the key is absent

None

Returns:

Type Description
str | None

the key value, a default value if the key is not found (None if no default specified). If the value is not a primitive value of the int, float, str, or bool type, returns None.

Note

in most cases, text will be annotated at the heading level, not at individual paragraphs. The metadata of text nodes overrides the default metadata from the heading.

Source code in lmm/markdown/tree.py
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
def get_metadata_string_for_key(
    self, key: str, default: str | None = None
) -> str | None:
    """
    Get the string representation of the key value in the
    metadata of the current node. If the value is a dict
    of list, return None. If the key is not present, return
    a default value. If the key is not in the metadata,
    return the key value of the heading.

    Args:
        key: the key for which the metadata is searched
        default: a default value if the key is absent

    Returns:
        the key value, a default value if the key is not
            found (None if no default specified). If the value
            is not a primitive value of the int, float, str, or
            bool type, returns None.

    Note: 
        in most cases, text will be annotated at the heading 
        level, not at individual paragraphs. The metadata of text
        nodes overrides the default metadata from the heading.
    """
    if key in self.metadata:
        value: MetadataValue = self.metadata[key]
        if is_metadata_primitive(value):
            return str(value)
        else:
            return None
    elif self.parent is not None and key in self.parent.metadata:
        value: MetadataValue = self.parent.metadata[key]
        if is_metadata_primitive(value):
            return str(value)
        else:
            return None
    return default

get_text_children()

Always returns an empty list for text nodes.

Source code in lmm/markdown/tree.py
723
724
725
def get_text_children(self) -> list['TextNode']:
    """Always returns an empty list for text nodes."""
    return []  # Text nodes have no children

heading_level()

Returns None as text nodes have no level.

Source code in lmm/markdown/tree.py
731
732
733
def heading_level(self) -> None:
    """Returns None as text nodes have no level."""
    return None  # Text nodes don't have levels

naked_copy()

Make a deep copy of this node and take it off the tree

Source code in lmm/markdown/tree.py
696
697
698
699
700
701
702
703
704
705
706
707
708
def naked_copy(self) -> 'TextNode':
    """Make a deep copy of this node and take it off the tree"""
    block_copy: TextBlock | ErrorBlock = self.block.model_copy(
        deep=True
    )
    new_node = TextNode(block_copy)
    new_node.metadata = copy.deepcopy(self.metadata)
    if self.metadata_block:
        new_node.metadata_block = self.metadata_block.model_copy(
            deep=True
        )

    return new_node

node_copy()

Make a deep copy of this node (same as naked_copy for TextNodes).

Source code in lmm/markdown/tree.py
710
711
712
713
714
def node_copy(self) -> 'TextNode':
    """Make a deep copy of this node (same as naked_copy for
    TextNodes)."""

    return self.naked_copy()

set_content(content)

Set the text content of the node.

Parameters:

Name Type Description Default
content str

the new text content.

required
Source code in lmm/markdown/tree.py
740
741
742
743
744
745
746
def set_content(self, content: str) -> None:
    """Set the text content of the node.

    Args:
        content: the new text content.
    """
    self.block.content = content

set_metadata_for_key(key, value)

Set the metadata value at key of the current node.

Parameters:

Name Type Description Default
key str

the key where the value should be set

required
value MetadataValue

the metadata value

required
Source code in lmm/markdown/tree.py
842
843
844
845
846
847
848
849
850
851
def set_metadata_for_key(
    self, key: str, value: MetadataValue
) -> None:
    """Set the metadata value at key of the current node.

    Args:
        key: the key where the value should be set
        value: the metadata value
    """
    self.metadata[key] = value

tree_copy()

Make a deep copy of this node

Source code in lmm/markdown/tree.py
716
717
718
719
720
def tree_copy(self) -> 'TextNode':
    """Make a deep copy of this node"""

    # Copy self
    return self.naked_copy()

blocks_to_tree(blocks, logger=get_logger(__name__))

Builds a tree representation of a list of markdown blocks.

Parameters:

Name Type Description Default
blocks list[Block]

The list of blocks parsed from a markdown file

required

Returns:

Type Description
MarkdownTree

A root node, or None for an empty block list.

Note

conversion to tree of a non-empty block list adds a metadata blocks in front, if missing, and an empty text block after metadata blocks without following text or heading block, to the original list of markdown blocks. If the block list starts with a heading, adds a header with the content of the heading.

Note

the nodes contain references to blocks. To avoid side effects, copy the blocks first:

root = blocks_to_tree(blocklist_copy(blocks))
Source code in lmm/markdown/tree.py
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
def blocks_to_tree(
    blocks: list[Block], logger: LoggerBase = get_logger(__name__)
) -> MarkdownTree:
    """
    Builds a tree representation of a list of markdown blocks.

    Args:
        blocks: The list of blocks parsed from a markdown file

    Returns:
        A root node, or None for an empty block list.

    Note:
        conversion to tree of a non-empty block list adds a
        metadata blocks in front, if missing, and an empty text
        block after metadata blocks without following text or
        heading block, to the original list of markdown blocks.
        If the block list starts with a heading, adds a header
        with the content of the heading.

    Note:
        the nodes contain references to blocks. To avoid side
        effects, copy the blocks first:
        ```
        root = blocks_to_tree(blocklist_copy(blocks))
        ```
    """
    if not blocks:
        return None

    # Report error blocks in logger
    report_error_blocks(blocks, logger)

    # Enforce the first block being header
    header_block: HeaderBlock
    match blocks[0]:
        case HeaderBlock():
            header_block = blocks[0]
        case MetadataBlock() as bl:
            # using private as "protected" member function
            header_block = HeaderBlock._from_metadata_block(bl) # type: ignore
        case HeadingBlock() as bl:
            if bl.get_content():
                header_block = HeaderBlock(
                    content={'title': bl.get_content()}
                )
            else:
                header_block = HeaderBlock.from_default()
            blocks = [header_block] + blocks
        case TextBlock() | ErrorBlock():
            header_block = HeaderBlock.from_default()
            blocks = [header_block] + blocks

    # Create root node as containing a HeadingBlock with the
    # document title as content.
    root_title = str(header_block.content["title"])
    root_block = HeadingBlock(level=0, content=root_title)
    root_node = HeadingNode(root_block)
    root_node.metadata = header_block.content
    # Store the original header block for reconstitution
    root_node.metadata_block = header_block

    current_node = root_node
    current_metadata: MetadataDict | None = None

    # Process remaining blocks
    current_metadata_block: MetadataBlock | None = None

    def _find_appropriate_parent(
        cur_node: HeadingNode, new_heading_level: int
    ) -> HeadingNode:
        """
        Finds the appropriate parent node for a new heading based on
        its level.

        Args:
            cur_node: The current node in the tree
            new_heading_level: The level of the new heading

        Returns:
            The appropriate parent node for the new heading
        """
        # For HeadingNode, we can now safely access the level
        while (
            cur_node.parent
            and cur_node.heading_level() >= new_heading_level
        ):
            cur_node = cur_node.parent
        return cur_node

    for block in blocks[1:]:
        match block:
            case HeadingBlock():
                # Appropriate parent depending on level
                parent = _find_appropriate_parent(
                    current_node, block.level
                )

                new_node = HeadingNode(block)
                if current_metadata:
                    new_node.metadata = current_metadata
                    new_node.metadata_block = current_metadata_block
                    current_metadata = None
                    current_metadata_block = None

                parent.add_child(new_node)
                current_node = new_node

            case MetadataBlock():
                # Handle consecutive metadata blocks
                if current_metadata:
                    # Create empty text node with the metadata
                    empty_text_block = TextBlock(content="")
                    text_node = TextNode(empty_text_block)
                    text_node.metadata = current_metadata
                    text_node.metadata_block = current_metadata_block
                    current_node.add_child(text_node)

                current_metadata = block.content
                current_metadata_block = block

            case TextBlock():
                text_node = TextNode(block)
                if current_metadata:
                    text_node.metadata = current_metadata
                    text_node.metadata_block = current_metadata_block
                    current_metadata = None
                    current_metadata_block = None

                current_node.add_child(text_node)

            case ErrorBlock():
                # Text node that contains the offending text
                text_node = TextNode(block)

                # Add to current node
                current_node.add_child(text_node)

    # Handle any remaining metadata
    if current_metadata:
        empty_text_block = TextBlock(content="")
        text_node = TextNode(empty_text_block)
        text_node.metadata = current_metadata
        text_node.metadata_block = current_metadata_block
        current_node.add_child(text_node)

    return root_node

extract_content(root_node, output_key, extract_func, filter_func=lambda _: True)

Extracts information from children content, processes it, and saves it in the output_key of metadata of parents. The extraction proceeds in post-order traversal to aggregate information bottom- up. To collect information from lower levels, code extract_func to use the information stored in output_key at previous rounds of traversal.

Parameters:

Name Type Description Default
root_node HeadingNode

the heading node from which the traversal starts

required
output_key str

the key of the heading metadata where information is stored

required
extract_func Callable[[Sequence[MarkdownNode]], MetadataValue]

The function to extract functions. Args: a list of MarkdownNodes, returns: a valid metadata value

required
filter_func Callable[[HeadingNode], bool]

A predicate function to filter HeadingNodes to which extract_func is applied.

lambda _: True

Returns:

Type Description
HeadingNode

the root node where the traversal was started.

Note

this is a convenience function to process information in the tree, conceptually equivalent to the extraction of information from neighbours of a graph.

See also

propagate_content: extract information from parents to children.

Example
def count_words(root: MarkdownNode) -> MarkdownNode
    KEY = "wordscount"

    def proc_sum(data: Sequence[MarkdownNode]) -> str:
        buff: list[str] = []
        for d in data:
            match d:
                case TextNode():
                    buff.append(d.get_content())
                case HeadingNode():
                    value=str(d.get_metadata_for_key(KEY, ""))
                    if value:
                        buff.append(value)
                case _:
                    raise RuntimeError("Unrecognized node")

        count: int = len((" ".join(buff)).split())
        return f"There are {count} words."

return extract_content(root, KEY, proc_sum)
Source code in lmm/markdown/tree.py
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
def extract_content(
    root_node: HeadingNode,
    output_key: str,
    extract_func: Callable[[Sequence[MarkdownNode]], MetadataValue],
    filter_func: Callable[[HeadingNode], bool] = lambda _: True,
) -> HeadingNode:
    """Extracts information from children content, processes it, and
    saves it in the output_key of metadata of parents. The extraction
    proceeds in post-order traversal to aggregate information bottom-
    up.
    To collect information from lower levels, code extract_func to
    use the information stored in output_key at previous rounds of
    traversal.

    Args:
        root_node: the heading node from which the traversal starts
        output_key: the key of the heading metadata where information
            is stored
        extract_func: The function to extract functions. Args: a list
            of `MarkdownNode`s, returns: a valid metadata value
        filter_func: A predicate function to filter `HeadingNode`s to
            which `extract_func` is applied.

    Returns:
        the root node where the traversal was started.

    Note:
        this is a convenience function to process information in
        the tree, conceptually equivalent to the extraction of
        information from neighbours of a graph.

    See also:
        `propagate_content`: extract information from parents to
        children.

    Example:
        ```python
        def count_words(root: MarkdownNode) -> MarkdownNode
            KEY = "wordscount"

            def proc_sum(data: Sequence[MarkdownNode]) -> str:
                buff: list[str] = []
                for d in data:
                    match d:
                        case TextNode():
                            buff.append(d.get_content())
                        case HeadingNode():
                            value=str(d.get_metadata_for_key(KEY, ""))
                            if value:
                                buff.append(value)
                        case _:
                            raise RuntimeError("Unrecognized node")

                count: int = len((" ".join(buff)).split())
                return f"There are {count} words."

        return extract_content(root, KEY, proc_sum)
        ```
    """

    def process_node(node: MarkdownNode) -> None:
        if not node.is_heading_node():
            return
        if not filter_func(node): # type: ignore (type checked)
            return

        value: MetadataValue = extract_func(node.children)
        if not node.metadata:
            node.metadata = {}
        node.metadata[output_key] = value

    post_order_traversal(root_node, process_node)
    return root_node

fold_tree(node, fold_func, initial_value, traversal_func=post_order_traversal)

Applies fold_func to accumulate a value across the tree using the specified traversal function. The fold function has no access to the children and parent of the node.

Parameters:

Name Type Description Default
node MarkdownNode

The root node of the tree or subtree

required
fold_func Callable[[U, MarkdownNode], U]

The function to apply to accumulate values

required
initial_value U

The initial value for the accumulation

required
traversal_func TraversalFunc

The traversal function to use (post_order_traversal by default)

post_order_traversal

Returns:

Type Description
U

The accumulated value

Source code in lmm/markdown/tree.py
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
def fold_tree(
    node: MarkdownNode,
    fold_func: Callable[[U, MarkdownNode], U],
    initial_value: U,
    traversal_func: TraversalFunc = post_order_traversal,
) -> U:
    """
    Applies fold_func to accumulate a value across the tree using the
        specified traversal function. The fold function has no
        access to the children and parent of the node.

    Args:
        node: The root node of the tree or subtree
        fold_func: The function to apply to accumulate values
        initial_value: The initial value for the accumulation
        traversal_func: The traversal function to use
            (post_order_traversal by default)

    Returns:
        The accumulated value
    """
    result = [initial_value]

    def accumulate(n: MarkdownNode) -> None:
        result[0] = fold_func(result[0], n.naked_copy())

    traversal_func(node, accumulate)
    return result[0]

get_heading_nodes(root)

Return (references to) heading nodes in tree

Source code in lmm/markdown/tree.py
1469
1470
1471
def get_heading_nodes(root: MarkdownNode) -> list[HeadingNode]:
    """Return (references to) heading nodes in tree"""
    return traverse_tree_nodetype(root, lambda x: x, HeadingNode)

get_text_nodes(root)

Return (references to) text nodes in tree

Source code in lmm/markdown/tree.py
1464
1465
1466
def get_text_nodes(root: MarkdownNode) -> list[TextNode]:
    """Return (references to) text nodes in tree"""
    return traverse_tree_nodetype(root, lambda x: x, TextNode)

load_tree(source, logger)

Load a pandoc markdown file or string into a tree.

This function wraps blocks_to_tree and adds console logging for errors.

Parameters:

Name Type Description Default
source str | Path

Path to a markdown file or a string containing markdown content. If a single-line string without newlines is provided, it's treated as a file path.

required

Returns:

Type Description
MarkdownTree

The root object of the tree.

Source code in lmm/markdown/tree.py
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
def load_tree(source: str | Path, logger: LoggerBase) -> MarkdownTree:
    """Load a pandoc markdown file or string into a tree.

    This function wraps blocks_to_tree and adds console logging for
    errors.

    Args:
        source: Path to a markdown file or a string containing
            markdown content. If a single-line string without newlines
            is provided, it's treated as a file path.

    Returns:
        The root object of the tree.
    """

    # Pure parsing function  (no exceptions raised)
    blocks = load_blocks(source, logger=logger)
    if not blocks:
        return None

    # Enforce constraint on first block
    if not isinstance(blocks[0], HeaderBlock):
        header = HeaderBlock.from_default(str(source))
        blocks = [header] + blocks

    return blocks_to_tree(blocks, logger)

post_order_traversal(node, visit_func)

Performs a post-order traversal of the tree, applying visit_func to each node.

Parameters:

Name Type Description Default
node MarkdownNode

The root node of the tree or subtree

required
visit_func Callable[[MarkdownNode], None]

The function to apply to each node

required
Note

this function may be used with side effects on the tree

Source code in lmm/markdown/tree.py
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
def post_order_traversal(
    node: MarkdownNode, visit_func: Callable[[MarkdownNode], None]
) -> None:
    """
    Performs a post-order traversal of the tree, applying visit_func
        to each node.

    Args:
        node: The root node of the tree or subtree
        visit_func: The function to apply to each node

    Note:
        this function may be used with side effects on the tree
    """
    for child in node.children:
        post_order_traversal(child, visit_func)
    visit_func(node)

pre_order_traversal(node, visit_func)

Performs a pre-order traversal of the tree, applying visit_func to each node.

Parameters:

Name Type Description Default
node MarkdownNode

The root node of the tree or subtree

required
visit_func Callable[[MarkdownNode], None]

The function to apply to each node

required
Note

this function may be used with side effects on the tree

Source code in lmm/markdown/tree.py
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
def pre_order_traversal(
    node: MarkdownNode, visit_func: Callable[[MarkdownNode], None]
) -> None:
    """
    Performs a pre-order traversal of the tree, applying visit_func to
        each node.

    Args:
        node: The root node of the tree or subtree
        visit_func: The function to apply to each node

    Note:
        this function may be used with side effects on the tree
    """
    visit_func(node)
    for child in node.children:
        pre_order_traversal(child, visit_func)

propagate_content(root_node, collect_func, select, filter_func=lambda _: True)

Use information from parent nodes to develop or replace children text nodes. The function traverses the tree top-down in pre-order.

Parameters:

Name Type Description Default
root_node HeadingNode

the heading node from which the traversal starts

required
collect_func Callable[[HeadingNode], str | TextNode]

a function that creates text from a parent node, such as from the metadata of the parent node. Args: the parent node. Returns: a string or a text node. If a string, a text node will be created with the string as content. If a text node, it will be given the heading node as parent. Return a text node if you need to store information in the metadata of the text node. Return a simple string in all other cases. Example of returning a text node: return TextNode.from_content("new content", {'key': "value"})

required
select bool

whether to replace all text children with a new text node child containing the text, or add the text node to the existing children.

required
filter_func Callable[[HeadingNode], bool]

A predicate function selecting heading nodes that will be processed.

lambda _: True

Returns:

Type Description
HeadingNode

the root node where the traversal was started.

Note

This function changes the structure of the children of the tree, but does not change the structure of the parent nodes. For more general operations on the tree structure, call pre_order_traversal directly.

See also

extract_content: propagate information from children to parents; propagate_property (treeutils): example of use

Source code in lmm/markdown/tree.py
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
def propagate_content(
    root_node: HeadingNode,
    collect_func: Callable[[HeadingNode], str | TextNode],
    select: bool,
    filter_func: Callable[[HeadingNode], bool] = lambda _: True,
) -> HeadingNode:
    """Use information from parent nodes to develop or replace
    children text nodes. The function traverses the tree top-down
    in pre-order.

    Args:
        root_node: the heading node from which the traversal starts
        collect_func: a function that creates text from a parent node,
            such as from the metadata of the parent node.
            Args: the parent node.
            Returns: a string or a text node. If a string, a text
                node will be created with the string as content. If
                a text node, it will be given the heading node as
                parent. Return a text node if you need to store
                information in the metadata of the text node. Return
                a simple string in all other cases. Example of
                returning a text node:
                    return TextNode.from_content("new content",
                                                 {'key': "value"})
        select: whether to replace all text children with a new
            text node child containing the text, or add the text
            node to the existing children.
        filter_func: A predicate function selecting heading nodes
            that will be processed.

    Returns:
        the root node where the traversal was started.

    Note:
        This function changes the structure of the children of the
        tree, but does not change the structure of the parent nodes.
        For more general operations on the tree structure, call
        pre_order_traversal directly.

    See also:
        `extract_content`: propagate information from children
        to parents; `propagate_property` (treeutils): example of use
    """

    def process_node(node: MarkdownNode) -> None:
        heading_node: HeadingNode
        if node.is_heading_node():
            heading_node = node  # type: ignore (type checked)
        else:
            return

        if not filter_func(heading_node):
            return

        value: str | TextNode = collect_func(heading_node)
        if isinstance(value, TextNode):
            new_node = value
            new_node.parent = heading_node
        else:
            new_node: TextNode = TextNode.from_content(
                value, {}, heading_node
            )
        if select:
            new_children: list[MarkdownNode] = [new_node]
            for child in node.children:
                if isinstance(child, HeadingNode):
                    new_children.append(child)
            heading_node.children = new_children
        else:
            heading_node.children.insert(0, new_node)

    pre_order_traversal(root_node, process_node)
    return root_node

save_tree(file_name, tree)

Write a markdown tree to a markdown file.

Parameters:

Name Type Description Default
file_name str | Path

Path to the output file (string or Path object)

required
tree MarkdownTree

the node with descendants to be serialized

required
Source code in lmm/markdown/tree.py
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
def save_tree(file_name: str | Path, tree: MarkdownTree) -> None:
    """Write a markdown tree to a markdown file.

    Args:
        file_name: Path to the output file (string or Path object)
        tree: the node with descendants to be serialized
    """
    content = serialize_tree(tree)
    from .ioutils import save_markdown
    from lmm.utils import logger

    save_markdown(file_name, content, logger)

serialize_tree(node)

Serialize a markdown tree to a string.

Parameters:

Name Type Description Default
node MarkdownTree

the node with descendants to be serialized

required
Source code in lmm/markdown/tree.py
1503
1504
1505
1506
1507
1508
1509
1510
def serialize_tree(node: MarkdownTree) -> str:
    """Serialize a markdown tree to a string.

    Args:
        node: the node with descendants to be serialized
    """
    blocks = tree_to_blocks(node)
    return serialize_blocks(blocks)

traverse_tree(node, map_func, filter_func=lambda _: True, traversal_func=pre_order_traversal)

Applies map_func to each node in the tree using the specified traversal function, for nodes satisfying the predicate boolean_func(n). The traversal produces a list of the return type of the map function.

Parameters:

Name Type Description Default
node MarkdownNode

The root node of the tree or subtree

required
map_func Callable[[MarkdownNode], T]

The function to apply to each node

required
filter_func Callable[[MarkdownNode], bool]

A predicate to select the nodes to which the map function will be applied and add to the list

lambda _: True
traversal_func TraversalFunc

The traversal function to use (pre_order_traversal by default)

pre_order_traversal

Returns:

Type Description
list[T]

A list containing the results of applying map_func to each node

Example
def collect_contents(root: MarkdownNode) -> list[str]:
    return traverse_tree(root, lambda n: n.get_content())
Source code in lmm/markdown/tree.py
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
def traverse_tree(
    node: MarkdownNode,
    map_func: Callable[[MarkdownNode], T],
    filter_func: Callable[[MarkdownNode], bool] = lambda _: True,
    traversal_func: TraversalFunc = pre_order_traversal,
) -> list[T]:
    """
    Applies map_func to each node in the tree using the specified
        traversal function, for nodes satisfying the predicate
        boolean_func(n). The traversal produces a list of the
        return type of the map function.

    Args:
        node: The root node of the tree or subtree
        map_func: The function to apply to each node
        filter_func: A predicate to select the nodes to which the
            map function will be applied and add to the list
        traversal_func: The traversal function to use
            (pre_order_traversal by default)

    Returns:
        A list containing the results of applying map_func to
            each node

    Example:
        ```python
        def collect_contents(root: MarkdownNode) -> list[str]:
            return traverse_tree(root, lambda n: n.get_content())
        ```
    """
    result: list[T] = []

    def collect_results(n: MarkdownNode) -> None:
        if filter_func(n):
            result.append(map_func(n))

    traversal_func(node, collect_results)
    return result

traverse_tree_nodetype(node, map_func, node_type, filter_func=lambda _: True, traversal_func=pre_order_traversal)

Applies map_func to each node in the tree using the specified traversal function, for nodes of the specified type. The type must be a subclass of MarkdownNode. The traversal produces a list of the return type of the map function.

Parameters:

Name Type Description Default
node MarkdownNode

The root node of the tree or subtree

required
map_func Callable[[MN], T]

The function to apply to each node

required
node_type type[MN]

The type of nodes to apply map_func to and include in the output list

required
filter_func Callable[[MN], bool]

A predicate function to filter the traversed nodes (defaults to true)

lambda _: True
traversal_func Callable[[MarkdownNode, Callable[[MarkdownNode], None]], None]

The traversal function to use (pre_order_traversal by default)

pre_order_traversal

Returns:

Type Description
list[T]

A list containing the results of applying map_func to nodes of type node_type

Example
def collect_titles(root: MarkdownNode) -> list[str]:
    return traverse_tree_nodetype(root,
                                  lambda n: n.get_content()),
                                  HeadingNode)
Source code in lmm/markdown/tree.py
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
def traverse_tree_nodetype(
    node: MarkdownNode,
    map_func: Callable[[MN], T],
    node_type: type[MN],
    filter_func: Callable[[MN], bool] = lambda _: True,
    traversal_func: Callable[
        [MarkdownNode, Callable[[MarkdownNode], None]], None
    ] = pre_order_traversal,
) -> list[T]:
    """
    Applies map_func to each node in the tree using the specified
        traversal function, for nodes of the specified type. The type
        must be a subclass of MarkdownNode. The traversal produces a
        list of the return type of the map function.

    Args:
        node: The root node of the tree or subtree
        map_func: The function to apply to each node
        node_type: The type of nodes to apply map_func to and include
            in the output list
        filter_func: A predicate function to filter the traversed
            nodes (defaults to true)
        traversal_func: The traversal function to use
            (pre_order_traversal by default)

    Returns:
        A list containing the results of applying map_func to nodes of
            type node_type

    Example:
        ```python
        def collect_titles(root: MarkdownNode) -> list[str]:
            return traverse_tree_nodetype(root,
                                          lambda n: n.get_content()),
                                          HeadingNode)
        ```
    """
    result: list[T] = []

    def collect_results(n: MarkdownNode) -> None:
        if isinstance(n, node_type) and filter_func(n):
            # If n is of the correct type, apply map_func
            result.append(map_func(n))

    traversal_func(node, collect_results)
    return result

tree_copy(root)

Make a deep copy of a non-empty tree or subtree. Returns: a root node with a copy of the tree.

Source code in lmm/markdown/tree.py
1135
1136
1137
1138
1139
1140
def tree_copy(root: MarkdownNode) -> MarkdownNode:
    """Make a deep copy of a non-empty tree or subtree.
    Returns:
        a root node with a copy of the tree.
    """
    return root.tree_copy()

tree_to_blocks(root_node)

Reconstitutes the original block list from the tree representation.

Parameters:

Name Type Description Default
root_node MarkdownNode | MarkdownTree

The root node of the tree

required

Returns:

Type Description
list[Block]

The reconstituted list of blocks.

Note

the blocks contain references to node components. To avoid side effects, copy the tree first:

tree_to_blocks(tree_copy(root_node))
Source code in lmm/markdown/tree.py
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
def tree_to_blocks(
    root_node: MarkdownNode | MarkdownTree,
) -> list[Block]:
    """
    Reconstitutes the original block list from the tree
        representation.

    Args:
        root_node: The root node of the tree

    Returns:
        The reconstituted list of blocks.

    Note:
        the blocks contain references to node components. To
        avoid side effects, copy the tree first:
        ```python
        tree_to_blocks(tree_copy(root_node))
        ```
    """
    if not root_node:
        return []

    blocks: list[Block] = []

    # Special handling for the root node
    # The root node is artificial (created from the header metadata)
    # So we only add the header metadata, not the heading itself
    if root_node.is_header_node():
        if root_node.metadata_block:
            newblock = root_node.metadata_block
            if root_node.metadata:
                newblock.content = root_node.metadata
            blocks.append(newblock)
        elif root_node.metadata:
            blocks.append(HeaderBlock(content=root_node.metadata))
        else:
            blocks.append(HeaderBlock.from_default())

    # Process all child nodes
    def process_node(node: MarkdownNode) -> None:
        # Skip the root node as it's handled separately
        if node == root_node:
            return

        match node:
            case HeadingNode():
                # For heading nodes, first add metadata if present
                if node.metadata_block:
                    newblock = node.metadata_block
                    if node.metadata:
                        newblock.content = node.metadata
                    blocks.append(newblock)
                elif node.metadata:
                    blocks.append(
                        MetadataBlock(content=node.metadata)
                    )
                else:
                    pass
                # Then add the heading block
                blocks.append(node.block)
            case TextNode():
                # For text nodes, first add metadata if present
                if node.metadata_block:
                    newblock = node.metadata_block
                    if node.metadata:
                        newblock.content = node.metadata
                    blocks.append(newblock)
                elif node.metadata:
                    blocks.append(
                        MetadataBlock(content=node.metadata)
                    )
                else:
                    pass
                # Then add the text block
                blocks.append(node.block)
            case MarkdownNode():
                # workaround type check limitation
                raise RuntimeError(
                    "Unreacheable code reached: instance of abstract type"
                )

    # Perform pre-order traversal
    pre_order_traversal(root_node, process_node)

    return blocks

Tree utils

Utility functions to extract information from markdown trees.

Main functions

Aggregate: Exchange information between child and parent nodes. These functions revise the tree and have side effects on the tree. inherit_metadata, inherit_parent_properties, bequeath_properties Traversal: Traverse the tree and extract information, allowing inheritance of metadata directly or through a function. No side effects. collect_text, collect_headings, collect_dictionaries, collect_table_of_contents, collect_annotated_textblocks Select nodes: Select nodes from tree. get_nodes, get_textnodes, get_headingnodes, get_nodes_with_metadata Map: Apply a function to each node in the tree. Has side effects on the tree. pre_order_map_tree, post_order_map_tree Fold: count_words Prune: Remove nodes from tree. Has side effects on the tree. prune_tree

Behaviour

Except for inherit_parent_properties, which validates input and raises a ValueError exception, these function do not raise exceptions. Functions are generally pure with respect to the tree unless noted otherwise (e.g., Aggregate, Map, Prune functions have side effects on the input nodes).

bequeath_properties(node, keys, new_keys=[], inherit=False, include_header=False, filter_func=lambda _: True)

Extract a property from the metadata of a heading node and bequeeth it into the child text nodes.

Parameters:

Name Type Description Default
node HeadingNode

the root or branch node to work on

required
keys list[str]

the properties to be given to the text nodes

required
new_keys list[str]

the new key names

[]
inherit bool

keys are bequeethed to grandchildren down the tree.

False
include_header bool

includer header node in the pool of nodes that bequeath properties to children

False
filter_func Callable[[HeadingNode], bool]

a predicate to filter the heading nodes that bequeath properties.

lambda _: True

Returns:

Type Description
HeadingNode

the root node of the modified branch

Note

If new_keys is empty, the properties will retain their original names in the child nodes.

Source code in lmm/markdown/treeutils.py
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
def bequeath_properties(
    node: HeadingNode,
    keys: list[str],
    new_keys: list[str] = [],
    inherit: bool = False,
    include_header: bool = False,
    filter_func: Callable[[HeadingNode], bool] = lambda _: True,
) -> HeadingNode:
    """Extract a property from the metadata of a heading node and
    bequeeth it into the child text nodes.

    Args:
        node: the root or branch node to work on
        keys: the properties to be given to the text nodes
        new_keys: the new key names
        inherit: keys are bequeethed to grandchildren down
            the tree.
        include_header: includer header node in the pool of
            nodes that bequeath properties to children
        filter_func: a predicate to filter the heading nodes
            that bequeath properties.

    Returns:
        the root node of the modified branch

    Note:
        If `new_keys` is empty, the properties will retain their
        original names in the child nodes.
    """

    if not (new_keys):
        new_keys = keys

    def process_node(n: HeadingNode) -> HeadingNode:
        for child in n.get_text_children():
            for k, nk in zip(keys, new_keys):
                value: MetadataValue = (
                    child.fetch_metadata_for_key(k, include_header)
                    if inherit
                    else child.get_metadata_for_key(k)
                )
                if value:
                    child.set_metadata_for_key(nk, value)
        return n

    traverse_tree_nodetype(
        node,
        process_node,
        HeadingNode,
        filter_func,
        post_order_traversal,
    )
    return node

collect_annotated_textblocks(root, inherit=True, include_header=False, filter_func=lambda _: True)

Unfold the tree to a block list, replacing headings with metadata blocks, such that text blocks are annotated by the metadata of the parent heading. If inherit is true, inherited metadata are sought up to the first parent heading with parent heading with metadata. If include_header is true, the header is considered as a source of metadata.

Parameters:

Name Type Description Default
root MarkdownTree

the tree

required
inherit bool

metadata keys are inherited from parent if True

True
include_header bool

if to inherit from header

False
filter_func Callable[[TextNode], bool]

a predicate to filter text nodes to include in the final block list.

lambda _: True

Returns:

Type Description
list[Block]

a block list

Note

the metadata_block members, corresponding to the private_ member of MetadataBlocks, are ignored.

Source code in lmm/markdown/treeutils.py
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
def collect_annotated_textblocks(
    root: MarkdownTree,
    inherit: bool = True,
    include_header: bool = False,
    filter_func: Callable[[TextNode], bool] = lambda _: True,
) -> list[Block]:
    """Unfold the tree to a block list, replacing headings with
    metadata blocks, such that text blocks are annotated by the
    metadata of the parent heading. If inherit is true, inherited
    metadata are sought up to the first parent heading with parent
    heading with metadata. If include_header is true, the header
    is considered as a source of metadata.

    Args:
        root: the tree
        inherit: metadata keys are inherited from parent if True
        include_header: if to inherit from header
        filter_func: a predicate to filter text nodes to include
            in the final block list.

    Returns:
        a block list

    Note:
        the metadata_block members, corresponding to the
        `private_` member of MetadataBlocks, are ignored.
    """

    if not root:
        return []

    dicts: list[NodeDict] = traverse_tree_nodetype(
        root.tree_copy(),
        lambda x: x.as_dict(inherit, include_header),
        TextNode,
        filter_func,
    )
    blocks: list[Block] = []
    for d in dicts:
        if 'metadata' in d:
            blocks.append(MetadataBlock._from_dict(d['metadata']))
        blocks.append(TextBlock(content=str(d['content'])))
    return blocks

collect_dictionaries(root, filter_func=lambda _: True, map_func=lambda x: x.as_dict(True, False))

Unfold a tree or branch into a list of dictionaries, containing the node content and its metadata, or a selection of metadata as specified by map_func, optionally filtered by filter func.

Parameters:

Name Type Description Default
root MarkdownTree

the root node of the tree or branch

required
map_func opt

a function mapping a MarkdownNode to a dictionary with text content and metadata fields. The default copies the content and the metadata of the node, inheriting from parents until a metadata is found, but excluding the header.

lambda x: as_dict(True, False)
filter_func opt

a function that filters the nodes to which map_func is applied.

lambda _: True

a list of dictionaries with key 'content' (the text) and

Type Description
list[NodeDict]

'metadata', or with the keys specified by map_func

Note

Use map_func to collect information from the parents of the node hierarchically.

Source code in lmm/markdown/treeutils.py
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
def collect_dictionaries(
    root: MarkdownTree,
    filter_func: Callable[[MarkdownNode], bool] = lambda _: True,
    map_func: Callable[
        [MarkdownNode], NodeDict
    ] = lambda x: x.as_dict(True, False),
) -> list[NodeDict]:
    """Unfold a tree or branch into a list of dictionaries,
    containing the node content and its metadata, or a selection of
    metadata as specified by map_func, optionally filtered by filter
    func.

    Args:
        root: the root node of the tree or branch
        map_func (opt): a function mapping a MarkdownNode to a
            dictionary with text content and metadata fields. The
            default copies the content and the metadata of the node,
            inheriting from parents until a metadata is found,
            but excluding the header.
        filter_func (opt): a function that filters the nodes to which
            map_func is applied.

    Returns: a list of dictionaries with key 'content' (the text) and
        'metadata', or with the keys specified by map_func

    Note:
        Use `map_func` to collect information from the parents of
        the node hierarchically.
    """
    if not root:
        return []

    return traverse_tree(root.tree_copy(), map_func, filter_func)

collect_headings(root, filter_func=lambda _: True)

Collect all heading text of the node and its descendants.

Parameters:

Name Type Description Default
root MarkdownTree

The root node of the tree, or a heading node

required
filter_func Callable[[HeadingNode], bool]

an optional filter on the node

lambda _: True

Returns:

Type Description
list[str]

A list of the accumulated text.

Source code in lmm/markdown/treeutils.py
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
def collect_headings(
    root: MarkdownTree,
    filter_func: Callable[[HeadingNode], bool] = lambda _: True,
) -> list[str]:
    """
    Collect all heading text of the node and its descendants.

    Args:
        root: The root node of the tree, or a heading node
        filter_func: an optional filter on the node

    Returns:
        A list of the accumulated text.
    """

    if not root:
        return []

    mapf: Callable[[HeadingNode], str] = lambda x: x.get_content()
    return traverse_tree_nodetype(
        root, mapf, HeadingNode, filter_func
    )

collect_table_of_contents(root)

A specialized collect function to extract a table of contents from the markdown tree.

Parameters:

Name Type Description Default
root MarkdownNode

The root node of the tree

required

Returns:

Type Description
list[dict[str, int | str]]

A list of dictionaries representing the table of contents

Source code in lmm/markdown/treeutils.py
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
def collect_table_of_contents(
    root: MarkdownNode,
) -> list[dict[str, int | str]]:
    """
    A specialized collect function to extract a table of contents
    from the markdown tree.

    Args:
        root: The root node of the tree

    Returns:
        A list of dictionaries representing the table of contents
    """

    def collect_headings(node: HeadingNode) -> dict[str, int | str]:
        return {
            'level': node.heading_level(),
            'content': node.get_content(),
        }

    return traverse_tree_nodetype(root, collect_headings, HeadingNode)

collect_text(root, filter_func=lambda _: True)

Collect all text in the text node descendants of the node.

Parameters:

Name Type Description Default
root MarkdownTree

The root node of the tree, or a heading node

required
filter_func Callable[[TextNode], bool]

an optional filter on the node

lambda _: True

Returns:

Type Description
list[str]

A list containing the accumulated text.

Source code in lmm/markdown/treeutils.py
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
def collect_text(
    root: MarkdownTree,
    filter_func: Callable[[TextNode], bool] = lambda _: True,
) -> list[str]:
    """
    Collect all text in the text node descendants of the node.

    Args:
        root: The root node of the tree, or a heading node
        filter_func: an optional filter on the node

    Returns:
        A list containing the accumulated text.
    """

    if not root:
        return []

    mapf: Callable[[TextNode], str] = lambda x: x.get_content()
    return traverse_tree_nodetype(root, mapf, TextNode, filter_func)

count_words(root)

Count the total number of words in the tree representation of the Markdown document.

Parameters:

Name Type Description Default
root MarkdownTree

The root node of the tree

required

Returns:

Type Description
int

The total number of words in the document

Source code in lmm/markdown/treeutils.py
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
def count_words(root: MarkdownTree) -> int:
    """
    Count the total number of words in the tree representation
     of the Markdown document.

    Args:
        root: The root node of the tree

    Returns:
        The total number of words in the document
    """

    if not root:
        return 0

    def count_words_in_node(node: MarkdownNode) -> int:
        if isinstance(node.block, TextBlock):
            return len(node.get_content().split())
        else:
            return 0

    return fold_tree(
        root, lambda acc, node: acc + count_words_in_node(node), 0
    )

get_headingnodes(root, naked_copy=True, filter_func=lambda _: True)

Find all heading nodes, or all heading nodes that satisfy a predicate function

Parameters:

Name Type Description Default
root MarkdownNode

The root node of the tree

required
naked_copy bool

if True, naked copy, otherwise reference (defaults to True)

True
filter_func Callable[[HeadingNode], bool]

a predicate to select the heading nodes

lambda _: True

Returns:

Type Description
list[HeadingNode]

A list of the heading nodes, or of those where filter_func is

list[HeadingNode]

true

Source code in lmm/markdown/treeutils.py
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
def get_headingnodes(
    root: MarkdownNode,
    naked_copy: bool = True,
    filter_func: Callable[[HeadingNode], bool] = lambda _: True,
) -> list[HeadingNode]:
    """
    Find all heading nodes, or all heading nodes that satisfy a
    predicate function

    Args:
        root: The root node of the tree
        naked_copy: if True, naked copy, otherwise reference (defaults
            to True)
        filter_func: a predicate to select the heading nodes

    Returns:
        A list of the heading nodes, or of those where filter_func is
        true
    """

    return get_nodes(root, naked_copy, HeadingNode, filter_func)

get_nodes(root, naked_copy=True, node_type=MarkdownNode, filter_func=lambda _: True)

Find all nodes of node_type, or all such nodes that satisfy a predicate function

Parameters:

Name Type Description Default
root MarkdownNode

The root node of the tree

required
naked_copy bool

naked copy is a deep copy of a node taken off the tree (without parent or children); if False, gives a reference. Defaults to True.

True
node_type type[MN]

the type of node to select (default to all)

MarkdownNode
filter_func Callable[[MN], bool]

a predicate to select the nodes (defaults to all nodes)

lambda _: True

Returns:

Type Description
list[MN]

A list of the text nodes, or of those where filter_func is

list[MN]

true

See also

get_text_nodes, get_heading_nodes, get_nodes_with_metadata for examples of uses

Source code in lmm/markdown/treeutils.py
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
def get_nodes(
    root: MarkdownNode,
    naked_copy: bool = True,
    node_type: type[MN] = MarkdownNode,
    filter_func: Callable[[MN], bool] = lambda _: True,
) -> list[MN]:
    """
    Find all nodes of node_type, or all such nodes that satisfy a
    predicate function

    Args:
        root: The root node of the tree
        naked_copy: naked copy is a deep copy of a node taken off the
            tree (without parent or children); if False, gives a
            reference. Defaults to True.
        node_type: the type of node to select (default to all)
        filter_func: a predicate to select the nodes (defaults
            to all nodes)

    Returns:
        A list of the text nodes, or of those where filter_func is
        true

    See also:
        `get_text_nodes`, `get_heading_nodes`,
        `get_nodes_with_metadata` for examples of uses
    """
    nodes: list[MN]
    if naked_copy:
        nodes = traverse_tree_nodetype(
            root, lambda x: x.naked_copy(), node_type, filter_func
        )
    else:
        nodes = traverse_tree_nodetype(
            root, lambda x: x, node_type, filter_func
        )
    return nodes

get_nodes_with_metadata(root, metadata_key, node_type=MarkdownNode, naked_copy=True)

Find all nodes that have a specific metadata key.

Parameters:

Name Type Description Default
root MarkdownNode

The root node of the tree

required
metadata_key str

The metadata key to search for

required
node_type type[MN]

the node type, defaults to MarkdownNode

MarkdownNode
naked_copy bool

if True, naked copy, otherwise reference (defaults to True)

True

Returns:

Type Description
list[MN]

A list of nodes that have the specified metadata key

Source code in lmm/markdown/treeutils.py
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
def get_nodes_with_metadata(
    root: MarkdownNode,
    metadata_key: str,
    node_type: type[MN] = MarkdownNode,
    naked_copy: bool = True,
) -> list[MN]:
    """
    Find all nodes that have a specific metadata key.

    Args:
        root: The root node of the tree
        metadata_key: The metadata key to search for
        node_type: the node type, defaults to MarkdownNode
        naked_copy: if True, naked copy, otherwise reference (defaults
            to True)

    Returns:
        A list of nodes that have the specified metadata key
    """

    return get_nodes(
        root,
        naked_copy,
        node_type,
        lambda x: bool(x.metadata) and metadata_key in x.metadata,
    )

get_textnodes(root, naked_copy=True, filter_func=lambda _: True)

Find all text nodes, or all text nodes that satisfy a predicate function

Parameters:

Name Type Description Default
root MarkdownNode

The root node of the tree

required
naked_copy bool

if True, naked copy, otherwise reference (defaults to True)

True
filter_func Callable[[TextNode], bool]

a predicate to select the text nodes

lambda _: True

Returns:

Type Description
list[TextNode]

A list of the text nodes, or of those where filter_func is

list[TextNode]

true

Source code in lmm/markdown/treeutils.py
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
def get_textnodes(
    root: MarkdownNode,
    naked_copy: bool = True,
    filter_func: Callable[[TextNode], bool] = lambda _: True,
) -> list[TextNode]:
    """
    Find all text nodes, or all text nodes that satisfy a predicate
    function

    Args:
        root: The root node of the tree
        naked_copy: if True, naked copy, otherwise reference (defaults
            to True)
        filter_func: a predicate to select the text nodes

    Returns:
        A list of the text nodes, or of those where filter_func is
        true
    """

    return get_nodes(root, naked_copy, TextNode, filter_func)

inherit_metadata(node, exclude, inherit=True, include_header=False, filter_func=lambda _: True)

Copy the metadata of headings into those of children nodes. If the metadata property is already defined on the child, no copy is made.

Parameters:

Name Type Description Default
node MarkdownNode

the root node of the branch to work on

required
exclude list[str]

a list of keys that are not inherited.

required
inherit bool

inheritance goes up the hierarchy to the first heading with metadata

True
include_header bool

if inherit, whether to consider the header for inheritance

False
filter_func Callable[[MarkdownNode], bool]

a filter for the children nodes to process.

lambda _: True

Returns:

Type Description
MarkdownNode

the modified branch

Note

when inherit is true, the inherit search stops at the first parent heading that has any metadata.

Source code in lmm/markdown/treeutils.py
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
def inherit_metadata(
    node: MarkdownNode,
    exclude: list[str],
    inherit: bool = True,
    include_header: bool = False,
    filter_func: Callable[[MarkdownNode], bool] = lambda _: True,
) -> MarkdownNode:
    """Copy the metadata of headings into those of children nodes. If
    the metadata property is already defined on the child, no copy is
    made.

    Args:
        node: the root node of the branch to work on
        exclude: a list of keys that are not inherited.
        inherit: inheritance goes up the hierarchy to the first
            heading with metadata
        include_header: if inherit, whether to consider the header
            for inheritance
        filter_func: a filter for the children nodes to process.

    Returns:
        the modified branch

    Note:
        when inherit is true, the inherit search stops at the first
        parent heading that has any metadata.
    """

    def _meta_from_parent(n: MarkdownNode) -> None:
        if not filter_func(n):
            return

        meta = n.get_metadata()  # get_metadata() delivers copy
        parent = n.parent
        if parent:
            if inherit:
                parentmeta = parent.fetch_metadata(
                    None, include_header
                )
            else:
                parentmeta = parent.get_metadata()
            for k in parentmeta.keys():
                if k in exclude:
                    continue
                if k not in meta:
                    meta[k] = parentmeta[k]
        n.metadata = meta

    pre_order_traversal(node, _meta_from_parent)
    return node

inherit_parent_properties(node, properties, destination_names, include_header=False, filter_func=lambda _: True)

Copy specified metadata properties from parent to the meta- data of its immediate children.

Parameters:

Name Type Description Default
node MarkdownNode

the root node of the branch to work on

required
properties list[str]

a list of metadata properties to copy

required
destination_names list[str] | None

the names of the keys to copy the properties into. If None, the same key names of the parent are used

required
include_header bool

also inherit properties of header

False
filter_func Callable[[MarkdownNode], bool]

a filter for the children nodes to process.

lambda _: True

Returns:

Type Description
MarkdownNode

the modified node (with side effects on its children)

Behaviour

raises ValueError if destination_names is not Null and is not of the same length as properties.

Note

Properties are inherited from the immediate parent only. This function does not propagate properties recursively from grandparents or higher ancestors in a single pass.

Source code in lmm/markdown/treeutils.py
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
def inherit_parent_properties(
    node: MarkdownNode,
    properties: list[str],
    destination_names: list[str] | None,
    include_header: bool = False,
    filter_func: Callable[[MarkdownNode], bool] = lambda _: True,
) -> MarkdownNode:
    """ Copy specified metadata properties from parent to the meta-
    data of its immediate children. 

    Args:
        node: the root node of the branch to work on
        properties: a list of metadata properties to copy
        destination_names: the names of the keys to copy the 
            properties into. If None, the same key names of the 
            parent are used
        include_header: also inherit properties of header
        filter_func: a filter for the children nodes to process.

    Returns:
        the modified node (with side effects on its children)

    Behaviour:
        raises ValueError if `destination_names` is not Null and is
        not of the same length as `properties`.

    Note:
        Properties are inherited from the immediate parent only. This
        function does not propagate properties recursively from
        grandparents or higher ancestors in a single pass.
    """
    if destination_names is not None:
        if not len(destination_names) == len(properties):
            raise ValueError(
                f"inherit_parent_property: "
                "destination_names is of length "
                f"{len(destination_names)}, but properties"
                f" is of length {len(properties)}"
            )
    else:
        destination_names = properties

    def _add_parent_summary(n: MarkdownNode) -> None:
        if not filter_func(n):
            return
        if n.parent:
            for p, d in zip(properties, destination_names):
                property: str | None = n.parent.get_metadata_string_for_key(p)
                if property:
                    n.set_metadata_for_key(d, property)

    post_order_traversal(node, _add_parent_summary)
    return node

post_order_map_tree(node, map_func)

Applies map_func in post-order to the nodes of the tree.

Parameters:

Name Type Description Default
node MarkdownNode

The root node of the tree or subtree

required
map_func Callable[[MarkdownNode], MarkdownNode]

The function to apply to each node that returns a new node

required

Returns:

Type Description
MarkdownNode

A new tree with the same structure, but transformed by map_func

Note

Make a deep copy of the root node prior to calling this function to prevent side effects: post_order_map(node.tree_copy())

Source code in lmm/markdown/treeutils.py
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
def post_order_map_tree(
    node: MarkdownNode,
    map_func: Callable[[MarkdownNode], MarkdownNode],
) -> MarkdownNode:
    """
    Applies map_func in post-order to the nodes of the tree.

    Args:
        node: The root node of the tree or subtree
        map_func: The function to apply to each node that returns a
            new node

    Returns:
        A new tree with the same structure, but transformed by
            map_func

    Related functions:
        post_order_traversal: has the same purpose, but with a
            different parameter function signature
            'Callable[[MarkdownNode], None]` and return type None

    Note:
        Make a deep copy of the root node prior to calling this
        function to prevent side effects:
        `post_order_map(node.tree_copy())`
    """
    node.children = [
        post_order_map_tree(child, map_func)
        for child in node.children
    ]
    return map_func(node)

pre_order_map_tree(node, map_func)

Applies map_func in pre-order to the nodes of the tree.

Parameters:

Name Type Description Default
node MarkdownNode

The root node of the tree or subtree

required
map_func Callable[[MarkdownNode], MarkdownNode]

The function to apply to each node that returns a new node

required

Returns:

Type Description
MarkdownNode

A new tree with the same structure, but transformed by map_func

Note

Make a deep copy of the root node prior to calling this function to prevent side effects: 'pre_order_map(node.tree_copy())'

Source code in lmm/markdown/treeutils.py
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
def pre_order_map_tree(
    node: MarkdownNode,
    map_func: Callable[[MarkdownNode], MarkdownNode],
) -> MarkdownNode:
    """
    Applies map_func in pre-order to the nodes of the tree.

    Args:
        node: The root node of the tree or subtree
        map_func: The function to apply to each node that returns a
            new node

    Returns:
        A new tree with the same structure, but transformed by
            map_func

    Related functions:
        pre_order_traversal: has the same purpose, but with a
            different parameter function signature
            'Callable[[MarkdownNode], None]` and return type None

    Note:
        Make a deep copy of the root node prior to calling this
        function to prevent side effects:
        'pre_order_map(node.tree_copy())'
    """

    mapped_node = map_func(node)
    mapped_node.children = [
        pre_order_map_tree(child, map_func) for child in node.children
    ]

    return mapped_node

print_tree_info(node, indent=0)

Print information about a node and its descendants, including its type, content, and metadata.

Parameters:

Name Type Description Default
node MarkdownNode

The node to print information for

required
indent int

The indentation level for pretty printing

0

Returns:

Type Description
None

None

Source code in lmm/markdown/treeutils.py
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
def print_tree_info(node: MarkdownNode, indent: int = 0) -> None:
    """
    Print information about a node and its descendants, including its
    type, content, and metadata.

    Args:
        node: The node to print information for
        indent: The indentation level for pretty printing

    Returns:
        None
    """
    indent_str = "  " * indent

    # Print node type and content
    if isinstance(node, HeadingNode):
        print(
            f"{indent_str}Heading (Level {node.heading_level()}): "
            + f"{node.get_content()}"
        )
    elif isinstance(node, TextNode):
        content = node.get_content()
        if len(content) > 50:
            content = content[:47] + "..."
        print(f"{indent_str}Text: {content}")
    else:
        print(f"{indent_str}Other: {type(node.block)}")

    # Print metadata
    if node.metadata:
        print(f"{indent_str}  Metadata: {node.metadata}")

    # Print title_ from effective metadata (if it exists)
    title_path = node.fetch_metadata_for_key('title')
    if title_path:
        print(f"{indent_str}  Effective title path: {title_path}")

    # Print children recursively
    for child in node.children:
        print_tree_info(child, indent + 1)

propagate_property(node, key, *, inherited_keys=[], add_key_info=True, select=False)

Extract a property from the metadata of a heading node and transform it into a child text node with content given by that property.

Parameters:

Name Type Description Default
node HeadingNode

the root or branch node to work on

required
key str

the property to be moved into a text node

required
inherited_keys list[str]

the keys that the new text node inherits

[]
add_key_info bool

if True, the metadata of the added text child node will have a 'type' property with the value of the transferred property.

True
select bool

if True, replaces all children of the heading node with the new node containing the property of the metadata. If the heading node has no such property, then the text children are not altered.

False

Returns:

Type Description
HeadingNode

the root node of the modified branch

Note

This function changes the structure of the tree.

Source code in lmm/markdown/treeutils.py
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
def propagate_property(
    node: HeadingNode,
    key: str,
    *,
    inherited_keys: list[str] = [],
    add_key_info: bool = True,
    select: bool = False,
) -> HeadingNode:
    """Extract a property from the metadata of a heading node and
    transform it into a child text node with content given by that
    property.

    Args:
        node: the root or branch node to work on
        key: the property to be moved into a text node
        inherited_keys: the keys that the new text node inherits
        add_key_info: if True, the metadata of the added text child
            node will have a 'type' property with the value of the
            transferred property.
        select: if True, replaces all children of the heading node
            with the new node containing the property of the metadata.
            If the heading node has no such property, then the text
            children are not altered.

    Returns:
        the root node of the modified branch

    Note:
        This function changes the structure of the tree.
    """

    def process_node(n: HeadingNode) -> TextNode:
        node: TextNode = TextNode.from_content(
            content=str(n.metadata.pop(key)),
            metadata={'type': key} if add_key_info else {},
        )
        for k in inherited_keys:
            if n.has_metadata_key(k):
                node.set_metadata_for_key(
                    k, n.get_metadata_for_key(k)
                )
        return node

    return propagate_content(
        node, process_node, select, lambda n: key in n.metadata.keys()
    )

prune_tree(node, filter_func)

Prune all nodes of the tree that do not satisfy the predicate filter_func.

Parameters:

Name Type Description Default
node MarkdownTree

The root node of the tree or branch to prune

required
filter_func Callable[[MarkdownNode], bool]

A predicate function that returns True if a node should be kept, and False if it should be pruned.

required

Returns:

Type Description
MarkdownTree

The root of a new tree copy with pruned nodes, or None if the

MarkdownTree

root itself is pruned.

Source code in lmm/markdown/treeutils.py
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
def prune_tree(
    node: MarkdownTree,
    filter_func: Callable[[MarkdownNode], bool],
) -> MarkdownTree:
    """
    Prune all nodes of the tree that do not satisfy the predicate
    filter_func.

    Args:
        node: The root node of the tree or branch to prune
        filter_func: A predicate function that returns True if a 
            node should be kept, and False if it should be pruned.

    Returns:
        The root of a new tree copy with pruned nodes, or None if the 
        root itself is pruned.
    """
    if node is None:
        return None

    if not filter_func(node):
        return None

    def _visit_func(n: MarkdownNode) -> None:
        survivors: list[MarkdownNode] = []
        for child in n.children:
            if filter_func(child):
                survivors.append(child)
        n.children = survivors

    root = node.tree_copy()
    pre_order_traversal(root, _visit_func)
    return root