Utilities
Utilities to index text.
Main functions
base_hash: a hash generation function for strings generate_uuid: a UUID represntation of a string generate_random_string: a random string of required length
Behaviour
these functions are supposedly pure.
base_hash(input_string)
Generate human-readable hash to check changes in strings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_string
|
str
|
an input string |
required |
Returns:
| Type | Description |
|---|---|
str
|
a hash string |
Source code in lmm/utils/hash.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | |
generate_random_string(length=18)
Generates a random string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
length
|
int
|
the length of the random string (defaults to 18 characters). |
18
|
Returns:
| Type | Description |
|---|---|
str
|
a random string of the required length. |
Source code in lmm/utils/hash.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | |
generate_uuid(text_input, namespace_uuid=uuid.NAMESPACE_URL)
Generates a UUID Version 5 from a given text string using a specified namespace.
UUID v5 is based on SHA-1 hashing, ensuring that the same text input with the same namespace will always produce the same UUID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text_input
|
str
|
The string from which to generate the UUID. |
required |
namespace_uuid
|
UUID object
|
The namespace UUID. Defaults to uuid.NAMESPACE_URL. You can use other predefined namespaces (e.g., uuid.NAMESPACE_DNS) or define your own. |
NAMESPACE_URL
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
The generated UUID v5 as a hyphenated string (36 chars). |
Source code in lmm/utils/hash.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | |
Utilities to read/write to/from disc and print errors to console. Errors are not propagated, but functions return null value.
append_postfix_to_filename(filename, postfix)
Appends a postfix string to the name of a file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
The original name of the file (e.g., "my_document.txt"). |
required |
postfix
|
str
|
The string to append (e.g., "_new"). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
The new filename with the postfix appended. |
Source code in lmm/utils/ioutils.py
197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 | |
check_allowed_content(input_string, allowed_list)
Extracts strings delimited by single quotes from input_string and checks if any of them are in the allowed_list.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_string
|
str
|
The string to extract quoted content from. |
required |
allowed_list
|
list[str]
|
List of strings to check against. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if any extracted string is in allowed_list, False otherwise. |
Source code in lmm/utils/ioutils.py
318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 | |
clean_text_concat(text_segments)
Concatenates a list of strings, merging overlapping tails/heads if the overlap constitutes at least one whole word.
The merge condition requires: 1. The tail of text A matches the head of text B. 2. The match represents a complete word boundary on both sides: - The character preceding the overlap in A must not be alphanumeric (or A starts with the overlap). - The character following the overlap in B must not be alphanumeric (or B ends with the overlap). 3. The overlap contains at least one alphanumeric character (to ensure it's "at least a word" and not just whitespace/punctuation).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text_segments
|
list[str]
|
A list of strings to concatenate. |
required |
Returns:
| Type | Description |
|---|---|
str
|
A single concatenated string with overlaps merged. |
Source code in lmm/utils/ioutils.py
350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 | |
create_interface(f, argv)
Waits for Enter key presses and handles Ctrl-C to enable interactive execution of the function f and for debugging. The first command-line argument is the markdown file on which the module acts. An optional second command-line argument is the file to which changes are saved. A third command line argument, if True, creates a loop for interactive editing.
Source code in lmm/utils/ioutils.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | |
list_files_with_extensions(folder_path, extensions)
Lists all files in a given folder that match a set of specified extensions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
folder_path
|
str | Path
|
The full path to the folder to search. |
required |
extensions
|
str | list[str]
|
A single semicolon-separated string of file extensions (e.g., ".txt;.md;py") OR a standard list of strings (e.g., ['.txt', 'md']). Extensions may or may not start with a dot. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
A list of full paths (as strings) for all matching files. Returns an |
list[str]
|
empty list if no files are found. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the specified folder_path does not exist. |
NotADirectoryError
|
If the specified folder_path is not a directory. |
ValueError
|
If the extensions string contains invalid characters for a filename. |
Source code in lmm/utils/ioutils.py
228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 | |
parse_external_boolean(value)
Sanitize externally given boolean
Source code in lmm/utils/ioutils.py
216 217 218 219 220 221 222 223 224 225 | |
process_string_quotes(input_string)
Processes a string to ensure consistent internal quoting.
Rules: - If the string contains the character ", except for the first and last character, replace it with ' and make sure the string starts and ends with ". - If the string contains the character ', make sure the string starts and ends with ".
In short, the quote should create a string that can internally quote text with a consistent approach, starting from a string that may do so using different ways.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_string
|
str
|
The string to be processed. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The processed string with consistent quoting. |
Source code in lmm/utils/ioutils.py
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 | |
string_to_path_or_string(input_string)
Takes a string as argument. If the string is one line, checks that the string codes for an existing file. If so, it returns a Path object for that file. Otherwise, it returns the string.
A string is considered one line if it contains no newlines, or if it only has a single trailing newline character.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_string
|
str
|
The input string to check |
required |
Returns:
| Type | Description |
|---|---|
Path | str
|
Path object if the string represents an existing file, |
Path | str
|
otherwise the original string |
Source code in lmm/utils/ioutils.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | |
validate_file(source, logger=logger)
Returns: None for failure, Path object otherwise
Source code in lmm/utils/ioutils.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | |
Centralized logging configuration for the ML Markdown project.
This module provides a standardized way to configure and use Python's logging module across the entire project. It ensures consistent log formatting, appropriate log levels, and centralized configuration.
Usage
from library.lm_logging import get_logger, ConsoleLogger,
FileLogger, ExceptionConsoleLogger
# Use the abstract interface implementations
console_logger = ConsoleLogger(__name__)
file_logger = FileLogger(__name__, "app.log")
exception_logger = ExceptionConsoleLogger(__name__)
# Or use the traditional logger
logger = get_logger(__name__)
ConsoleLogger
Bases: LoggerBase
A console logger implementation that uses logging.Logger as a delegate. Logs messages to the console using Python's built-in logging module.
Source code in lmm/utils/logging.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |
__init__(name=None)
Initialize the ConsoleLogger with a specific logger name, typically name to use the module name
Source code in lmm/utils/logging.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |
critical(msg)
Log a critical message.
Source code in lmm/utils/logging.py
112 113 114 | |
error(msg)
Log an error message.
Source code in lmm/utils/logging.py
104 105 106 | |
get_level()
Get the current logging level
Source code in lmm/utils/logging.py
96 97 98 | |
info(msg)
Log an informational message.
Source code in lmm/utils/logging.py
100 101 102 | |
set_level(level)
Set the logging level for the logger.
Source code in lmm/utils/logging.py
92 93 94 | |
warning(msg)
Log a warning message.
Source code in lmm/utils/logging.py
108 109 110 | |
ExceptionConsoleLogger
Bases: LoggerBase
A console logger implementation that raises exceptions on error and critical calls.
This logger behaves like ConsoleLogger for info, warning, and set_level methods, but raises exceptions when error() or critical() methods are called. The message is still logged before the exception is raised.
Source code in lmm/utils/logging.py
391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 | |
__init__(name='')
Initialize the ExceptionConsoleLogger with a specific logger name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the logger, typically name to use the module name |
''
|
Source code in lmm/utils/logging.py
402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 | |
critical(msg)
Log a critical message and raise an exception.
Source code in lmm/utils/logging.py
444 445 446 447 | |
error(msg)
Log an error message and raise an exception.
Source code in lmm/utils/logging.py
435 436 437 438 | |
get_level()
Get the current logging level
Source code in lmm/utils/logging.py
427 428 429 | |
info(msg)
Log an informational message.
Source code in lmm/utils/logging.py
431 432 433 | |
set_level(level)
Set the logging level for the logger.
Source code in lmm/utils/logging.py
423 424 425 | |
warning(msg)
Log a warning message.
Source code in lmm/utils/logging.py
440 441 442 | |
FileConsoleLogger
Bases: LoggerBase
A file logger implementation that uses logging.Logger as a delegate. Logs messages to a specified file using Python's built-in logging module, and relays the messages to the console as well.
This logger allows independent control of logging levels for both file and console outputs.
Source code in lmm/utils/logging.py
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 | |
__init__(name='', log_file='app.log', console_level=logging.INFO, file_level=logging.INFO)
Initialize the FileConsoleLogger with a specific logger name, file path, and separate logging levels for console and file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the logger, typically name to use the module name |
''
|
log_file
|
str | Path
|
Path to the log file where messages will be written |
'app.log'
|
console_level
|
int
|
The logging level for console output (default: logging.INFO) |
INFO
|
file_level
|
int
|
The logging level for file output (default: logging.INFO) |
INFO
|
Source code in lmm/utils/logging.py
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 | |
critical(msg)
Log a critical message.
Source code in lmm/utils/logging.py
304 305 306 307 | |
error(msg)
Log an error message.
Source code in lmm/utils/logging.py
294 295 296 297 | |
get_console_level()
Get the current logging level for the console logger.
Returns:
| Type | Description |
|---|---|
int
|
The console logger's current level |
Source code in lmm/utils/logging.py
271 272 273 274 275 276 277 278 | |
get_file_level()
Get the current logging level for the file logger.
Returns:
| Type | Description |
|---|---|
int
|
The file logger's current level |
Source code in lmm/utils/logging.py
280 281 282 283 284 285 286 287 | |
get_level()
Get the current logging level for the file logger.
Returns:
| Type | Description |
|---|---|
int
|
The file logger's current level |
Source code in lmm/utils/logging.py
262 263 264 265 266 267 268 269 | |
info(msg)
Log an informational message.
Source code in lmm/utils/logging.py
289 290 291 292 | |
set_console_level(level)
Set the logging level for the console logger only.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
level
|
int
|
The logging level for console output |
required |
Source code in lmm/utils/logging.py
244 245 246 247 248 249 250 251 | |
set_file_level(level)
Set the logging level for the file logger only.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
level
|
int
|
The logging level for file output |
required |
Source code in lmm/utils/logging.py
253 254 255 256 257 258 259 260 | |
set_level(level)
Set the logging level for both file and console loggers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
level
|
int
|
The logging level to set for both outputs |
required |
Source code in lmm/utils/logging.py
234 235 236 237 238 239 240 241 242 | |
warning(msg)
Log a warning message.
Source code in lmm/utils/logging.py
299 300 301 302 | |
FileLogger
Bases: LoggerBase
A file logger implementation that uses logging.Logger as a delegate. Logs messages to a specified file using Python's built-in logging module.
Source code in lmm/utils/logging.py
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 | |
__init__(name='', log_file='app.log')
Initialize the FileLogger with a specific logger name and file path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the logger, typically name to use the module name |
''
|
log_file
|
str | Path
|
Path to the log file where messages will be written |
'app.log'
|
Source code in lmm/utils/logging.py
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | |
critical(msg)
Log a critical message.
Source code in lmm/utils/logging.py
174 175 176 | |
error(msg)
Log an error message.
Source code in lmm/utils/logging.py
166 167 168 | |
get_level()
Get the current logging level
Source code in lmm/utils/logging.py
158 159 160 | |
info(msg)
Log an informational message.
Source code in lmm/utils/logging.py
162 163 164 | |
set_level(level)
Set the logging level for the logger.
Source code in lmm/utils/logging.py
154 155 156 | |
warning(msg)
Log a warning message.
Source code in lmm/utils/logging.py
170 171 172 | |
LoggerBase
Bases: ABC
Abstract interface for logging functionality.
Source code in lmm/utils/logging.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | |
critical(msg)
abstractmethod
Log a critical message.
Source code in lmm/utils/logging.py
59 60 61 62 | |
error(msg)
abstractmethod
Log an error message.
Source code in lmm/utils/logging.py
49 50 51 52 | |
get_level()
abstractmethod
Get the current logging level
Source code in lmm/utils/logging.py
39 40 41 42 | |
info(msg)
abstractmethod
Log an informational message.
Source code in lmm/utils/logging.py
44 45 46 47 | |
set_level(level)
abstractmethod
Set the logging level for the logger.
Source code in lmm/utils/logging.py
34 35 36 37 | |
warning(msg)
abstractmethod
Log a warning message.
Source code in lmm/utils/logging.py
54 55 56 57 | |
LoglistLogger
Bases: LoggerBase
Maintains a list of logged errors and warnings that can be inspected by the object creator.
Source code in lmm/utils/logging.py
310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 | |
__init__()
Initialize the logger.
Source code in lmm/utils/logging.py
316 317 318 319 320 | |
clear_logs()
Clear the logs from the cache
Source code in lmm/utils/logging.py
381 382 383 | |
count_logs(level=0)
The number of recorded logs. Zero means there were no recorded logs.
Source code in lmm/utils/logging.py
375 376 377 378 379 | |
critical(msg)
Log a critical message.
Source code in lmm/utils/logging.py
342 343 344 | |
error(msg)
Log an error message.
Source code in lmm/utils/logging.py
334 335 336 | |
get_level()
Get the current logging level
Source code in lmm/utils/logging.py
326 327 328 | |
get_logs(level=0)
Returns a list of strings with the log messages.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
level
|
int
|
a filter on the logs. Possible values: 0 or less: returns all messages WARNING or less: omit info ERROR or less: omit warning CRITICAL or more: only errors and critical |
0
|
Source code in lmm/utils/logging.py
346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 | |
info(msg)
Log an informational message.
Source code in lmm/utils/logging.py
330 331 332 | |
set_level(level)
Set the logging level for the logger.
Source code in lmm/utils/logging.py
322 323 324 | |
warning(msg)
Log a warning message.
Source code in lmm/utils/logging.py
338 339 340 | |
add_file_handler(log_file)
Add a file handler to the root logger to write logs to a file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
log_file
|
str | Path
|
Path to the log file |
required |
Source code in lmm/utils/logging.py
503 504 505 506 507 508 509 510 511 512 513 514 | |
get_logger(name)
Get a logger with the specified name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the logger, typically name to use the module name |
required |
Returns:
| Type | Description |
|---|---|
LoggerBase
|
A configured logger instance |
Source code in lmm/utils/logging.py
450 451 452 453 454 455 456 457 458 459 460 461 462 | |
get_logging_logger(name)
Get a logger with the specified name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the logger, typically name to use the module name |
required |
Returns:
| Type | Description |
|---|---|
Logger
|
A configured logger instance |
Source code in lmm/utils/logging.py
478 479 480 481 482 483 484 485 486 487 488 489 490 | |
set_log_level(level)
Set the log level for all loggers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
level
|
int
|
The logging level (e.g., logging.DEBUG, logging.INFO) |
required |
Source code in lmm/utils/logging.py
493 494 495 496 497 498 499 500 | |
The utility class LazyLoadingDict stores memoized language model
class objects, or indeed objects of any class, produced by a factory
function.
The LazyLoadingDict class has three main uses that may be combined.
-
the first is to create objects based on a definition using a dictionary interface. The key of the dictionary is the definition that provides the object instance; different instances may be created based on the definition
-
the second is to memoize the objects created by the definition
-
the third is to enable runtime errors when an invalid definition is given.
The class is instantiated by providing the factory function in the constructor. The factory function takes one argument of the type of the dictionary key, and returns a type that determined the type of the values in the dictionary. To trigger runtime errors when invalid definitions are provided, provide keys of EnumStr of BaseModel-derived types (for example, see the documentation of the class).
LazyLoadingDict
Bases: dict[KeyT, ValueT]
A lazy dictionary class with memoized object of type ValueT. To restrict the keys used, use a StrEnum key value (see example below). Any object type may be used as key, depending on how the dictionary is used.
Example:
# We define here permissible keys by inheriting from StrEnum
class LMSource(StrEnum):
Anthropic = 'Anthropic'
Gemini = 'Gemini'
OpenAI = 'OpenAI'
# We then define a factory function that creates a model object
# designated by the key, i.e. a function that maps the possible
# keys to instances that are memoized. In the example, ModelClass
# objects are stored in the dictionary (code not included):
def create_model_instance(model_name: LMSource) -> ModelClass:
print(f"Created instance of {model_name}")
return ModelClass(model_name=model_name)
# The lazy dictionary is created by giving the factory function
# in the constructor.
lazy_dict = LazyLoadingDict(create_model_instance)
# The objects are created or retrieved as the value of the key:
openai_model = lazy_dict['OpenAI']
# If the argument of the factory is derived from StrEnum, calling
# the dictionary with an invalid key will throw a ValueError:
model = lazy_dict[LMSource('OpenX')]
This is a more elaborate example, where a whole specification is used to create objects and memoize them:
# This defines the supported model sources. Runtime errors
# provided by BaseModel below
from typing import Literal
from pydantic import BaseModel, ConfigDict
LanguageModelSource = Literal[
'Anthropic',
'Gemini',
'Mistral',
'OpenAI'
]
# This defines source + model
class LanguageModelSpecification(BaseModel):
source_name: LanguageModelSource
model_name: str
# This required to make instances hashable, so that they can
# be used as keys in the dictionary
model_config = ConfigDict(frozen=True)
# Langchain model type specified here.
def _create_model_instance(
model: LanguageModelSpecification,
) -> BaseLM[BaseMsg]:
# Factory function to create Langchain models while checking
# permissible sources, provided as key values:
match model.source_name:
case LanguageModelSource.OpenAI:
from langchain_openai.chat_models import ChatOpenAI
return ChatOpenAI(
model=model.model_name,
temperature=0.1,
max_retries=2,
use_responses_api=False,
)
... (rest of code not shown)
# The memoized dictionary. langchain_models is parametrized like
# a dict[LanguageModelSpecification, BaseLM[BaseMSg]]
langchain_models = LazyLoadingDict(_create_model_instance)
# Example of use
model_spec = {'source_name': "OpenAI", 'model_name': "gpt-4o"}
model = langchain_models[
LanguageModelSpecification(**model_spec)
]
A Pydantic model class may also be used to create a more flexible dictionary. In the previous example, only the models specified in LanguageModel source can be specified without raising exceptions. However, a Pydantic model class may be used to constrain the objects saved in the dictionary without limiting them to a finite sets, i.e. by a validation that does not constrain the instances to that set. Thus, if source_name was a str in the above example, then any LanguageModelSpecification constructed with any string will be accepted.
In the following example, the runtime error is generated in the factory function, because literals do not give rise to runtime errors in themselves.
ModelSource = Literal["OpenAI", "Cohere"]
def _model_factory(src: ModelSource) -> ModelClass:
match src:
case "OpenAI"
return ModelClass("OpenAI") # code not shown
case "Cohere"
return ModelClass("Cohere") # code not shown
case _:
# required to raise error
raise ValueError(f"Invalid model source: {src}")
model_factory = LazyLoadingDict(_model_factory)
It is also possible to assign to the dictionary directly, thus bypassing the factory function. In this case, the only checks are those that are possibly computed by Pydantic when the object is assigned.
Expected behaviour: may raise ValidationError and ValueErrors.
Source code in lmm/utils/lazy_dict.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 | |
__setitem__(key, value)
Allow direct setting of key/value pairs.
This bypasses the factory function for the given key. Once set directly, the factory function will not be called for this key unless the key is deleted first.
Raises:
| Type | Description |
|---|---|
ValueError
|
If the key already exists in the dictionary. |
Source code in lmm/utils/lazy_dict.py
190 191 192 193 194 195 196 197 198 199 200 201 202 | |
apply_markdown_heuristics(page_text)
Applies simple heuristics to convert extracted raw text into basic Markdown format.
This function attempts to: 1. Clean up excessive whitespace. 2. Ensure proper paragraph separation (Markdown requires two newlines). 3. (Placeholder for advanced logic) Detect headings or lists based on patterns.
Source code in lmm/utils/importpdfs.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | |
convert_folder_to_markdown(input_dir, output_dir)
Reads all PDF files from an input directory and converts them to Markdown in an output directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_dir
|
str
|
The path to the folder containing PDF files. |
required |
output_dir
|
str
|
The path where the Markdown files will be saved. |
required |
Source code in lmm/utils/importpdfs.py
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | |
convert_pdf_to_md(pdf_path, output_dir)
Converts a single PDF file into a Markdown file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pdf_path
|
Path
|
Path object to the input PDF file. |
required |
output_dir
|
Path
|
Path object for the output directory. |
required |
Source code in lmm/utils/importpdfs.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | |