Skip to content

Configuration

NOUZ looks for config.yaml in this order: NOUZ_CONFIG, current working directory, then the installed server directory. For LUCA mode the file is optional — the server starts with default settings.

Without config.yaml, the server runs in LUCA mode (pure graph).

Minimal Config

From source:

bash
cp config.template.yaml config.yaml

On Windows PowerShell:

powershell
Copy-Item config.template.yaml config.yaml
yaml
mode: prizma  # luca | prizma | sloi

Place this file in the project working directory or pass its absolute path through NOUZ_CONFIG. That is enough to start with semantic classification.

Full Config

yaml
# Operating mode: luca | prizma | sloi
mode: prizma

# Meta-root: level 0 anchor note.
# Useful for large bases; leave empty when starting simple.
# Excluded from all semantic operations.
meta_root: ""

# Semantic etalons — domain descriptions for classification.
# Each etalon: 2-3 sentences in the domain's subject language.
# Avoid common words that appear across multiple domains.
etalons:
  - sign: S
    name: Systems Analysis
    text: >
      Methodology for analysing complex objects: feedback loops,
      emergent properties, self-regulation, bifurcation points.
      Cybernetics, synergetics, dissipative structures, catastrophe
      theory, autopoiesis — tools for understanding how the whole
      exceeds the sum of its parts. Not data and not code — a way
      of thinking about how parts form a whole and why systems
      behave non-linearly.
  - sign: D
    name: Data & Science
    text: >
      Physics and cosmology: from subatomic particles to the large-scale
      structure of the Universe. Lagrangians, curvature tensors, scattering
      cross-sections, quarks, bosons, fermions, plasma, vacuum fluctuations,
      cosmic microwave background, cosmological constant, decoherence.
      Pure science about the nature of matter, energy and spacetime.
  - sign: E
    name: Engineering
    text: >
      Software engineering, machine learning and infrastructure: writing
      and debugging code, deployment, containerisation, neural networks,
      inference, tokenisation, data serialisation, microservices, CI/CD,
      automated testing, refactoring, Git, Docker, Kubernetes, APIs.
      The practical discipline of building computational systems from
      architecture to production.

# Hierarchy levels.
# Values are technically configurable, but the standard L1-L5 scale is recommended:
# docs, examples, and SLOI validation assume this layout.
levels:
  core: 1
  pattern: 2
  module: 3
  quant: 4
  artifact: 5

# Classification and bridge thresholds
thresholds:
  # Minimum difference between max and min cosine for sign assignment.
  # If spread < sign_spread → the difference between domains is too weak.
  sign_spread: 0.05

  # Minimum absolute cosine to the nearest core.
  # If max_cosine >= confident_cosine → sign_source = "auto" (reliable).
  # If max_cosine < confident_cosine → sign_source = "weak_auto" (best guess,
  #   bridges to the same core are NOT blocked).
  confident_cosine: 0.6

  # Minimum normalized % for a core to appear in a compound sign.
  # After spread normalization: if adjusted_score / total * 100 >= threshold,
  # the core is included. Allows compound signs like "SE" when two cores both score ≥ 30%.
  pattern_second_sign_threshold: 30.0

  # Minimum cosine similarity to propose a semantic bridge.
  # Only proposed between notes with different cores (cross-domain).
  semantic_bridge_threshold: 0.55

  # Minimum structural similarity for an analogy bridge.
  # Analogy bridges connect notes from different cores with similar
  # graph positions (core_mix, level, degree, tags).
  structural_bridge_threshold: 0.55

  # Minimum cosine similarity to auto-link a file to a parent.
  # Used by process_orphans and add_entity when auto_parents=true.
  # Raw cosine only — ensures meaningful proximity.
  parent_link_threshold: 0.55

  # Reliability threshold after spread normalization.
  # If dominant core >= confident_spread% → sign_source = "auto".
  # Below → sign_source = "weak_auto" (bridges not blocked).
  confident_spread: 60.0

# Artifact signs. This is a sign dictionary, not embedding etalons.
# L5 artifacts get artifact_sign by content heuristics.
# L4 quants can include artifact_sign as part of the composite sign.
artifact_signs:
  - sign: β
    name: Note
    text: "Short note, observation, marginal thought."
  - sign: δ
    name: Concept
    text: "Definition, concept, entity description."
  - sign: ζ
    name: Reference
    text: "External source, documentation, link, citation."
  - sign: σ
    name: Log
    text: "Chronicle of events, session record, dialogue log."
  - sign: μ
    name: News
    text: "News item, update, release note."
  - sign: λ
    name: Hypothesis
    text: "Hypothesis, assumption, speculative idea."
  - sign: 🝕
    name: Specification
    text: "Technical specification, instruction, requirements."

# Extra sign characters for filename parsing (optional).
# Only for extracting signs from names, NOT for classification.
sign_chars: ""

Profiles (Optional)

For switching between etalon sets via PROFILE:

yaml
mode: prizma
profiles:
  default:
    mode: prizma
    etalons: []
  research:
    mode: sloi
    etalons:
      - sign: T
        name: Theory
        text: >
          Scientific theories, hypotheses, formal models...
bash
export PROFILE=research

Parameters

mode

ValueDescription
lucaPure graph. YAML frontmatter and connections only. No embeddings needed. Default mode.
prizmaSemantics + graph. Embeddings classify notes by domain. Flexible hierarchy.
sloiStrict 5-level hierarchy with validation. Requires embeddings. Skipping a level = error.

etalons

A list of domains. Each etalon contains:

  • sign — short domain symbol. The example uses S, D, E, but you can choose other letters or symbols if they are used consistently in config and do not conflict with artifact_signs.
  • name — name
  • text — descriptive text of 2–3 sentences. The basis of classification. Write in the subject language of your domain, use domain-specific jargon. Avoid words that appear across multiple domains.
Etalon Quality

Run calibrate_cores and check pairwise cosine between etalons. Raw cosine for transformer models is usually high (0.6-0.75) because of anisotropy. Look at mean-centered values: they should be noticeably lower than raw and differ between pairs. If all pairs look almost the same, strengthen domain specificity and remove common words.

artifact_signs

A dictionary of material types for L5 artifacts. These are not embedding etalons: the server chooses artifact_sign with content-structure heuristics. For example, a log gets σ, a specification gets 🝕, a hypothesis gets λ.

thresholds

ParameterDefaultDescription
sign_spread0.05Min max/min cosine difference for classification
confident_cosine0.6Absolute cosine threshold to nearest core
pattern_second_sign_threshold30.0Min % for compound sign inclusion
semantic_bridge_threshold0.55Semantic bridge threshold
structural_bridge_threshold0.55Analogy bridge threshold
parent_link_threshold0.55Auto-parent linking threshold
confident_spread60.0Classification reliability threshold (%)

Environment Variables

VariableRequiredDescription
OBSIDIAN_ROOTYesAbsolute path to the vault
NOUZ_CONFIGNoAbsolute path to config.yaml; if omitted, the server looks in the current working directory
PROFILENoProfile name from config.yaml (default: default)
EMBED_PROVIDERNoopenai-compatible or ollama (default: openai)
EMBED_ENABLEDNotrue or false (default: true)
EMBED_API_URLFor prizma/sloiURL of an OpenAI-compatible embedding API
EMBED_MODELNoEmbedding model name
EMBED_API_KEYNoAPI key for cloud providers
LLM_API_URLNoURL for LLM (tag extraction)
LLM_MODELNoLLM model name
bash
export OBSIDIAN_ROOT=/path/to/vault
export EMBED_API_URL=http://127.0.0.1:1234/v1
export EMBED_MODEL=nomic-embed-text

Compatible Embedding Providers

ProviderURLNote
LM Studiohttp://127.0.0.1:1234/v1Recommended for local use
Ollamahttp://127.0.0.1:11434Uses /api/embeddings endpoint
OpenAIhttps://api.openai.com/v1Add EMBED_API_KEY
Any OpenAI-compatibleStandard /v1/embeddings endpoint

Writing Your Own Etalons

The S/D/E etalons above are a starting example of three well-separated domains: systems analysis, scientific data, and engineering practice. You can reuse them as a template for text quality, but the domains themselves should match your base. Most setups need 2–4 domains with dense subject language and clear boundaries between neighboring areas. After changing etalons, run calibrate_cores and check Etalon Quality.

{Semiotronika}
Telegram · Volnaya Sreda · Email