agree devlog - 01/29
The main shared transit data structure
The central question was: what shape does the data take as it moves between the Python visitor and the Go CLI core?
Landed on this:
AgreeType = Literal["some", "source", "partial"]
class AgreeMetadata(BaseModel):
starting_line_number: int
agree_type: AgreeType
class AgreeAttribute(BaseModel):
name: str
starting_line_number: int
types: list[str]
class AgreeSchema(BaseModel):
name: str
metadata: AgreeMetadata
attributes: list[AgreeAttribute]
class AgreeEntity(BaseModel):
name: str
schemas: list[AgreeSchema]
In memory during parsing, it's an untyped dict of dicts for easy stack usage, so I always know what class/attribute I'm currently inside and can just push new data into that part. Then in transit, since this is crossing a language boundary, named attributes via pydantic serialize cleanly to JSON which Go can consume.
Indexing concern
One thing I thought about: in the CLI core, we'll need to query over certain attrs inside arrays like agree_type or entity name. Naively that's O(n) since you'd have to check each array member. Could be slow on a massive repo.
The alternative was to index during ingestion, build a multi-map keyed by whatever we care about (agree_type, entity_name, etc.) so queries are O(1). Indexing is still O(n) but you only pay that once.
Ended up not doing that. Instead rebuilt the entity model directly inside the parser:
class AgreeEntity(BaseModel):
name: str
source_schema: AgreeSchema | None = None
other_schemas: list[AgreeSchema] = []
During parsing, we're building a dict[str, AgreeEntity] where the key is just the entity name. That doubles the name but makes insertion during parsing fast. More importantly, source is already separated out, so when the CLI gets list[AgreeEntity] it can just do entity.source_schema and immediately compare against the others. No searching, no indexing, O(1).
One thing I noticed: the stack resets per module, but it should really reset per class. Once a class definition is done the stack should be empty. Still needs fixing. Also union operators (|) in type annotations aren't handled yet, those will need their own visitor logic.
Getting data from Python to Go
Still need to figure out the handoff. Since Go will call Python via exec, the options are stdout or some kind of serialization. Leaning toward stdout with JSON but haven't committed yet.