Pruzhany Chronicles
Interactive Edition Guide
Introduction
The Pruzhany Chronicles transforms the archived pages of two historical Yiddish newspapers into a vivid portrait of a community on the eve of its destruction. Using generative AI to read the surviving editions from Pruzhany, Poland, the project extracts local entities—individuals, businesses, life events—and weaves them with Holocaust databases, library collections, and yizkor books to reassemble a data-driven record of the town's final years.
This guide walks through one edition—December 16, 1938—to document how a newspaper scan becomes a structured database. Each section details one layer of the data model, from physical text extraction through semantic event mapping, with every value drawn from the live edition bundle. Work is ongoing to process the approximately 500 remaining editions of the Pruzaner Sztyme and Pruzaner Lebn spanning 1934–1939, preserved by the Historical Jewish Press.
Three-Layer Architecture
The data model separates the newspaper content into three increasingly semantic layers, plus a cross-cutting enrichment layer. This separation reflects a fundamental insight: the physical layout of a newspaper page, the editorial structure of its content, and the real-world events it describes are three distinct things that happen to share the same physical medium.
Edition at a Glance
This guide explores the 1938-12-16 edition of the Pruzhaner Shtime — 4 pages containing 8,053 words across 73 text blocks, organized into 72 content units. The enrichment layer adds 92 named individuals, 20 locations, and 6 documented events.
Entity Counts
Pages & Blocks (Layer 1)
The physical layer represents the newspaper as the OCR pipeline sees it. Each of the 4 pages is divided into text blocks — contiguous regions of text identified by their bounding box coordinates on the page image. These coordinates allow the digital edition to link every piece of transcribed text back to its precise location on the original newspaper.
Page Summary
| Block ID | Bbox [x, y, w, h] | Transcription | Conf. | Content Unit |
|---|---|---|---|---|
| blk-5746 | [1597, 2803, 490, 136] | אונזערע פריינד פאמיליעס פאמעראנץ-שאפירא צו דער חתונה פון זייער טאכטער און שוועסט… | 0.90 | |
| blk-5747 | [1597, 3088, 490, 122] | ר ח ל ! צו דיין חתונה מיט ה' חיים שור אונזער הארציקן מזל טוב טייבל, דארע, לאשע (… | 0.90 | |
| blk-5748 | [1597, 2117, 490, 146] | אונזערע פריינט פאמיליעס פאמעראניץ-שאפירא צו דער חתונה פון זייער טאכטער, שוועסטער… | 0.90 |
Content Units (Layer 2)
Content Units represent the editorial structure of the newspaper. Each unit is a coherent editorial object — an article, a notice, an advertisement, a congratulatory message, or an obituary. A single Content Unit may span multiple physical blocks (for example, an article that flows across two columns).
Showing 2 notice units. Select a type above to browse.
| ID | Title | People | Locs |
|---|---|---|---|
| cu-13690b | Entertainment: White-Blue Ball at Bereze Tarbut | 1 | 0 |
Page location Page 4 2
blocks Yiddish Textפארוויילונג אין בערעזער „תרבות".
שבת דעם 17 טן ד. מ. וועלן אלע פריינד
זיך טרעפן אין די פיין דעקארירטע זאלן פון
„תרבות". ווי מיר ווייסן פון אלע יארן אז דער
ווייס בלויער באל לאזט איבער א גוטן איינ-
דרוק אויף אלע באזוכער. ווי די איניציאטארן
גיבן די גרעסטע מי אז דער אוונט זאל זיין
וואס אימפאזאנטער. נאך מער היינט מיט'ן ספע-
ציעלן דזשאז ארקעסטער פון קאברין אונטער
דער לייטונג פון ה' סעלעצקי ווי ער איז בא-
דייטנד פארגרעסערט. אלזא, מארגן טרעפן מי-
זיך אלע 9 אוונט אין „תרבות" אויף דעם טרא-
דיציאנעלן ווייס בלויען באל. English Translation## Entertainment in Bereze 'Tarbut'
Saturday the 17th of this month, all friends will meet in the finely decorated halls of 'Tarbut'. As we know from all previous years, the **White-Blue Ball** leaves a good impression on all visitors. The initiators are making the greatest effort to ensure the evening is as impressive as possible. Even more so today with the special jazz orchestra from Kobrin under the direction of Mr. Selecki, which has been significantly enlarged. Therefore, tomorrow we will all meet at **9 PM** in 'Tarbut' for the traditional **White-Blue Ball**. | |||
| cu-13694a | Commerce for the Holidays: Store Hours | 0 | 0 |
Life Events (Layer 3)
Life Events represent the semantic layer — real-world occurrences that generated newspaper coverage. The Pomerantz–Shor wedding, for instance, produced 27 Content Units: a formal announcement from the bride's father plus 26 congratulatory messages. The event groups these Content Units together and records each unit's role (primary announcement, congratulation, or mention).
This edition contains 6 events with associated Content Units. Events sourced from the newspaper text are marked edition; those added from external Holocaust research are marked historical.
| Event Name | Type | Source | Participants | Units |
|---|---|---|---|---|
| Visit of Mrs. Barzach to Pruzhany | community_event | edition | 2 | 1 |
| Death of Rabbi Chaim Feldman | death | edition | 4 | 2 |
| JNF Hanukkah Academy | community_event | edition | 0 | 2 |
| Birth of Judewicz Daughter | birth | edition | 6 | 23 |
| Rachel Pomerantz & Chaim Shor Wedding | wedding | edition | 5 | 27 |
| White-Blue Ball in Bereze | community_event | edition | 0 | 1 |
Community Connections Made Visible
The table above shows individual events in isolation. But the real power of the semantic layer emerges when events are cross-referenced — revealing how the same people appear across different life events, their relationships reconstructed from the newspaper's own pages.
Social overlap between two life events in the 1938-12-16 edition
The Enrichment Layer
The three layers above capture what is in the newspaper — its physical form, editorial structure, and the events it describes. The enrichment layer goes further: it adds structured knowledge drawn from external research, transforming brief textual mentions into richly annotated records. Every named person, place, organization, and topic becomes a node in a growing knowledge graph.
This enrichment accumulates gradually. The current edition represents a snapshot of ongoing research — some entities have extensive biographical data and verified Holocaust fates, while others remain skeletal entries awaiting further investigation. The Data Quality section quantifies these gaps.
Five Entity Types
The sections that follow explore each entity type in detail. Here is what the enrichment layer captures for this edition:
References are maintained in both directions — a person lists all content units where they appear, and each content unit lists all people mentioned in it — supporting different access patterns depending on whether you're exploring an entity or reading an article.
People
The enrichment layer identifies 92 individuals mentioned in this edition. Each person entity includes biographical data and, where possible, research into their fate during the Holocaust. This research draws on databases from Yad Vashem, the Arolsen Archives, memorial books, and other sources.
For most individuals in this edition, their Holocaust fate remains unknown — a common challenge in Holocaust research. The fate classifications range from definitive ("Perished," "Survived") to probabilistic ("Likely Perished," "Likely Survived" based on their known location and circumstances).
47 people connected by 64 family relationships across 15 family groups
Locations
The enrichment layer includes 20 geographic locations, organized in a hierarchical tree reflecting administrative containment: streets sit within towns, towns within districts, districts within regions, and regions within countries.
Each location is linked to external geographic databases (Wikidata and GeoNames) for interoperability with the European Holocaust Research Infrastructure (EHRI). Locations of Holocaust significance include structured metadata about ghettos, camps, and massacre sites.
The map below shows how this single newspaper edition connected readers across four continents — from the Pruzhany heartland to the Argentine diaspora, from Eretz Israel to Newfoundland. Dot size reflects the number of content units mentioning each location. Click any dot to see its details in the tree below.
Click any location in the tree to see its details. Use Expand All to see the full hierarchy.
country Belarus 0
region Brest Region 0
town Pruzhany פרוזשאנע 7
country Canada קאנאדע 0
country Israel ישראל 0
country Lithuania ליטע 0
country Poland פוילן 1
Pruzhany
פרוזשאנע
- Type
- town
- Country
- Belarus (historically: Poland (Second Polish Republic))
- Coordinates
- 52.5567, 24.4567
- Wikidata
- Q1866937
- GeoNames
- 622997
- Names
- en: Pruzhany
pl: Prużana
be: Пружаны
ru: Пружаны
yi: פרוזשאנע - Holocaust History
- German occupation began June 1941. July 1941: 17 prominent Jews executed in Lachy (Lyakhi) forest. Ghetto established late 1941 with over 10,000 Jews from Pruzhany and surrounding areas. Judenrat formed under Chairman Itzel Janowicz. LIQUIDATION: January 28-31, 1943 - Jews transported by sled 12km to Linovo (Orantschice) railway station, then by freight train to Auschwitz. Most perished upon arrival. Some survivors joined partisan groups in forests near Kobrin/Pruzhany (Chapayev Brigade, Kirov Unit). Liberation: July 20, 1944. Sources: Pinkas Pruzhany (1983), Morris Sorid memoirs
- Content Units
Topics & Organizations
Topics (5)
Topics categorize Content Units by theme, grouping related editorial content regardless of physical position in the newspaper.
Organizations (8)
Organizations mentioned in the newspaper include local businesses, community institutions, and cultural organizations, linked to the Content Units where they appear.
Data Quality & Coverage
Where is the data complete, and where are the gaps? This section examines field coverage across the edition's entities and visualizes the Holocaust fate research that underpins the biographical data. These metrics help assess readiness for using this edition as ground truth when processing future editions.
Field Coverage
Percentage of entities where key fields have been filled in (excluding “unknown” values for Holocaust fate).
Holocaust Fate Distribution
Each square represents approximately 1% of the 92 named individuals.
Timeline
Birth and death years of named individuals, relative to the newspaper's publication
Technical Reference
This section documents key design decisions in the data model — the choices that aren't obvious from the data itself but matter for reviewing correctness and extending the model for future editions.
Why String IDs?
All entity identifiers are semantic strings like person-rachel-pomerantz or cu-ball-announcement rather than auto-incrementing integers. This choice offers several
advantages:
- Self-documenting: You can read the data without looking up IDs in a separate table
- Prefix-based polymorphism: The UI can determine entity type from the prefix (
person-,cu-,location-), enabling a single selection variable to hold any entity - Stability: IDs don't change when the database is rebuilt (unlike auto-increment)
- Debuggability: Log messages and JSON output are human-readable
ID Prefix Reference
| Prefix | Entity Type | Example |
|---|---|---|
page- | Page | page-1938-12-16-4 |
blk- | Block | blk-5746 |
cu- | Content Unit | cu-13685 |
event- | Event | event-pomerantz-shor-wedding |
person- | Person | person-rachel-pomerantz |
location- | Location | location-pruzhany |
topic- | Topic | topic-lifecycle-events |
org- | Organization | org-kehillah-pruzhany |
Why snake_case?
All field names use snake_case (e.g., holocaust_fate, unit_ids) for consistency with the SQLite database layer. Earlier versions used
camelCase, but the migration to a relational database standardized on snake_case throughout the
stack.
Why Separate Files?
Each data layer is stored in its own JSON file because each has a different authoring cadence:
- Pages/Blocks: Change when OCR is re-run or corrected
- Content Units: Change during editorial curation
- Enrichment: Accumulates gradually through research
- Life Events: Curated as editorial understanding deepens
Separation allows independent updates without merge conflicts. The Edition Bundle format (1938-12-16-edition.json) merges everything
into a single self-contained document for distribution and validation.
Event Source Classification
Events carry a source field distinguishing their provenance:
edition— Events documented in the newspaper itself (weddings, bar mitzvahs, community news)historical— Events added from external Holocaust research (ghetto establishment, deportations, massacres)
Events of type holocaust_event always have source: 'historical'. This
classification ensures that the newspaper's own voice is never confused with retrospective
knowledge.
רייזטשע און משה יודעוויטש מעלדן וועגן געבורט פון זייער טעכטערל
Rivtche and Moshe Judewicz announce the birth of their little daughter
— for Tzvia