Skip to content

Introduction

The Project

The Pruzhany Chronicles transforms the archived pages of two historical Yiddish newspapers into a vivid portrait of a community on the eve of its destruction. Using generative AI to read the surviving editions from Pruzhany, Poland, the project extracts local entities—individuals, businesses, life events—and weaves them with Holocaust databases, library collections, and yizkor books to reassemble a data-driven record of the town's final years.

This Guide

This guide walks through one edition—December 16, 1938—to document how a newspaper scan becomes a structured database. Each section details one layer of the data model, from physical text extraction through semantic event mapping, with every value drawn from the live edition bundle. Work is ongoing to process the approximately 500 remaining editions of the Pruzaner Sztyme and Pruzaner Lebn spanning 1934–1939, preserved by the Historical Jewish Press.

Three-Layer Architecture

The data model separates the newspaper content into three increasingly semantic layers, plus a cross-cutting enrichment layer. This separation reflects a fundamental insight: the physical layout of a newspaper page, the editorial structure of its content, and the real-world events it describes are three distinct things that happen to share the same physical medium.

blocks reference content units
content units grouped into events
Click a layer to see examples

Edition at a Glance

This guide explores the 1938-12-16 edition of the Pruzhaner Shtime — 4 pages containing 8,053 words across 73 text blocks, organized into 72 content units. The enrichment layer adds 92 named individuals, 20 locations, and 6 documented events.

Entity Counts

0 Pages
0 Text Blocks
0 Content Units
0 Events
0 People
0 Locations
0 Organizations
0 Topics

Pages & Blocks (Layer 1)

The physical layer represents the newspaper as the OCR pipeline sees it. Each of the 4 pages is divided into text blocks — contiguous regions of text identified by their bounding box coordinates on the page image. These coordinates allow the digital edition to link every piece of transcribed text back to its precise location on the original newspaper.

Page Summary

Block IDBbox [x, y, w, h] TranscriptionConf. Content Unit
blk-5746[1597, 2803, 490, 136]אונזערע פריינד פאמיליעס פאמעראנץ-שאפירא צו דער חתונה פון זייער טאכטער און שוועסט…0.90
blk-5747[1597, 3088, 490, 122]ר ח ל ! צו דיין חתונה מיט ה' חיים שור אונזער הארציקן מזל טוב טייבל, דארע, לאשע (…0.90
blk-5748[1597, 2117, 490, 146]אונזערע פריינט פאמיליעס פאמעראניץ-שאפירא צו דער חתונה פון זייער טאכטער, שוועסטער…0.90

Content Units (Layer 2)

Content Units represent the editorial structure of the newspaper. Each unit is a coherent editorial object — an article, a notice, an advertisement, a congratulatory message, or an obituary. A single Content Unit may span multiple physical blocks (for example, an article that flows across two columns).

Showing 2 notice units. Select a type above to browse.

2 of 2
ID Title People Locs
cu-13690bEntertainment: White-Blue Ball at Bereze Tarbut10
Page location
Page 4 2 blocks

Yiddish Text

פארוויילונג אין בערעזער „תרבות". שבת דעם 17 טן ד. מ. וועלן אלע פריינד זיך טרעפן אין די פיין דעקארירטע זאלן פון „תרבות". ווי מיר ווייסן פון אלע יארן אז דער ווייס בלויער באל לאזט איבער א גוטן איינ- דרוק אויף אלע באזוכער. ווי די איניציאטארן גיבן די גרעסטע מי אז דער אוונט זאל זיין וואס אימפאזאנטער. נאך מער היינט מיט'ן ספע- ציעלן דזשאז ארקעסטער פון קאברין אונטער דער לייטונג פון ה' סעלעצקי ווי ער איז בא- דייטנד פארגרעסערט. אלזא, מארגן טרעפן מי- זיך אלע 9 אוונט אין „תרבות" אויף דעם טרא- דיציאנעלן ווייס בלויען באל.

English Translation

## Entertainment in Bereze 'Tarbut' Saturday the 17th of this month, all friends will meet in the finely decorated halls of 'Tarbut'. As we know from all previous years, the **White-Blue Ball** leaves a good impression on all visitors. The initiators are making the greatest effort to ensure the evening is as impressive as possible. Even more so today with the special jazz orchestra from Kobrin under the direction of Mr. Selecki, which has been significantly enlarged. Therefore, tomorrow we will all meet at **9 PM** in 'Tarbut' for the traditional **White-Blue Ball**.
Category: Local & Regional Communal News
People:  
Physical location:  
Cross-references: (references, 100%) 
cu-13694aCommerce for the Holidays: Store Hours00

Life Events (Layer 3)

Life Events represent the semantic layer — real-world occurrences that generated newspaper coverage. The Pomerantz–Shor wedding, for instance, produced 27 Content Units: a formal announcement from the bride's father plus 26 congratulatory messages. The event groups these Content Units together and records each unit's role (primary announcement, congratulation, or mention).

This edition contains 6 events with associated Content Units. Events sourced from the newspaper text are marked edition; those added from external Holocaust research are marked historical.

6 of 6
Event Name Type Source Participants Units
Visit of Mrs. Barzach to Pruzhanycommunity_eventedition21
Death of Rabbi Chaim Feldmandeathedition42
JNF Hanukkah Academycommunity_eventedition02
Birth of Judewicz Daughterbirthedition623
Rachel Pomerantz & Chaim Shor Weddingweddingedition527
White-Blue Ball in Berezecommunity_eventedition01

Community Connections Made Visible

The table above shows individual events in isolation. But the real power of the semantic layer emerges when events are cross-referenced — revealing how the same people appear across different life events, their relationships reconstructed from the newspaper's own pages.

The same edition also announces the birth of a daughter to Rivtche and Moshe Judewicz, generating a formal birth announcement and 22 congratulatory messages. Strikingly, 19 individuals sent congratulations to both events — among them Rivtche Judewicz herself, the new mother, who joins the Kaplan and Judewicz families in congratulating the Pomerantz–Shor wedding ().
WeddingPomerantz–ShorBirthJudewicz20only wedding26only birth19both events
Wedding Only 20
Shared 19
Birth Only 26

Social overlap between two life events in the 1938-12-16 edition

The Enrichment Layer

The three layers above capture what is in the newspaper — its physical form, editorial structure, and the events it describes. The enrichment layer goes further: it adds structured knowledge drawn from external research, transforming brief textual mentions into richly annotated records. Every named person, place, organization, and topic becomes a node in a growing knowledge graph.

This enrichment accumulates gradually. The current edition represents a snapshot of ongoing research — some entities have extensive biographical data and verified Holocaust fates, while others remain skeletal entries awaiting further investigation. The Data Quality section quantifies these gaps.

Five Entity Types

The sections that follow explore each entity type in detail. Here is what the enrichment layer captures for this edition:

92 People Biographical data, family relationships, Holocaust fate
20 Locations Hierarchical geography with coordinates and historical context
6 Events Edition events and historical events with participants
5 Topics Thematic tags grouping related content units
8 Organizations Named institutions, businesses, and communal bodies

References are maintained in both directions — a person lists all content units where they appear, and each content unit lists all people mentioned in it — supporting different access patterns depending on whether you're exploring an entity or reading an article.

People

The enrichment layer identifies 92 individuals mentioned in this edition. Each person entity includes biographical data and, where possible, research into their fate during the Holocaust. This research draws on databases from Yad Vashem, the Arolsen Archives, memorial books, and other sources.

For most individuals in this edition, their Holocaust fate remains unknown — a common challenge in Holocaust research. The fate classifications range from definitive ("Perished," "Survived") to probabilistic ("Likely Perished," "Likely Survived" based on their known location and circumstances).

47 people connected by 64 family relationships across 15 family groups

Judewicz–KaplanShapiro–PomerantzPerlsteinZuber–Zuber-SwirskyRosenblumKatzElmanSpectorShapiroPomeranetzPruzhanskiTschernevskyGoldbergEpsteinLaike
Fate Perished Survived Likely Survived Unknown
Relationship Spouse Parent–Child Sibling In-Law Other

Locations

The enrichment layer includes 20 geographic locations, organized in a hierarchical tree reflecting administrative containment: streets sit within towns, towns within districts, districts within regions, and regions within countries.

Each location is linked to external geographic databases (Wikidata and GeoNames) for interoperability with the European Holocaust Research Infrastructure (EHRI). Locations of Holocaust significance include structured metadata about ghettos, camps, and massacre sites.

The map below shows how this single newspaper edition connected readers across four continents — from the Pruzhany heartland to the Argentine diaspora, from Eretz Israel to Newfoundland. Dot size reflects the number of content units mentioning each location. Click any dot to see its details in the tree below.

Click any location in the tree to see its details. Use Expand All to see the full hierarchy.

country Belarus 0
region Brest Region 0
town Pruzhany פרוזשאנע 7
country Canada קאנאדע 0
country Israel ישראל 0
country Lithuania ליטע 0
country Poland פוילן 1

Pruzhany

פרוזשאנע

Type
town
Country
Belarus (historically: Poland (Second Polish Republic))
Coordinates
52.5567, 24.4567
Wikidata
Q1866937
GeoNames
622997
Names
en: Pruzhany
pl: Prużana
be: Пружаны
ru: Пружаны
yi: פרוזשאנע
Holocaust History
German occupation began June 1941. July 1941: 17 prominent Jews executed in Lachy (Lyakhi) forest. Ghetto established late 1941 with over 10,000 Jews from Pruzhany and surrounding areas. Judenrat formed under Chairman Itzel Janowicz. LIQUIDATION: January 28-31, 1943 - Jews transported by sled 12km to Linovo (Orantschice) railway station, then by freight train to Auschwitz. Most perished upon arrival. Some survivors joined partisan groups in forests near Kobrin/Pruzhany (Chapayev Brigade, Kirov Unit). Liberation: July 20, 1944. Sources: Pinkas Pruzhany (1983), Morris Sorid memoirs
Content Units
       

Topics & Organizations

Topics (5)

Topics categorize Content Units by theme, grouping related editorial content regardless of physical position in the newspaper.

Organizations (8)

Organizations mentioned in the newspaper include local businesses, community institutions, and cultural organizations, linked to the Content Units where they appear.

Zionist
Business
Social Welfare
Education
Healthcare

Data Quality & Coverage

Where is the data complete, and where are the gaps? This section examines field coverage across the edition's entities and visualizes the Holocaust fate research that underpins the biographical data. These metrics help assess readiness for using this edition as ground truth when processing future editions.

Field Coverage

Percentage of entities where key fields have been filled in (excluding “unknown” values for Holocaust fate).

Content Units with English translation 72/72
100%
Blocks assigned to Content Units 73/73
100%
People with determined Holocaust fate 25/92
27%
People with family relationships 52/92
57%
People with Yiddish name 88/92
96%
Locations with coordinates 20/20
100%
Locations with Wikidata ID 19/20
95%

Holocaust Fate Distribution

Each square represents approximately 1% of the 92 named individuals.

Unknown 67 73%
Perished 11 12%
Likely Survived 6 7%
Survived 5 5%
N/A 2 2%
Died Before 1 1%

Timeline

Birth and death years of named individuals, relative to the newspaper's publication

1880189019001910192019301940195016 Kislev 5699PUBLICATION DATE
Birth year
Death (perished)
Death (survived)
Death (died before war)
Holocaust event
Publication date

Technical Reference

This section documents key design decisions in the data model — the choices that aren't obvious from the data itself but matter for reviewing correctness and extending the model for future editions.

Why String IDs?

All entity identifiers are semantic strings like person-rachel-pomerantz or cu-ball-announcement rather than auto-incrementing integers. This choice offers several advantages:

  • Self-documenting: You can read the data without looking up IDs in a separate table
  • Prefix-based polymorphism: The UI can determine entity type from the prefix (person-, cu-, location-), enabling a single selection variable to hold any entity
  • Stability: IDs don't change when the database is rebuilt (unlike auto-increment)
  • Debuggability: Log messages and JSON output are human-readable

ID Prefix Reference

PrefixEntity TypeExample
page-Pagepage-1938-12-16-4
blk-Blockblk-5746
cu-Content Unitcu-13685
event-Eventevent-pomerantz-shor-wedding
person-Personperson-rachel-pomerantz
location-Locationlocation-pruzhany
topic-Topictopic-lifecycle-events
org-Organizationorg-kehillah-pruzhany

Why snake_case?

All field names use snake_case (e.g., holocaust_fate, unit_ids) for consistency with the SQLite database layer. Earlier versions used camelCase, but the migration to a relational database standardized on snake_case throughout the stack.

Why Separate Files?

Each data layer is stored in its own JSON file because each has a different authoring cadence:

  • Pages/Blocks: Change when OCR is re-run or corrected
  • Content Units: Change during editorial curation
  • Enrichment: Accumulates gradually through research
  • Life Events: Curated as editorial understanding deepens

Separation allows independent updates without merge conflicts. The Edition Bundle format (1938-12-16-edition.json) merges everything into a single self-contained document for distribution and validation.

Event Source Classification

Events carry a source field distinguishing their provenance:

  • edition — Events documented in the newspaper itself (weddings, bar mitzvahs, community news)
  • historical — Events added from external Holocaust research (ghetto establishment, deportations, massacres)

Events of type holocaust_event always have source: 'historical'. This classification ensures that the newspaper's own voice is never confused with retrospective knowledge.

רייזטשע און משה יודעוויטש מעלדן וועגן געבורט פון זייער טעכטערל

Rivtche and Moshe Judewicz announce the birth of their little daughter

— for Tzvia

Generated from the Pruzhany Digital Archive edition bundle. Schema version: 1.0.0.

A project of DS Media LLC