Event & Case Clustering

Clustering helps make SESAMm’s ESG data cleaner and easier to use. By grouping articles about the same topic, we reduce duplication and highlight only the most relevant, high-quality content. This structure helps you follow the evolution of ESG issues across time—without getting lost in dozens of repetitive articles.

How It Works (In a Nutshell)

Clean up: We filter out noise and irrelevant articles.
Summarize: An AI model reads each article and creates short summaries of the controversy.
Group: Articles are grouped into:
1. Events: short-term developments
2. Cases: longer-term stories made up of multiple events
Simplify: Each group gets a clean title and summary so you can quickly understand what’s going on.

You’ll only see the most relevant documents and a clear timeline of how each ESG topic unfolds.

The Step-by-Step Process

1. Pre-Processing: Removing the Noise

Before grouping articles, we filter out low-quality or duplicate content, keeping only the most important article from each set. We also use AI to check whether the article actually describes a valid ESG controversy, ensuring we’re only clustering meaningful content.

2. Document Insights

Once articles are filtered, our AI generates:

Main story summary: This summary provides a comprehensive overview of the ESG event or controversy. It consolidates key contextual information to give users a complete picture of the ongoing situation.
Novelty summary: This summary focuses exclusively on new insights or developments related to the broader ESG event. It highlights what is new or updated compared to what is already known.

By generating these two summaries, the module enables users to understand the full context of an ESG situation and to detect new developments as they emerge.

3. Creating Embeddings

From there, we turn the summaries into “embeddings”—mathematical representations that help us compare the meaning of different texts. We use the:

Main Story for Case grouping
Main Story + Novelty for Event grouping

This allows us to detect when different documents are really talking about the same thing, even across languages.

4. Clustering into Cases and Events

We group documents into:

Cases: We use a clustering algorithm to group articles into bigger, long-running topics. If the content is closely related, it’s grouped even if published months apart.
Events: Inside each Case, we create smaller clusters of articles. At this point, a temporal component is introduced. Documents that are closer in time are more likely to be grouped together, while documents that are too far apart will generally be split into separate Events.

This ensures that Events remain time-bounded and reflect actual developments in an ESG controversy. For example, it prevents a strike in 2022 from being grouped with a similar strike in 2024.

5. Clean Titles & Summaries

Once the clustering is done:

Cases get a clear title and summary, created by analyzing the most representative events.
Events get their own summaries so you can quickly understand what happened and when.
Other metadata, including the intensity score, ESG risk, and ESG sub-risk, are added.

Note: Structural constraints

Each Event is strictly associated with one and only one Case.
Each Event and each Case is also associated with a single entity.

This guarantees clarity and consistency in how events are tracked and reported across the system.