Background
We manage Teramind deployments for enterprises across Turkey as an authorized Teramind partner. One of our largest on-site deployments runs approximately 1,500 active users on a dedicated Teramind server.
In late May 2026, this environment began exhibiting two simultaneous symptoms that quickly escalated to a critical incident.
The Symptoms
1. tmsrv Crash Loop
The Teramind server process (tmsrv) was crashing and restarting repeatedly. Each restart was preceded by a SIGABRT signal — meaning the process was deliberately aborting itself rather than hitting an external kill signal.
2. work_time Under-Reporting
Employee work time data in the dashboards was severely under-reported. Sessions that clearly showed activity were logging zero or near-zero productive time. At first glance this looked like a monitoring policy misconfiguration.
Both symptoms appeared together, which was the first signal that they shared a common cause.
Initial Response: TMU 878
We opened a case with Teramind support. Their recommendation was to apply TMU 878, a maintenance update that addressed several tmsrv stability issues.
We applied TMU 878. The crash loop continued.
The update did not change the behavior at all. This meant the root cause was something TMU 878 was not designed to address — and we needed to find it ourselves.
Tracing the Root Cause
We pulled a core dump from the crashed tmsrv process and analyzed the stacktrace.
The crash originated in BackgroundWorker::threadFunc(), which called flush_log_and_abort():
#4 flush_log_and_abort()
#5 teramind::server::BackgroundWorker::threadFunc()
#6 libboost_thread.so.1.74.0
#7 start_thread
flush_log_and_abort() is an internal Teramind function that flushes pending log/data writes and then calls abort(). It is triggered when the process encounters a state it considers unrecoverable — typically a database write failure or a constraint violation.
We turned our attention to the database.
The Integer Overflow
Inspecting the PostgreSQL schema, we checked the column types on the highest-traffic tables. The mon_mail_attachment table stores every email attachment event captured by Teramind agents.
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'mon_mail_attachment'
AND column_name = 'mon_mail_attachment_id';
Result: integer — a signed 32-bit integer with a maximum value of 2,147,483,647.
We then checked the current sequence value:
SELECT last_value FROM mon_mail_attachment_mon_mail_attachment_id_seq;
The sequence had hit the ceiling. Every new INSERT into mon_mail_attachment was failing with an integer overflow, BackgroundWorker was catching the unrecoverable error and calling flush_log_and_abort(), and tmsrv was restarting — only to crash again on the next email attachment event.
The work_time under-reporting was a side effect: when the background worker responsible for data persistence crashed mid-cycle, it also dropped the buffered productivity metrics for that window.
At 1,500 users with active email monitoring, this table accumulates roughly 2–5 million rows per month. The 32-bit limit was always going to be hit. It just took long enough that no one had encountered it yet.
The Fix
TMU 878 did not alter the data type of mon_mail_attachment_id. The fix was a one-line schema migration:
ALTER TABLE mon_mail_attachment
ALTER COLUMN mon_mail_attachment_id TYPE bigint;
bigint is a signed 64-bit integer with a maximum value of 9,223,372,036,854,775,807 — effectively unlimited for any realistic Teramind deployment.
We applied this on May 31, 2026 at approximately 22:00 (after business hours, with no agent downtime required — PostgreSQL executes this as a metadata-only operation on modern versions, with no full table rewrite).
Immediate result:
- tmsrv crash loop stopped
- Agent connections stabilized within minutes
- work_time data returned to normal on the next reporting cycle
Why No Downtime?
On PostgreSQL 12+, changing an integer column to bigint is a catalog-only change when the column has no check constraints that reference the type range. PostgreSQL does not rewrite the table on disk — it simply updates the type metadata. This means the migration completes in milliseconds regardless of table size and does not require a maintenance window.
The Near-Miss
The fix was applied on Saturday, May 31. Monday, June 2 was the first business day after a public holiday.
Had we not caught and resolved this proactively, the customer would have opened their Teramind dashboards on Monday morning — the first working day after a long weekend — to find:
- All agent data missing for the entire holiday period
- Productivity reports showing zero values
- Behavior alert triggers misfiring due to missing data
For a 1,500-user deployment with management relying on these dashboards, that would have been a serious escalation. The timing made it worse: a support ticket opened on Monday morning after a holiday would have taken hours to reach the right people.
Preventing the Same Issue on Other Deployments
If you run an on-site Teramind deployment with significant email monitoring volume, check whether your mon_mail_attachment_id column is still typed as integer:
SELECT
column_name,
data_type,
(SELECT last_value
FROM mon_mail_attachment_mon_mail_attachment_id_seq) AS current_seq,
2147483647 AS int_max,
ROUND(
(SELECT last_value FROM mon_mail_attachment_mon_mail_attachment_id_seq)::numeric
/ 2147483647 * 100, 2
) AS pct_used
FROM information_schema.columns
WHERE table_name = 'mon_mail_attachment'
AND column_name = 'mon_mail_attachment_id';
If data_type is integer and pct_used is above 70%, apply the migration before your next high-volume period:
ALTER TABLE mon_mail_attachment
ALTER COLUMN mon_mail_attachment_id TYPE bigint;
The same risk applies to any other high-volume event table that uses a 32-bit sequence. Tables worth auditing on busy deployments:
| Table | Risk Factor |
|---|---|
mon_mail_attachment |
High (every attachment = 1 row) |
mon_web_file |
Medium–High (file uploads/downloads) |
mon_keystroke |
High on typing-intensive deployments |
mon_screen |
Medium (screenshot events) |
Data Growth Management
As a complementary measure, we deploy an automated cleanup daemon on large on-site installations. The daemon runs daily and removes event data older than 12 months using Teramind's built-in tm.pl utility:
/usr/local/teramind/scripts/tm.pl \
-func remove_user_data_ex \
-keep_months 12 \
-no_disk_check
The -no_disk_check flag is important on Master servers that proxy to Terabi nodes — without it the script may abort with a disk space check error even when space is not the concern.
This does not eliminate the need for the bigint migration (12 months of data at 1,500 users still generates hundreds of millions of rows), but it prevents unbounded growth and keeps database performance healthy long-term.
What We Reported to Teramind
Following the incident, we sent Teramind support a detailed write-up. The key points:
- TMU 878 did not address this issue. The
integer → bigintmigration formon_mail_attachment_idwas not part of the update. - We recommend this migration be included in an official TMU release so deployments are patched through the standard update path.
- Existing on-site deployments should be audited proactively — any customer that has been running Teramind with heavy email monitoring for several years is potentially at risk.
Summary
| Symptom | tmsrv crash loop + work_time under-reporting |
| Root cause | mon_mail_attachment_id hit 32-bit integer max (2,147,483,647) |
| Vendor fix | TMU 878 — did not resolve the issue |
| Actual fix | ALTER COLUMN mon_mail_attachment_id TYPE bigint |
| Downtime | Zero (PostgreSQL catalog-only operation) |
| Affected users | ~1,500 |
| Time to fix | ~15 minutes after root cause identified |
If you manage a Teramind on-site deployment and this looks familiar, run the audit query above. The migration is safe, fast, and permanent.
For Teramind deployments in Turkey — licensing, setup, or ongoing management — get in touch.