Anonymization

PHI Anonymization

PHI anonymization is a critical process for protecting Protected Health Information from unauthorized access and misuse. It transforms healthcare data so it can no longer identify individuals, even when combined with other available data.

This ensures that sensitive patient information such as names, addresses, medical record numbers, diagnosis details, or biometric identifiers are either removed or altered to preserve privacy.

Vertical Audio Flow Component

Audio Input

Original voice recording

Transcription

Convert to text

PHI Detection

Identify sensitive data

Anonymization

Replace with tones

Synthesis

Generate safe audio

PHI Benefits Component

Benefits of PHI anonymization

Enables secure and compliant storage of sensitive voice and text data

Supports legal and ethical data sharing with third parties

Facilitates AI model training and analytics without compromising patient privacy

Preserves the utility of original data for innovation and research

Enhances patient trust by protecting identity and sensitive health details

Anonymization Example Component

Real-World Anonymization Example

Original Transcription

"Hi, this is Dr. Emily Carter, calling regarding patient Michael Reynolds, date of birth 07/09/1982, who had a follow-up scheduled at Stanford Medical Center for March 15th, 2025. His insurance ID is AETN-4455-2391."

Anonymized Output

"Hi, this is <tone>, calling regarding patient <tone>, date of birth <tone>, who had a follow-up scheduled at <tone> for <tone>. His insurance ID is <tone>."

→

Identified PHI entities:

Clean Blue Features Grid

Secure Storage

Enables secure and compliant storage of sensitive voice and text data.

AI Training

Facilitates AI model training without compromising patient privacy.

Data Utility

Preserves the utility of original data for innovation and research.

PHI Tabbed Interface

HIPAA-defined PHI Identifiers

1

Name

John Doe, Mary Smith

2

Geographic data (smaller than a state)

123 Main St, Springfield, IL 62704

3

Dates related to individual (excluding year)

Birthdate: 05/14/1982, Discharge Date: 09/22

4

Telephone numbers

(555) 123-4567

5

Fax numbers

(555) 987-6543

6

Email addresses

johndoe@example.com

7

Social Security numbers

123-45-6789

8

Medical record numbers

MRN: 567890123

9

Health plan beneficiary numbers

Medicare ID: A123456789

10

Account numbers

Patient account: 00349876

11

Certificate/license numbers

Driver's license: D1234567

12

Vehicle identifiers and serial numbers

License plate: XYZ-789, VIN: 1HGCM82633A004352

13

Device identifiers and serial numbers

Pacemaker SN: 1029384756

14

Web URLs

www.patientportal.com/johndoe

15

IP address numbers

192.168.1.1

16

Biometric identifiers (incl. voiceprints)

Voiceprint used for patient ID; Fingerprint scan

17

Full-face photos and comparable images

Passport photo, ID card image

18

Any unique identifying code or characteristic

Internal system ID: X5A-KL89-P1

Nijta's Unique Advantage - Voiceprint Protection

Voiceprints are explicitly listed under biometric identifiers and are protected PHI under HIPAA.
They can uniquely identify individuals and must be anonymized before data is reused, shared, or stored.

‍This is a core feature of our platform , we detect and anonymize voiceprints automatically in audio to ensure full HIPAA compliance.

Glass UI Language Support - White Background

Supported Languages

Spanish

3.0%

WER Score

Italian

4.0%

WER Score

English

4.2%

WER Score

German

4.5%

WER Score

French

7.1%

WER Score

Speech-to-Text

Speech-to-Text or Automatic Speech Recognition (ASR) is the process of converting speech or audio into written text. Our advanced ASR technology is built on the robust foundation of OpenAI's Whisper, known for its exceptional performance in multilingual speech recognition. However, we've significantly enhanced its capabilities with in-house innovations, including the implementation of phonetic time-stamps. These detailed markers provide an extra layer of precision by capturing the timing of specific phonetic elements within the audio, enabling more granular analysis and synchronization.

Our ASR component supports multiple languages and has robust code-switching capabilities, as it effortlessly transcribes audio that blends various languages. Its built-in automatic language detection ensures that users do not have to manually specify the language, streamlining the workflow, while precise time-stamps allow for easy navigation and review of audio content.

Benchmarks for Top 10 Supported Languages

Benchmarks for top 10 supported languages

Rank	Language	WER (%) on FLEURS
1	Spanish	3.0
2	Italian	4.0
3	English	4.2
4	Portuguese	4.3
5	German	4.5
6	Japanese	5.0
7	Polish	5.6
8	Russian	5.6
9	Dutch	6.1
10	Indonesian	6.4

Speaker Diarization

Speaker diarization is the process of identifying and segmenting individual speakers within an audio recording. It plays a crucial role in scenarios where multiple participants are involved, such as meetings, interviews, or call center conversations. By accurately distinguishing between speakers, diarization helps in creating clear, organized transcripts, improving sentiment analysis, and enhancing audio data analytics. This technology is particularly valuable for compliance, customer experience monitoring, and research purposes, where understanding who said what is essential for accurate analysis and reporting.

We are excited to unveil Monster, our new speaker diarization system, Nijta’s latest innovation in audio segmentation that redefines both accuracy and efficiency. This first release marks a major step forward in speaker diarization, offering precise segmentation, multilingual support, and robust performance on your noisy data. From medical conversations to customer service calls and teleconferences, our advanced model adapts to various acoustic conditions, ensuring high accuracy where other diarization systems fall short.

Features Section

Precise speaker diarization

Seamless handling of overlapping speech

Robust performance across varied acoustic environments

Unlimited number of speakers

Biometric Anonymization

Biometric anonymization is the process of altering or removing unique vocal characteristics to prevent speaker re-identification while preserving the usability of the audio. Since voice carries biometric markers such as pitch, tone, and speech patterns, anonymization techniques ensure that speech data remains valuable for transcription, analytics, and AI training without compromising individual privacy. A pseudo voice with a choice of gender is created from a large random pool of speakers to completely prevent re-identification.

This feature is currently available in English and French. Additional languages could be requested for your particular use case.

Legal guarantees

The biometric anonymization solution of Nijta is legally guaranteed by the French Data Protection Authority, CNIL, on its effectiveness based on the following criteria derived from Opinion 05/2015 of the Article 29 Working Party. These factors determine whether anonymized voice data can still be traced back to an individual, influencing compliance with privacy regulations like GDPR and HIPAA.

Single Guarantee Card

Linkability

Linkability refers to the ability to connect anonymized biometric voice data across different datasets. If anonymized voice recordings can still be matched to an individual by comparing them with other available data, they may not meet strict anonymization standards. Effective biometric anonymization ensures that no meaningful links can be established between datasets.

Single Guarantee Card

Singling Out

Occurs when an individual can be uniquely identified within a dataset, even without knowing their name. In voice data, if certain speech patterns or biometric markers remain distinguishable, an attacker could isolate a speaker and potentially re-identify them. True anonymization removes or masks these unique traits to prevent this risk.

Single Guarantee Card

Inference

Involves the possibility of deducing sensitive information about an individual from anonymized voice data. For example, if anonymization removes direct identifiers but retains enough characteristics, machine learning models could still infer speaker identity or demographic details. Strong anonymization techniques ensure that no personally revealing attributes can be reconstructed from the processed data.

Voice data passed through Nijta’s biometric anonymization solution cannot be re-identified using any of these methods, ensuring complete safety and compliance.

Set your data free
Start using Nijta.

Anonymise your first voice recording in minutes.

Watch a demo

ANONYMIZATION

PHI Anonymization

Benefits of PHI anonymization

Real-World Anonymization Example

HIPAA-defined PHI Identifiers

Nijta's Unique Advantage - Voiceprint Protection

Supported Languages

Speech-to-Text

Benchmarks for top 10 supported languages

Speaker Diarization

Biometric Anonymization

Legal guarantees

Linkability

Singling Out

Inference

Set your data freeStart using Nijta.

Set your data free
Start using Nijta.