Dimtse
Languages Models Leaderboard How it works
Log in Start earning

Privacy Policy

Dimtse — Privacy Notice

Effective date: _to be set at launch_ · Version: v1 (draft)

Dimtse ("Dimtse", "the platform", "we", "us") is a contributor platform for building

open speech and language resources for East African languages. Dimtse is operated by

Samic Ventures LLC, a limited liability company organized in the State of Wyoming,

United States ("the operator"). This notice explains what we collect from contributors,

why, who it is shared with, and the rights you have over it. It is written to be read,

not to be skimmed past — if anything here is unclear, contact us before you contribute.

This notice covers contributors (people who record their voice, rate audio, correct

transcripts, or submit text on Dimtse). It does not cover end-users of products

built by the operator (for example the Dewul phone service), which have their own

notices.

1. What we collect

When you contribute, we collect only what the work requires:

| Data | Why we collect it |

|---|---|

| Voice recordings | The core contribution — clips of you reading prompts, used to build open speech datasets and to train speech (ASR/TTS) models. |

| Ratings | Your 1–5 scores of audio naturalness / intelligibility, used to evaluate model quality (MOS panels). |

| Transcript corrections / text | Your fixes to machine transcripts, and any text you author, used to create gold-standard training pairs. |

| Display name | Shown in your contributor profile and, if you opt in, in dataset attribution. You may use a pseudonym. |

| Language, country (optional) | To balance datasets across dialects and to report fairness metrics. |

| Payout method + payout handle | E.g. "telebirr" plus the phone number or account the money goes to, so we can pay you. This is the most sensitive item we hold — see §5. |

| Basic operational logs | Timestamps and counts of accepted items, to compute what we owe you and to prevent abuse. |

We do not collect government IDs, precise location, contacts, or any data beyond

the table above. We do not use third-party advertising trackers.

2. The two consent scopes

Before you record or rate anything, Dimtse shows you a consent screen. Your agreement

covers two distinct, clearly-stated uses, and recording does not begin until you

agree:

1. Open dataset release under CC-BY-SA-4.0. Your accepted recordings, ratings, and

text corrections may be published as part of an open dataset under the

Creative Commons Attribution-ShareAlike 4.0

license. This means others may reuse and build on the data, including commercially,

as long as they credit the source and share derivatives under the same license.

2. Use for AI model training. The same accepted contributions may be used to train,

fine-tune, and evaluate speech and language models (ASR, TTS, NLU) — including models

that the operator may license openly and commercially (for example, the voice

used by the Dewul service).

Both scopes are presented together and recorded with a version stamp so we always know

exactly what you agreed to. Agreeing to contribute means agreeing to both; if you are

not comfortable with either, please do not contribute.

3. Fair pay

Contribution on Dimtse is paid work, not a donation.

  • You are paid a stated rate per accepted item (per voice recording, per rating),
  • shown to you before you start.

  • Rates are set to be fair for the local cost of living — benchmarked to sit above
  • the applicable local minimum wage and in line with ethical research-participant norms

    (Masakhane principles), not driven to the cheapest possible number.

  • The rate that applied when you contributed is the rate you are paid; we do not
  • retroactively lower it.

  • We compute what you are owed from your accepted-item counts and pay out via the method
  • and handle you provide.

4. Who we share data with

  • Open dataset recipients (the public). Only the **de-identified, contribution
  • content** you consented to release (audio, ratings, corrected text) plus, if you opted

    in, your chosen attribution name. Never your payout handle, phone number, or any

    contact detail.

  • The operator's model-training pipeline (internal), under the consent scopes above.
  • Payment processors / mobile-money providers, only the minimum needed to pay you.
  • We do not sell your personal data, and we do not share it with advertisers.

5. How we protect your payout handle and contact details

Your payout handle is the one piece of directly-identifying data we must keep, and

we treat it accordingly:

  • It is stored encrypted at rest and is accessible only to the payout process.
  • It is never included in any dataset, model, export, or public artifact.
  • It is never released, even in aggregate.
  • Access is limited to operating the payout, and it is deleted on request (see §7).

6. De-identification

Before any contribution leaves the platform for a dataset or model, we remove direct

identifiers from text:

  • Phone numbers, email addresses, and long digit runs (account/card/ID numbers) are
  • stripped and replaced with placeholders.

  • Obvious self-introduced names in transcripts are redacted.
  • Recordings are referenced by an internal key, never by your name or number.

De-identification is applied in code, automatically, as a gate the data must pass

through — not as a manual afterthought. (For data originating from the operator's Dewul

phone service, additional, stricter rules apply: raw call audio and personal transcripts

are never publicly released — only de-identified, derived data, and only with caller

consent. See `docs/DATA_GOVERNANCE.md`.)

7. Your rights: withdrawal and deletion

You stay in control of your contributions:

  • Withdraw at any time. You may stop contributing whenever you like; this does not
  • affect pay already earned for accepted items.

  • Right to deletion. You may ask us to delete your contributions and your personal
  • details (display name, payout handle, country). On a verified request we:

  • remove your recordings, ratings, and text from our active store, and
  • purge them from all derived datasets and model-training inputs at the next build,
  • so withdrawn contributions stop flowing into future releases.

  • Important limits, stated honestly:
  • Open datasets already published under CC-BY-SA-4.0 cannot be recalled from people
  • who already downloaded them — that is inherent to an open license. We remove your

    data from the *next* release and our copies; we cannot un-distribute past copies.

  • A model that was already trained on your data before withdrawal cannot have that
  • single contribution surgically removed; we exclude your data from subsequent

    training and releases.

  • Access / correction. You may ask what we hold about you and have it corrected.

To exercise any of these, contact us (§9). We aim to respond within 30 days.

8. Retention

  • Payout handle / contact details: kept only as long as needed to pay you and meet
  • legal/accounting obligations, then deleted.

  • Contributions: kept while the project is active or until you withdraw, subject to
  • the open-license limit in §7.

  • Operational logs: kept for a limited period for accounting and abuse-prevention.

9. Contact and governing law

  • Operator: Samic Ventures LLC, Wyoming, USA.
  • Contact: privacy@dimtse.ai (placeholder — set the real address at launch).
  • Governing law: this notice is governed by the laws of the State of Wyoming, USA,
  • without regard to its conflict-of-laws rules. Where mandatory local data-protection law

    applies to you as a contributor, we honor the stronger protection.

10. African data-ethics commitments

Dimtse exists to build language resources with East African communities, not to

extract from them. We commit to:

  • Fair wage for contribution (§3).
  • Attribution option — you may choose to be credited as a contributor in released
  • datasets.

  • Community benefit — the resulting datasets are released openly (CC-BY-SA-4.0) so
  • the communities whose languages they represent can use and build on them, and

    performance is reported across dialects and genders rather than hidden.

  • No parachute research — local co-investigators share authorship, budget, and
  • governance of the resulting resources.

11. Changes

If we change this notice materially, we will update the version and effective date and

notify active contributors. Your existing consent record always reflects the version you

agreed to.

← Back to Dimtse

Dimtse
Voice for African languages — built by Samic Ventures LLC.
Languages Models Leaderboard Privacy Terms Contact
© 2026 Samic Ventures LLC. Contributor data is consented, fairly paid, and de-identified.