S3 Raw Data Integration

Use this integration if you store raw user-level data in S3 and want to send it to Magify.

Before You Start

Before setting up the integration, make sure you have:

  • access to your S3 bucket with source data
  • the sync script provided by the Magify team
  • access to the Magify S3 bucket

To receive the script and destination access, contact Magify Support.


How the Integration Works

This integration transfers raw user-level data from your S3 storage to a Magify S3 bucket on a regular schedule using a script.

The script runs on your side in a stateful environment (for example, a virtual machine or a container with persistent storage) on a schedule (for example, via cron) and is not a one-time import.

On each run, the script:

  • connects to your S3 using access and secret keys
  • reads raw user-level data
  • applies filtering rules (for example, by app)
  • converts data into .csv.gz format
  • uploads files to the Magify S3 bucket

Magify provides access to its S3 bucket, which is used as the destination in the script configuration.


Requirements

To run the integration, you need a runtime environment for the script and access to both S3 storages.

Runtime Environment

The script must run in a stateful environment:

  • virtual machine
  • server
  • container with a mounted volume/PVC

If you use Kubernetes, you can run the script in a container. In this case, you need to attach a persistent volume (PVC) to store the state file or a database (for example, sqlite).


Task Scheduling

The script should be run on a schedule (for example, via cron).

Minimum frequency:

  • at least once per day (for example, at night or the next morning)

S3 Access

You need access to:

  • your source S3 bucket with raw data
  • the Magify S3 bucket (provided with the script)

Data Format

Source (your S3)

Data can be stored in:

  • Parquet
  • CSV
  • other tabular formats

Data must be user-level.

Typical data:

  • MMP events (AppsFlyer Data Locker, Adjust Raw Data, and others) that include media source breakdown and attribution fields

The script reads source files directly and can filter data by fields (for example, app_id, dates, and others).


Destination (Magify S3)

Format:

  • .csv.gz (single compression)
  • .gz.csv.gz is not recommended

Structure:

  • a table with columns matching a raw MMP export (for example, AppsFlyer CSV)

Data must remain user-level without pre-aggregation.

Magify uses the data to load it into the analytics system and calculate metrics by media source.


Script Configuration

To get the script, contact Magify Support.

Filtering

"filter": {
  "column": "...",
  "mode": "equals",
  "value": "..."
}

Used to:

  • select data for a specific app_id
  • limit the export period

Source S3

"source": {
  "bucket": "...",
  "access_key_id": "...",
  "secret_access_key": "...",
  "region": "us-east-1",
  "endpoint_url": null
}

Parameters:

  • bucket — S3 bucket name
  • access_key_id / secret_access_key — credentials with read access
  • region — bucket region
  • endpoint_url — endpoint (if applicable)

Destination S3

Configured the same way as source, using parameters provided by Magify.


Workers

"max_workers": 8

Number of workers used for data processing. Can be increased depending on available resources.


last_sync_at

"last_sync_at": "..."

Stores the timestamp of the last synchronization and is used to track sync progress.


Sync Frequency

The schedule is defined on your side (for example, via cron) and depends on your data freshness requirements.

Recommended logic:

  • daily runs — for previous-day data
  • weekly runs — for periodic updates

Testing

Before enabling regular data export, test the setup.

Test Run

  • use a test S3 bucket (if available) or the Magify bucket
  • run the script and verify that the pipeline completes without errors
  • run a short test using a limited period (for example, a few days of historical data)

Data Validation

Magify checks:

  • .csv.gz file format
  • correctness of CSV structure

You can also validate data locally (for example, using pandas).


Cleanup

After testing, you can request cleanup of data in the Magify S3 bucket.


Initial Data Load

For the initial load, send data starting from the point when the Magify SDK is already used in your app and a correct UserId is passed.

For the first runs, limit the data to a few days.

After validation, you can:

- backfill historical data

- and/or switch to regular daily exports as the version with the correct UserId is rolled out

Related articles

Smadex

Verve

Chartboost

MobileFuse

TikTok

Yandex