S3 Raw Data Integration
Use this integration if you store raw user-level data in S3 and want to send it to Magify.
Before You Start
Before setting up the integration, make sure you have:
- access to your S3 bucket with source data
- the sync script provided by the Magify team
- access to the Magify S3 bucket
To receive the script and destination access, contact Magify Support.
How the Integration Works
This integration transfers raw user-level data from your S3 storage to a Magify S3 bucket on a regular schedule using a script.
The script runs on your side in a stateful environment (for example, a virtual machine or a container with persistent storage) on a schedule (for example, via cron) and is not a one-time import.
On each run, the script:
- connects to your S3 using access and secret keys
- reads raw user-level data
- applies filtering rules (for example, by app)
- converts data into
.csv.gzformat - uploads files to the Magify S3 bucket
Magify provides access to its S3 bucket, which is used as the destination in the script configuration.
Requirements
To run the integration, you need a runtime environment for the script and access to both S3 storages.
Runtime Environment
The script must run in a stateful environment:
- virtual machine
- server
- container with a mounted volume/PVC
If you use Kubernetes, you can run the script in a container. In this case, you need to attach a persistent volume (PVC) to store the state file or a database (for example, sqlite).
Task Scheduling
The script should be run on a schedule (for example, via cron).
Minimum frequency:
- at least once per day (for example, at night or the next morning)
S3 Access
You need access to:
- your source S3 bucket with raw data
- the Magify S3 bucket (provided with the script)
Data Format
Source (your S3)
Data can be stored in:
- Parquet
- CSV
- other tabular formats
Data must be user-level.
Typical data:
- MMP events (AppsFlyer Data Locker, Adjust Raw Data, and others) that include media source breakdown and attribution fields
The script reads source files directly and can filter data by fields (for example, app_id, dates, and others).
Destination (Magify S3)
Format:
.csv.gz(single compression).gz.csv.gzis not recommended
Structure:
- a table with columns matching a raw MMP export (for example, AppsFlyer CSV)
Data must remain user-level without pre-aggregation.
Magify uses the data to load it into the analytics system and calculate metrics by media source.
Script Configuration
To get the script, contact Magify Support.
Filtering
"filter": {
"column": "...",
"mode": "equals",
"value": "..."
}
Used to:
- select data for a specific
app_id - limit the export period
Source S3
"source": {
"bucket": "...",
"access_key_id": "...",
"secret_access_key": "...",
"region": "us-east-1",
"endpoint_url": null
}
Parameters:
bucket— S3 bucket nameaccess_key_id/secret_access_key— credentials with read accessregion— bucket regionendpoint_url— endpoint (if applicable)
Destination S3
Configured the same way as source, using parameters provided by Magify.
Workers
"max_workers": 8
Number of workers used for data processing. Can be increased depending on available resources.
last_sync_at
"last_sync_at": "..."
Stores the timestamp of the last synchronization and is used to track sync progress.
Sync Frequency
The schedule is defined on your side (for example, via cron) and depends on your data freshness requirements.
Recommended logic:
- daily runs — for previous-day data
- weekly runs — for periodic updates
Testing
Before enabling regular data export, test the setup.
Test Run
- use a test S3 bucket (if available) or the Magify bucket
- run the script and verify that the pipeline completes without errors
- run a short test using a limited period (for example, a few days of historical data)
Data Validation
Magify checks:
.csv.gzfile format- correctness of CSV structure
You can also validate data locally (for example, using pandas).
Cleanup
After testing, you can request cleanup of data in the Magify S3 bucket.
Initial Data Load
For the initial load, send data starting from the point when the Magify SDK is already used in your app and a correct UserId is passed.
For the first runs, limit the data to a few days.
After validation, you can:
- backfill historical data
- and/or switch to regular daily exports as the version with the correct UserId is rolled out