Data Connection

GCP Connector (Google Cloud Storage)

The GCP Connector enables Replenit to ingest data directly from Google Cloud Storage (GCS). Data is accessed securely via a service account and processed within Replenit's ingestion layer.

The ingestion layer parses, normalizes, and unifies data for downstream processing.

Data Sources overview showing Google Cloud Storage connections for Customer, Product, and Order entities

How It Works

Supported Ingestion Modes

HISTORICALHistorical Ingestion

One-time full data load
Used to initialize datasets

ONGOINGOngoing Ingestion

Scheduled ingestion (e.g. daily, hourly)
Incremental updates

Requirements

Data Entities

Replenit requires three core datasets:

Entity	Description
Customer	User-level data
Order	Transaction data
Product	Catalog data

1Prepare Data

Datasets should represent the full customer lifecycle and be linkable via identifiers.

Data Format

Preferred: Parquet

Replenit recommends Parquet format.

Efficient for large-scale data
Schema consistency
Faster ingestion

Supported Formats

JSON (including nested payloads)
CSV

Data Flexibility

Replenit supports any data structure:

Flat tables or nested JSON
No strict limit on fields
Additional attributes are allowed

Replenit performs:

ParsingMappingNormalizationEnrichment

ℹ️Data can be API-aligned or provided as raw batch data. Replenit handles ETL.

Data Structure

gs://your-bucket/customers/
gs://your-bucket/orders/
gs://your-bucket/products/

Suggested Dataset Structure

Customer Dataset (Preferred: Parquet)

Example file:

customers_date=YYYYMMDD.parquet

Required Fields

Field
customer_id

Recommended Fields

Field
email
created_at
updated_at
country
city

Order Dataset (JSON / CSV / Parquet)

Example column:

transaction_data_json_payload

Example Structure

{
  "identifiers": {
    "userId": 3253833,
    "email": "user@example.com"
  },
  "transaction": {
    "orderId": "12345",
    "orderDate": "2025-01-01T10:00:00Z",
    "totalAmount": 120,
    "currency": "EUR"
  },
  "products": [
    {
      "productId": "SKU-001",
      "price": 60,
      "quantity": 2
    }
  ]
}

Required Logical Fields

Field	Example Path
order_id	transaction.orderId
customer_id	identifiers.userId
order_date	transaction.orderDate

Recommended Fields

Field	Example Path
total_amount	transaction.totalAmount
currency	transaction.currency
product_id	products[].productId
quantity	products[].quantity

Product Dataset (JSON / CSV / Parquet)

Example column:

product_data_json_payload

Example Structure

{
  "productId": "UMT-U180065",
  "taxonomy": ["Bath", "Care"],
  "brand": "BrandX",
  "price": 49.99,
  "currency": "EUR"
}

Required Logical Fields

Field	Example Path
product_id	productId

Recommended Fields

Field	Example Path
category	taxonomy[]
brand	brand
price	price

Data Relationships

Customer (customer_id / identifiers.userId)

Order (customer_id + product_id)

Product (product_id)

Key Requirements

customer_id must be consistent across datasets
product_id must match between orders and products
Timestamp format should be ISO 8601

Recommended Data Scope

Up to 24 months of historical data
Full product catalog
Complete order history

2Create Service Account

1
Go to Google Cloud Console → IAM & Admin → Service Accounts
2
Create a service account
3
Assign role: Storage Object Viewer
4
Create JSON key
5
Download the key file

3Grant Access

Grant access to your bucket:

Role: Viewer
Scope: Bucket

4Add Data Source

You can configure your data sources and provide access under the Health & Data Management section in your Replenit panel.

Add Data Source modal showing Entity, Directory Address, Bucket Name, and Credential fields

Configuration Fields

Field	Description
Entity	Customer / Order / Product
Entity Directory Address	customers / orders / products
Bucket Name	GCS bucket
Credential	JSON key file

ℹ️Repeat the data source configuration for each entity: Customer, Order, and Product.

5Verify Data Sources

After configuration, verify that:

All sources are Active
Correct directories are mapped

6Historical Data Load

1
Go to Automation Jobs
2
Start historical job
3
Select data source
4
Run

7Ongoing Sync

Configure daily job
Enable incremental ingestion

8Monitoring

Field	Description
Status	Completed / Failed
TransferredFileCount	Files processed
FailedFileCount	Errors
Last Run	Timestamp

Automation Jobs panel showing historical, one-time, and daily job statuses with execution details

Expected Timeline

Task	Time
Data preparation	4–8 hours
Access setup	2–4 hours
Configuration	2–4 hours
Historical ingestion	4–6 hours
Validation	3–4 hours
Total	13–24 hours

Common Issues

Issue	Cause
Access denied	Missing IAM roles
No data	Incorrect directory
Job failure	Schema mismatch
Missing data	ID mismatch

Security

Service Account JSON authentication
Read-only access supported
No modification to source data

Summary

Direct ingestion from GCS
Parquet preferred, JSON/CSV supported
Flexible schema handling
Replenit performs ETL and normalization

Need help or have questions?

Our team is ready to assist you. Reach out to us at support@replen.it

Email Support