Data Connection

GCP Connector (Google Cloud Storage)

The GCP Connector enables Replenit to ingest data directly from Google Cloud Storage (GCS). Data is accessed securely via a service account and processed within Replenit's ingestion layer.

The ingestion layer parses, normalizes, and unifies data for downstream processing.

Data Sources overview showing Google Cloud Storage connections for Customer, Product, and Order entities

How It Works

Supported Ingestion Modes

HISTORICALHistorical Ingestion

  • One-time full data load
  • Used to initialize datasets

ONGOINGOngoing Ingestion

  • Scheduled ingestion (e.g. daily, hourly)
  • Incremental updates

Requirements

Data Entities

Replenit requires three core datasets:

EntityDescription
CustomerUser-level data
OrderTransaction data
ProductCatalog data

1Prepare Data

Datasets should represent the full customer lifecycle and be linkable via identifiers.

Data Format

Preferred: Parquet

Replenit recommends Parquet format.

  • Efficient for large-scale data
  • Schema consistency
  • Faster ingestion

Supported Formats

  • JSON (including nested payloads)
  • CSV

Data Flexibility

Replenit supports any data structure:

  • Flat tables or nested JSON
  • No strict limit on fields
  • Additional attributes are allowed

Replenit performs:

ParsingMappingNormalizationEnrichment

ℹ️Data can be API-aligned or provided as raw batch data. Replenit handles ETL.

Data Structure

gs://your-bucket/customers/
gs://your-bucket/orders/
gs://your-bucket/products/

Suggested Dataset Structure

Customer Dataset (Preferred: Parquet)

Example file:

customers_date=YYYYMMDD.parquet

Required Fields

Field
customer_id

Recommended Fields

Field
email
created_at
updated_at
country
city

Order Dataset (JSON / CSV / Parquet)

Example column:

transaction_data_json_payload

Example Structure

{
  "identifiers": {
    "userId": 3253833,
    "email": "user@example.com"
  },
  "transaction": {
    "orderId": "12345",
    "orderDate": "2025-01-01T10:00:00Z",
    "totalAmount": 120,
    "currency": "EUR"
  },
  "products": [
    {
      "productId": "SKU-001",
      "price": 60,
      "quantity": 2
    }
  ]
}

Required Logical Fields

FieldExample Path
order_idtransaction.orderId
customer_ididentifiers.userId
order_datetransaction.orderDate

Recommended Fields

FieldExample Path
total_amounttransaction.totalAmount
currencytransaction.currency
product_idproducts[].productId
quantityproducts[].quantity

Product Dataset (JSON / CSV / Parquet)

Example column:

product_data_json_payload

Example Structure

{
  "productId": "UMT-U180065",
  "taxonomy": ["Bath", "Care"],
  "brand": "BrandX",
  "price": 49.99,
  "currency": "EUR"
}

Required Logical Fields

FieldExample Path
product_idproductId

Recommended Fields

FieldExample Path
categorytaxonomy[]
brandbrand
priceprice

Data Relationships

Customer (customer_id / identifiers.userId)
Order (customer_id + product_id)
Product (product_id)

Key Requirements

  • customer_id must be consistent across datasets
  • product_id must match between orders and products
  • Timestamp format should be ISO 8601

Recommended Data Scope

  • Up to 24 months of historical data
  • Full product catalog
  • Complete order history

2Create Service Account

  1. 1

    Go to Google Cloud Console → IAM & Admin → Service Accounts

  2. 2

    Create a service account

  3. 3

    Assign role: Storage Object Viewer

  4. 4

    Create JSON key

  5. 5

    Download the key file

3Grant Access

Grant access to your bucket:

  • Role: Viewer
  • Scope: Bucket

4Add Data Source

You can configure your data sources and provide access under the Health & Data Management section in your Replenit panel.

Add Data Source modal showing Entity, Directory Address, Bucket Name, and Credential fields

Configuration Fields

FieldDescription
EntityCustomer / Order / Product
Entity Directory Addresscustomers / orders / products
Bucket NameGCS bucket
CredentialJSON key file

ℹ️Repeat the data source configuration for each entity: Customer, Order, and Product.

5Verify Data Sources

After configuration, verify that:

  • All sources are Active
  • Correct directories are mapped

6Historical Data Load

  1. 1

    Go to Automation Jobs

  2. 2

    Start historical job

  3. 3

    Select data source

  4. 4

    Run

7Ongoing Sync

  • Configure daily job
  • Enable incremental ingestion

8Monitoring

FieldDescription
StatusCompleted / Failed
TransferredFileCountFiles processed
FailedFileCountErrors
Last RunTimestamp
Automation Jobs panel showing historical, one-time, and daily job statuses with execution details

Expected Timeline

TaskTime
Data preparation4–8 hours
Access setup2–4 hours
Configuration2–4 hours
Historical ingestion4–6 hours
Validation3–4 hours
Total13–24 hours

Common Issues

IssueCause
Access deniedMissing IAM roles
No dataIncorrect directory
Job failureSchema mismatch
Missing dataID mismatch

Security

  • Service Account JSON authentication
  • Read-only access supported
  • No modification to source data

Summary

  • Direct ingestion from GCS
  • Parquet preferred, JSON/CSV supported
  • Flexible schema handling
  • Replenit performs ETL and normalization

Need help or have questions?

Our team is ready to assist you. Reach out to us at support@replen.it

Email Support