Skip to content

S3 object storage (AWS, Minio, Wasabi, Linode, etc.)

S3

Lyftdata supports reading from and writing to S3 compatible object storage (AWS, Minio, Wasabi, Linode, etc).

Configure Lyftdata to read data from S3

Add Lyftdata input s3 to a job and configure:

  • Bucket Name: The storage service container for created blobs
  • Object Names: The objects names. If we are listing these are search prefixes!
  • Region: S3 Region
  • Endpoint: S3 Endpoint, default: https://s3.amazonaws.com
  • Access Key: Access Key or username
  • Secret Key: Secret Key or password
  • Security Token: Security Token
  • Session Token: Session Token
  • Role Arn: The role ARN for the Assumed Role using above credentials
  • Object Name Field: The field that a name from an operation should be stored in
  • Creation Time Field: The field that the blob creation time should be stored in
  • Last Modified Field: The field that the blob last modified time should be stored in
  • Content Length Field: The field that the blob content length information should be stored in
  • Content Type Field: The field that the blob content type information should be stored in
  • Etag Field: The field that the object ETag should be stored in
  • Data Field: A field that the object data should be nested in

Configure Lyftdata to write data to from S3

Add Lyftdata output s3 to a job and configure:

  • Bucket Name: The storage service container for created blobs
  • Object Names: The objects names. If we are listing these are search prefixes!
  • Region: S3 Region
  • Endpoint: S3 Endpoint, default: https://s3.amazonaws.com
  • Access Key: Access Key or username
  • Secret Key: Secret Key or password
  • Security Token: Security Token
  • Session Token: Session Token
  • Role Arn: The role ARN for the Assumed Role using above credentials

Recommendations for files and folders

  • each file should not be larger than 100MB-150MB (compressed gzip or parquet)
  • Y/M/ or Y/M/D/ or Y/M/D/H/ folders

Create S3 bucket in AWS console

  • Bucket and objects not public
  • Block all public access: On
  • ACLs are disabled. All objects in this bucket are owned by this account. Access to this bucket and its objects is specified using only policies.
  • create IAM user and policy (see below)
  • do not enable console access
  • attach policy directly
  • create policy in separate window
  • attach policy to new user

AWS user policy for programmatic access

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": [
"arn:aws:s3:::*BUCKETNAME",
"arn:aws:s3:::*BUCKETNAME/*"
]
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject"
],
"Resource": "arn:aws:s3:::*BUCKETNAME/*"
}
]
}