Skip to content

Document Version: 1.0
Purpose: Analyze the issue of excessively slow object movement during pnpm migrate:r2:raw-prefix execution and plan resolution strategies. Code generation is not included.


  • Execution: pnpm migrate:r2:raw-prefix (Moves R2 objects under the raw/ prefix)
  • Target: Approximately 2,190 objects (excluding raw/ and prod/)
  • Symptoms: User interruption (^C) after processing 50 objects. Estimated to take several seconds per object with sequential processing → Expected to take 1 to 3 hours or more for full completion.

StepActionNotes
1wrangler r2 object get bucket/key -f TMP_FILE --remoteDownload (R2 → Local)
2wrangler r2 object put bucket/raw/key --file TMP_FILE --remoteUpload (Local → R2)
3Delete local temporary file
4wrangler r2 object delete bucket/key --remoteDelete
  • Per object: wrangler CLI calls 3 times (get, put, delete) + local disk I/O.
  • Data Path: R2 → Local → R2 (object body traverses two network segments).
  • Execution Mode: Sequential (one object completed before next). 2,190 × (get + put + delete) ≈ 6,570 CLI calls executed sequentially.
CauseExplanation
Sequential ProcessingOnly one object processed at a time. Significant CPU and network idle time.
Double Transfer of Body DataFull download via get followed by full upload via put. Data traverses the client even for transfers within the same bucket.
CLI OverheadEach call requires launching npx wrangler + authentication/parsing. High call frequency.
Single Temporary FileProcesses one object at a time, so disk bottlenecks are relatively small, but I/O still occurs.

ApproachOverviewAdvantagesDisadvantages/Risks
A. S3 CopyObject + DeleteObjectCopies within the same bucket using S3’s CopyObject API. Body moves only within R2. Deletes original with DeleteObject (or DeleteObjects batch) after copying.Data does not pass through the client. Only CopyObject and DeleteObject are called, reducing cost and latency per call. Parallel calls possible using existing @aws-sdk/client-s3 (e.g., 20-50 concurrent). Up to 1,000 keys/request can be deleted in batches using DeleteObjects.Requires adjusting concurrency and retries to match R2/S3 rate limits.
B. Current Approach + ParallelizationMaintain wrangler get/put/delete, but process multiple objects simultaneously (e.g., 5-10).Relatively small implementation change scope.Object body double transmission and CLI overhead remain. Requires N temporary files simultaneously. Increases rate limit and local disk burden.
C. Queue + Worker DistributionSend each key as a queue message; Workers perform copy+delete.Theoretically horizontally scalable.Limited benefit for one-time migration relative to complexity. Worker CPU/time constraints, queue configuration and monitoring overhead.
Section titled “3.2 Recommended Approach: A. S3 CopyObject + DeleteObject + Parallel/Batch”
  • CopyObject (same bucket):
    • Perform only server-side copy using CopySource: bucket/key and Key: raw/key.
    • Clients handle metadata only; object bodies move only within R2.
  • Parallel Copy:
    • Execute N concurrent CopyObject operations (e.g., 20~50). Process in batch units rather than all 2,190 at once.
    • Set N considering R2 rate limits and timeouts (start small, increase if no issues).
  • Deletion:
    • Batch delete up to 1,000 keys/request using DeleteObjects (S3 Multi-Object Delete).
    • Choose policy: “Perform all copies first → verify then batch delete” or “Batch copy first, then batch delete only that batch”.
  • Utilize existing assets:
    • Use existing loadR2FileList (S3 ListObjectsV2) for list queries.
    • Authentication and endpoints remain the same as existing R2_ACCESS_KEY_ID , R2_SECRET_ACCESS_KEY , R2_ACCOUNT_ID , etc.

4. Implementation Considerations (Planning Level)

Section titled “4. Implementation Considerations (Planning Level)”
ItemDetails
CopyObject Source FormatWhen copying within the same bucket via S3 API, CopySource should be bucket/key or URL-encoded format. Specify according to SDK/documentation.
Concurrency LimitStart with concurrent CopyObject operations at 20~50. Retry with backoff and reduce concurrency if 429 (Throttling) occurs.
Deletion Timing(1) Batch delete only keys where all copies succeeded, (2) or list only keys where copy succeeded and periodically use DeleteObjects. Retry failed keys or review manually.
dry-run / —no-deleteSupported as before. dry-run outputs target keys without copying/deleting. —no-delete performs CopyObject only, skipping DeleteObjects.
Progress·LoggingOutput progress per batch. Failed keys are logged or saved to a file for re-execution or manual handling.
idempotencyIf raw/key already exists, determine policy for CopyObject behavior (overwrite or not) and whether to “skip if already in raw/“.

MetricCurrent (Sequential get/put/delete)Improvement (CopyObject+DeleteObject, Parallel·Batch)
Network per objectBody 2 times (download+upload)Body 0 times (internal server copy)
API Calls per Object3 times (get, put, delete)1 copy + delete per 1,000 keys
Execution MethodSequential 2,190 timesCopy parallel (e.g., 20~50 concurrent), Delete batch (1,000 keys/request)
Estimated Duration1–3+ hoursExpected to be minutes to 20 minutes depending on concurrency/rate limit (specific values will be adjusted after implementation/measurement).

ItemDescription
IssueCurrent migration is very slow due to sequential execution of get → put → delete per object.
CauseSequential processing, duplicate body transmission, repeated wrangler CLI calls.
RecommendationUse S3 CopyObject (same bucket) + DeleteObjects (batch), with parallel execution for CopyObject.
Next StepsDesign and implement migration script (or dedicated module) using the above approach, then validate with dry-run/small-scale testing before full execution.

This document defines speed improvement directions only; code changes will be handled as a separate task.