Amazon some public s3 buckets with all data TPCH data (1TB corpus).
Yes, you can generate it your self, but you can also just use data already there in amazon. this will save you some time.
To use the 1TB data set
s3://redshift-demo/tpc-h/1024/customer/customer.tbl
s3://redshift-demo/tpc-h/1024/lineitem/lineitem.tbl
s3://redshift-demo/tpc-h/1024/nation/nation.tbl
s3://redshift-demo/tpc-h/1024/orders/orders.tbl
s3://redshift-demo/tpc-h/1024/part/part.tbl
s3://redshift-demo/tpc-h/1024/partsupp/partsupp.tbl
s3://redshift-demo/tpc-h/1024/region/region.tbl
s3://redshift-demo/tpc-h/1024/supplier/supplier.tbl
——————————————————————————————————————————
I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:
https://www.linkedin.com/in/omid-vahdaty/
Omid , is this data still available
Some told me recently the data was removed from AWS buckets. I would recommend using the tpch gen util from the tpch website.