![]() ![]() Do not store plain text files in Snappy compressed form, instead use a container like SequenceFile. Plain text files: Like Gzip, Snappy is not splittable. Permanent Storage: Snappy compression is not efficient space-wise and it is expensive to store data on HDFS (3-way replication) Please do make sure these intermediate files are cleaned up soon enough so we don’t have disk space issues on the cluster. Temporary Intermediate files (not available currently as of Pig 0.9.2, applicable only to native Map Reduce) : If you have a series of MR jobs chained together, Snappy compression is a good way to store the intermediate files. Map output: Snappy works great if you have large amounts of data flowing from Mappers to the Reducers (you might not see a significant difference if data volume between Map and Reduce is low)
0 Comments
Leave a Reply. |