I have a directory of <20MB pdf files (each pdf represents an ad) on an AWS EC2 large instance. I'm trying to upload each pdf file to S3 using ruby and DM-Paperclip.
Most files upload successfully but some seem to take hours with the CPU hanging at 100%. I've located the line of code that causes the issue by printing debug statements in the relevant section.
# Takes an array of pdf file paths and uploads each to S3 using dm-paperclip
def save_pdfs(pdfs_files)
pdf_files.each do |path|
pdf = File.open(path)
ad = Ad.new
ad.pdf.assign(pdf) # <= Last debug statment is printed before this line
begin
ad.save
rescue => e
# log error
ensure
pdf.close
end
end
To help troubleshoot the issue I attached strace to the process while it was stuck at 100%. The result was hundreds of thousands of lines like this:
...
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
... 500K lines
Followed by a few thousand:
...
brk(0x1224d0000) = 0x1224d0000
brk(0x1224f3000) = 0x1224f3000
brk(0x122514000) = 0x122514000
...
During an upload that doesn't hang, strace looks like this:
...
ppoll([{fd=12, events=POLLOUT}], 1, NULL, NULL, 8) = 1 ([{fd=12, revents=POLLOUT}])
fstat(12, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
fcntl(12, F_GETFL) = 0x2 (flags O_RDWR)
write(12, "%PDF-1.3\n%\342\343\317\323\n8 0 obj\n<</Filter"..., 4096) = 4096
ppoll([{fd=12, events=POLLOUT}], 1, NULL, NULL, 8) = 1 ([{fd=12, revents=POLLOUT}])
write(12, "S\34\367\23~\277u\272,h\204_\35\215\35\341\347\324\310\307u\370#\364\315\t~^\352\272\26\374"..., 4096) = 4096
ppoll([{fd=12, events=POLLOUT}], 1, NULL, NULL, 8) = 1 ([{fd=12, revents=POLLOUT}])
write(12, "\216%\267\2454`\350\177\4\36\315\211\7B\217g\33\217!e\347\207\256\264\245vy\377\304\256\307\375"..., 4096) = 4096
...
The pdf files that cause this issue seem random. They are all valid pdf files, and they are all relatively small. They vary between ~100KB to ~50MB.
Is the strace with the seemingly excessive stat system calls related to my issue?
It appears to be a problem of the original file permissions on failed files sent along with the file from origin computer. All pdf's(files Tobe saved into server) to be 0644 assigned in the script or if script uses original permissions at pickup to send from client. Basically the server OS and configurations is rejecting it because the file permissions are not 0644 on write to disc.