I am trying to ingest data to Elasticsearch.
The data is located in a shared folder. If possible I want to delete the zip file once I ingested it to Elasticsearch.
It is a .zip file which unzips to a big single file with a format looking like this:
#ReferenceID 123das
#FamilyID abc
#ArchiveDate 1483237892226 (datetime in millisecond epoch)
#SenderID user1
#RecipientID user2
#RecipientID user3 (notice there are more than 1 RecipientID)
#Content
This is secret content of the document, and it is not encrypted.
#EndDoc
#ReferenceID 123das/1 (The "/1" means 1st attachment)
#FamilyID abc
#ArchiveDate 1483237892227 (datetime in millisecond epoch)
#SenderID user1
#RecipientID user2
#RecipientID user3 (notice there are more than 1 RecipientID)
#Content
This is the secret attachment content
#EndDoc
#ReferenceID...
...
#EndDoc
...multiple of these until End of File
Basically each
#ReferenceID...
...
#EndDoc
Is representing a document to be ingested into Elasticsearch
My question is could this be done using Logstash and Beats? How would I go about doing this? Any pointers appreciated.
Logstash's file{} input cannot handle a zipped file. See the github issue.
Filebeat can't either... yet! See this PR, which seems to working its way through.
Just FYI, the s3{} input can.