I have a several crawlers that crawls multiple sites and stores the contents in a database. The logs from the program are stored in CloudWatch Logs.
If the crawlers successfully pulls back content it looks like similarly to below
HTTP GET: 200 - https://www.thecheyennepost.com/news/national/r
HTTP GET: 200 - https://www.thecheyennepost.com/news/f-e-warren-hous
The issue I'm dealing with is identifying when 400 errors pop up. Below is an example:
HTTP GET: 429 - https://www.livingstonparishnews.com/search/?l=25&sort=
HTTP GET: 429 - https://www.livingstonparishnews.com/search/?l=25&sort=rele
HTTP GET: 429 - https://www.ktbs.com/search/?l=25&s=start_time&sd=desc&f=
I tried using status_code=4*
but that didn't do anything
I just want to be able to filter any and all 400 errors.
Any help that can be provided would be greatly appreciated.
Yes! Now you can with Logs Insights :)
First... you need to have the new UI or in another way go to "Logs Insights" service... jaja
CloudWatch -> CloudWatch Logs -> Log groups -> [your service logs]
With the new UI you can see this button (or go to Logs Insights in the search engine of aws cli):
Now you can see this:
Now in your case.. you need this query (tell me if you need to filter another thing)
I see your logs and you have spaces between your status code and I think this is the best
And that's all
Now run the query and you will see only logs that contains status codes [4xx]. I hope that solve your problem
NOTE: if you go directly from search engine to Logs Insights you need to select the service logs that you scan with the query. On the combobox in top of query box.