I have this method SearchPdf(string path, string keyword)
where path
is the folder path that contains all the PDFs file to search and keyword
is the keyword to search in the PDF file or PDF's file name.
I'm using Spire.Pdf
to read the PDFs.
Here is the method:
public static ConcurrentBag<KeyValuePair<string, string>> SearchPdf(string path, string keyword)
{
var results = new ConcurrentBag<KeyValuePair<string, string>>();
var directory = new DirectoryInfo(path);
var files = directory.GetFiles("*.pdf", SearchOption.AllDirectories);
Parallel.ForEach(files, file =>
{
// Apri il file PDF
var document = new PdfDocument(file.FullName);
Console.WriteLine("\n\rRicerca per: " + keyword + " in file: " + file.Name + "\n\r");
// Itera le pagine del documento
for (int i = 0; i < document.Pages.Count; i++)
{
// Estrai il testo della pagina
var page = document.Pages[i];
var text = page.ExtractText();
// Cerca la parola chiave
keyword = keyword.ToLower().Trim();
if (text.ToLower().Contains(keyword) || file.Name.ToLower().Trim().Contains(keyword) || file.FullName.ToLower().Trim().Contains(keyword))
{
results.Add(new KeyValuePair<string, string>(keyword, file.FullName));
}
}
});
return results;
}
All works fine but when I have more than 200 keywords to search and more then 1500 files it's a bit slow. Is there something to do to optimize this loop?