archiveopteryx VS dbmail

1k views Asked by At

I am after a database system to store my email messages (over 4 million now) which would allow me to search over emails easily programmatically, so far I only found these two, and alternative is come up with my own schemea and sync the IMAP/Pop3 folders as the emails come in. However reading this : DBMAIL for search seems due to the way dbmail keeps the emails, searching over the emails is not very possible.

My main goal is to be able to read the emails and search through them lets say using python.

3

There are 3 answers

0
Daniel Vérité On

Some other alternative choices:

http://sqmail.sourceforge.net/ is written in Python but unsupported since a long time.

http://www.manitou-mail.org/ which is written in Perl (by me!) with an optional GUI in Qt/C++.

2
Entrop On

I'm looking for something almost exactly the same, storage and search across a huge corpus of emails and attachments without and clear winners. Would love to compare notes. Appears that archiveopteryx are dbmail the two front runners with mature db schemas after a quick googling. Both seem to be suites optimized around mta integration delivery/pop/imap type services, though I'm still investigating both.

I also found this http://www.flaterco.com/aemail/aedocs.html but it seems unmaintained since about 2003.

I can't seem to figure out how to send a PM or connect with people on this service yet.

0
CozC On

A little bit late, but you can do your searching easily by creating views in your database and then do your searching.

/* to, from and subject*/
create view as EmailHeaders
select 
`m`.`physmessage_id` AS `physmessage_id`,
`ph`.`internal_date` AS `internal_date`,
max(if(`n`.`headername` = 'from',`v`.`headervalue`,NULL)) AS `From`,
max(if(`n`.`headername` = 'to',`v`.`headervalue`,NULL)) AS `To`,
max(if(`n`.`headername` = 'subject',`v`.`headervalue`,NULL)) AS `Subject`
from 
(
    (
        (`dbmail_messages` `m` 
            join `dbmail_header` `h` on (`m`.`physmessage_id` = `h`.`physmessage_id`)
        ) 
        join `dbmail_physmessage` `ph` on(`m`.`physmessage_id` = `ph`.`id`)
    ) 
    join `dbmail_headername` `n` on(`h`.`headername_id` = `n`.`id`)
) 
join `dbmail_headervalue` `v` on(`h`.`headervalue_id` = `v`.`id`))
where 
`n`.`headername` = 'to' 
or `n`.`headername` = 'from' 
or `n`.`headername` = 'subject' 
group by `m`.`physmessage_id`

You can also create views to disaplay messages, but is not efficient (memory consuming) but you can use something like this

select  
    physmessage_id, 
    sum(prts.size) emailSize 
from dbmail_partlists list, dbmail_mimeparts prts, EmailHeaders eh
where list.part_id=prts.id 
   and list.is_header=0
   and eh.physmessage_id=list.physmessage_id
   and prts like '%your_word_here%'
group by list.physmessage_id 

For small servers these queries are enough, but for large ones, you may need to employ other methods, but essentially this is it.