I was wondering how do email providers like gmail, yahoo detect spam mails and mark them as spam? How do they know which mail is a spam and which is not.
Just theoretically I want to know.
Any help is appreciated.
I may not be that accurate but
- hidden texts in the e-mail (HTML)
- undefined parameters in the mail headers(giving proper information like from address, to address,reply-to address, subject etc
- language, grammar etc
Mailchimp has a couple of good articles about the topic:
I would search for research papers on this topic, for example, http://scholar.google.com/scholar?hl=en&q=spam+filter
Unfortunately, people usually don't blog about it.
The workings of SpamAssassin is a starting point for research. The big mail providers also have the advantage of being able to know when an email was sent to a large pool of users.