For a given Web page, we first wish to determine if it contains some meaningful content (i.e., longer, informative, not necessarily contiguous text, resembling a newspaper article). Then, we wish to extract the main content without the surrounding or the interleaving boilerplate. Besides the main goal of separating the main content from the boilerplate, we would also like to differ between various subtypes of the main content. These can be headlines, user comments, related content, supplemental content, and alike.