Caroline web crawler | Just pics, please

Our web crawler continually searches the Internet for pr0n, identifying itself with a user-agent of "Caroline". If you run a website and would prefer that we do not crawl or link to your pages, you can instruct our crawler to ignore your website: just add "CarolineBot" to the list of disallowed user-agents in your robots.txt file (we follow Google's robots.txt specifications). You'll still see Caroline initially visit your site, but it won't follow any links and won't add your page to our search results.

Once Caroline determines that one of your pages may contain adult pictures or tubes, it will download a few images (or a video) to create thumbnails and save the URL to your web page. It will run some additional tests to ensure your page is really porn-related and safe for people to browse (e.g., no malware or excessive advertising). If all these tests pass, your page will start to appear in our search results within a day or two.

If you want us to include your pages in our search results, here are some tips to ensure they are crawled successfully:

Ensure other popular sites link to your pages. Caroline follows links from one webpage to the next, so if no one links to your pages, we won't be able to find them.
Don't block web crawlers in robots.txt. If you use a robots.txt file to prevent all web crawlers from accessing your site (e.g., "User-agent: *"), Caroline will not add any of your pages to our search database. There is no "Allow" command for robots.txt files, so if you want to block specific crawlers while still allowing Caroline, you need to block them each individually. See robotstxt.org for details.
Photos should link directly to image files. Caroline looks for full-size images by checking the file extension of each link (e.g., does it end in '.jpg'?). Some webmasters link to separate HTML pages for each image; Caroline will not add pages like this to our search database. If your images don't have a file extension, or have a weird file extension (.jpg and .jpeg are safe), then Caroline will not find them.
Tubes should use the standard HTML5 <video> tag. Because we aim to support mobile devices as well as desktop computers, we do not link to galleries that only support Flash or other non-standard tube videos.
Keep the number of images or movies reasonable. Caroline will not index pages with too few images, nor will she index pages with too many images. The exact numbers may change over time, but between 10 and 30 images per page should be safe. For movie clip galleries, we look for at least 4 clips.
Keep the number of outgoing links reasonable, too. If your page has hundreds of links to other pages, Caroline will mark it as spam and ignore it.
Don't spam keywords. We ignore <meta> tags, so spamming keywords there won't get you anything. We also examine the text of the web page itself; if it contains lots of unrelated keywords, we don't index it.
Quality matters. We don't index pages with small images or short (less than 8 minutes), low-bitrate tubes.

If you'd like to talk to us about our web crawler, please use our feedback page.

The Caroline web crawler