Public @LifeSizeHD Videocenters need to upgrade to latest release or may get crawled to death.

Recently had a problem with one of our LifeSize Videocenters where the hard disks would fill up (within a day or two), despite that fact that no new content or videos had been recorded or uploaded. A reboot would fix it, but the reboot would take a very long time (20-30 minutes). At first we thought it might be a problem with a log file filling up (a previous issue).

Baidu-Sogou

After working with LifeSize tech support we found that the problem was with Chinese search engines that would crawl the videocenter and try to download all the videos. Here is the explanation from the LifeSize tech we worked with:

Video Center creates a tar file in /tmp when a video’s archive is downloaded. The archive in /tmp is cleared only when it is older than 24 hours. Archives are cleaned up at midnight.

Their videocenter is on a public IP and it is being crawled by Chinese search engines: Sogou and Baidu. Sogou’s indexing spider is asking for the archive link for some of the videos multiple times. Each time Video Center gets this request, it is creating a tmp file that gets cleared two days later.

Here’s a snapshot of the requests coming from the search engines:

root@209:/var/log/apache2# grep archive access.log.1

220.181.89.170 – - [08/Dec/2012:07:52:30 -0600] “GET /videos/video/894/archive/ HTTP/1.1″ 200 14952 “-” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”

180.76.5.140 – - [08/Dec/2012:08:44:26 -0600] “GET /videos/video/623/archive/ HTTP/1.1″ 200 819548 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”

220.181.89.170 – - [08/Dec/2012:08:55:38 -0600] “GET /videos/video/871/archive/ HTTP/1.1″ 200 14952 “-” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”

220.181.89.170 – - [08/Dec/2012:08:57:37 -0600] “GET /videos/video/894/archive/ HTTP/1.1″ 200 14952 “-” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”

220.181.89.170 – - [08/Dec/2012:09:01:43 -0600] “GET /videos/video/871/archive/ HTTP/1.1″ 200 14952 “-” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”

220.181.89.170 – - [08/Dec/2012:09:04:55 -0600] “GET /videos/video/894/archive/ HTTP/1.1″ 200 14952 “-” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”

220.181.89.170 – - [08/Dec/2012:09:06:58 -0600] “GET /videos/video/871/archive/ HTTP/1.1″ 200 14952 “-” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”

220.181.89.170 – - [08/Dec/2012:09:10:56 -0600] “GET /videos/video/894/archive/ HTTP/1.1″ 200 14952 “-” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”

220.181.89.170 – - [08/Dec/2012:09:12:54 -0600] “GET /videos/video/871/archive/ HTTP/1.1″ 200 14952 “-” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”

220.181.89.170 – - [08/Dec/2012:09:16:07 -0600] “GET /videos/video/894/archive/ HTTP/1.1″ 200 14951 “-” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”

220.181.89.170 – - [08/Dec/2012:09:20:18 -0600] “GET /videos/video/894/archive/ HTTP/1.1″ 200 14951 “-” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”

220.181.89.170 – - [08/Dec/2012:09:18:16 -0600] “GET /videos/video/871/archive/ HTTP/1.1″ 200 14951 “-” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”

The immediate solution was to disable video downloads on the videocenter.

Screen Shot 2013-02-03 at 7.18.14 PM

If you have a firewall in front of your Videocenter, you can add rules to block these search engine crawlers from accessing the Video Center IP.

After reporting this problem, LifeSize added a modification on the downloads page of the videocenter so that requesting robots do not follow the links. Sogou and Baidu honour these directives, so if you upgrade to the latest version of the software (2.1.2(3)) you won’t have this problem.

Note: when the videocenter reboots, it will delete all the tmp files, so that is why a reboot would fix the problem. 

Leave a Reply

Your email address will not be published.


7 × = thirty five

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

The Free-Range Technologist © 2014