We are scanning phonebooks in Mexico. We are scanning each column separately at 600dpi. We started with the PHP sample code. The processImage call specifies Spanish and txt format. But about 50% of the time the curl_exec call returns and empty string. Here is a sample scan: http://digitalfire.com/culiacan/pictures/168.jpg
Also, we are interested in any recommendations specific to scanning phonebooks.
Thanks.
UPDATE
I have emailed the code and have not heard anything so I will try to paste it code here in sections:
function OCR($fileName) {
$applicationId = 'edited out';
$password = 'edited out';
$url = 'http://cloud.ocrsdk.com/processImage?language=spanish&exportFormat=txt';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERPWD, "$applicationId:$password");
curl_setopt($ch, CURLOPT_POST, 1);
$post_array = array("my_file"=>"@".$fileName,);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_array);
$response = curl_exec($ch); curl_close($ch);
$xml = simplexml_load_string($response);
$arr = $xml->task[0]->attributes();
$taskid = $arr["id"];
$url = 'http://cloud.ocrsdk.com/getTaskStatus';
$qry_str = "?taskid=$taskid";
do {
sleep(5); $ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url.$qry_str);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERPWD,
"$applicationId:$password");
$response = curl_exec($ch); curl_close($ch);
$xml = simplexml_load_string($response);
$arr = $xml->task[0]->attributes();
} while($arr["status"] != "Completed");
$url = $arr["resultUrl"];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch); curl_close($ch);
return $response;
}