Community

ocr multiple documents in a folder

Hi,

I have a script to read the files in a folder & if a pdf is found, it will then call the ocr api to scan the pdf files. my script is a below. The problem I am having is that it will stop after scanning one document. The loop does not go on to the next file. Hope someone can advice where did I go wrong.

    <?php

    $dir = "/xampp/htdocs/ocr/ocrfiles/";
    $dh  = opendir($dir);
    while (false !== ($filenamedir = readdir($dh))) {
        $files[] = $filenamedir;
    }

    sort($files);

    echo "<pre>";
    print_r($files);
    echo "</pre>";

    $starter = 0;
    $arraySize = count($files);

     while ($starter < $arraySize) {
        $verifyDate = $files[$starter];
        echo $starter . ": " . $verifyDate . "<BR>";
        $ext = pathinfo($verifyDate, PATHINFO_EXTENSION);

        if ($ext == "pdf") {
            echo "run abbyy api here <BR>";

      // 1. Send image to Cloud OCR SDK using processImage call
      // 2. Get response as xml
      // 3. Read taskId from xml

      // !!!!!!!!!! Enter your data here !!!!!!!!!!
      $applicationId = 'API_ID';
      $password = 'API_PASSWORD';
      $fileName = $verifyDate;

      // Get path to file that we are going to recognize
      $local_directory=dirname(__FILE__).'/ocrfiles';
      $filePath = $local_directory.'/'.$fileName;
      if(!file_exists($filePath))
      {
        die('File '.$filePath.' not found.');
      }

      // Recognizing with English language to rtf
      // You can use combination of languages like ?language=english,russian or
      // ?language=english,french,dutch
      // For details, see API reference for processImage method
      $url = 'http://cloud.ocrsdk.com/processImage?language=english&exportFormat=xml';

      // Send HTTP POST request and ret xml response
      $curlHandle = curl_init();
      curl_setopt($curlHandle, CURLOPT_URL, $url);
      curl_setopt($curlHandle, CURLOPT_RETURNTRANSFER, 1);
      curl_setopt($curlHandle, CURLOPT_USERPWD, "$applicationId:$password");
      curl_setopt($curlHandle, CURLOPT_POST, 1);
      $post_array = array(
          "my_file"=>"@".$filePath,
      );
      curl_setopt($curlHandle, CURLOPT_POSTFIELDS, $post_array); 
      $response = curl_exec($curlHandle);
      if($response == FALSE) {
        $errorText = curl_error($curlHandle);
        curl_close($curlHandle);
        die($errorText);
      }
      curl_close($curlHandle);

      // Parse xml response
      $xml = simplexml_load_string($response);
      $arr = $xml->task[0]->attributes();

      // Task id
      $taskid = $arr["id"];

      // 4. Get task information in a loop until task processing finishes
      // 5. If response contains "Completed" staus - extract url with result
      // 6. Download recognition result (text) and display it

      $url = 'http://cloud.ocrsdk.com/getTaskStatus';
      $qry_str = "?taskid=$taskid";

      // Check task status in a loop until it is finished
      // TODO: support states indicating error
      do
      {
        sleep(5);
        $curlHandle = curl_init();
        curl_setopt($curlHandle, CURLOPT_URL, $url.$qry_str);
        curl_setopt($curlHandle, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($curlHandle, CURLOPT_USERPWD, "$applicationId:$password");
        $response = curl_exec($curlHandle);
        curl_close($curlHandle);

        // parse xml
        $xml = simplexml_load_string($response);
        $arr = $xml->task[0]->attributes();
      }
      while($arr["status"] != "Completed");

      // Result is ready. Download it

      $url = $arr["resultUrl"];   
      $curlHandle = curl_init();
      curl_setopt($curlHandle, CURLOPT_URL, $url);
      curl_setopt($curlHandle, CURLOPT_RETURNTRANSFER, 1);
      // Warning! This is for easier out-of-the box usage of the sample only.
      // The URL to the result has https:// prefix, so SSL is required to
      // download from it. For whatever reason PHP runtime fails to perform
      // a request unless SSL certificate verification is off.
      curl_setopt($curlHandle, CURLOPT_SSL_VERIFYPEER, false);
      $response = curl_exec($curlHandle);
      curl_close($curlHandle);

      // Let user donwload rtf result
      echo $response . "<BR><BR>";

    //document $ amount checker process ends
        }
        $starter++;

    }

    ?>

Hope someone can help out soon. Thanks!

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.