Question:
I'm trying to implement a Simples Nacional query, the operation is similar to the revenue query by CNPJ.
Details I've noticed so far:
- After loading the page, it executes an
ajax
(captcha2.js
file ) that returns 3 items inJSON {Token, Dados, ContentType}
. TheToken
is stored in a cookie that would probably serve to validate the image after submit.Dados
is the base64 of the image.ContentType
the image type. - It generates the cnpj input by concatenating a number which is also stored in another hidden input ( id :
ctl00_ContentPlaceHolderConteudo_HiddenField1
). - There are other inputs (__VIEWSTATE, __EVENTVALIDATION, __EVENTARGUMENT, __EVENTTARGET) that store values, the last two, usually empty…
I'm capturing the initial data and setting the headers in the second request to retrieve the data, however, the problem is that it always gives an error ( Invalid anti-robot characters. Try again. ).
Here are the codes I have so far (they may be messy or with unnecessary lines, they are just tests yet, need to clean the code later):
QueryCnpjSimplesNacional.php:
class ConsultaCnpjSimplesNacional
{
/**
* Devolve um array de parâmetros para consulta de CNPJ Simples Nacional
* @return array
*/
public static function getParams()
{
$ckfile = tempnam("/tmp", "CURLCOOKIE");
// ini_set('xdebug.var_display_max_depth', 5);
// ini_set('xdebug.var_display_max_children', 256);
// ini_set('xdebug.var_display_max_data', 10000000000000);
$urlConsulta = 'http://www8.receita.fazenda.gov.br/SimplesNacional/Aplicacoes/ATBHE/ConsultaOptantes.app/ConsultarOpcao.aspx';
$chInicial = curl_init();
curl_setopt($chInicial, CURLOPT_URL, $urlConsulta);
curl_setopt($chInicial, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($chInicial, CURLOPT_VERBOSE, 1);
curl_setopt($chInicial, CURLOPT_HEADER, 1);
curl_setopt($chInicial, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($chInicial, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt($chInicial, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($chInicial, CURLOPT_TIMEOUT, 20000);
curl_setopt($chInicial, CURLOPT_CONNECTTIMEOUT, 20000);
$response = curl_exec($chInicial);
require_once __DIR__ . DIRECTORY_SEPARATOR . 'simple_html_dom.php';
$html = str_get_html($response);
$inputViewStateValue = $html->getElementById('__VIEWSTATE')->value;
$inputEventValidationValue = $html->getElementById('__EVENTVALIDATION')->value;
$inputHiddenField1Value = $html->getElementById('ctl00_ContentPlaceHolderConteudo_HiddenField1')->value;
$inputHddServidorCaptchaValue = $html->getElementById('hddServidorCaptcha')->value;
$urlCaptchaContainer = $html->getElementById('captcha-container')->{'data-url'};
$html->clear();
unset($html);
//Essa url informada, é definida no javascript captcha2.js
//$chInicial = curl_init($urlCaptchaContainer . '/Captcha/Inicializa.ashx');
curl_setopt($chInicial, CURLOPT_URL, $urlCaptchaContainer. '/Captcha/Inicializa.ashx');
curl_setopt($chInicial, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($chInicial, CURLOPT_VERBOSE, 1);
curl_setopt($chInicial, CURLOPT_HEADER, 1);
curl_setopt($chInicial, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($chInicial, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt($chInicial, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($chInicial, CURLOPT_HTTPHEADER, ['Content-type' => 'application/x-www-form-urlencoded']);
$response = curl_exec($chInicial);
$header_size = curl_getinfo($chInicial, CURLINFO_HEADER_SIZE);
$response = substr($response, $header_size);
curl_close($chInicial);
$jsonResponse = json_decode($response);
$browser = new Browser();
if ($browser->getBrowser() == Browser::BROWSER_IE /*&& $browser->getVersion() <= 8*/) {
$randomName = utf8_encode(\Yii::$app->security->generateRandomString());
$pasta = Url::to("@webroot/assets/temp_images_ie/$randomName.png");
$imgReturn = Url::to("@web/assets/temp_images_ie/$randomName.png");
file_put_contents("$pasta", $jsonResponse->Dados);
} else {
$imgReturn = 'data:image/png;base64,' . $jsonResponse->Dados;
}
return [
'cookie' => 'captcha_token=' . $jsonResponse->Token,
'viewState' => $inputViewStateValue,
'eventValidation' => $inputEventValidationValue,
'hiddenField1' => $inputHiddenField1Value,
'hddServidorCaptcha' => $inputHddServidorCaptchaValue,
'captchaBase64' => $imgReturn
];
}
public static function consulta($cnpj, $captcha, $stringCookie, $viewState, $eventValidation, $hiddenField, $servidorCaptcha)
{
$result = [];
$ch = curl_init('http://www8.receita.fazenda.gov.br/SimplesNacional/Aplicacoes/ATBHE/ConsultaOptantes.app/ConsultarOpcao.aspx');
$ckfile = tempnam("/tmp", "CURLCOOKIE");
$options = [
CURLOPT_HTTPHEADER => [
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding' => 'gzip, deflate',
'Accept-Language' => 'pt-BR,pt;q=0.8,en-US;q=0.6,en;q=0.4',
'Connection' => 'keep-alive',
'Content-type' => 'application/x-www-form-urlencoded',
'Cookie' => $stringCookie,
'DNT' => 1,
'Host' => 'www8.receita.fazenda.gov.br',
'Origin' => 'http://www8.receita.fazenda.gov.br',
'Referer' => 'http://www8.receita.fazenda.gov.br/SimplesNacional/Aplicacoes/ATBHE/ConsultaOptantes.app/ConsultarOpcao.aspx',
'User-Agent' => 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.93 Safari/537.36'
],
CURLOPT_COOKIEJAR => $ckfile,
CURLOPT_COOKIEFILE => $ckfile,
CURLOPT_POST => TRUE,
CURLOPT_RETURNTRANSFER => TRUE,
CURLOPT_FOLLOWLOCATION => 1,
CURLOPT_POSTFIELDS => [
'__EVENTTARGET' => null,
'__EVENTARGUMENT' => null,
'__VIEWSTATE' => $viewState,
'__EVENTVALIDATION' => $eventValidation,
"ctl00\$ContentPlaceHolderConteudo\$$hiddenField" => $cnpj,
'ctl00$ContentPlaceHolderConteudo$HiddenField1' => $hiddenField,
'ctl00$ContentPlaceHolderConteudo$hddServidorCaptcha' => $servidorCaptcha,
'ctl00$ContentPlaceHolderConteudo$txtTexto_captcha_serpro_gov_br' => $captcha,
'ctl00$ContentPlaceHolderConteudo$btnConfirmar' => 'Consultar'
],
CURLOPT_TIMEOUT => 20000,
CURLOPT_CONNECTTIMEOUT => 20000
];
curl_setopt_array($ch, $options);
$response = curl_exec($ch);
curl_close($ch);
require_once __DIR__ . DIRECTORY_SEPARATOR . 'simple_html_dom.php';
echo $response;
exit();
$html = str_get_html($response);
}
}
query_cnpj_simples.php
/**
* @var yii\web\View $this
*/
$this->title = 'Consulta CNPJ Simples Nacional';
try {
$params = \app\common\components\consulta\ConsultaCnpjSimplesNacional::getParams();
} catch (Exception $e) {
return Json::encode(['error' => 'error', 'mensagem' => $e->getMessage()]);
}
$form = ActiveForm::begin([
'id' => 'form-consulta-cnpj',
'enableClientScript' => false,
'action' => Url::to(['consulta/processa-cnpj-simples'], true),
'method' => 'POST'
]
);
?>
<img id="image_captcha" class="img-thumbnail" src="<?= $params['captchaBase64'] ?>"/><br/><br/>
<input type="hidden" id="cookie" name="cookie" value="<?= $params['cookie'] ?>">
<input type="hidden" id="viewState" name="viewState" value="<?= $params['viewState'] ?>">
<input type="hidden" id="eventValidation" name="eventValidation" value="<?= $params['eventValidation'] ?>">
<input type="hidden" id="hiddenField1" name="hiddenField1" value="<?= $params['hiddenField1'] ?>">
<input type="hidden" id="hddServidorCaptcha" name="hddServidorCaptcha" value="<?= $params['hddServidorCaptcha'] ?>">
<input type="text" name="input_captcha" id="input_captcha" placeholder="Digite o código da imagem">
<input type="text" name="cnpj" id="cnpj" placeholder="Digite o CNPJ" value="00175318000103">
<input type="submit" value="Enviar">
<?php ActiveForm::end();?>
ConsultaController.php
/**
* Processa a requisição de consulta por CNPJ;
*/
public function actionProcessaCnpjSimples()
{
try {
$post = Yii::$app->request->post();
if (!isset($post['cnpj']) ||
!isset($post['input_captcha']) ||
!isset($post['cookie']) ||
!isset($post['viewState']) ||
!isset($post['eventValidation']) ||
!isset($post['hiddenField1']) ||
!isset($post['hddServidorCaptcha'])
)
throw new Exception('Informe todos os campos!', 99);
$formatter = new Formatter();
$cnpj = $formatter->customOnlyNumberFormat($post['cnpj']);
$return['code'] = 0;
$return['message'] = Yii::t('app', 'Dados encontrados!');
$resultado = ConsultaCnpjSimplesNacional::consulta(
$cnpj,
$post['input_captcha'],
$post['cookie'],
$post['viewState'],
$post['eventValidation'],
$post['hiddenField1'],
$post['hddServidorCaptcha']
);
//$return = array_merge($return, $resultado);
} catch (\Exception $e) {
$return = ['code' => $e->getCode(), 'message' => $e->getMessage()];
}
echo '<pre>';
print_r($return);
echo '</pre>';
}
Note that the current structure is built on the Yii2 framework (I didn't think it was necessary to include it as a tag), but it can be easily "disassembled" to implement elsewhere, since the "heart" is the QueryCnpjSimplesNacional class.
Has anyone ever needed to implement a query for Simples Nacional, or could you indicate a way of not validating the images (captcha)?
Link to PHP Simple Html DOM Parser used in the class to "defrag" the html and get the field values.
Answer:
When making the captcha request, you are not getting the image associated with the query.
That's because the server differentiates different queries by cookie – which you didn't set. There are two ways to do this:
-
saving the cookie in a variable in the first query, and then when getting the captcha, you set that same cookie
-
doing it automatically, using a cookie container before making any query. This is close to the way browsers handle cookies.
$filename = 'C:\pasta\cookie.txt'; // deve conter o caminho absoluto para o arquivo curl_setopt($ch, CURLOPT_COOKIEJAR, $filename); curl_setopt($ch, CURLOPT_COOKIEFILE, $filename);
This is a basic problem that I identified by looking over the code. Perhaps there are others, and identifying them is difficult, as programming a robot with curl in PHP is first and foremost a reverse engineering job.
I suggest enabling curl verbose curl_setopt($ch, CURLOPT_VERBOSE, 1);
and compare the headers and GET/POST data you send with what you observe in the Network
tab of Ferramentas do Desenvolvedor
in Chrome when doing a query manually (check Preserve log
). If everything is identical, it will work.