php – Extract data from a facebook profile by searching for email

Question:

I need to check if there is a Facebook profile, passing the email as a parameter.

I noticed that through the API there is no way.

But the facebook site has the URL:

https://www.facebook.com/search/all/?q=@

Instead of @ I put a valid email and it finds the profile.

The question would be:

1 – How can I via file_get_contents access this URL dynamically via PHP, and get the profile name and photo.

Note that accessing via browser, and placing a valid email there, it shows name, profile picture etc.

Thanks

Answer:

<?php
date_default_timezone_set('Asia/Tokyo');

ini_set('error_reporting', E_ALL & ~E_STRICT & ~E_DEPRECATED); // & ~E_NOTICE
ini_set('log_errors', true);
ini_set('html_errors', false);
ini_set('display_errors', true);

define('CHARSET', 'UTF-8');

ini_set('default_charset', CHARSET);
mb_http_output(CHARSET);
mb_internal_encoding(CHARSET);
mb_regex_encoding(CHARSET);

header('Content-Type: text/html; charset='.CHARSET);


/*
A parte que interessa começa aqui. O trecho acima é somente um bootstrap.
*/

$email = 'email@que.deseja.buscar';
$url = 'https://www.facebook.com/search/all/?q='.$email;
$data = file_get_contents($url);
$data = html_entity_decode($data);
$data = str_replace(array('<!-- ', ' -->'), '', $data);

class Foo {

    private $data;
    private $dom;

    public function __construct($data) {
        $this->data = $data;
        $this->dom = new DOMDocument();
        $this->dom->validateOnParse = false;
        $this->dom->preserveWhiteSpace = true;
    }

    public function htmlGetContentBySelector($query, $data = null) {
        if (!empty($data)) {
            $this->data = $data;
        }
        libxml_use_internal_errors(true);
        @$this->dom->loadHTML($this->data);
        libxml_use_internal_errors(false);
        $xpath = new DOMXPath($this->dom);
        $xpath_resultset = $xpath->query($query);
        return $this->dom->saveHTML($xpath_resultset->item(0));
    }
}

$c = new Foo($data);

$query = "//code[@id='u_0_d']";
$rs = $c->htmlGetContentBySelector($query);
// O resultado integral
// Exibe o bloco inteiro
//echo $rs; exit;

/*
Agora vamos filtrar e extrair o que interessa

Aqui pegamos a foto.
*/
$query = "//img[@class='_fbBrowseXuiResult__profileImage img']";
$pic = $c->htmlGetContentBySelector($query, $rs);
echo $pic;

/*
retorno
<img class="_fbBrowseXuiResult__profileImage img" src="https://scontent-nrt1-1.xx.fbcdn.net/v/t1.0-1/c17.0.100.100/p100x100/FOTO-DO-PERFIL" width="100" height="100" alt="NOME DO PERFIL">
*/

/*
O nome e URL do perfil.
*/
$query = "//div[@class='_gll']";
$name = $c->htmlGetContentBySelector($query, $rs);
echo $name;

/*
<div class="_gll"><div><a href="https://www.facebook.com/pagina-da-pessoa"><div class="_5d-4"><div class="_5d-5">NOME DO PERFIL</div></div></a></div></div>
*/

/*
Empresa onde trabalha.
*/
$query = "//div[@class='_glm']";
$job = $c->htmlGetContentBySelector($query, $rs);
echo $job;

/*
     <div class="_glm"><div class="_pac" data-bt="{" ct>å¤åå: <a href="https://www.facebook.com/pages/pagina-da-empresa">NOME DA EMPRESA</a><div class="_1my"></div>
</div></div>
     */

The results still have HTML formatting, however, they are quite easy to manipulate and extract the data if you want to remove the HTML from them.

The $rs variable returns something like this:

string(1558) "<code id="u_0_d"><!-- <div class="_4-u2 _4-u8"><div id="all_search_results" data-bt="{"session_id":"5505924b49749c699b44850e32fe24fa","typeahead_sid":null,"result_type":"all","referrer":"","path":"\\/search\\/all\\/","experience_type":"simplepps"}"><div class="_1yt"><div class="_3u1 _gli _5und" data-bt="{"id":1251714145,"rank":null,"abtest_version":null,"abtest_params":[null],"section":"main_column","owner_id":null,"sub_id":null,"browse_location":null,"query_data":{"q":"email\\u0040que.deseja.buscar"},"is_headline":false}"><div class="_401d"><div class="clearfix"><a class="_fbBrowseXuiResult__profileImageLink _8o _8s lfloat _ohe" href="https://www.facebook.com/pagina.da.pessoa" aria-hidden="true" tabindex="-1"><img class="_fbBrowseXuiResult__profileImage img" src="https://scontent-nrt1-1.xx.fbcdn.net/v/t1.0-1/c17.0.100.100/p100x100/xxxxxx-FOTO-DA-PESSOa-xxxxx_n.jpg?oh=74ae0b9e2cc130f9800f98d35d64ce36&oe=58AAB17B" width="100" height="100" alt="wa wa" /></a><div class="_42ef"><div class="_glj"><div class="clearfix"><div class="_glk rfloat _ohf"></div><div class="_gll"><div><a href="https://www.facebook.com/pagina.da.pessoa"><div class="_5d-4"><div class="_5d-5">NOME DA PESSOA   </div></div></a></div></div></div><div><div class="_glm"><div class="_pac" data-bt="{"ct":"sub_headers"}">Job: <a href="https://www.facebook.com/pages/página-empresa-onde-trabalha/codigo-qualquer">NOME DA EMPRESA ONDE TRABALHA</a><div class="_1my"></div></div></div><div class="_glo"></div></div><div class="_glp"></div><div class="_3t0c"></div></div></div></div></div></div></div></div></div> --></code>"

Note: The facebook URL will obviously not return data from profiles that are configured to hide the data.

I can't say if the result can return more than one profile. But considering that emails are unique for each profile, then we can risk extracting data like name, url and profile picture without worrying about it.

It is also important that the values ​​defined in the class and id attributes can change. The above script may stop working correctly due to this or for any other reason in the future, because it is a workaround and not an official and documented way.

Be aware that abnormal requests can block the IP you request. So use it sparingly.

Scroll to Top