Introduction
I am developing my presentation site and I want to include my Stack Overflow profile info/posts/data (eg top tag, score and so on.)
I found data.stackexchange.com to retrieve the desired data but I can't understand how can I show this data in my site.
In github.com I found this prerequisites: https://github.com/StackExchange/StackExchange.DataExplorer#prerequisites which basically says that I must be a .NET programmer to be able to display this data but I am a PHP programmer, I work with Apache MySQL and PHP.
I know that there are lots of PHP MsSQL functions I can use but how can I connect to the Stack Exchange database (I think as a guest/limited user) with which username-password?
Even if this is not too much on-topic here, where can I find more info on how I can display Stack Overflow data on my site?
I recommend checking out http://simplehtmldom.sourceforge.net/
Something like this should get reputation using the PHP Simple HTML DOM Parser
$html = file_get_html('https://stackoverflow.com/users/5039442/thetaskmaster');
$reputation = $html->find('.reputation', 0)->plaintext;
Even if CONFUS3D' answer is a good solution, any alteration to the User Interface may cause errors in your site.
I suggest you to use the Stack Exchange API set with which you can retrieve the most of data you probably need.
Any API query will return a JSON object. I use this PHP class to retrieve this object:
class ApiReader {
public function getResponse($url) {
$cH = curl_init();
curl_setopt($cH, CURLOPT_URL, $url);
curl_setopt($cH, CURLOPT_HEADER, 0);
curl_setopt($cH, CURLOPT_RETURNTRANSFER, true);
curl_setopt($cH, CURLOPT_TIMEOUT, 30);
curl_setopt($cH, CURLOPT_USERAGENT, cURL);
curl_setopt($cH, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($cH, CURLOPT_ENCODING, "gzip");
$result = curl_exec($cH);
if(curl_errno($cH)) {
$retur = FALSE;
}
else {
$status = curl_getinfo($cH, CURLINFO_HTTP_CODE);
if($status == 200) {
$retur = $result;
}
else {
$retur = FALSE;
}
}
curl_close($cH);
return $retur;
}
}
I use this little trick to test the site even if I am off-line.
In your host, save all the JSON objects you need to use, then declare two vars $UInfo_API containing the API query and $UInfo_Syn which gets the content of the saved JSON object
$UInfo_API = "api.stackexchange.com/2.2/users/5039442?site=stackoverflow";
$UInfo_Syn = file_get_contents("yourjsonobject.json");
Then save the result in a variable checking if the getResponse() method has failed or not. After that, you have the data on tap.
$sear = new ApiReader();
$uInfo = $sear->getResponse($UInfo_API);
$uInfo = ($uInfo !== FALSE)? json_decode($uInfo, TRUE): json_decode($UInfo_Syn, TRUE);
$rep = $uInfo["items"][0]["reputation"];
Related
I'm trying to fetch the number of followers of an instagram account through web scraping and curl. Using their API may be easier but i want to know why this won't work, because in many case i got the data through HTML.
static $url='https://www.instagram.com/cats_of_instagram/';
function getUrlContent($url){
try {
$curl_connection = curl_init($url);
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
//Data are stored in $data
$data = (curl_exec($curl_connection));
$position = strpos($data,"<span data-reactid=\".0.1.0.0:0.1.3.1.0.2\"> followers</span>");
print_r($position);
curl_close($curl_connection);
} catch(Exception $e) {
return $e->getMessage();
}
}
Problem is the function strpos does not return position.
$position = strpos($data,"<span data-reactid=\".0.1.0.0:0.1.3.1.0.2\"> followers</span>");
You can't do that.
The element you're looking for is rendered by javascript, after the page has loaded.
curl doesn't wait for scripts to run (nor does it run any). It just returns the html.
You can easily verify this by printing $data. Or by looking at the page's source.
To "see" the element you're looking for, you need to use the DOM inspector.
We have a JIRA instance that our custom PHP app built in Laravel pulls from and for each issue looks to see if a specific branch or tag exists:
chdir($path . $repo);
exec("git rev-parse --verify ".$branch, $branch_dump, $return_var);
if ($return_var == 0) {
return true;
} else {
return false;
}
However, we have migrated all of our git projects to GitLab and that method no longer works, since you need root to get into GitLab's repo data directory.
We looked at GitLab's API and found that we could do this:
http://gitlab/api/v3/projects/10/repository/commits/OUR-TAG-HERE?private_token=XXX
However this requires us to specify an arbitrary GitLab project ID (10 in this case) and therefore isn't predictable, so we can't programmatically execute the search for each JIRA API return like we did before. This method would work if we could simply search for tags using the project name only, but I can't find a way to do that.
Here's an overlook at the way the app works:
JIRA contains all issues we want
Each issue contains several custom fields we use to search our git repos with, generically they are "Repo Name" and "Tag Name"
Our Laravel app connects to JIRA's api and harvests all issues into an array we use to build a table listing information about each issue
The two custom fields "Repo Name" and "Tag Name" are matched against our git repositories to determine which of several options to provide the end user (clone tag, create tag if repo exists but no tag exists, or none if neither)
We briefly considered adding another custom field to our JIRA issues which we would fill with GitLab's project ID, but we have hundreds of issues and it is an inelegant solution that really only acts as another potential point of failure, to say nothing of the extra maintenance.
Any ideas?
The best solution I found to this issue was to use the API to get the list of projects and use that list to pair name and ID.
For example, this code will output the tag names for all your projects:
//Get Projects list via API
$header = array("PRIVATE-TOKEN: <YOUR_TOKEN>");
$ch = curl_init("https://<YOUR_GITLAB_DOMAIN>/api/v3/projects/");
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_close($ch);
//Parse returned list to an array
$projectsArray= json_decode($result, true);
//Loop over the array of projects accessing the list of tags via the API
foreach ($projectsArray as $project) {
echo $project["name"] . " Tags:<br>";
$tagURL = "https://<YOUR_GITLAB_DOMAIN>/api/v3/projects/" . $project["id"] . "/repository/tags";
$ch2 = curl_init($tagURL);
curl_setopt($ch2, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch2, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch2, CURLOPT_RETURNTRANSFER, 1);
$result2 = curl_exec($ch2);
curl_close($ch2);
$tagsArray= json_decode($result2, true);
foreach ($tagsArray as $tag) {echo $tag["name"] . "<br>";}
echo "<br>";
}
Since arbitrary project IDs are still required by the GitLab API for this functionality we've scrapped the API altogether. Instead we're now simply cURLing HTTP response codes. Here's one of our methods to see if the issue has a tag:
public function HasTag($projectName, $nameSpace, $tagName)
{
$url=$this->gitLabUrl.'/'.$nameSpace.'/'.$projectName.'/tags/'.$tagName;
$ch = curl_init(); // Initiate curl
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Disable SSL verification
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // Will return the response, if false print the response
curl_setopt($ch, CURLOPT_URL, $url); // Set the url
curl_exec($ch); // Execute
$info = curl_getinfo($ch);
curl_close($ch);
if ($info['http_code'] == 200) {
return true;
} else {
return false;
}
}
And here's our method to check for a branch:
public function HasBranch($projectName, $nameSpace, $branchName)
{
$url=$this->gitLabUrl.'/'.$nameSpace.'/'.$projectName.'/tree/'.$branchName;
$ch = curl_init(); // Initiate curl
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Disable SSL verification
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // Will return the response, if false print the response
curl_setopt($ch, CURLOPT_URL, $url); // Set the url
curl_exec($ch); // Execute
$info = curl_getinfo($ch);
curl_close($ch);
if ($info['http_code'] == 200) {
return true;
} else {
return false;
}
}
As you can see this is pretty simple and hacky, but it works for our implementation because none of the projects being accessed are private (our GitLab instance is purely internal).
Hopefully in the future GitLab will remove the ID requirement from its API.
I need to creating some categories in Disqus. I tried to do it by Javascript but it cannot do because require POST request but JSONP only work with GET request. After that, I tried to use CURL in server-side, there are my code
public function createDisqusCategory($title, $forum)
{
$access_token = ACCESS_TOKEN;
$secret_key = SECRET_KEY;
$public_key = PUBLIC_KEY;
$url = 'https://disqus.com/api/3.0/categories/create.json';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER,array('Content-Type: application/json'));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, "access_token=$access_token&api_secret=$secret_key&api_key=$public_key&forum=$forum&title=$title");
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
and it response {"code": 22, "response": "You do not have admin privileges on forum '...'"}
How can I solve this problem?
Does your application have Default access set to "Read, write and manage forums"? If not, you'll either need to add a "scope" parameter to your POSTFIELDS, or set default access to manage forums in your application settings. Here's our documentation on scopes: http://disqus.com/api/docs/permissions/
On another note, categories in Disqus are limited to use with the API, so it's not useful in any way unless you're querying comments/threads using a custom script. If you are, I'd also advise keeping it to about 5 categories maximum, or else it can really slow down queries.
I am trying to write a tool that detects if a remote website uses flash using php. So far I have written a script that detects if embed or objects exist which give an indicator that there is a possibility of it being installed but some sites encrypt their code so renders this function useless.
include_once('simple_html_dom.php');
$flashTotalCount = 0;
function file_get_contents_curl($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$html = file_get_contents_curl($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
foreach($html->find('embed') as $pageEmbed){
$flashTotalCount++;
}
foreach($html->find('object') as $pageObject){
$flashTotalCount++;
}
if($flashTotalCount == 0){
echo "NO FLASH";
}
else{
echo "FLASH";
}
Would anyone one know of a way to check to see if a website uses flash or if possible get header information that flash is being used etc.
Any advise would be helpful.
As far as I understand, flash can be loaded by javascript. So you should execute the web page. For this purposes you'll have to use tool like this:
http://seleniumhq.org/docs/02_selenium_ide.html#the-waitfor-commands-in-ajax-applications
I don't think that it is usable from php.
I am trying to retrieve the content of web pages and check if the page contain certain error keywords I am monitoring. (instead of manually loading each URL everytime to check on the sites, I hope to do this programmatically and flag out errors when they occur)
I have tried XMLHttpRequest. I am able to get the HTML content, like what I see when I "view source" on the page. But the pages I monitor runs on Sharepoint and the webparts are dynamically generated. I believe if error occurs when loading these parts I would not be able to flag them out as the HTML I pull will not contain the errors but just usual paths to the webparts.
cURL seems to do the same. I just read about DOMDocument and I was wondering if DOMDocument process the codes or does it just break the HTML into a hierarchical structure.
I only wish to have the content of the URL. (like what you get when you save website as txt in IE, not the HTML). Or if I can further process the HTML then it would be good too. How can I do that? Any help will be really appreciated. :)
Why do you want to strip the HTML? It's better to use it!
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
$data = curl_exec($ch);
curl_close($ch);
// libxml_use_internal_errors(true);
$oDom = new DomDocument();
$oDom->loadHTML($data);
// Go through DOM and look for error (it's similar if it'd be
// <p class="error">error message</p> or whatever)
$errors = $oDom->getElementsByTagName( "error" ); // or however you get errors
foreach( $errors as $error ) {
if(strstr($error->nodeValue, 'SOME ERROR')) {
echo 'SOME ERROR occurred';
}
}
If you don't want to do that, you can just do:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
$data = curl_exec($ch);
curl_close($ch);
if(strstr($data, 'SOME_ERROR')) {
echo 'SOME ERROR occurred';
}