I have a small element on my website that displays the validity of the current page's markup. At the moment, it is statically set as "HTML5 Valid", as I constantly check whether it is, in fact, HTML5 valid. If it's not then I fix any issues so it stays HTML5-valid.
I would like this element to be dynamic, though. So, is there any way to ping the W3C Validation Service with the current URL, receive the result and then plug the result into a PHP or JavaScript function? Does the W3C offer an API for this or do you have to manually code this?
Maintainer of the W3C HTML Checker (aka validator) here. In fact the checker does expose an API that lets you do, for example:
https://validator.w3.org/nu/?doc=https%3A%2F%2Fgoogle.com%2F&out=json
…which gives you the results back as JSON. There’s also a POST interface.
You can find more details here:
https://github.com/validator/validator/wiki/Service-»-HTTP-interface
https://github.com/validator/validator/wiki/Service-»-Input-»-POST-body
https://github.com/validator/validator/wiki/Service-»-Input-»-GET
https://github.com/validator/validator/wiki/Output-»-JSON
They do not have an API that I am aware of.
As such, my suggestion would be:
Send a request (GET) to the result page (http://validator.w3.org/check?uri=) with your page's URL (using file_get_contents() or curl). Parse the response for the valid message (DOMDocument or simple string search).
Note: This is a brittle solution. Subject to break if anything changes on W3C's side. However, it will work and this tool has been available for several years.
Also, if you truly want this on your live site I'd strongly recommend some kind of caching. Doing this on every page request is expensive. Honestly, this should be a development tool. Something that is run and reports the errors to you. Keep the badge static.
Here is an example how to implement W3C API to validate HTML in PHP:
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_URL => "http://validator.w3.org/nu/?out=json",
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => "",
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 30,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => "POST",
CURLOPT_POSTFIELDS => '<... your html text to validate ...>',
CURLOPT_HTTPHEADER => array(
"User-Agent: Any User Agent",
"Cache-Control: no-cache",
"Content-type: text/html",
"charset: utf-8"
),
));
$response = curl_exec($curl);
$err = curl_error($curl);
curl_close($curl);
if ($err) {
//handle error here
die('sorry etc...');
}
$resJson = json_decode($response, true);
$resJson will look like this:
{
"messages": [
{
"type": "error",
"lastLine": 13,
"lastColumn": 110,
"firstColumn": 5,
"message": "Attribute “el” not allowed on element “link” at this point.",
"extract": "css\">\n <link el=\"stylesheet\" href=\"../css/plugins/awesome-bootstrap-checkbox/awesome-bootstrap-checkbox.min.css\">\n <",
"hiliteStart": 10,
"hiliteLength": 106
},
{
"type": "info",
"lastLine": 294,
"lastColumn": 30,
"firstColumn": 9,
"subType": "warning",
"message": "Empty heading.",
"extract": ">\n <h1 id=\"promo_codigo\">\n ",
"hiliteStart": 10,
"hiliteLength": 22
},....
Check https://github.com/validator/validator/wiki/Service-»-Input-»-POST-body for more details.
Related
I am currently attempting to query the user information for any given discord account. Its pretty simple, I give the website a Discord User ID and it outputs that specific users account information.
I believe my issue is related directly to how discord authorizes their bots and after about an hour of google searching I figured it would be better to ask here.
Any help much appreciated!
Current Code:
$IDURL = "https://discord.com/api/v8/users/" . $UserID;
$CurlInitID = curl_init();
curl_setopt_array( $CurlInitID, [
CURLOPT_URL => $IDURL,
CURLOPT_POST => true,
CURLOPT_HTTPHEADER => [
"Authorization: Bot *private key removed*"
]
]);
$Return = curl_exec($CurlInitID);
curl_close($CurlInitID);
die($Return);
Return From Website:
{"message": "405: Method Not Allowed", "code": 0}
I want to programmatically update product information, such as quantity, price, etc.
(from outside of the Magento source directory.)
How can I do that?
Magento is pretty easy to bootstrap. If you want a standalone script where you can access all your functions, just add the following at the top of your PHP file :
define('MAGENTO', realpath(dirname(__FILE__)));
require_once MAGENTO . '/../app/Mage.php'; //or whatever location your Mage.php file is
Mage::app(Mage_Core_Model_Store::ADMIN_CODE); //initialization with another store is possible
After that you can load all your models. To update your products I suggest you two ways. The regular one :
Mage::getModel('catalog/product')->setStoreId($myStoreId)->load($myProductId)
->setPrice(50)
->save();
Or the API model usage :
$api = Mage::getModel('catalog/product_api_v2');
$api->update($productId, $productData, $store, $identifierType);
I'd highly recommend leveraging the REST APIs available in M2x to Create/Update products and its attributes.
Note: You have the option of using OAuth or Bearer Tokens in Magento 2 to Authenticate/Authorize your API invocations.
You can find additional information on all the APIs available in Magento 2.1 here -
http://devdocs.magento.com/swagger/index_21.html
You'll find specifics of the APIs you need under the grouping titled catalogProductRepositoryV1
Get info on product(s) using a search/filter criteria -> GET /V1/products
Get info on a specific Product -> GET /V1/products/{sku}
Create New Product -> POST /V1/products
Create/Update specific Product -> PUT /V1/products/{sku}
I haven't tested the code, but I think something like this should do the trick:
$BEARER_TOKEN_TO_USE_FOR_TRANSACTION = 'XYZ';
$REQUEST_HEADER = array(
"Authorization => Bearer ". $BEARER_TOKEN_TO_USE_FOR_TRANSACTION ,
"cache-control: no-cache",
"content-type: application/json"
);
$REQUEST_URL='INSTANCE_URL/rest/V1/products';
$PRODUCT_DATA_TO_USE ='{
"product": {
ENTER_PRODUCT_ATTRIBUTES_AS_JSON
} }';
$CURL_OBJ = curl_init($REQUEST_URL);
$CURL_OPTIONS_ARRAY_TO_USE = array (
CURLOPT_HTTPHEADER => $REQUEST_HEADER,
CURLOPT_CUSTOMREQUEST => "POST",
CURLOPT_POSTFIELDS => $PRODUCT_DATA_TO_USE,
CURLOPT_URL => $REQUEST_URL,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_ENCODING => "",
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 30,
);
curl_setopt_array($CURL_OBJ, $CURL_OPTIONS_ARRAY_TO_USE);
$result = curl_exec($CURL_OBJ);
$result = json_decode($result);
echo 'Output -> " . $result;
Trying to help out someone who is trying to access and API using PHP. My code using ColdFusion works fine posting to the API, but we can't get the PHP to work. In CF the code uses urlparams to send the data:
<cfhttp url="https://example.com/_api/proxyApi.cfc" method="post" result="httpResult" charset="UTF-8">
<cfhttpparam type="url" name="method" value="apiauth"/>
<cfhttpparam type="url" name="argumentCollection" value="#jsData#"/>
</cfhttp>
A dump of the resulting call from the API shows the variables in the URL like this:
method = apiauth is the main authorization function, and then the json string in argumentCollection is passed to the correction function in the API by apiauth.
From PHP his curl is posting as form data, not URL and the API complains that the required information is missing because it's in the wrong scope. I've been trying to figure out how to make curl use URL scope instead:
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_URL => $target_url,
CURLOPT_POST => 1,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_MAXREDIRS => 2,
CURLOPT_AUTOREFERER => true,
CURLOPT_POSTFIELDS => array(
'method' => 'apiauth',
'argumentCollection' => $json
)
));
The same dump from the API shows the same data, but in the wrong scope:
It seems like if we can get the data in the right scope we'll make progress, but my PHP knowledge is dangerously limited.
You are sending an empty POST in your CF example.
<cfhttpparam type="url" is processed as a query string parameter, as in:
https://example.com/_api/proxyApi.cfc?method=apiauth&argumentCollection=...
Thus your dump of the URL scope (the key-value-paired query string) shows the data.
To put those parameters into your POST body, you would use:
<cfhttpparam type="formfield"
And then you FORM scope would show the data.
Your PHP cURL does the latter: it adds your parameters to the POST body.
If you want the cURL to work as your example CF code, do this instead:
// add the parameters to the URL's query string
// start with & instead of ?, if the URL already contains a query string, see comment below snippet
$target_url .= '?'.'method=apiauth'.'&'.'argumentCollection='.urlencode($json);
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_URL => $target_url,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_MAXREDIRS => 2,
CURLOPT_AUTOREFERER => true
));
no query string in $target_url:
$target_url = 'https://example.com/_api/proxyApi.cfc';
$target_url .= '?'.'method=apiauth'.'&'.'argumentCollection='.urlencode($json);
query string in $target_url:
$target_url = 'https://example.com/_api/proxyApi.cfc?p=';
$target_url .= '&'.'method=apiauth'.'&'.'argumentCollection='.urlencode($json);
On a side note: You probably don't want to send JSON via query string as the query string has a limit of about 2000 chars (depends on browser and webserver). If your JSON is complex, your query string will be truncated and mess everything up. Use the POST body for this instead.
I'm new to cURL.
I have been trying to scrape contents of this amazon link, (ie., image, book title, author and price of the 20 books) into a html page. So far I've got is print the page using the below code
<?php
function curl($url) {
$options = Array(
CURLOPT_RETURNTRANSFER => TRUE,
CURLOPT_FOLLOWLOCATION => TRUE,
CURLOPT_AUTOREFERER => TRUE,
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_MAXREDIRS => 10,
CURLOPT_URL => $url,
);
$ch = curl_init();
curl_setopt_array($ch, $options);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
?>
$url = "http://www.amazon.in/gp/bestsellers/books/1318209031/ref=zg_bs_nav_b_2_1318203031";
$results_page = curl($url);
echo $results_page;
I have tried using regex and failed; I have tried everything possible for 6hrs straight and got really tired, hoping I will find solution here; just thanks isn't enough for the solution but tq in advance. :)
UPDATE: Found a really helpful site(click here) for beginners like me(without using cURL though).
You really should be using the AWSECommerce API, but here's a way to leverage Yahoo's YQL service:
<?php
$query = sprintf(
'http://query.yahooapis.com/v1/public/yql?q=%s',
urlencode('SELECT * FROM html WHERE url = "http://www.amazon.in/gp/bestsellers/books/1318209031/ref=zg_bs_nav_b_2_1318203031" AND xpath=\'//div[#class="zg_itemImmersion"]\'')
);
$xml = new SimpleXMLElement($query, null, true);
foreach ($xml->results->div as $product) {
vprintf("%s\n", array(
$product->div[1]->div[1]->a,
));
}
/*
Engineering Thermodynamics
A Textbook of Fluids Mechanics
The Design of Everyday Things
A Forest History of India
Computer Networking
The Story of Microsoft
Private Empire: ExxonMobil and Americ...
Project Management Metrics, KPIs, and...
Design and Analysis of Experiments: I...
IES - 2013: General English
Foundation of Software Testing: ISTQB...
Faster: 100 Ways to Improve your Digi...
A Textbook of Fluid Mechanics and Hyd...
Software Engineering for Embedded Sys...
Communication Skills for Engineers
Making Things Move DIY Mechanisms for...
Virtual Instrumentation Using Labview
Geometric Dimensioning and Tolerancin...
Power System Protection & Switchgear...
Computer Networks
*/
I am trying to scrape a aspx page using php curl code, which contains data page wise. Initially the page loads with get method, but as we select page no. from drop down it submits page the page using post method.
I want to find data of particular page no by passing postfields to curl, but couldn't do that.
I have created a dummy code to get records of 5th page, but it always returns result of first page.
Sample code
$url = 'http://www.ticketalternative.com/SitePages/Search.aspx?catid=All&pattern=Enter%20Artist%2c%20Team%2c%20or%20Venue';
$file=file_get_contents($url);
//<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value=
preg_match_all("#<input.*?name=\"__VIEWSTATE\".*?value=\"(.*?)\".*?>.*?<input.*?name=\"__EVENTVALIDATION\".*?value=\"(.*?)\".*?>#mis", $file, $arr_viewstate);
$viewstate = urlencode($arr_viewstate[1][0]);
$eventvalidation = urlencode($arr_viewstate[2][0]);
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => true, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 1120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_POST => true,
CURLOPT_VERBOSE => true,
CURLOPT_POSTFIELDS => '__EVENTTARGET='.urlencode('ctl00$ContentPlaceHolder1$SearchResults1$SearchResultsGrid$ctl13$ctl05').'&__EVENTARGUMENT='.urlencode('').'&__VIEWSTATE='.$viewstate.'&__EVENTVALIDATION='.$eventvalidation.'&__LASTFOCUS='.urlencode('').'&ctl00$ContentPlaceHolder1$SearchResults1$SearchResultsGrid$ctl13$ctl05=4');
$ch = curl_init($url);
curl_setopt_array($ch,$options);
$result = curl_exec($ch);
curl_close($ch);
preg_match_all('/<a id=\".*?LinkToVenue\" href=\"(.*?)\">(.*?)<\/a>/ms',$result,$matches);
print_r($matches);
Can anybody help me out with this, where am I getting wrong, I think its not working because at first time page loads with GET method and as we go on page links it uses post.
How will I get records of particular page no.?
Regards
I write scrapers in php sometimes when a client requires it but I would never attempt to scrape an ASP.NET site with php. For that you need perl python or ruby. All 3 have a mechanize library that usually makes it easy.