How to change values in php? - php

How can I change value of price (in wordpress) which is set for numeric values? I want to change the value to display text or numeric from url (scraping api)
right now my class_core.php file shows this:
Price Display
========================================================================== */
function PRICE($val){
// RETURN IF NOT NUMERIC
if(!is_numeric($val) && defined('WLT_JOBS') ){ return $val; }
if(isset($GLOBALS['CORE_THEME']['currency'])){
$seperator = "."; $sep = ","; $digs = 2;
if(is_numeric($val)){
$val = number_format($val,$digs, $seperator, $sep);
}
$val = hook_price_filter($val);
// RETURN IF EMPTY
if($val == ""){ return $val; }
// LEFT/RIGHT POSITION
if(isset($GLOBALS['CORE_THEME']['currency']['position']) && $GLOBALS['CORE_THEME']['currency']['position'] == "right"){
if(substr($val,-3) == ".00"){ $val = substr($val,0,-3); }
$val = $val.$GLOBALS['CORE_THEME']['currency']['symbol'];
}else{
$val = $GLOBALS['CORE_THEME']['currency']['symbol'].$val;
}
}

php is a scripting language. you dont have to declare what kind of variable you will be using. You just declare the name and the type of the variable change automatically depending on what data are you storing.

If you have a url that contains some information, like (www.xyz.com/dddddd/ddddd) you can use CURL to obtain a result...
(ref: http://www.jonasjohn.de/snippets/php/curl-example.htm)
function curl_download($Url){
// is cURL installed yet?
if (!function_exists('curl_init')){
die('Sorry cURL is not installed!');
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Url);
curl_setopt($ch, CURLOPT_REFERER, "http://www.example.org/yay.htm");
curl_setopt($ch, CURLOPT_USERAGENT, "MozillaXYZ/1.0");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$output = curl_exec($ch);
curl_close($ch);
return $output;
}
and then in your code...
$url_for_value = "www.xyz.com/dddddd/ddddd";
// remember to add http colon and two slashes in front of url...
// stackoverflow tools won't let me do that here...
$val = curl_download($url_for_value);
function PRICE($val){
if(!is_numeric($val) && defined('WLT_JOBS') ){
// if not numeric, e.g. $100 , strip off non-numeric characters.
preg_match_all('/([\d]+)/', $val, $match);
// Do we have a valid number now?
if (!is_numeric($match[0]){
// perform other tests on return info from the CURL function?
return $val;
}
$val = $match[0];
}
if(isset($GLOBALS['CORE_THEME']['currency'])){ ....
Note: Its certainly admirable to have a need for a specific function, and then use that need to motivate you to learn new skills. This project assumes a certain experience in HTML, PHP and WordPress. If you don't feel comfortable in that stuff yet, that's okay, we all started knowing nothing.
Here's a possible learning roadmap:
--HTML Learn the organization of a website, elements, and how to create forms, buttons, etc...
--PHP This is a scripting language, runs on a server.
--CSS You will need this for WordPress. (Why? Because we insist on you using a child theme, and that will require to understand how CSS works. )
--JavaScript, although not absolutely required, lots of existing tools use this.
There are a lot of free tutorials on this stuff. I'd probably start at http://html.net/ or somewhere like that. Do all the tutorials.
After that you get to jump into WordPress. Start small, modify a few sites, then grow to writing your own plugins. At that point, I think you should be able to easily create the functionality you are looking for.
If not, it could well be quicker to hire the job out. eLance is your friend.

Related

What would be the best way to collect the titles (in bulk) of a subreddit

I am looking to collect the titles of all of the posts on a subreddit, and I wanted to know what would be the best way of going about this?
I've looked around and found some stuff talking about Python and bots. I've also had a brief look at the API and am unsure in which direction to go.
As I do not want to commit to find out 90% of the way through it won't work, I ask if someone could point me in the right direction of language and extras like any software needed for example pip for Python.
My own experience is in web languages such as PHP so I initially thought of a web app would do the trick but am unsure if this would be the best way and how to go about it.
So as my question stands
What would be the best way to collect the titles (in bulk) of a
subreddit?
Or if that is too subjective
How do I retrieve and store all the post titles of a subreddit?
Preferably needs to :
do more than 1 page of (25) results
save to a .txt file
Thanks in advance.
PHP; in 25 lines:
$subreddit = 'pokemon';
$max_pages = 10;
// Set variables with default data
$page = 0;
$after = '';
$titles = '';
do {
$url = 'http://www.reddit.com/r/' . $subreddit . '/new.json?limit=25&after=' . $after;
// Set URL you want to fetch
$ch = curl_init($url);
// Set curl option of of header to false (don't need them)
curl_setopt($ch, CURLOPT_HEADER, 0);
// Set curl option of nobody to false as we need the body
curl_setopt($ch, CURLOPT_NOBODY, 0);
// Set curl timeout of 5 seconds
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
// Set curl to return output as string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// Execute curl
$output = curl_exec($ch);
// Get HTTP code of request
$status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
// Close curl
curl_close($ch);
// If http code is 200 (success)
if ($status == 200) {
// Decode JSON into PHP object
$json = json_decode($output);
// Set after for next curl iteration (reddit's pagination)
$after = $json->data->after;
// Loop though each post and output title
foreach ($json->data->children as $k => $v) {
$titles .= $v->data->title . "\n";
}
}
// Increment page number
$page++;
// Loop though whilst current page number is less than maximum pages
} while ($page < $max_pages);
// Save titles to text file
file_put_contents(dirname(__FILE__) . '/' . $subreddit . '.txt', $titles);

Catching all common types of redirection, header, meta, JavaScript, etc

I'm in need of a function that tests a URL if it is redirected by whatever means.
So far, I have used cURL to catch header redirects, but there are obviously more ways to achieve a redirect.
Eg.
<meta http-equiv="refresh" content="0;url=/somewhere/on/this/server" />
or JS scripts
window.location = 'http://melbourne.ag';
etc.
I was wondering if anybody has a solution that covers them all. I'll keep working on mine and will post the result here.
Also, a quick way of parsing
<meta http-equiv="refresh"...
in PHP anyone?
I thought this would be included in PHP's native get_meta_tags() ... but I thought wrong :/
It can be done for markup languages (any simple markup parser will do), but it cannot be done in general for programming languages like JavaScript.
Redirection in a program in a Web document is equivalent to halting that program. You are asking for a program that is able to tell whether another, arbitrary program will halt. This is known in computer science as the halting problem, the first undecidable problem.
That is, you will only be able to tell correctly for a subset of resources whether redirection will occur.
Halfway there, I'll add the JS checks when I wrote them...
function checkRedirect($url){
// returns the redirected URL or the original
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL, $url);
$out = curl_exec($ch);
$out = str_replace("\r", "", $out);
$headers_end = strpos($out, "\n\n");
if( $headers_end !== false ) {
$out = substr($out, 0, $headers_end);
}
$headers = explode("\n", $out);
foreach($headers as $header) {
if(strtolower(substr($header, 0, 10)) == "location: " ) {
$target = substr($header, 10);
return $target;
}
}
return $url;
}

PHP - Parse_url only get pages

I'm working on a little webcrawler as a side project at the moment and basically having it collect all hrefs on a page and then subsequently parsing those, my problem is.
How can I only get the actual page results? at the moment i'm using the following
foreach($page->getElementsByTagName('a') as $link)
{
$compare_url = parse_url($link->getAttribute('href'));
if (#$compare_url['host'] == "")
{
$links[] = 'http://'.#$base_url['host'].'/'.$link->getAttribute('href');
}
elseif ( #$base_url['host'] == #$compare_url['host'] )
{
$links[] = $link->getAttribute('href');
}
}
As you can see this will bring in jpegs, exe files etc. I only need to pickup the web pages like .php, .html, .asp etc.
I'm not sure if there is some function able to work this one out or if it will need to be regex from some sort of master list?
Thanks
Since the URL string alone doesn't connected with the resource behind it in any way you will have to go out and ask the webserver about them. For this there's a HTTP method called HEAD so you won't have to download everything.
You can implement this with curl in php like this:
function is_html($url) {
function curl_head($url) {
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_NOBODY, true);
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_MAXREDIRS, 5);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true );
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HTTP_VERSION , CURL_HTTP_VERSION_1_1);
$content = curl_exec($curl);
curl_close($curl);
// redirected heads just pile up one after another
$parts = explode("\r\n\r\n", trim($content));
// return only the last one
return end($parts);
}
$header = curl_head('http://github.com');
// look for the content-type part of the header response
return preg_match('/content-type\s*:\s*text\/html/i', $header);
}
var_dump(is_html('http://github.com'));
This version is only accepts text/html responses and doesn't check if the response is 404 or other error (however follows redirects up to 5 jumps). You can tweak the regexp or add some error handling in either from the curl response, or by matching against the header string's first line.
Note: Webservers will run scripts behind these URLs to give you responses. Be careful not overload hosts with probing, or grabbing "delete" or "unsubscribe" type links.
To check if a page is valid (html,php... extension use this function:
function check($url){
$extensions=array("php","html"); //Add extensions here
foreach($extensions as $ext){
if(substr($url,-(strlen($ext)+1))==".".$ext){
return 1;
}
}
return 0;
}
foreach($page->getElementsByTagName('a') as $link) {
$compare_url = parse_url($link->getAttribute('href'));
if (#$compare_url['host'] == "") { if(check($link->getAttribute('href'))){ $links[] = 'http://'.#$base_url['host'].'/'.$link->getAttribute('href');} }
elseif ( #$base_url['host'] == #$compare_url['host'] ) {
if(check($link->getAttribute('href'))){ $links[] = $link->getAttribute('href'); }
}
Consider using preg_match to check the type of the link (application , picture , html file) and considering the results decide what to do.
Another option (and simple) is to use explode and find the last string of the url which comes after a . (the extension)
For instance:
//If the URL will has any one of the following extensions , ignore them.
$forbid_ext = array('jpg','gif','exe');
foreach($page->getElementsByTagName('a') as $link) {
$compare_url = parse_url($link->getAttribute('href'));
if (#$compare_url['host'] == "")
{
if(check_link_type($link->getAttribute('href')))
$links[] = 'http://'.#$base_url['host'].'/'.$link->getAttribute('href');
}
elseif ( #$base_url['host'] == #$compare_url['host'] )
{
if(check_link_type($link->getAttribute('href')))
$links[] = $link->getAttribute('href');
}
}
function check_link_type($url)
{
global $forbid_ext;
$ext = end(explode("." , $url));
if(in_array($ext , $forbid_ext))
return false;
return true;
}
UPDATE (instead of checking 'forbidden' extensions , let's look for good ones)
$good_ext = array('html','php','asp');
function check_link_type($url)
{
global $good_ext;
$ext = end(explode("." , $url));
if($ext == "" || !in_array($ext , $good_ext))
return true;
return false;
}

How to skip links containing file extensions while web scraping using PHP

Here is a function that validates .edu TLD and checks that the url does not point to a .pdf document or a .doc document.
public function validateEduDomain($url) {
if( preg_match('/^https?:\/\/[A-Za-z]+[A-Za-z0-9\.-]+\.edu/i', $url) && !preg_match('/\.(pdf)|(doc)$/i', $url) ) {
return TRUE;
}
return FALSE;
Now I am encountering links that point to jpg, rtf and others that simple_html_dom tries to parse and return its content. I want to avoid this happening by skipping all such links. The problem is that the list is non-exhaustive and I want the code to skip all such links. How am I supposed to do that??
Tring to filter urls by guessing what's behind it will always fail in a number of cases. Assuming you are using curl to download, you should check if the response document-type header is among the acceptable ones:
<?php
require "simple_html_dom.php";
$curl = curl_init();
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true); //default is to output it
$urls = array(
"google.com",
"https://www.google.com/logos/2012/newyearsday-2012-hp.jpg",
"http://cran.r-project.org/doc/manuals/R-intro.pdf",
);
$acceptable_types = array("text/html", "application/xhtml+xml");
foreach ($urls as $url) {
curl_setopt($curl, CURLOPT_URL, $url);
$contents = curl_exec($curl);
//we need to handle content-types like "text/html; charset=utf-8"
list($response_type) = explode(";", curl_getinfo($curl, CURLINFO_CONTENT_TYPE));
if (in_array($response_type, $acceptable_types)) {
echo "accepting {$url}\n";
// create a simple_html_dom object from string
$obj = str_get_html($contents);
} else {
echo "rejecting {$url} ({$response_type})\n";
}
}
running the above results in:
accepting google.com
rejecting https://www.google.com/logos/2012/newyearsday-2012-hp.jpg (image/jpeg)
rejecting http://cran.r-project.org/doc/manuals/R-intro.pdf (application/pdf)
Update the last regex to something like this:
!preg_match('/\.(pdf)|(doc)|(jpg)|(rtf)$/i', $url) )
Will filter out the jpgs and rtf documents.
You have to add the extensions to the regex above to omit them.
Update
I don’t think its possible to block all sort of extensions and I personally do not recommend it for scraping usage also. You will have to skip some extensions to keep crawling. Why dont you change you regex filter to the ones you would like to accept like:
preg_match('/\.(html)|(html)|(php)|(aspx)$/i', $url) )

Can I Send URL with Parameters via PHP and retrieve the data?

I'm starting to help a friend who runs a website with small bits of coding work, and all the code required will be PHP. I am a C# developer, so this will be a new direction.
My first stand-alone task is as follows:
The website is informed of a new species of fish. The scientific name is entered into, say, two input controls, one for the genus (X) and another for the species (Y). These names will need to be sent to a website in the format:
http://www.fishbase.org/Summary/speciesSummary.php?genusname=X&speciesname=Y&lang=English
Once on the resulting page, there are further links for common names and synonyms.
What I would like to be able to do is to find these links, and call the URL (as this will contain all the necessary parameters to get the particular data) and store some of it.
I want to save data from both calls and, once completed, convert it all into xml which can then be uploaded to the website's database.
All I'd like to know is (a) can this be done, and (b) how difficult is it?
Thanks in advance
Martin
If I understand you correctly you want your script to download a page and process the downloaded data. If so, the answers are:
a) yes
b) not difficult
:)
Oke... here some more information: I would use the CURL extension, see:
http://php.net/manual/en/book.curl.php
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "example.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
?>
I used a thing called snoopy (http://sourceforge.net/projects/snoopy/) 4 years a go.
I took about 500 customers profiles from a website that published them in a few hours.
a) Yes
b) Not difficult when have experience.
Google for CURL first, or allow_url_fopen.
file_get_contents() will do the job:
$data = file_get_contents('http://www.fishbase.org/Summary/speciesSummary.php?genusname=X&speciesname=Y&lang=English');
// Отправить URL-адрес
function send_url($url, $type = false, $debug = false) { // $type = 'json' or 'xml'
$result = '';
if (function_exists('curl_init')) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_close($ch);
} else {
if (($content = #file_get_contents($url)) !== false) $result = $content;
}
if ($type == 'json') {
$result = json_decode($result, true);
} elseif ($type == 'xml') {
if (($xml = #simplexml_load_file($result)) !== false) $result = $xml;
}
if ($debug) echo '<pre>' . print_r($result, true) . '</pre>';
return $result;
}
$data = send_url('http://ip-api.com/json/212.76.17.140', 'json', true);

Categories