Scrape page and separate internal from external links - php

Building a little PHP scraper , I'm writing a little function which should separate my Internal & External links,
I'm passing the function a copy of the html source code along with the base host address
$source = file_get_contents('http://www.example.com');
$host = "mysite.com";
here is my function so far...
function find_page_links($source, $host){
if($source){
$htmlDoc = new DomDocument();
#$htmlDoc->loadhtml($source);
$int_links = array();
$ext_links = array();
// GET LINKS
foreach($htmlDoc->getElementsByTagName('a') as $link) {
$url = trim($link->getAttribute('href'));
$title = trim($link->getAttribute('title'));
$text = trim($link->nodeValue);
$rel = trim($link->getAttribute('rel'));
$pos = strpos($url,$host);
if( $pos === false ){ // NO MATCH EXTERNAL
if( (substr($url, 0, 1) == '/') ||
(substr($url, 0, 1) == '#') )
{
// INTERNAL
$int_links[] = array( 'link_url' => $url,
'link_text' => $text,
'link_title' => $title,
'link_rel' => $rel
);
}else{
// EXTERNAL
$ext_links[] = array( 'link_url' => $url,
'link_text' => $text,
'link_title' => $title,
'link_rel' => $rel
);
}
}else{
if( $pos < 20 ){
// INTERNAL
$int_links[] = array( 'link_url' => $url,
'link_text' => $text,
'link_title' => $title,
'link_rel' => $rel );
}else{
// EXTERNAL
$ext_links[] = array( 'link_url' => $url,
'link_text' => $text,
'link_title' => $title,
'link_rel' => $rel
);
}
} // end else
} // end foreach
$content = array();
$content['int_links'] = $int_links;
$content['ext_links'] = $ext_links;
return $content ;
}
}
So whats happening is the function loads the HTML via DomDocument
I create 2 arrays to store both internal & external
Loop through the document and getElementsByTagName('a')
It then uses strpos to check if the host address "example.com" is within the link URL if there is NO match/false then it's external, but we do a further check to make sure the link URL doesnt start with a forward slash ie: "/contact-us.php" that would mean its an internal, also in that check we check for a "#" tag at the begining which would be an anchor link on the page...
so that was IF pos === false / no match
now If host is in the link URL its a match, I do another check to see if the position of the host is lower down in the string, which would be internal ie:
http://example.com/about/
but if the position is greater than 20 (just a number plucked from the air) then..
like a google plus link or facebook link the host url will be present in the link but much further along the string which would mean its an external,
ie: http://www.facebook.com/plugins/like.php?href=http://example.com/
phew...
if you guys have any other BETTER ways to spot an external or internal link please let me know..my results, really vary depending on the site, if links are using the full path,

Related

update_post_meta and update_field not working without any error received

I am working with a wordpress site that imports all data from API to the site automatically via cron job. However, I'm now on the part of saving the field data from the API. The problem is update_post_meta and update_fields are both not working.
I already tried interchanging between the two methods of saving but both doesn't work. No error prompts and no results as well (which is pretty weird for me). I checked the built-in plugin of the site and it uses update_post_meta.
add_action('wp_ajax_nopriv_get_properties_from_api','get_properties_from_api');
add_action('wp_ajax_get_properties_from_api','get_properties_from_api')
function get_properties_from_api(){
$file = get_stylesheet_directory() . '/report.txt';
$current_page=(! empty($_POST['current_page'])) ? $_POST['current_page'] : 1;
$properties = [];
$results = wp_remote_retrieve_body(wp_remote_get('https://www.realestateview.com.au/listing_api?rm=search&company=castlemain&code=29GKRRSgkVdQVM&CID=5813&json=1&ptr=r&con=S&portalview=residential&rn=1&pg='. $current_page));
file_put_contents($file, "Current page: ". $current_page. "\n\n", FILE_APPEND);
$results=json_decode($results, true);
if(!is_array($results['Listings']) || empty($results['Listings'])){
return false;
}
$properties[]=$results;
foreach($properties[0] as $property){
$property_slug = sanitize_title($property->TitleNoHTML, '-', $property->OrderID);
$inserted_property = wp_insert_post([
'post_name' => $property_slug,
'post_title' => $property->TitleNoHTML,
'post_type' => 'property',
'post_status' => 'publish',
]);
if(is_wp_error($inserted_property)){
continue;
}
$fillable=[
//Basic information
get_the_title($inserted_property) => 'TitleNoHTML',
'REAL_HOMES_property_price' => 'PriceText',
'REAL_HOMES_property_size' => 'LandSizeText',
'REAL_HOMES_property_bedrooms' => 'BedroomsCount',
'REAL_HOMES_property_bathrooms' => 'BathroomsCount',
'REAL_HOMES_property_garage' => 'LockUpGaragesCount',
'REAL_HOMES_featured' => 'FeaturedProperty',
//$this->REAL_HOMES_property_id =
//$this->REAL_HOMES_property_year_built =
//Location on Map
'REAL_HOMES_property_address' => 'AddressText',
'REAL_HOMES_property_location' => 'Suburb',
'REAL_HOMES_property_map' => 'DisplayTrueAddress',
//Gallery
'REAL_HOMES_property_images' => 'PhotoOriginalURL',
//Floor Plans
//$this->inspiry_floor_plan_name =
'inspiry_floor_plan_price' => 'PriceText',
//$this->inspiry_floor_plan_price_postfix =
// $this->inspiry_floor_plan_size =
// $this->inspiry_floor_plan_size_postfix =
'inspiry_floor_plan_bedrooms' => 'BedroomsCount',
'inspiry_floor_plan_bathrooms' =>'BathroomsCount',
// $this->inspiry_floor_plan_descr =
'inspiry_floor_plan_image' => 'FloorplanThumbURL',
//Property Video
'inspiry_video_group_image' => 'PhotoThumbURL',
//$this->inspiry_video_group_title =
'inspiry_video_group_url' => 'VideoURL',
//DEPRECATED FIELDS
// $this->REAL_HOMES_360_virtual_tour =
// $this->REAL_HOMES_tour_video_url_divider =
// $this->REAL_HOMES_tour_video_url =
// $this->REAL_HOMES_tour_video_image =
//Agent
//$this->REAL_HOMES_agent_display_option =
'REAL_HOMES_agents' => 'ContactAgentName',
//Energy Performance
// $this->REAL_HOMES_energy_class =
// $this->REAL_HOMES_energy_performance =
// $this->REAL_HOMES_epc_current_rating =
// $this->REAL_HOMES_epc_potential_rating =
//Misc
// $this->REAL_HOMES_sticky =
// $this->inspiry_property_label =
// $this->inspiry_property_label_color =
// $this->REAL_HOMES_attachments =
'inspiry_property_owner_name' => 'ClientName',
//$this->inspiry_property_owner_contact =
'inspiry_property_owner_address' => 'ClientAddress',
// $this->REAL_HOMES_property_private_note =
// $this->inspiry_message_to_reviewer =
//Homepage slider
// $this->REAL_HOMES_add_in_slider =
// $this->REAL_HOMES_slider_image =
// $this->REAL_HOMES_page_banner_image =
//Additional fields
'inspiry_InspectionDateandStartTime' => 'ISOInspectionStart',
'inspiry_InspectionDateandFinishTime' => 'ISOInspectionFinish',
];
foreach($fillable as $key => $TitleNoHTML){
update_post_meta($inserted_property, $key, $property->$TitleNoHTML);
}
}
$current_page = $current_page + 1;
wp_remote_post(admin_url('admin-ajax.php?action=get_properties_from_api'), [
'blocking' => false,
'sslverify' => false,
'body' => [
'current_page' => $current_page
]
]);
What I'm already expecting is it should be already save some data, if not, it should produce an error but for some weird reason, there isn't. I tried to var_dump some variables and I think it should be working. Would anyone be able to help me find out where I gone wrong?
In order to replace the Wordpress cron with a real cron job you will need to set a cron job which will fetch data from a webpage using wget. First you will need to create a Wordpress page which will contain your PHP code then fetch the content with cron using wget.
The real cron job command will look like this:
wget -q -O - http://yourdomain.com/your_cron_page >/dev/null 2>&1
-q tells wget to operate quietly (ie. to not output the usual status information)
-O /dev/null tells it to output to /dev/null
Keep in mind that everyone can access this page so you might want to set some restrictions.

Prevent writing of log/file if visitor is a spider?

I have a website status checker that writes the latest urls checked to a log file (url, status e.g. up or down and date checked), trouble i'm now finding is that it also records spider/Google bot visits, so latest site checks are being written multiple times per second...
Here is my log writing function:
public function log($url, $status) {
if (strpos($url, "/") !== false):
if (strpos($url, "http://") === false):
$url = "http://" . $url;
endif;
$parse = parse_url($url);
$url = $parse['host'];
endif;
if (!empty($url)):
$arrayToWrite = array(
array(
"url" => $url,
"status" => $status,
"date" => date("m/d/Y h:i")
)
);
if (file_exists($this->logfile)):
$fileContents = file_get_contents($this->logfile);
$arrayFromFile = unserialize($fileContents);
foreach ($arrayFromFile as $k => $tmpArray):
if ($tmpArray['url'] == $url):
unset($arrayFromFile[$k]);
endif;
endforeach;
if (is_array($arrayFromFile)):
array_splice($arrayFromFile, 9);
$arrayToWrite = array_merge($arrayToWrite, $arrayFromFile);
endif;
endif;
file_put_contents($this->logfile, serialize($arrayToWrite));
endif;
}
What type of amendments could I make so it ignores bots/spider visits please so it only tracks/writes real visitors?
Refrencing this answer: how to detect search engine bots with php?
You can use $_SERVER['HTTP_USER_AGENT'] to check if the visitor identifies as a spider.
$bots = array("googlebot", "msn", "add other bots");
if(in_array(strtolower($_SERVER['HTTP_USER_AGENT']), $bots)){
// Don't save url
}
A List of Spiders

Inserting data by sending PUT request in CodeIgniter (Phil Sturgeon Rest Server)

There is a difference between a PUT and POST request I send through a REST CLIENT in my API. It is implemented in CodeIgniter with Phil Sturgeon's REST server.
function station_put(){
$data = array(
'name' => $this->input->post('name'),
'number' => $this->input->post('number'),
'longitude' => $this->input->post('longitude'),
'lat' => $this->input->post('latitude'),
'typecode' => $this->input->post('typecode'),
'description' => $this->input->post('description'),
'height' => $this->input->post('height'),
'mult' => $this->input->post('mult'),
'exp' => $this->input->post('exp'),
'elevation' => $this->input->post('elevation')
);
$id_returned = $this->station_model->add_station($data);
$this->response(array('id'=>$id_returned,'message'=>"Successfully created."),201);
}
this request successfully inserts data into the server BUT - it renders the rest of the values NULL except for the id.
But if you change the function name into station_post, it inserts the data correctly.
Would somebody please point out why the PUT request does not work? I am using the latest version of google chrome.
Btw this API will be integrated to a BackBone handled app. Do I really need to use PUT? Or is there another workaround with the model saving function in backbone when using post?
Finally answered. Instead of $this->input->post or $this->input->put, it must be $this->put or $this->post because the data is not coming from a form.
Codeigniter put_stream also failing to fetch put data, therefore I had to handle php PUT request and I found this function useful enough to save someone's time:
function parsePutRequest()
{
// Fetch content and determine boundary
$raw_data = file_get_contents('php://input');
$boundary = substr($raw_data, 0, strpos($raw_data, "\r\n"));
// Fetch each part
$parts = array_slice(explode($boundary, $raw_data), 1);
$data = array();
foreach ($parts as $part) {
// If this is the last part, break
if ($part == "--\r\n") break;
// Separate content from headers
$part = ltrim($part, "\r\n");
list($raw_headers, $body) = explode("\r\n\r\n", $part, 2);
// Parse the headers list
$raw_headers = explode("\r\n", $raw_headers);
$headers = array();
foreach ($raw_headers as $header) {
list($name, $value) = explode(':', $header);
$headers[strtolower($name)] = ltrim($value, ' ');
}
// Parse the Content-Disposition to get the field name, etc.
if (isset($headers['content-disposition'])) {
$filename = null;
preg_match(
'/^(.+); *name="([^"]+)"(; *filename="([^"]+)")?/',
$headers['content-disposition'],
$matches
);
list(, $type, $name) = $matches;
isset($matches[4]) and $filename = $matches[4];
// handle your fields here
switch ($name) {
// this is a file upload
case 'userfile':
file_put_contents($filename, $body);
break;
// default for all other files is to populate $data
default:
$data[$name] = substr($body, 0, strlen($body) - 2);
break;
}
}
}
return $data;
}

How to attach an image to a post using wordpress and xml-rpc? [duplicate]

Anyone knows how to create new post with photo attached in WordPress using XMLRPC?
I am able to create new post and upload new picture separately, but looks like there is no way to attach the uploaded photo to the created post?
Below is the codes I'm currently using.
<?php
DEFINE('WP_XMLRPC_URL', 'http://www.blog.com/xmlrpc.php');
DEFINE('WP_USERNAME', 'username');
DEFINE('WP_PASSWORD', 'password');
require_once("./IXR_Library.php");
$rpc = new IXR_Client(WP_XMLRPC_URL);
$status = $rpc->query("system.listMethods"); // method name
if(!$status){
print "Error (".$rpc->getErrorCode().") : ";
print $rpc->getErrorMessage()."\n";
exit;
}
$content['post_type'] = 'post'; // post title
$content['title'] = 'Post Title '.date("F j, Y, g:i a"); // post title
$content['categories'] = array($response[1]['categoryName']); // psot categories
$content['description'] = '<p>Hello World!</p>'; // post body
$content['mt_keywords'] = 'tag keyword 1, tag keyword 2, tag keyword 3'; // post tags
$content['mt_allow_comments'] = 1; // allow comments
$content['mt_allow_pings'] = 1; // allow pings
$content['custom_fields'] = array(array('key'=>'Key Name', 'value'=>'Value One')); // custom fields
$publishBool = true;
if(!$rpc->query('metaWeblog.newPost', '', WP_USERNAME, WP_PASSWORD, $content, $publishBool)){
die('An error occurred - '.$rpc->getErrorCode().":".$rpc->getErrorMessage());
}
$postID = $rpc->getResponse();
echo 'POST ID: '.$postID.'<br/>';
if($postID){ // if post has successfully created
$fs = filesize(dirname(__FILE__).'/image.jpg');
$file = fopen(dirname(__FILE__).'/image.jpg', 'rb');
$filedata = fread($file, $fs);
fclose($file);
$data = array(
'name' => 'image.jpg',
'type' => 'image/jpg',
'bits' => new IXR_Base64($filedata),
false // overwrite
);
$status = $rpc->query(
'metaWeblog.newMediaObject',
$postID,
WP_USERNAME,
WP_PASSWORD,
$data
);
echo print_r($rpc->getResponse()); // Array ( [file] => image.jpg [url] => http://www.blog.com/wp-content/uploads/2011/09/image.jpg [type] => image/jpg )
}
?>
I've been involved in WordPress sites (my current employer uses 3 of these) and posting stuff daily and by the bulk has forced me to use what I do best-- scripts!
They're PHP-based and are quick and easy to use and deploy. And security? Just use .htaccess to secure it.
As per research, XMLRPC when it comes to files is one thing wordpress really sucks at. Once you upload a file, you can't associate that attachment to a particular post! I know, it's annoying.
So I decided to figure it out for myself. It took me a week to sort it out. You will need 100% control over your publishing client that is XMLRPC compliant or this won't mean anything to you!
You will need, from your WordPress installation:
class-IXR.php, located in /wp-admin/includes
class-wp-xmlrpc-server.php, located in /wp-includes
class-IXR.php will be needed if you craft your own posting tool, like me. They have the correctly-working base64 encoder. Don't trust the one that comes with PHP.
You also need to be somewhat experienced in programming to be able to relate to this. I will try to be clearer.
Modify class-wp-xmlrpc-server.php
Download this to your computer, through ftp. Backup a copy, just in case.
Open the file in a text editor. If it doesn't come formatted, (typically it should, else, it's unix-type carriage breaks they are using) open it elsewhere or use something like ultraedit.
Pay attention to the mw_newMediaObject function. This is our target. A little note here; WordPress borrows functionality from blogger and movabletype. Although WordPress also has a unique class sets for xmlrpc, they choose to keep functionality common so that they work no matter what platform is in use.
Look for the function mw_newMediaObject($args). Typically, this should be in line 2948. Pay attention to your text editor's status bar to find what line number you are in. If you can't find it still, look for it using the search/find function of your text editor.
Scroll down a little and you should have something that looks like this:
$name = sanitize_file_name( $data['name'] );
$type = $data['type'];
$bits = $data['bits'];
After the $name variable, we will add something. See below.
$name = sanitize_file_name( $data['name'] );
$post = $data['post']; //the post ID to attach to.
$type = $data['type'];
$bits = $data['bits'];
Note the new $post variable. This means whenever you will make a new file upload request, a 'post' argument will now be available for you to attach.
How to find your post number depends on how you add posts with an xmlrpc-compliant client. Typically, you should obtain this as a result from posting. It is a numeric value.
Once you've edited the above, it's time to move on to line 3000.
// Construct the attachment array
// attach to post_id 0
$post_id = 0;
$attachment = array(
'post_title' => $name,
'post_content' => '',
'post_type' => 'attachment',
'post_parent' => $post_id,
'post_mime_type' => $type,
'guid' => $upload[ 'url' ]
);
So here's why no image is associated to any post! It is always defaulted to 0 for the post_parent argument!
That's not gonna be the case anymore.
// Construct the attachment array
// attach to post_id 0
$post_id = $post;
$attachment = array(
'post_title' => $name,
'post_content' => '',
'post_type' => 'attachment',
'post_parent' => $post_id,
'post_mime_type' => $type,
'guid' => $upload[ 'url' ]
);
$post_id now takes up the value of $post, which comes from the xmlrpc request. Once this is committed to the attachment, it will be associated to whatever post you desire!
This can be improved. A default value can be assigned so things don't get broken if no value is entered. Although in my side, I put the default value on my client, and no one else is accessing the XMLRPC interface but me.
With the changes done, save your file and re-upload it in the same path where you found it. Again, make sure to make backups.
Be wary of WordPress updates that affects this module. If that happens, you need to reapply this edit again!
Include class-IXR.php in your PHP-type editor. If you're using something else, well, I can't help you there. :(
Hope this helps some people.
When you post, WordPress will scan at the post for IMG tags.
If WP finds the image, it's loaded in it's media library. If there's an image in the body, it will automatically attached it to the post.
Basically you have to:
post the media (image) first
Grab its URL
include the URL of the image with a IMG tag in the body of your post.
then create the post
Here is some sample code. It needs error handling, and some more documentation.
$admin ="***";
$userid ="****";
$xmlrpc = 'http://localhost/web/blog/xmlrpc.php';
include '../blog/wp-includes/class-IXR.php';
$client = new IXR_Client($xmlrpc);
$author = "test";
$title = "Test Posting";
$categories = "chess,coolbeans";
$body = "This is only a test disregard </br>";
$tempImagesfolder = "tempImages";
$img = "1338494719chessBoard.jpg";
$attachImage = uploadImage($tempImagesfolder,$img);
$body .= "<img src='$attachImage' width='256' height='256' /></a>";
createPost($title,$body,$categories,$author);
/*
*/
function createPost($title,$body,$categories,$author){
global $username, $password,$client;
$authorID = findAuthor($author); //lookup id of author
/*$categories is a list seperated by ,*/
$cats = preg_split('/,/', $categories, -1, PREG_SPLIT_NO_EMPTY);
foreach ($cats as $key => $data){
createCategory($data,"","");
}
//$time = time();
//$time += 86400;
$data = array(
'title' => $title,
'description' => $body,
'dateCreated' => (new IXR_Date(time())),
//'dateCreated' => (new IXR_Date($time)), //publish in the future
'mt_allow_comments' => 0, // 1 to allow comments
'mt_allow_pings' => 0,// 1 to allow trackbacks
'categories' => $cats,
'wp_author_id' => $authorID //id of the author if set
);
$published = 0; // 0 - draft, 1 - published
$res = $client->query('metaWeblog.newPost', '', $username, $password, $data, $published);
}
/*
*/
function uploadImage($tempImagesfolder,$img){
global $username, $password,$client;
$filename = $tempImagesfolder ."/" . $img;
$fs = filesize($filename);
$file = fopen($filename, 'rb');
$filedata = fread($file, $fs);
fclose($file);
$data = array(
'name' => $img,
'type' => 'image/jpg',
'bits' => new IXR_Base64($filedata),
false //overwrite
);
$res = $client->query('wp.uploadFile',1,$username, $password,$data);
$returnInfo = $client->getResponse();
return $returnInfo['url']; //return the url of the posted Image
}
/*
*/
function findAuthor($author){
global $username, $password,$client;
$client->query('wp.getAuthors ', 0, $username, $password);
$authors = $client->getResponse();
foreach ($authors as $key => $data){
// echo $authors[$key]['user_login'] . $authors[$key]['user_id'] ."</br>";
if($authors[$key]['user_login'] == $author){
return $authors[$key]['user_id'];
}
}
return "not found";
}
/*
*/
function createCategory($catName,$catSlug,$catDescription){
global $username, $password,$client;
$res = $client->query('wp.newCategory', '', $username, $password,
array(
'name' => $catName,
'slug' => $catSlug,
'parent_id' => 0,
'description' => $catDescription
)
);
}
After calling the method metaWeblog.newMediaObject, we need to edit the image entry on the database to add a parent (the previously created post with metaWeblog.newPost).
If we try with metaWeblog.editPost, it throws an error 401, which indicates that
// Use wp.editPost to edit post types other than post and page.
if ( ! in_array( $postdata[ 'post_type' ], array( 'post', 'page' ) ) )
return new IXR_Error( 401, __( 'Invalid post type' ) );
The solution is to call wp.editPost, which takes the following arguments:
$blog_id = (int) $args[0];
$username = $args[1];
$password = $args[2];
$post_id = (int) $args[3];
$content_struct = $args[4];
So, just after newMediaObject, we do:
$status = $rpc->query(
'metaWeblog.newMediaObject',
$postID,
WP_USERNAME,
WP_PASSWORD,
$data
);
$response = $rpc->getResponse();
if( isset($response['id']) ) {
// ATTACH IMAGE TO POST
$image['post_parent'] = $postID;
if( !$rpc->query('wp.editPost', '1', WP_USERNAME, WP_PASSWORD, $response['id'], $image)) {
die( 'An error occurred - ' . $rpc->getErrorCode() . ":" . $rpc->getErrorMessage() );
}
echo 'image: ' . $rpc->getResponse();
// SET FEATURED IMAGE
$updatePost['custom_fields'] = array( array( 'key' => '_thumbnail_id', 'value' => $response['id'] ) );
if( !$rpc->query( 'metaWeblog.editPost', $postID, WP_USERNAME, WP_PASSWORD, $updatePost, $publishBool ) ) {
die( 'An error occurred - ' . $rpc->getErrorCode() . ":" . $rpc->getErrorMessage() );
}
echo 'update: ' . $rpc->getResponse();
}
I've used the Incutio XML-RPC Library for PHP to test and the rest of the code is exactly as in the question.
Here's some sample code to attach an image from a path not supported by WordPress (wp-content)
<?php
function attach_wordpress_images($productpicture,$newid)
{
include('../../../../wp-load.php');
$upload_dir = wp_upload_dir();
$dirr = $upload_dir['path'].'/';
$filename = $dirr . $productpicture;
# print "the path is : $filename \n";
# print "Filnamn: $filename \n";
$uploads = wp_upload_dir(); // Array of key => value pairs
# echo $uploads['basedir'] . '<br />';
$productpicture = str_replace('/uploads','',$productpicture);
$localfile = $uploads['basedir'] .'/' .$productpicture;
# echo "Local path = $localfile \n";
if (!file_exists($filename))
{
echo "hittade inte $filename !";
die ("no image for flaska $id $newid !");
}
if (!copy($filename, $localfile))
{
wp_delete_post($newid);
echo "Failed to copy the file $filename to $localfile ";
die("Failed to copy the file $filename to $localfile ");
}
$wp_filetype = wp_check_filetype(basename($localfile), null );
$attachment = array(
'post_mime_type' => $wp_filetype['type'],
'post_title' => preg_replace('/\.[^.]+$/', '', basename($localfile)),
'post_content' => '',
'post_status' => 'inherit'
);
$attach_id = wp_insert_attachment( $attachment, $localfile, $newid );
// you must first include the image.php file
// for the function wp_generate_attachment_metadata() to work
require_once(ABSPATH . 'wp-admin/includes/image.php');
$attach_data = wp_generate_attachment_metadata( $attach_id, $localfile );
wp_update_attachment_metadata( $attach_id, $attach_data );
}
?>
I had to do this several months ago. It is possible but not only is it hacky and undocumented I had to dig through wordpress source to figure it out. What I wrote up way back then:
One thing that was absolutely un-documented was a method to attach an image to a post. After some digging I found attach_uploads() which is a function that wordpress calls every time a post is created or edited over xml-rpc. What it does is search through the list of un-attached media objects and see if the new/edited post contains a link to them. Since I was trying to attach images so that the theme’s gallery would use them I didn’t necessarily want to link to the images within the post, nor did I want to edit wordpress. So what I ended up doing was including the image url within an html comment. -- danieru.com
Like I said messy but I searched high and low for a better method and I'm reasonably sure that none exists.
As of Wordpress 3.5, newmediaobject now recognizes the hack semi-natively.
it is no longer necessary to hack class-wp-xmlrpc-server.php.
Instead, your xml-rpc client needs to send the post number to a variable called post_id. (Previously it was just the variable 'post')
Hope that helps someone out.

Get keyword from a (search engine) referrer url using PHP

I am trying to get the search keyword from a referrer url. Currently, I am using the following code for Google urls. But sometimes it is not working...
$query_get = "(q|p)";
$referrer = "http://www.google.com/search?hl=en&q=learn+php+2&client=firefox";
preg_match('/[?&]'.$query_get.'=(.*?)[&]/',$referrer,$search_keyword);
Is there another/clean/working way to do this?
Thank you,
Prasad
If you're using PHP5 take a look at http://php.net/parse_url and http://php.net/parse_str
Example:
// The referrer
$referrer = 'http://www.google.com/search?hl=en&q=learn+php+2&client=firefox';
// Parse the URL into an array
$parsed = parse_url( $referrer, PHP_URL_QUERY );
// Parse the query string into an array
parse_str( $parsed, $query );
// Output the result
echo $query['q'];
There are different query strings on different search engines. After trying Wiliam's method, I have figured out my own method. (Because, Yahoo's is using 'p', but sometimes 'q')
$referrer = "http://search.yahoo.com/search?p=www.stack+overflow%2Ccom&ei=utf-8&fr=slv8-msgr&xargs=0&pstart=1&b=61&xa=nSFc5KjbV2gQCZejYJqWdQ--,1259335755";
$referrer_query = parse_url($referrer);
$referrer_query = $referrer_query['query'];
$q = "[q|p]"; //Yahoo uses both query strings, I am using switch() for each search engine
preg_match('/'.$q.'=(.*?)&/',$referrer,$keyword);
$keyword = urldecode($keyword[1]);
echo $keyword; //Outputs "www.stack overflow,com"
Thank you,
Prasad
To supplement the other answers, note that the query string parameter that contains the search terms varies by search provider. This snippet of PHP shows the correct parameter to use:
$search_engines = array(
'q' => 'alltheweb|aol|ask|ask|bing|google',
'p' => 'yahoo',
'wd' => 'baidu',
'text' => 'yandex'
);
Source: http://betterwp.net/wordpress-tips/get-search-keywords-from-referrer/
<?php
class GET_HOST_KEYWORD
{
public function get_host_and_keyword($_url) {
$p = $q = "";
$chunk_url = parse_url($_url);
$_data["host"] = ($chunk_url['host'])?$chunk_url['host']:'';
parse_str($chunk_url['query']);
$_data["keyword"] = ($p)?$p:(($q)?$q:'');
return $_data;
}
}
// Sample Example
$obj = new GET_HOST_KEYWORD();
print_r($obj->get_host_and_keyword('http://www.google.co.in/search?sourceid=chrome&ie=UTF-&q=hire php php programmer'));
// sample output
//Array
//(
// [host] => www.google.co.in
// [keyword] => hire php php programmer
//)
// $search_engines = array(
// 'q' => 'alltheweb|aol|ask|ask|bing|google',
// 'p' => 'yahoo',
// 'wd' => 'baidu',
// 'text' => 'yandex'
//);
?>
$query = parse_url($request, PHP_URL_QUERY);
This one should work For Google, Bing and sometimes, Yahoo Search:
if( isset($_SERVER['HTTP_REFERER']) && $_SERVER['HTTP_REFERER']) {
$query = getSeQuery($_SERVER['HTTP_REFERER']);
echo $query;
} else {
echo "I think they spelled REFERER wrong? Anyways, your browser says you don't have one.";
}
function getSeQuery($url = false) {
$segments = parse_url($url);
$keywords = null;
if($query = isset($segments['query']) ? $segments['query'] : (isset($segments['fragment']) ? $segments['fragment'] : null)) {
parse_str($query, $segments);
$keywords = isset($segments['q']) ? $segments['q'] : (isset($segments['p']) ? $segments['p'] : null);
}
return $keywords;
}
I believe google and yahoo had updated their algorithm to exclude search keywords and other params in the url which cannot be received using http_referrer method.
Please let me know if above recommendations will still provide the search keywords.
What I am receiving now are below when using http referrer at my website end.
from google: https://www.google.co.in/
from yahoo: https://in.yahoo.com/
Ref: https://webmasters.googleblog.com/2012/03/upcoming-changes-in-googles-http.html

Categories