Find and replace problem - php

My website, has 2 database tables. 1 of them have the posts_table and the other one have the videos.
At the moment i am getting the text images etc , normally from the post_table table.
In my CMS when we add a video there is added a short code
[media id=487 width=660 height=440]
This shortcode automaticly get the link of a video from the vid_table where the id is the same as the shortcode.
So what i want is:
I need to do the same thing that the short code do, when a video is added on CMS the short code is showed in the post, i need to delete the shortcode and instead of it want to be played a video that has the link on the vid_table.
I have some problems with my english , so if you dont understand again please tell me.
Any kind of help will be great.
Thank you.
EDITED: So i want to replace the whole media tag with a flash player, that plays the url that belongs to the ID in the media tag
BUMP !! CAN HELP PLEASE ?

This is quite a sophisticated problem actually. I was bored and made a basic tag parser. Right now it has some problems:
HTML rendering should be implemented in a separated class (and a template engine such as Twig should do the rendering);
Tag parsing is way too naive and will probably give you unexpected results if a tag's syntax is incorrect;
[media] tag does not support IE. You would have to change the source itself (method TagParser::renderMedia())
Some features to note:
extra parameters will be rendered as attributes for [link] tag, e.g [link id=25 class=foo] will output example.
parameters may contain spaces if you quote them: [link id=25 class="foo bar"] will output example
If DataProvider::findById() does not return a 'content' in its array, the parser will output http://example.com
The code is too long to paste here, you can find it on gist. Just put each file in the directory specified by the first commented line and you should be set. Run example.php to see it in action. You can find out some more details about using this script by looking at the unit test.

What do you want exactly?, you can get media id out of the text using
$text = 'some stuff [media id=468 width=660 height=440] more stuff';
preg_match("/media id=(.*) w/",$text, $results);
$result = $results[0];
$result = str_replace("media id=","",$result);
$result = str_replace("w","",$result);
$id = $result;

Related

Grab text between tag and specific property names and store into PHP array

i have a question, if i want to grab ALL the product names in url http://www.tokopedia.com/lbagstore
in the url above will display all products
i see in the View Source menu product name is between tag
<b itemprop="name"> [product name] <b>
i have PHP script like below:
<?
$html=file_get_contents("https://www.tokopedia.com/lbagstore");
preg_match("'<b itemprop=\"name\">(.*?)</b>'si", $html, $match);
$productname = $match[1];
echo $productname;
?>
but it seems all blank page.
i have difficulty especially to put in array and display them all
does anyone can help me to fix this code? Thanks!
You are "inspecting" the html code of the page instead of "displaying the SOURCE CODE". If you want to extract the data from a website, you need to display its source code, then you can get what you want from it using a regex.
I checked the code myself and there are no <b itemprop="name"> [product name] <b> within the source code, that's why you don't have any results. The only way to see that piece of code was to inspect the code displayed instead of the source code ;)
If you change your code to this, you will be able to see what the real code looks like and then you will be able to adapt your regex to grab the names of the products you want.
$html = "https://www.tokopedia.com/lbagstore";
var_dump($html);
preg_match("'<b itemprop=\"name\">(.*?)</b>'si", $html, $match);
$productname = $match[1];
echo $productname;
Just add the var_dump to display the text. Also if you can't scrap what you want from the website and you need to do it quick, i may recommand you a free google chrome extension called "grepsr" (https://chrome.google.com/webstore/search/grepsr), I tested it and i could extract the names of the products within 5 minuts.
edit : Also if you want to grab the name of all the products in the page you will have to use preg_match_all() instead of preg_match.
I hope this helps ;)

XPath in PHP: Get all text nodes, except navigation

I’m writing a custom parser/data extractor for some pretty shitty HTML.
Changing the HTML is out of the question.
I will spare you the details of the hoops I’ve had to jump through but I’ve now come pretty close to my original goal. I’m using a combination of DOMDocument getElementByName, regular expression replace (I know, I know...), and XPath queries.
I need to get all the text out of the body of the document. I would like for the navigation to remain a separate entity, at least in the abstract. Here’s what I’m doing now:
$contentnodes = $xpath->query("//body//*[not(self::a)]/text()|//body//ul/li/a");
foreach ($contentnodes as $contentnode) {
$type = $contentnode->nodeName;
$content = $contentnode->nodeValue;
$output[] = array( $type, $content);
}
This works, except that of course it treats all of the links on the page differently, and I only want it to do that to the navigation.
What XPath syntax can I use so that, in the first part of that query, before the |, I tell it to get all the text nodes of body’s children except ul > li > a.
Please note that I cannot rely on the presence of p tags or h1 tags or anything sensible like that to make educated guesses about content.
Thanks
Update: #hr_117’s answer below works. I’ve also found that you can use multiple not statements like so:
//body//text()[not(parent::a/parent::li/parent::ul)][not(parent::h1)]
You may try something like this:
//body//text()[not(parent::a/parent::li/parent::ul)]|//body//ul/li/a
//body//*[not(self::a/parent::li/parent::ul)]/text()[normalize-space()]|//body//ul/li/a
(test)

How to get Wikipedia "clean" content?

I'm using Mediawiki api in order to get content from Wikipedia pages.
I've written a code which generates the next query (for example):
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&rvsection=0&titles=hawaii
Which retrieves only the leading paragraph from the Wikipdia page about Hawaii.
The problem is that as you might notice there are a lot of irrelevant substrings such as:
"[[Molokai|Moloka{{okina}}i]], [[Lanai|Lāna{{okina}}i]], [[Kahoolawe|Kaho{{okina}}olawe]], [[Maui]] and the [[Hawaii (island)|".
All those barckets [[]] are not relevant , and I wonder whether there is an alegant method to pull only 'clean' content from such pages?
Thanks in advance.
You can get a clean HTML text from Wikipedia with this query:
https://en.wikipedia.org/w/api.php?action=query&prop=extracts&titles=hawaii
If you want just a plain text, without HTML, try this:
https://en.wikipedia.org/w/api.php?action=query&prop=extracts&titles=hawaii&explaintext
please try this:
$relevant = preg_replace('/[[.*?]]/', '', $string);
EDIT: just found this - hope it is helpful

Regex to alter img attributes in Wordpress Filter

I have a custom theme I've developed for a photographer client and need to implement lazy-loading of the images so that the blog loads faster as it is horribly slow due to the amount of images he currently has, even when only showing five posts. To do this I'm using the JAIL jquery plugin but I need to be able to modify the image tags for it to work properly.. basically I have to replace the src attribute with a placeholder and set a data-href attribute to the source url. I cannot seem to find a resolution that works properly inside of a wordpress filter, I'm basically filtering the_content() hook in the posts.. does anyone know how I could accomplish this?
The standard Stackoverflow cliche for these questions is that you should use a DOM parser. Which is actually correct, but not quite feasible (performance) for output manipulation.
To accomplish what you want you could try:
$html = preg_replace_callback(
'#(<img\s[^>]*src)="([^"]+)"#',
"callback_img", $html);
Then define a callback like this:
function callback_img($match) {
list(, $img, $src) = $match;
return "$img=\"placeholder\" data-href=\"$src\" ";
}
Note that this regex is only workable if all your image links follow this scheme consistently (they all should be using double quotes for example).

PHP Summarize any URL

How can I, in PHP, get a summary of any URL? By summary, I mean something similar to the URL descriptions in Google web search results.
Is this possible? Is there already some kind of tool I can plug in to so I don't have to generate my own summaries?
I don't want to use metadata descriptions if possible.
-Dylan
What displays in Google is (generally) the META description tag. If you don't want to use that, you could use the page title instead though.
If you don't want to use metadata descriptions (btw, this is exactly what they are for), you have a lot of research and work to do. Essentially, you have to guess which part of the page is content and which is just navigation/fluff. Indeed, Google has exactly that; note however, that extracting valuable information from useless fluff is their #1 competency and they've been researching and improving that for a decade.
You can, of course, make an educated guess (e.g. "look for an element with ID or class maincontent" and get the first paragraph from it) and maybe it will be OK. The real question is, how good do you want the results to be? (Facebook has something similar for linking to websites, sometimes the summary just insists that an ad is the main content).
The following will allow you to to parse the contents of a page's title tag. Note: php must be configured to allow file_get_contents to retrieve URLs. Otherwise you'll have to use curl to retrieve the page HTML.
$title_open = '<title>';
$title_close = '</title>';
$page = file_get_contents( 'http://www.domain.com' );
$n = stripos( $page, $title_open ) + strlen( $title_open );
$m = stripos( $page, $title_close);
$title = substr( $page, n, m-n );
While i hate promoting a service i have found this:
embed.ly
It has an API, that returns a JSON with all the data you need.
But i am still searching for a free/opensource library to do the same thing.

Categories