I have an article formatted in HTML. It contains a whole lot of jargon words that perhaps some people wouldn't understand.
I also have a glossary of terms (MySQL Table) with definitions which would be helpful to there people.
I want to go through the HTML of my article and find instances of these glossary terms and replace them with some nice JavaScript which will show a 'tooltip' with a definition for the term.
I've done this nearly, but i'm still having some problems:
terms are being found within words (ie: APS is in Perhaps)
I have to make sure that it doesn't do this to alt, title, linked text, etc. So only text that doesn't have any formatting applied. BUT it needs to work in tables and paragraphs.
Here is the code I have:
$query_glossary = "SELECT word FROM glossary_terms WHERE status = 1 ORDER BY LENGTH(word) DESC";
$result_glossary = mysql_query_run($query_glossary);
//reset mysql via seek so we don't have to do the query again
mysql_data_seek($result_glossary,0);
while($glossary = mysql_fetch_array($result_glossary)) {
//once done we can replace the words with a nice tip
$glossary_word = $glossary['word'];
$glossary_word = preg_quote($glossary_word,'/');
$article['content'] = preg_replace_callback('/[\s]('.$glossary_word.')[\s](.*?>)/i','article_checkOpenTag',$article['content'],10);
}
And here is the PHP function:
function article_checkOpenTag($matches) {
if (strpos($matches[0], '<') === false) {
return $matches[0];
}
else {
$query_term = "SELECT word,glossary_term_id,info FROM glossary_terms WHERE word = '".escape($matches[1])."'";
$result_term = mysql_query_run($query_term);
$term = mysql_fetch_array($result_term);
# CREATING A RELEVENT LINK
$glossary_id = $term['glossary_term_id'];
$glossary_link = SITEURL.'/glossary/term/'.string_to_url($term['word']).'-'.$term['glossary_term_id'];
# SOME DESCRIPTION STUFF FOR THE TOOLTIP
if(strlen($term['info'])>400) {
$glossary_info = substr(strip_tags($term['info']),0,350).' ...<br /> Read More';
}
else {
$glossary_info = $term['info'];
}
return ' '.$term['word'].'',$glossary_info,400,1,0,1).'">'.$matches[1].'</a> '.$matches[2];
}
}
Move the load from server to client. Assuming that your "dictionary of slang" changes not frequently and that you want to "add nice tooltips" to words across a lot of articles, you can export it into a .js file and add a corresponding <script> entry into your pages - just a static file easily cacheable by a web-browser.
Then write a client-side js-script that will try to find a dom-node where "a content with slang" is put, then parse out the occurences of the words from your dictionary and wrap them with some html to show tooltips. Everything with js, everything client-side.
If the method is not suitable and you're going to do the job within your php backend, at least consider some caching of processed content.
I also see that you insert a description text for every "jargon word" found within content. What if a word is very frequent across an article? You get overhead. Make that descriptions separate, put them into JS as an object. The task is to find words which have a description and just mark them using some short tag, for instance <em>. Your js-script should find that em`s, pick a description from the object (associative array with descriptions for words) and construct a tooltip dynamically on "mouse over" event.
Interestingly enough, I was searching exactly NOT for a question like yours, but while reading I realized that your question is one that I had been through quite some time ago
It was basically a system to parse a dictionary and spits augmented HTML.
My suggestion would include instead:
Use database if you want, but a cached generated CSV file could be faster to use as dictionary
Use a hook in your rendering system to parse the actual content within this dictionary
caching of the page could be useful too
I elaborated a solution on my blog (in French, sorry for that). But it outlines basically something that you can actually use to do that.
I called it "ContentAbbrGenerator" as a MODx plugin. But the raw of the plugin can be applied outside of the established structure.
Anyway you can download the zip file and get the RegExes and find a way around it.
My objective
Use one file that is read to get the kind of html decoration.
Generate html from within author entered content that doesnt know about accessibility and tags (dfn and or abbr)
Make it re-usable.
Make it i18n-izable. That is, in french, we use the english definition but the adaptative technology reads the english word in french and sounds weird. So we had to use the lang="" attribute to make it clear.
What I did
Is basically that the text you give, gets more semantic.
Imagine the following dictionary:
en;abbr;HTML;Hyper Text Markup Language;es
en;abbr;abbr;Abbreviation
Then, the content entered by the CMS could spit a text like this:
<p>Have you ever wanted to do not hassle with HTML abbr tags but was too lazy to hand-code them all!? That is my solution :)</p>
That gets translated into:
<p>Have you ever wanted to do not hassle with <abbr title="Hyper Text Markup Language" lang="es">HTML</abbr> <abbr title="Abbreviation">abbr</abbr> tags but was too lazy to hand-code them all!? That is my solution :)</p>
All depends from one CSV file that you can generate from your database.
The conventions I used
The file /abbreviations.txt is publicly available on the server (that could be generated) is a dictionary, one definition per accronym
An implementation has only to read the file and apply it BEFORE sending it to the client
The tooltips
I strongly recommend you use the tooltip tool that even Twitter Bootstrap implements. It basically reads the title of any marked up tags you want.
Have a look there: Bootstrap from Twitter with Toolip helper.
PS: I'm very sold to the use of the patterns Twitter put forward with this Bootstrap project, it's worth a look!!
Related
Currently I'm using CURL to scrape a website. I want to reliably get the title, description and keywords.
//Parse for the title, description and keywords
if (strlen($link_html) > 0)
{
$tags = get_meta_tags($link); // name
$link_keywords = $tags['keywords']; // php documentation
$link_description = $tags['description'];
}
The only problem is people are now using all kinds of meta tags, such as open graph <meta property="og:title" content="The Rock" />. They also vary the tags a lot <title> <Title> <TITLE> <tiTle>. It's very difficult to get these reliably.
I really need some code that will extract these variables consistently. If there is some title, keyword and description provided that it will find it. Because right now it seems very hit and miss.
Perhaps a way to extract all titles into a titles array? Then the scraping web developer can choose the best one to record in their database. The same applying to keywords and description.
This is not a duplicate. I have searched through stackoverflow and
nowhere is this solution to place all "title", "keywords" and
"description" type tags into arrays.
Generally get_meta_tags() should get you most of what you need, you just need to setup a set of cascading checks that will sample the required field from each metadata system until one is found. For example, something like this:
function get_title($url) {
$tags = get_meta_tags($url);
$props = get_meta_props($url);
return #tags["title"] || #props["og:title"] || ...
}
The above implementation is obviously not efficient (because if we implemetn all the getters like this you'd reload the URL for each getter), and I didn't implement get_meta_props() - which is problematic to implement correctly using pcre_* and tedious to implement using DOMDocument.
Still a correct implementation is trivial though a lot of work - which is a classic scenario for an external library to solve the problem! Fortunately, there is one for just that - called simply "Embed" and you can find it on github, or using composer just run
composer require embed/embed
So my school has this very annoying way to view my rooster.
you have to bypass 5 links to get to my rooster.
this is the link for my class (it updates weekly without changing the link)
https://webuntis.a12.nl/WebUntis/?school=roc%20a12#Timetable?type=1&departmentId=0&id=2147
i want to display the content from that page on my website but with my
own stylesheet.
i don't mean this:
<?php
$homepage = file_get_contents('http://www.example.com/');
echo $homepage;
?>
or an iframe....
I think this can be better done using jquery and ajax. You can get jquery to load the target page, use selectors to strip out what you need, then attach it to your document tree. You should then be able to style it anyway you like.
I would recommend you to use the cURL library: http://www.php.net/manual/en/curl.examples.php
But you have to extract part of the page you want to display, because you will get the whole HTML document.
You'd probably read the whole page into a string variable (using file_get_contents like you mentioned for example) and parse the content, here you have some possibilities:
Regular expressions
Walking the DOM tree (eg. using PHPs DOMDocument classes)
After that, you'd most likely replace all the style="..." or class="..." information with your own.
I've built a blog similar to wordpress. On my home page, I take the entire blog post, throw it through a function, and only display an excerpt of it. I want to go through and shrink my videos to a specific width/height. The code in the post could look like:
[vimeo width="700" height="400"] // (the 700 & 400 could be any values).
I basically want to find that, then change it to:
[vimeo width="300" height="200"] // this will be preset/hard coded.
You can use regular expressions through preg_replace() to do the filtering. Just load your whole blog post into $BlogPost. The RegEx pattern may need to be altered to allow for variations in syntax and spacing (i.e. width = '700', etc.)
<?php
$FilteredBlogPost = preg_replace('/(.*vimeo width=")\d+(" height=")\d+(".*)/im', '${1}300${2}200${3}', $BlogPost);
?>
unless you want to scrape the video, put it on your server, resize it with some video dedicated libraries and then stream it from your server, the solution is client related (ie: HTML. not PHP)
Here is a nice tutorial on how to achieve that with youtube videos - i think exactly the same applies in your case too
I would like to integrate my tumblr feed in to my website. It seems that tumblr has an API for this, but I'm not quite sure how to use it. From what I understand, I request the page, and tumblr returns an xml file with the contents of my blog. But how do I then make this xml into meaningful html? Must I parse it with php, turning the relevant tags into headers and so on? I tell myself it cannot be that painful. Anyone have any insights?
There's a javascript include that does this now, available from Tumblr (you have to login to see it): http://www.tumblr.com/developers
It winds up being something like this:
<script type="text/javascript" src="http://{username}.tumblr.com/js"></script>
You can use PHPTumblr, an API wrapper written in PHP which makes retrieving posts a breeze.
If you go to http://yourblog.tumblr.com/api/read where "yourblog" should be replaced with the name of your blog (be careful, if you host your Tumblr blog on a custom domain, like I do, use that) you'll see the XML version of your blog. It comes up really messy for me on Firefox for some reason so I use Chrome, try a couple of different browser, it'll help to see the XML file well-formed, indented and such.
Once your looking at the XML version of your blog, notice that each post has a bunch of data in an attribute="value" orientation. Here's an example from my blog:
<post id="11576453174" url="http://wamoyo.com/post/11576453174" url-with-slug="http://wamoyo.com/post/11576453174/100-year-old-marathoner-finishes-race" type="link" date-gmt="2011-10-17 18:01:27 GMT" date="Mon, 17 Oct 2011 14:01:27" unix-timestamp="1318874487" format="html" reblog-key="E2Eype7F" slug="100-year-old-marathoner-finishes-race" bookmarklet="true">
So, there's lots of ways to do this, I'll show you the one I used, and drop my code on the bottom of this post so you can just tailor that to your needs. Notice the type="link" part? Or the id="11576453174" ? These are the values you're going to use to pull data into your PHP script.
Here's the example:
<!-- The Latest Text Post -->
<?php
echo "";
$request_url = "http://wamoyo.com/api/read?type=regular"; //get xml file
$xml = simplexml_load_file($request_url); //load it
$title = $xml->posts->post->{'regular-title'}; //load post title into $title
$post = $xml->posts->post->{'regular-body'}; //load post body into $post
$link = $xml->posts->post['url']; //load url of blog post into $link
$small_post = substr($post,0,350); //shorten post body to 350 characters
echo // spit that baby out with some stylish html
'<div class="panel" style="width:220px;margin:0 auto;text-align:left;">
<h1 class="med georgia bold italic black">'.$title.'</h1>'
. '<br />'
. '<span>'.$small_post.'</span>' . '...'
. '<br /></br><div style="text-align:right;"><a class="bold italic blu georgia" href="'.$link.'">Read More...</a></div>
</div>
<img style="position:relative;top:-6px;" src="pic/shadow.png" alt="" />
';
?>
So, this is actually fairly simple. The PHP script here places data (like the post title and post text) from the xml file into php variables, and then echos out those variable along with some html to create a div which features a snippet from a blog post. This one features the most recent text post. Feel free to use it, just go in and change that first url to your own blog. And then choose whatever values you want from your xml file.
For example let's say you want, not the most recent, but the second most recent "photo" post. You have to change the request_url to this:
$request_url = "http://wamoyo.com/api/read?type=photo&start=1"
Or let's say you want the most recent post with a specific tag
$request_url = "http://wamoyo.com/api/read?tagged=events";
Or let's say you want a specific post, just use the id
$request_url = "http://wamoyo.com/api/read?id=11576453174";
So all you have to do is tack on the ? with whatever parameter and use an & if you have multiple parameters.
If you want to do something fancier, you'll need the tumblr api docs here: http://www.tumblr.com/docs/en/api/v2
Hope this was helpful!
There are two main ways to do this. First, you can parse the xml, pulling out the content from the the tags you need (a few ways to do this depending on whether you use a SAX or DOM parser). This is the quick and dirty solution.
You can also use an XSLT transformation to convert the xml source directly to the html you want. This is more involved since you have to learn the syntax for xslt templates, which is a bit verbose.
I have a client who is using a separate vCard on a separate page. These are being pasted into a wordpress text field. (Not the most efficient way to maintain a list of people, but I won't editorialize after the fact.) My mission is to write something to parse through all the addresses in the vCards and to dump the information into a central database. This would allow all the disparate pages to become addresses replete with lat and lng coordinates from google and display a lovely front page with pins galore.
This page would show all the vcards from the rest of the pages of the site.
Oh, this is an example, sanitized, of a vcard on the site, in reality it would be surrounded by a lot of dubious HTML code:
<div class="vcard">
<span class="fn org">XYZ Org Name</span><br />
<span class="url">http://www.someurl.com/</span>
<div class="adr"><span class="street-address">1234 Main Ave</span><br />
<span class="locality">Chicago</span><br />
<span class="region">IL</span><br /><span class="postal-code">60647</span></div>
</div>
Now, each page has one of these, and to spider through the entire site, and collect them into an array is a bit out of my league. I can handle dumping them into a database, using PHP and mySQL.
Any and all advice would be welcome!
EDIT: Not sure how important this is, but I am pulling the data from a different server.
I believe you are looking for HTML parsers. Here is HTML parsing module for python
You need to parse the relevant data out of all the HTML files and then do whatever with it.
I have not tried any php html parsers to recommend any but since you are working on a webserver I'm hoping it has perl? Take a look at perl html parsers.
#this snippet will get contents of organization name
sub start {
my ($self, $tag, $attr, $attrseq, $origtext) = #_;
if ($tag =~ /^span$/i && $attr->{'class'} =~ /^fn org$/i) {
# see if we find <span class="fn org"
push (#org_names, $origtext);
}
}
now you have #org_names array that contains all organization names.
Try the DOMDocument class' loadHTML method. Then you can use DOMDocument methods to select the nodes, attributes and values you want. Or if you're familiar with XPath, you can also instantiate a DOMXPath object to query against the loaded DOMDocument to select the desired data.