Articles / Making a google search engine with standards
Making a google search engine with standards
Associated links:
Introduction
Search engines are good. Google has been the indisputable king of search engines for quite some time now, and even if msn is trying it's best to pursue the surfers with a new engine, a massive advertising budget and a new "css" based layout, google is still on top. Yahoo and Ask Jeeves has always been alternatives, and still are, while the 90's big shots AltaVista and webcrawler has taken a big step down.
This article will demonstrate how you can build your own search engine using Google's API service and some PHP magic. Google has released the source code for it's results, and they are free for anyone to use with the only drawback being a limit of 1000 queries per day and displays a maximum of 10 results at a time.
And while we're at it, we might as well do it right this time, using semantic XHTML markup and accessible forms. This will also make it possible for you to add your own search engine to your site with googles powerful back-end serving up the results.
The retro-cool google
Google is loved by many and hated by few. It's light, fast, (almost) ad-free and secure. But a quick look under the hood makes any web developer wonder why the biggest of them all are still using markup that belong to the 90's. Inaccessible forms, tables for layout, markup errors, incorrect ampersands in the url, old-school <b> tags etc. etc. Now, in order to change all this you could write them an email explaining your concern about their coding. Or you could create your own search engine, built on the same reliable back-end but re-coded with modern front-end technologies. We will do the later in this article.
Google API
As mentioned before, google released their search results to the public some time ago. All you have to do is sign up a google account at https://www.google.com/accounts/NewAccount if you don't already have one. Once there, you can request for an API key. If you have problems obtaining one, see their help pages for more info. This key is exactly what is sounds like - a key to access their massive database of basically all web pages available on the world wide wisdom. We will use this key in the example, so make sure you have one before you continue.
PHP and NuSOAP
We will use PHP to create the program. PHP is an open-source server side scripting language and is very similar to perl and c++. Ask your host if they support it (all good hosts do) or try to install it on your local machine. Not only will we use PHP, but also an extension called NuSOAP. It contains many useful classes, especially for tasks like accessing the google results and parsing them. You can download the package at http://sourceforge.net/projects/nusoap/. Once downloaded, unzip the .zip archive and locate the folder named "lib". In there you will find several PHP files with the extension ".php". These are the files we will use to access the NuSOAP library.
Getting Started
To get started with the tutorial, make sure you go through these steps:
- Get an API key at https://www.google.com/accounts/NewAccount
- Create a new directory in the root called "search"
- Download the NuSOAP library at http://sourceforge.net/projects/nusoap/
- Unzip the NuSOAP .zip archive, locate the lib folder and copy all files in the folder into the search folder you just created
- Read the next section
Google did wrong - let's make it right
Semantic markup and web standards are the way of the future. It's like any tool or technological advance - first comes availability and then perfection. Google's markup is quite far from perfection, but we know better. After a quick analyze of the search results page I decided that a definition list would probably be the best way to code the results, since it has all the typical fields: a title and descriptions. So this is basically what we want:
<form>
<p>[search_form]</p>
</form>
<p class="results">[results_info]</p>
<dl>
<dt><a href="[url]">[title]</a></dt>
<dd>[description]</dd>
<dd class="url">[url] - [size]</dd>
[ ... repeated ... ]
</dl>
<p class="nav">[page_nav]</p>
Getting the PHP stuff working
Now is the time to download the index.php sample file. This is the file that contains all vital php instructions to create the search engine. Once downloaded, upload it to the search folder on your root directory or open it up in your favourite text editor. Once there, lets take a look at the general sections of the PHP file:
- Preferences
- Main program
- Sub programs
- XHTML rendering
1. Preferences
Theese are the preferences. Modify as you wish, but never change the google API key once you obtained it.
$pref->text- defines what text should be displayed if google doesn't find any text.$pref->title- defines what title should be displayed if google doesn't find any title.$pref->key- this is where you should enter your Google API key - read previous section for more info on how to obtain one.$pref->results- sets the number of results to be displayed on one page. Google API has a limit of max 10 results.$pref->start- sets the starting item to be displayed. If set to "Auto", a page navigation will appear in the bottom.$pref->safe- toggles the SafeSearch filter on or off. "false" means off and "true" means on.$pref->filter- toggles the google filter for similar pages on the same domain on or off. "false" means off and "true" means on.
2. Main program
The main program contain 4 commands:
require_once('nusoap.php');$query = $_GET['q'];$results = getResults($query);$title = getTitle($query);
The first line includes the NuSOAP lib into the document. The second line collects the query string. The third collects the search results in a variable called $results and the fourth line collects the document title into $title.
3. Sub programs
I am not going into detail of all aspects of these functions, but here is a quick review. The getTitle(); is very simple and collects the title depending on what and if the search query has been done. The getResults(); parses the $query string and collects the google results using the NuSOAP functions, then returns the data. The markup is parsed with some regular expressions using the preg_replace(); function in order to remove unwanted tags, correct ampersands and convert <b> tags to XHTML. The function also contains an advanced page and results calculator if you have set the $pref->start to "auto".
4. XHTML rendering
The last section looks familiar for any web programmer. A few things should be noted here:
- The default style is my childish attempt to assimilate google's well known layout interface. You can change this to whatever you like.
- Please note the
<title><?php echo $title; ?></title>line. That means that the PHP will take care of the title. - The form also contains some PHP commands, leave them where they are or experiment as you wish.
- The most important line is
<?php echo $results; ?>. This is where the data comes in from the PHP program.
Wrapping it all up
Once you have configured the preferences, just save your file and point your browser to http://www.yoursite.com/search/index.php and start googling. Or, you can have a look at our example demo. You can alter and refine the PHP code depending on what level you master, or you can stick to modifying the XHTML rendering in the fourth section of the document. In any way, you will have a fully standard compilant, valid and highly customizable google search engine at your disposal, ready to use at your site or wherever you like. Remember that the google API only allows 1000 queries per day, so if the search results are empty, you might need to cool off and take a walk.
This article was written by David Hellsing
David Hellsing is a designer and web developer living in Gothenburg, Sweden. He is the founder and gentle dictator of Stylegala and the swedish design firm monc.
Are you a web publisher?
We are always looking for people who can write good articles about web design, css and standards. Are you one of them? Talk to us.
There are 117 guest comments so far.
All articles
- We Are Linguists
- No More CSS Hacks
- Choosing a Good Domain Name
- Design Psychology
- Making a google search engine with standards
Search
Features
- Stylegala BookStore
- The Stylegala BookStore has a massive archive of great books for you as a professional..
- Bullet madness
- Bullet madness is a list of 200 bullets, arrows and icons uploaded by our users.
- CSS Reference
- An alphabetical list over the most common CSS1 and CSS2 syntax and properties.
Sponsors







Nice article. I knew there was this Google API, but I haven't studied if it would be of any us of me and my site. I gather you can configure this php-code to have the search only search pages on my site, or?
Anyways. I will miss WaV, but this articles section might not be as bad either ;)
Interesting read, and a very viable solution in style.
Google is quite different from other big companies in their approach to innovation and change, but it might be some time before they can move on with web standards. Corporate inertia sucks. On the bright side, we now have no excuse for having non standards compliant local google searches.
Yes, Tommy, you can restrict the searches to one domain only by adding a " site:domain.com" after the query word.
Why not just use FindForward.com?
Because — without wishing to state the obvious (or start a fire) — you'd not learn much, would you?
Besides, the use of .
Sure, you can "use" findforward just like any other search egnine or portal. But this article is not about how to "use" a search engine, but how to build one yourself and possibly add it to your site.
Is there a way to restrict the results to searching a few selected domains? For Example: I want my search engine to search Apple.com, Microsoft.com, and Slashdot.org.
Unfortunately, the Google API only allows you to search a single domain at a time if you want to conduct a site restricted search.
The way to do this is to add "site:yourdomain.com" to the search query by changing this line of code:
$query = $_GET['q'];to:
$query = $_GET['q']." site:yourdomain.com";You can also add the
"site:"term in thegetResults();function if you don't want the modified query to appear in your page's title or in your search field. Change this line of code:'q' => $query,to:
'q' => $query." site:yourdomain.com",This should enable you to create a Google search engine that returns results for your specified domain.
dont know
Great article, but I have one wonder: why do the search results differ from the original Google?
Example:
Can anyone explain that?
Good question Niek. I honestly don't know the exact reason, but google has been dancing quite frequently during the last months, and we are probably looking at different datacenters or the API results might be behind in some way.
Funny that stylegala's own search page for charming design is number 4 in the results for the term itself, thanks to your link in the comment :)
Good article about nice feature Google offers. Though the service seems usually quite a lot slower than their "public" search. Just one note about Google and web standards. Did you know that they have a version at www.google.com/xhtml. This is actually Google for mobile, but I find it really neat even with desktop computer, loads fast, validates and doesn't display ads :)
testing
I would like to use this code on a site -- but I don't have access to the root directory. Is there any way to alter this to work somewhere like mydomain.com/dir/search ?
Thanks in advance!
ahem. ive been foolish. never mind!
"Keith Smolar wrote:
Is there a way to restrict the results to searching a few selected domains? For Example: I want my search engine to search Apple.com, Microsoft.com, and Slashdot.org."
Yes, this can be done easily using the "site:domain.com". All you have to do is make a variable that will be used in a selection of any specified website. Now you can do this with selection boxes or a drop-down menu, or anyother way you can figure out. For assistance, give me a haallllerr, send me a bump in my inbox, my email is joshua.english gmail com. (keep the bots out). Okay cool, peace out people..
Forgot this, what I mentioned above will only search one domain at a time, but you can search all but using all three in standard timing seperation. The results will be shown seperately on the page although. I have done this sucKcesssfulllly on my own personal website.
I just ran into some major trouble. I am trying to add:
try your search on "AllTheWeb", "Yahoo,,,,,
anyway i cant figure out how to link it to "AllTheWeb" and have it use the $query variable. Right now it links to like:
Say $query = Dog
Well instead of my link going to:
http://www.alltheweb.com/search?q=dog
it is going to:
http://www.alltheweb.com/search?q=$query
I am not an expert in PHP, and I need some help, this should be pretty easy I assume. Thanks in advace, please email me if you know how to fix my problem, thank you again. My email is:
joshua.english (at) gmail (dot) com
Actually, it is more than possible to search multiple sites using the Google API. In fact, I've drawn up a basic example building upon the example provided in this article.
Cheers,
Peter
Outsource software development to India offshore web application development IT services company India outsourcing
Hi I was wondering when I tried out the test index.php why do I get this error:
Notice: Undefined index: q in C:PublicWebswww.firstenergy.comwebnewsitesearchindex.php on line 33
What do I have to change in order for the script to query only my site, not the whole web?
I want to know what it takes to develop search engine like google.
or a search engine ten times less accessed like say by 20 million users a day.
Appreciate your advise.
thanks
plz give me some more useful tutorials
please could you state companies which create proffesional search engines like google. i heard there are some sites which hae the progrm and are for download for money.
The tutorial was great. I did just as said but it simple keeps saying search returned no useful results. My host supports php and I have a brand new Google API key.
new to php ... im getting a an error..
Fatal error: Cannot redeclare class soapclient in /var/www/domains/linux.4communityrealestate.com/docs/search/nusoap.php on line 7240
can anyone help me.. i havent ventured into customizing things.. just want to learn a bit about about php..
thanks
Hi
Thanks for that search engine :))
Everithing's fine, but when I tri to find something like:
http://www.stylegala.com/files/search/index.php?q=dir.bg
I receive only ??????????????? mark instead of right translated words
Compare to google:
http://www.google.com/search?hl=en&q=dir.bg&btnG=Google+Search
How I can fix thats
Thanks again
daniel
This article is awesome, and was exactly the information I was looking for. I've implemented a modified version over at http://simplepie.org/search/
Anyway, I noticed that the number of pages (down at the bottom of the page) tends to get out of sync with the actual number of results. I wasn't sure if this was a bug that you were aware of, and before I go trying to fix it, I wanted to see if you already had a solution on-hand.
At http://simplepie.org/search/?q=geoffrey you can see that after I click past the first page of results, I get the same results on the second page that I had on the first page. Also, it shows onlt one page available, but also gives a 'Next' button.
Anyway, I know it isn't supposed to work that way, so if you don't have a solution, then leave a note in these comments that you don't, and I'll go to work trying to fix it myself.
Thanks!
@ Fraser - You're including the script twice. I know, because I accidentally did the same thing. If you include NuSOAP with your other includes, you need to remove it from the top part of this script from Stylegala.
Excellent article. This bit of script just saved me many hours of time and money. Many thanks.
There is a little lag time - I'm assuming that's because it's converting the html to standards compliance mode. ??
I d like to let me know if i could connect my application with other search engines except for google (eg altavista,yahoo etc).Does anyone know if there is a tutorial like this for other search engines?thanks
Love the script, time and money sparing !
Articles and Blogs provides knowlegeable information to every one. I appreciate this article. More articles & blogs are available at http://www.kvcindia.com/blog
Dear Thats a great articlr but i find some critcal aspects that are not Programmed.. THere a re many key words form wich no results are shown How it come So can you explain it Plz and solution ...
http://www.cowburn.info/playground/search/index.php?q=multi-column+lists
Very cool!!! Please, explain how did you do this Peter?
Hi i get this php notices, what shoul i do with this?
PHP Notice: Undefined index: start in D:phsearchindex.php on line 111
PHP Notice: Undefined variable: prev in D:phsearchindex.php on line 124
PHP Notice: Undefined index: q in D:phsearchindex.php on line 174
This is a very good article, was what I am looking for during 3 days.... I got an error..
Fatal error: Cannot redeclare class soapclient in /home/www/virtual/buddhist-events.org/ligga_net/htdocs/search/nusoap.php on line 7240
I saw Ryan Parman explained it above but I still don't know what to do exactly....Anyone could give me an advice.... My email. thaidirr(at)gmail.com
Please!...........
tx all for
The best is not always the most popular. My favorite scientific blogs .
this api will work in a private site, yes ?
gerardo was here
8-)
thanks links
If i was to make my own Yellow Pages or White Pages site, will this API work?
How do I populate my own database? Does anyone know of any tutorial?
thanks
Excellent article. This bit of script just saved me many hours of time and money. Many thanks.
There is a little lag time - I'm assuming that's because it's converting the html to standards compliance mode. ??
sdsdsdsd
dfffdffd
dffddfdfdf
fdddffddf
asiii cooo
thanks
asdad
Very sweet article -I wish there was a workaround for the 1,000 queries per day.
Hey im a newbie to web page design, i have the search box on the page however it produces no results...feel free to send me further information @ getpayed@gmail.com
Thanks
seks shop
Thanks
thanks
thankss
thank you
can you create a search engine with this so only results people recieve are from one country and not 50. eg .co.uk but not .com
thanks
thankss
thankss
thankss
thankss
h illl
Thank you David,
It looks good...but I get an error saying:
[ SoapClient() expects parameter 2 to be array ... ]
This is on line 75 or 76 on the index.php file.
Any help? Please?
Gomi S.
Thank you David,
It looks good...but I get an error saying:
[ SoapClient() expects parameter 2 to be array ... ]
This is on line 75 or 76 on the index.php file.
Any help? Please?
Gomi S.
How do I get a API key? I found this at Googles FAQ:
"We are no longer issuing new API keys for the SOAP Search API."
Plz send me a mail if you can help! :)
thanks a lot.
thanks
great article
thanks
http://infomechanic.blogspot.com
great stuff , very nice
what line of the php file is it on ?
nice article I had difficulty with the coding though
I had a mismatch in the array also, soved it now though, there was an error on line 33
how do make a search engine
and also can you tell me how to get your search engine on your site plz send me your htlmcode for your search bar
Very good article. This bit of script just saved me many hours of time and money. Many thanks
Appreciate your advise.
thanks
thanks...
Thanks for the great article. Looking forward to more of your post.
Thanks for the great article. Looking forward to more of your post.
Thanks for the great article. Looking forward to more of your post.
Thanks for the great article. Looking forward to more of your post.
Thanks for the great article. Looking forward to more of your post.
Good post! Search engine is cool.
Great article, but I have one wonder: why do the search results differ from the original Google ?
I have done this sucKcesssfulllly on my own personal website. Thanks !!!
How to restrict your search to a particular domain zone with this script? for example in .cn zone?
Thanks
Excellent article. This bit of script just saved me many hours of time and money. Many thanks.
thanks for the input
Many thanks.
Quelle belle salope cette petite pute, elle suce comme une vraie salope.
thank you very very much.
I am having the same issue as what "skins" wrote. He/She showed no result. I built one of these long ago but lost everything and now had to rebuild it. I am thinking what happens when you get no results its because the API code is new and has to do automatic configurations on googles side. I am thinking that it takes time for this to happen. If I am wrong could someone clarify this for "skins" and I?
thanks for the
thanks for the he mate
thanks ..
thanks ...
thanks ..
thanks ..
thanks ..
thanks ..
thanks ..
thanks ..
thanks ..
thanks ..
thanks ..
thanks ..
thanks ..
thanks ..
thanks ..
thanks ..
thanks ..
thanks ..
thanks ..
as ykryj erhy
this is for my personal use
Add a comment:
Keep the comment relevant and constructive.
A valid email address or URL to your site must be provided, or the comment might get deleted. Content seemed inappropriate or offensive may be edited and/or deleted. Avoid explicit language. Email addresses are never displayed. Line breaks and paragraphs are automatically converted - no need to use <p> or <br/>. Quotes & apostrophes are automatically converted to smart punctuation. Be careful when copying and pasting portions of entries or other comments. The following inline HTML elements may be used: <strong><em><pre><q><blockquote><code>. All other code will get removed before posting. Don't forget to close your tags.