How does a search engine work? Review of programs for searching documents and data.

Professional Internet search requires specialized software, as well as specialized search engines and search services.

PROGRAMS

http://dr-watson.wix.com/home – the program is designed to study arrays of text information in order to identify entities and connections between them. The result of the work is a report on the object under study.

http://www.fmsasg.com/ - one of the best programs in the world for visualizing connections and relationships Sentinel Vizualizer. The company has completely Russified its products and connected a hotline in Russian.

http://www.newprosoft.com/ – “Web Content Extractor” is the most powerful, easy-to-use software for extracting data from web sites. It also has an effective Visual Web spider.

SiteSputnik – a software package that has no analogues in the world, allowing you to search and process its results on the Visible and Invisible Internet, using all the search engines necessary for the user.

WebSite-Watcher – allows you to monitor web pages, including password-protected ones, monitoring forums, RSS feeds, news groups, local files. Has a powerful filter system. Monitoring is carried out automatically and is delivered in a user-friendly form. A program with advanced functions costs 50 euros. Constantly updated.

http://www.scribd.com/ is the most popular platform in the world and increasingly used in Russia for posting various kinds of documents, books, etc. for free access with a very convenient search engine for titles, topics, etc.

http://www.atlasti.com/ is the most powerful and effective tool for qualitative information analysis available to individual users, small and even medium-sized businesses. The program is multifunctional and therefore useful. It combines the ability to create a unified information environment for working with various text, tabular, audio and video files as a single whole, as well as tools for qualitative analysis and visualization.

Ashampoo ClipFinder HD – an ever-increasing share of the information flow comes from video. Accordingly, competitive intelligence officers need tools that allow them to work with this format. One such product is the free utility we present. It allows you to search for videos based on specified criteria on video file storage sites such as YouTube. The program is easy to use, displays all search results on one page with detailed information, titles, duration, time when the video was uploaded to the storage, etc. There is a Russian interface.

http://www.advego.ru/plagiatus/ – the program was made by SEO optimizers, but is quite suitable as an Internet intelligence tool. Plagiarism shows the degree of uniqueness of the text, the sources of the text, and the percentage of text match. The program also checks the uniqueness of the specified URL. The program is free.

http://neiron.ru/toolbar/ – includes an add-on for combining Google and Yandex search, and also allows for competitive analysis based on assessing the effectiveness of sites and contextual advertising. Implemented as a plugin for FF and GC.

http://web-data-extractor.net/ is a universal solution for obtaining any data available on the Internet. Setting up data cutting from any page is done in a few mouse clicks. You just need to select the data area that you want to save and Datacol will automatically select a formula for cutting out this block.

CaptureSaver is a professional Internet research tool. Simply an indispensable working program that allows you to capture, store and export any Internet information, including not only web pages, blogs, but also RSS news, email, images and much more. It has the widest functionality, an intuitive interface and a ridiculous price.

http://www.orbiscope.net/en/software.html – web monitoring system at more than affordable prices.

http://www.kbcrawl.co.uk/ – software for working, including on the “Invisible Internet”.

http://www.copernic.com/en/products/agent/index.html – the program allows you to search using more than 90 search engines, using more than 10 parameters. Allows you to combine results, eliminate duplicates, block broken links, and show the most relevant results. Comes in free, personal and professional versions. Used by more than 20 million users.

Maltego is a fundamentally new software that allows you to establish the relationship of subjects, events and objects in real life and on the Internet.

SERVICES

new https://hunter.io/ – an effective service for detecting and checking email.

https://www.whatruns.com/ is an easy to use yet effective scanner to discover what is working and not working on a website and what its security holes are. Also implemented as a plugin for Chrom.

https://www.crayon.co/ is an American budget platform for market and competitive intelligence on the Internet.

http://www.cs.cornell.edu/~bwong/octant/ – host identifier.

https://iplogger.ru/ – a simple and convenient service for determining someone else’s IP.

http://linkurio.us/ is a powerful new product for economic security workers and corruption investigators. Processes and visualizes huge amounts of unstructured information from financial sources.

http://www.intelsuite.com/en – English-language online platform for competitive intelligence and monitoring.

http://yewno.com/about/ is the first operating system for translating information into knowledge and visualizing unstructured information. Currently supports English, French, German, Spanish and Portuguese.

https://start.avalancheonline.ru/landing/?next=%2F – forecasting and analytical services by Andrey Masalovich.

https://www.outwit.com/products/hub/ – a complete set of stand-alone programs for professional work in web 1.

https://github.com/search?q=user%3Acmlh+maltego – extensions for Maltego.

http://www.whoishostingthis.com/ – search engine for hosting, IP addresses, etc.

http://appfollow.ru/ – analysis of applications based on reviews, ASO optimization, positions in tops and search results for the App Store, Google Play and Windows Phone Store.

http://spiraldb.com/ is a service implemented as a plugin for Chrom, which allows you to get a lot of valuable information about any electronic resource.

https://millie.northernlight.com/dashboard.php?id=93 - a free service that collects and structures key information on industries and companies. It is possible to use information panels based on text analysis.

http://byratino.info/ – collection of factual data from publicly available sources on the Internet.

http://www.datafox.co/ – CI platform collects and analyzes information on companies of interest to clients. There is a demo.

https://unwiredlabs.com/home - a specialized application with an API for searching by geolocation of any device connected to the Internet.

http://visualping.io/ – a service for monitoring websites and, first of all, the photographs and images available on them. Even if the photo only appears for a second, it will be in the subscriber's email. Has a plugin for Google Chrome.

http://spyonweb.com/ is a research tool that allows for in-depth analysis of any Internet resource.

http://bigvisor.ru/ – the service allows you to track advertising campaigns for certain segments of goods and services, or specific organizations.

http://www.itsec.pro/2013/09/microsoft-word.html – instructions from Artem Ageev on using Windows programs for competitive intelligence needs.

http://granoproject.org/ is an open source tool for researchers who track networks of connections between individuals and organizations in politics, economics, crime, etc. Allows you to connect, analyze and visualize information obtained from various sources, as well as show significant connections.

http://imgops.com/ – a service for extracting metadata from graphic files and working with them.

http://sergeybelove.ru/tools/one-button-scan/ – a small online scanner for checking security holes in websites and other resources.

http://isce-library.net/epi.aspx – service for searching primary sources using a fragment of text in English

https://www.rivaliq.com/ is an effective tool for conducting competitive intelligence in Western, primarily European and American markets for goods and services.

http://watchthatpage.com/ is a service that allows you to automatically collect new information from monitored Internet resources. The service is free.

http://falcon.io/ is a kind of Rapportive for the Web. It is not a replacement for Rapportive, but provides additional tools. In contrast, Rapportive provides a general profile of a person, as if glued together from data from social networks and mentions on the web. http://watchthatpage.com/ - a service that allows you to automatically collect new information from monitored resources on the Internet. The service is free.

https://addons.mozilla.org/ru/firefox/addon/update-scanner/ – add-on for Firefox. Monitors web page updates. Useful for websites that do not have news feeds (Atom or RSS).

http://agregator.pro/ – aggregator of news and media portals. Used by marketers, analysts, etc. to analyze news flows on certain topics.

http://price.apishops.com/ – automated web service for monitoring prices for selected product groups, specific online stores and other parameters.

http://www.la0.ru/ is a convenient and relevant service for analyzing links and backlinks to an Internet resource.

www.recordedfuture.com is a powerful tool for data analysis and visualization, implemented as an online service built on cloud computing.

http://advse.ru/ is a service with the slogan “Find out everything about your competitors.” Allows you to obtain competitors' websites in accordance with search queries and analyze competitors' advertising campaigns in Google and Yandex.

http://spyonweb.com/ – the service allows you to identify sites with the same characteristics, including those using the same Google Analytics statistics service identifiers, IP addresses, etc.

http://www.connotate.com/solutions – a line of products for competitive intelligence, managing information flows and converting information into information assets. It includes both complex platforms and simple, cheap services that allow for effective monitoring along with information compression and obtaining only the necessary results.

http://www.clearci.com/ - competitive intelligence platform for businesses of various sizes from start-ups and small companies to Fortune 500 companies. Solved as saas.

http://startingpage.com/ is a Google add-on that allows you to search on Google without recording your IP address. Fully supports all Google search capabilities, including in Russian.

http://newspapermap.com/ is a unique service that is very useful for a competitive intelligence officer. Connects geolocation with an online media search engine. Those. you select the region you are interested in, or even a city, or language, see the place on the map and a list of online versions of newspapers and magazines, click on the appropriate button and read. Supports Russian language, very user-friendly interface.

http://infostream.com.ua/ is a very convenient news monitoring system “Infostream”, distinguished by a first-class selection and quite accessible to any wallet, from one of the classics of Internet search, D.V. Lande.

http://www.instapaper.com/ is a very simple and effective tool for saving the necessary web pages. Can be used on computers, iPhones, iPads, etc.

http://screen-scraper.com/ – allows you to automatically extract all information from web pages, download the vast majority of file formats, and automatically enter data into various forms. It saves downloaded files and pages in databases and performs many other extremely useful functions. Works on all major platforms, has fully functional free and very powerful professional versions.

http://www.mozenda.com/ - has several tariff plans and is accessible even to small businesses, a web service for multifunctional web monitoring and delivery of information necessary for the user from selected sites.

http://www.recipdonor.com/ - the service allows you to automatically monitor everything that happens on competitors' websites.

http://www.spyfu.com/ – and this is if your competitors are foreign.

www.webground.su is a service for monitoring the Runet created by Internet search professionals, which includes all the major providers of information, news, etc., and is capable of individual monitoring settings to suit the user’s needs.

SEARCH ENGINES

https://www.idmarch.org/ is the best search engine for the world archive of pdf documents in terms of quality. Currently, more than 18 million pdf documents have been indexed, ranging from books to secret reports.

http://www.marketvisual.com/ is a unique search engine that allows you to search for owners and top management by full name, company name, position, or a combination thereof. The search results contain not only the objects you are looking for, but also their connections. Designed primarily for English-speaking countries.

http://worldc.am/ is a search engine for freely accessible photographs linked to geolocation.

https://app.echosec.net/ is a public search engine that describes itself as the most advanced analytical tool for law enforcement and security and intelligence professionals. Allows you to search for photos posted on various sites, social platforms and social networks in relation to specific geolocation coordinates. There are currently seven data sources connected. By the end of the year their number will be more than 450. Thanks to Dementy for the tip.

http://www.quandl.com/ is a search engine for seven million financial, economic and social databases.

http://bitzakaz.ru/ – search engine for tenders and government orders with additional paid functions

Website-Finder - makes it possible to find sites that Google does not index well. The only limitation is that it only searches 30 websites for each keyword. The program is easy to use.

http://www.dtsearch.com/ is a powerful search engine that allows you to process terabytes of text. Works on desktop, web and intranet. Supports both static and dynamic data. Allows you to search in all MS Office programs. The search is carried out using phrases, words, tags, indexes and much more. The only federated search engine available. It has both paid and free versions.

http://www.strategator.com/ – searches, filters and aggregates information about the company from tens of thousands of web sources. Searches in the USA, Great Britain, major EEC countries. It is highly relevant, user-friendly, and has free and paid options ($14 per month).

http://www.shodanhq.com/ is an unusual search engine. Immediately after his appearance, he received the nickname “Google for hackers.” It does not search for pages, but determines IP addresses, types of routers, computers, servers and workstations located at a particular address, traces chains of DNS servers and allows you to implement many other interesting functions for competitive intelligence.

http://search.usa.gov/ is a search engine for websites and open databases of all US government agencies. The databases contain a lot of practical, useful information, including for use in our country.

http://visual.ly/ – today visualization is increasingly used to present data. This is the first infographic search engine on the Web. Along with the search engine, the portal has powerful data visualization tools that do not require programming skills.

http://go.mail.ru/realtime – search for discussions of topics, events, objects, subjects in real or customizable time. The previously highly criticized search in Mail.ru works very effectively and provides interesting, relevant results.

Zanran is just launched, but already working great, the first and only data search engine that extracts data from PDF files, EXCEL tables, data on HTML pages.

http://www.ciradar.com/Competitive-Analysis.aspx is one of the world's best information retrieval systems for competitive intelligence on the deep web. Retrieves almost all types of files in all formats on the topic of interest. Implemented as a web service. The prices are more than reasonable.

http://public.ru/ – Effective search and professional analysis of information, media archive since 1990. The online media library offers a wide range of information services: from access to electronic archives of Russian-language media publications and ready-made thematic press reviews to individual monitoring and exclusive analytical research based on press materials.

Cluuz is a young search engine with ample opportunities for competitive intelligence, especially on the English-language Internet. Allows you not only to find, but also to visualize and establish connections between people, companies, domains, e-mails, addresses, etc.

www.wolframalpha.com – the search engine of tomorrow. In response to a search request, it provides statistical and factual information available on the request object, including visualized information.

www.ist-budget.ru – universal search in databases of government procurement, tenders, auctions, etc.

At first glance, it may seem that only Yandex can be better than Google, and even that is not a fact. These companies invest huge amounts of money in innovation and development. Does anyone really have a chance not only to compete with the leaders, but also to win? Lifehacker's answer: “Yes!” There are several search engines that have succeeded. Let's look at our heroes.

What is this

This is a fairly well-known open source search engine. Servers are located in the USA. In addition to its own robot, the search engine uses results from other sources: Yahoo! Search BOSS, Wikipedia, Wolfram|Alpha.

The better

DuckDuckGo positions itself as a search engine that provides maximum privacy and confidentiality. The system does not collect any data about the user, does not store logs (no search history), and the use of cookies is as limited as possible.

DuckDuckGo does not collect or share personal information from users. This is our privacy policy.
Gabriel Weinberg, founder of DuckDuckGo

Why do you need this

All major search engines are trying to personalize search results based on data about the person in front of the monitor. This phenomenon is called the “filter bubble”: the user sees only those results that are consistent with his preferences or that the system deems as such.

DuckDuckGo creates an objective picture that does not depend on your past behavior on the Internet, and eliminates thematic advertising from Google and Yandex based on your queries. With DuckDuckGo, it’s easy to search for information in foreign languages: Google and Yandex by default give preference to Russian-language sites, even if the query is entered in another language.

What is this

"" is a Russian metasearch system developed by Moscow State University graduates Viktor Lavrenko and Vladimir Chernyshov. It searches through the indexes of Google, Bing, Yandex and others, and also has its own search algorithm.

The better

Searching through the indexes of all major search engines allows you to generate relevant results. In addition, Nigma divides the results into several thematic groups (clusters) and invites the user to narrow the search field, discarding unnecessary ones or highlighting priority ones. Thanks to the Mathematics and Chemistry modules, you can solve mathematical problems and request the results of chemical reactions directly in the search bar.

Why do you need this

Eliminates the need to search for the same query in different search engines. The cluster system makes it easy to manipulate search results. For example, Nigma collects results from online stores into a separate cluster. If you do not intend to buy anything, then simply exclude this group. By selecting the “English-language sites” cluster, you will receive results only in English. The Mathematics and Chemistry modules will help schoolchildren.

Unfortunately, the project is not currently being developed, as the developers have transferred their activity to the Vietnamese market. Nevertheless, “Nigma” is not only not outdated yet, but in some things it still gives Google a head start. Let's hope development resumes.

What is this

not Evil is a system that searches the anonymous Tor network. To use it, you need to go to this network, for example, by launching a specialized browser of the same name. not Evil is not the only search engine of its kind. There is LOOK (the default search in the Tor browser, accessible from the regular Internet) or TORCH (one of the oldest search engines on the Tor network) and others. We settled on not Evil because of the clear allusion to Google itself (just look at the start page).

The better

It searches where Google, Yandex and other search engines are generally closed.

Why do you need this

The Tor network contains many resources that cannot be found on the law-abiding Internet. And as government control over the content of the Internet tightens, their number will grow. Tor is a kind of Network within the Network: with its own social networks, torrent trackers, media, trading platforms, blogs, libraries, and so on.

YaCy

What is this

YaCy is a decentralized search engine that works on the principle of P2P networks. Each computer on which the main software module is installed scans the Internet independently, that is, it is analogous to a search robot. The results obtained are collected into a common database that is used by all YaCy participants.

The better

It’s difficult to say whether this is better or worse, since YaCy is a completely different approach to organizing search. The absence of a single server and owner company makes the results completely independent of anyone's preferences. The autonomy of each node eliminates censorship. YaCy is capable of searching the deep web and non-indexed public networks.

Why do you need this

If you are a supporter of open source software and a free Internet, not subject to the influence of government agencies and large corporations, then YaCy is your choice. It can also be used to organize a search within a corporate or other autonomous network. And even though YaCy is not very useful in everyday life, it is a worthy alternative to Google in terms of the search process.

Pipl

What is this

Pipl is a system designed to search for information about a specific person.

The better

The authors of Pipl claim that their specialized algorithms search more efficiently than “regular” search engines. In particular, priority sources of information include social network profiles, comments, member lists, and various databases that publish information about people, such as court decisions. Pipl's leadership in this area is confirmed by assessments from Lifehacker.com, TechCrunch and other publications.

Why do you need this

If you need to find information about a person living in the US, then Pipl will be much more effective than Google. The databases of Russian courts are apparently inaccessible to the search engine. Therefore, he does not cope so well with Russian citizens.

What is this

Another specialized search engine. Searches for various sounds (house, nature, cars, people, etc.) in open sources. The service does not support queries in Russian, but there is an impressive list of Russian-language tags that you can search for.

The better

The output contains only sounds and nothing extra. In the search settings you can set the desired format and sound quality. All sounds found are available for download. There is a search for sounds by pattern.

Why do you need this

If you need to quickly find the sound of a musket shot, the blows of a suckling woodpecker, or the cry of Homer Simpson, then this service is for you. And I chose this only from the available Russian-language queries. In English the spectrum is even wider. But seriously, a specialized service requires a specialized audience. But what if it comes in handy for you too?

The life of alternative search engines is often fleeting. Lifehacker asked the former general director of the Ukrainian branch of Yandex, Sergei Petrenko, about the long-term prospects of such projects.

As for the fate of alternative search engines, it is simple: to be very niche projects with a small audience, therefore without clear commercial prospects or, conversely, with complete clarity of their absence.

If you look at the examples in the article, you can see that such search engines either specialize in a narrow but popular niche, which, perhaps, has not yet grown enough to be noticeable on the radars of Google or Yandex, or they are testing an original hypothesis in ranking, which is not yet applicable in regular search.

For example, if a search on Tor suddenly turns out to be in demand, that is, results from there are needed by at least a percentage of Google’s audience, then, of course, ordinary search engines will begin to solve the problem of how to find them and show them to the user. If the behavior of the audience shows that for a significant proportion of users in a significant number of queries, results given without taking into account factors depending on the user seem more relevant, then Yandex or Google will begin to produce such results.

“Be better” in the context of this article does not mean “be better at everything.” Yes, in many aspects our heroes are far from Google and Yandex (even far from Bing). But each of these services gives the user something that the search industry giants cannot offer.

Hello, dear readers of the blog site. , then its few users had enough of their own bookmarks. However, as you remember, it happened in geometric progression, and very soon it became more difficult to navigate in all its diversity.

Then directories appeared (Yahoo, Dmoz and others), in which their authors added and sorted various sites into categories. This immediately made life easier for the then, not yet very numerous users of the global network. Many of these catalogs are still alive today.

But after some time, the size of their databases became so large that the developers first thought about creating a search within them, and then about creating an automated system for indexing all Internet content in order to make it accessible to everyone.

The main search engines of the Russian-speaking segment of the Internet

As you understand, this idea was implemented with stunning success, but, however, everything turned out well only for a handful of selected companies that managed not to disappear on the Internet. Almost all search engines that appeared in the first wave have now either disappeared, languished, or were bought by more successful competitors.

A search engine is a very complex and, importantly, very resource-intensive mechanism (this means not only material resources, but also human ones). Behind the seemingly simple , or its ascetic analogue from Google, there are thousands of employees, hundreds of thousands of servers and many billions of investments that are necessary for this colossus to continue to operate and remain competitive.

Entering this market now and starting from scratch is more of a utopia than a real business project. For example, one of the world's richest corporations, Microsoft, has been trying to gain a foothold in the search market for decades, and only now their search engine Bing is slowly beginning to meet their expectations. And before that there was a whole series of failures and setbacks.

What can we say about entering this market without any special financial influences. For example, our domestic search engine Nigma has a lot of useful and innovative things in its arsenal, but their traffic is thousands of times lower than the leaders of the Russian market. For example, take a look at the daily Yandex audience:

In this regard, we can assume that the list of the main (best and luckiest) search engines of the Runet and the entire Internet has already been formed and the whole intrigue lies only in who will eventually devour whom, or how their percentage share will be distributed if they all survive and will stay afloat.

Russian search engine market is very clearly visible and here, probably, we can distinguish two or three main players and a couple of minor ones. In general, a rather unique situation has developed in RuNet, which, as I understand it, has repeated itself only in two other countries in the world.

I'm talking about the fact that the Google search engine, having come to Russia in 2004, has still not been able to take leadership. In fact, they tried to buy Yandex around this period, but something didn’t work out there and now “our Russia”, along with the Czech Republic and China, are those places where the almighty Google, if not defeated, then, in any case, met serious resistance.

In fact, to see the current state of affairs among the best search engines on the RuNet Anyone can. It will be enough to paste this URL into the address bar of your browser:

Http://www.liveinternet.ru/stat/ru/searches.html?period=month;total=yes

The fact is that most of them use .

After entering the given Url, you will see a picture that is not very attractive and presentable, but it well reflects the essence of the matter. Pay attention to the top five search engines from which sites in Russian receive traffic:

Yes, of course, not all resources with Russian-language content are located in this zone. There are also SU and RF, and general areas like COM or NET are full of Internet projects focused on Runet, but still, the sample is quite representative.

This dependence can be presented in a more colorful way, as, for example, someone did online for his presentation:

This doesn't change the essence. There are a couple of leaders and several very, very far behind search engines. By the way, I have already written about many of them. Sometimes it can be quite interesting to plunge into the history of success or, conversely, to delve into the reasons for the failures of once promising search engines.

So, in order of importance for Russia and the Runet as a whole, I will list them and give them brief characteristics:

Searching on Google has already become a household word for many people on the planet - you can read about it in the link. In this search engine, I liked the “translation of results” option, when you received answers from all over the world, but in your native language, but now, unfortunately, it is not available (at least on google.ru).

Lately I have also been puzzled by the quality of their output (Search Engine Result Page). Personally, I always first use the RuNet mirror search engine (there is one there, well, I’m used to it) and only if I don’t find an intelligible answer there, I turn to Google.

Usually the release of them made me happy, but lately it has only puzzled me - sometimes such nonsense comes out. It is possible that their struggle to increase income from contextual advertising and the constant shuffling of search results in order to discredit SEO promotion may lead to the opposite result. In any case, this search engine has a competitor on the RuNet, and what kind of one at that.

I think that it is unlikely that anyone will specifically go to Go.mail.ru to search the RuNet. Therefore, traffic to entertainment projects from this search engine can be significantly more than ten percent. Owners of such projects should pay attention to this system.

However, in addition to the clear leaders in the search engine market of the Russian-language segment of the Internet, there are several more players whose share is quite low, but nevertheless the very fact of their existence makes it necessary to say a few words about them.

Runet search engines from the second echelon

Internet-wide search engines

By and large, on the scale of the entire Internet there is only one serious player - Google. This is the undisputed leader, but it still has some competition.

First of all, it's still the same Bing, which, for example, has a very good position in the American market, especially considering that its engine is also used on all Yahoo services (almost a third of the entire US search market).

Well, secondly, due to the huge share that users from China make up in the total number of Internet users, their main search engine called Baidu wedges itself into the distribution of places on the world Olympus. He was born in 2000 and now his share is about 80% of the entire national audience in China.

It is difficult to say anything more intelligible about Baidu, but on the Internet there are opinions that places in its Top are occupied not only by the sites most relevant to the request, but also by those who paid for it (directly to the search engine, and not to the SEO office). Of course, this applies primarily to commercial listings.

In general, looking at the statistics, it becomes clear why Google easily agrees to worsen its search results in exchange for increasing profits from contextual advertising. In fact, they are not afraid of user churn, because in most cases they have nowhere to go. This situation is somewhat sad, but we'll see what happens next.

By the way, to make life even more difficult for optimizers, and perhaps to maintain the peace of mind of this search engine’s users, Google has recently been using encryption when transmitting queries from users’ browsers to the search bar. Soon it will no longer be possible to see in the statistics of visitor counters what queries Google users came to you for.

Of course, in addition to the search engines mentioned in this publication, there are thousands of others - regional, specialized, exotic, etc. Trying to list and describe them all in one article would be impossible, and probably not necessary. Let's better say a few words about how easy it is to create a search engine and how it’s not easy or cheap to keep it up to date.

The vast majority of systems work on similar principles (read about this and that) and pursue the same goal - to give users an answer to their question. Moreover, this answer must be relevant (corresponding to the question), comprehensive and, which is not unimportant, relevant (of the first freshness).

Solving this problem is not so easy, especially considering that the search engine will need to analyze the contents of billions of Internet pages on the fly, weed out the unnecessary ones, and from the remaining ones form a list (issue), where the most appropriate answers to the user’s question will appear first.

This extremely complex task is solved by preliminary collection of information from these pages using various indexing robots. They collect links from already visited pages and load information from them into the search engine database. There are bots that index text (a regular and fast bot that lives on news and frequently updated resources so that the latest data is always presented in the results).

In addition, there are robots that index images (for their subsequent output to), favicons, site mirrors (for their subsequent comparison and possible gluing), bots that check the functionality of Internet pages, which users or through tools for webmasters (here you can read about, and) .

The indexing process itself and the subsequent process of updating index databases are quite time-consuming. Although Google does this much faster than its competitors, at least Yandex, which takes a week or two to do this (read about).

Typically, a search engine breaks the text content of an Internet page into individual words, which are reduced to the basic principles, so that it can then give correct answers to questions asked in different morphological forms. All the extra stuff in the form of HTML tags, spaces, etc. things are deleted, and the remaining words are sorted alphabetically and their position in this document is indicated next to them.

This kind of thing is called a reverse index and allows you to search not by web pages, but by structured data located on the search engine servers.

The number of such servers for Yandex (which searches mainly only for Russian-language sites and a little for Ukrainian and Turkish) is in the tens or even hundreds of thousands, and for Google (which searches in hundreds of languages) - in the millions.

Many servers have copies, which serve both to increase the security of documents and help increase the speed of request processing (by distributing the load). Estimate the costs of maintaining this entire economy.

The user's request will be sent by the load balancer to the server segment that is currently least loaded. Then an analysis is carried out of the region from which the search engine user sent his request, and it is analyzed morphologically. If a similar query was recently entered in the search bar, then the user is given data from the cache so as not to overload the servers again.

If the request has not yet been cached, then it is transferred to the area where the search engine's index database is located. In response, you will receive a list of all Internet pages that are at least somewhat related to the request. Not only direct occurrences are taken into account, but also other morphological forms, as well as, etc. things.

Their needs to be ranked and at this stage the algorithm (artificial intelligence) comes into play. In fact, the user's request is multiplied through all possible options for its interpretation, and answers to many requests are searched simultaneously (through the use of query language operators, some of which are available to ordinary users).

As a rule, the search results contain one page from each site (sometimes more). are now very complex and take into account many factors. In addition, to correct them, and are used, which manually evaluate reference sites, which allows you to adjust the operation of the algorithm as a whole.

In general, it is clear that the matter is dark. We can talk about this for a long time, but it is already clear that user satisfaction with a search system is achieved, oh, how difficult it is. And there will always be those who don’t like something, like you and me, dear readers.

Good luck to you! See you soon on the pages of the blog site

You can watch more videos by going to

");">

You might be interested

Yandex People - how to search for people on social networks Apometr is a free service for tracking changes in search results and updates of search engines. DuckDuckGo - a search engine that doesn't follow you
How to check Internet speed - online connection test on computer and phone, SpeedTest, Yandex and other meters
Yandex and Google images, as well as search by image file in Tineye (tinai) and Google

To say that in our time of information technology and the endless growth of the volume of data available to both an individual and society, there are many problems with processing information and searching for it is already blasphemy. Who doesn't raise this topic? And in order not to burden you with subjective and, in part, objective judgments drawn from various information sources regarding the problem, I will move directly to its solution. Today we'll talk about search. That is, about programs and serious information systems that search for the documents and data we need.

Upgrade "direct search"

Not so long ago, when the trees were large, and there was not much information even on the enterprise local network, any search was carried out by simply searching through a handful of available files and sequentially checking their names and contents. Such a search is called direct, and programs (utilities) using direct search technology are traditionally present in all operating systems and tool packages. But even the power of modern computers is not enough for a quick and adequate search in gigantic volumes of data during direct search. Searching through a couple of hundred documents on a disk and searching a huge library and several dozen mailboxes are two different things. Therefore, direct search programs today are clearly fading into the background - when it comes to universal tools.

Of course, this type of search has not been in demand for a long time in the corporate sector. The volumes are not the same. And, therefore, for many years now, and recently clearly, technologies capable of quickly and accurately searching for documents of various formats and from various sources are more than relevant. Not so long ago, Microsoft’s “father” Bill Gates, apparently envious of the phenomenal success of the Internet search engine Google, at one of the press conferences announced the desire of the software industry (and not only) to contribute in every possible way, develop and deepen the creation of search engines and technologies. But it’s too early to create any phenomenally working program from Microsoft or a competitive server on the Internet (MSN still doesn’t reach Google). Therefore, let's turn to existing developments. Index, query, relevance

Modern technologies are based on two fundamental processes. Firstly, it is indexing the available information and processing the request with subsequent output of the results. As for the first, any program (be it a desktop search engine, a corporate information system or an Internet search engine) creates its own search area. That is, it processes documents and generates an index of these documents (an organized structure that contains information about the processed data). In the future, it is the created index that is used for work - quickly obtaining a list of necessary documents according to the request. What follows, although by no means simple in terms of technology, is quite understandable to the average user. The program processes the request (using a keyword phrase) and displays a list of documents that contain this keyword phrase. Since the information is contained in a structured index, query processing is much faster (tens and hundreds of times!) than in the case of direct search (the selection of documents is carried out not by enumerating files, but by analyzing text information in the index).

The program displays the found documents in the resulting list according to relevance - the document's compliance with the query text. In different technologies, of course, there are different methods for searching and determining the relevance of a document (the number of “occurrences” of a word and its frequency of mention in the document, the ratio of these parameters to the total number of words in the document, the distance between the words of the query phrase in the searched files, and so on). Based on these parameters, the “weight” of the document is determined and, depending on it, a particular file appears in the list of results at a certain position. In the case of Internet search, the situation is even more complicated. Indeed, in this case, many other factors must be taken into account (Google’s Page Rank is an example of this). But this is a topic for a separate article, so we won’t touch the Internet. Review of search engines

This material examines the capabilities of several popular search programs that boast both decent speeds and good functionality. But showing off in brochures is one thing, but standing under the gaze of an expert is quite another. And there were no more experts, no less an office full of people who liked to tinker with the software for its usability. A set of programs was installed on the experimental computer (Athlon 2.2 MHz, with RAM 1 GB, 160 GB IDE hard drive Seagate 7200 rpm and Windows XP): dtSearch Desktop, Ishcheika Prof Deluxe, Google Desktop Search, SearchInform , Copernic Desktop Search, ISYS Desktop. For the tests, a text database of documents was compiled in doc, txt and html formats with a total size of neither more nor less, but 20 gigabytes. A group of comrades under the leadership of your humble servant tested, compared and shared their subjective impressions of each software. Read a summary of the findings below. dtSearch Desktop

A program that, according to the developers, claims to be the fastest, most convenient and best search engine. Like, in general, everyone else from this review. The dtSearch interface is quite simple, but some windows or tabs are somewhat overloaded with elements, which makes it seem difficult to use. But in reality there are no particular difficulties. The only really unpleasant point is the software’s lack of support for the Russian language (despite the fact that the program can search for documents in several languages, its interface is exclusively English).

But dtSearch is one of the few programs that can index web pages to a user-specified “depth” (albeit, taking into account the “additional purchase” of the dtSearch Spider add-on kit). This is in addition to supporting disk files of various text formats and emails from the Outlook mailbox. At the same time, the program cannot work with databases, which are such a tasty morsel for search engines due to the large volumes of information contained in them and their wide distribution in companies, and therefore in corporate networks. The speed of indexing dtSearch documents turned out to be at the proper level. Looking ahead, I will say that this program coped with the indexing of a given amount of information on a level with another competitor - iSYS - and shared second place with it in the list of the fastest systems. dtSearch indexed a test 20 gigabytes of information in 6 hours and 13 minutes, creating an index of 7.9 GB for subsequent search needs.

As for the search capabilities, here they are at the proper level. Firstly, dtSearch has a morphological search (searching for a word in all its morphological forms). Using this opportunity, you free yourself from, say, such thoughts as “in what case was a certain word used in the document I needed?” The use of morphological search is almost always justified, so it should be present in any professional search engine.

Search by sound is a non-standard feature even for professional search engines. Its essence is that the program will search for words that sound the same as the word you entered. And the best part is, this function also works for the Russian language! For example, when you type the word "ear" in a search query, you will see not only the words "ear" but also "ear" as a result.

Search with error correction is a very important function. It is used to search for words containing syntactic errors - these can be either typos or errors in documents obtained using character recognition systems, for example. A simple example - you are looking for the word keyboard. Some document contains the word “keyboard”, it is obvious that in fact this is the word “keyboard”, the person just made a typo when typing. So, an error correction search will detect and include a document with the word "keyboard" in the result. There is also a setting in dtSearch that allows you to determine the degree of possible erroneous characters.

Search using synonyms. This feature uses a list of synonyms for various words. So, for example, by entering the word “fast”, the program will also find the words “high-speed” and others that are synonyms for the word “fast”, if, of course, they are present in the list of synonyms. A ready-made list of synonyms is not supplied with the dtSearch program, however, it is possible to use lists on the Internet (accordingly, a connection is required, which is not always convenient), or you can create your own list of synonyms.

In addition to the listed capabilities, dtSearch can search using phrases consisting of words connected by logical operations. Each word in a query can be assigned its own “weight,” that is, significance. A useful option is to use a dictionary consisting of unimportant words in order to not take them into account when searching, but this dictionary is also empty and you will have to fill it out yourself.

Next, let's look at the program's capabilities when working on the network. In fact, dtSearch does not offer any specific capabilities for working with the network. However, it is quite possible to use it online. Alternatively, you can create some kind of index and put it in a public (shared) folder. The program itself can be installed on each user’s computer, or it can also be placed in a folder open for public access, and shortcuts can be created in a special way for each user separately, using command line parameters, the purpose of which is described in the help file supplied with the program. It is also possible to automatically install the program on the network using an MSI file. This will take into account the settings for each connected user.

In general, it is a good program from the category of professional search engines. It may qualify for a good rating, but gaining trust and respect from users may not be easy for dtSearch due to certain factors (not everything is smooth with the interface, Russian users are deprived, there are no bright features for working with the network). As for directly searching for documents, the program had no problems with Russian text. As there were none with the declared morphology, or with a fuzzy search. The system quite adequately found the necessary documents both by a simple one-word query and by using a couple of paragraphs or a document as a key phrase.

Official site:
Distribution size: 23 Mb Bloodhound Prof Deluxe

Based on the name, you can guess that there is support for the Russian language in this program. This is already nice. As for the interface, in general, it is somewhat unusual, but in appearance it is very attractive. Another thing is convenience. A very controversial criterion, but still, probably, a multi-window solution is not the most successful option (the request is entered in one window, the result is displayed in another, and the like).

Snoop uses the same indexes to perform a quick search, but indexing is much slower than other programs. This is very strange, especially considering that its capabilities for processing search queries are very weak, and therefore the index structure is not complex. Most likely, this is due to unoptimized algorithms. This program turned out to be a clear outsider in indexing and search speeds: the time spent on creating an index is six times longer than that of dtSearch and iSYS. Indexing 20 gigabytes of texts for the bloodhound resulted in 38 hours and 46 minutes of work. And the created “search area” took up the same size on the hard drive as the original data with a small minus - 19 gigabytes.

Bloodhound can be presented as an alternative to the standard search in Windows; it is unlikely to be capable of more. The fact that the Snooper's primary task is the simplest search for files is indicated not only by the small number of functions for analyzing the text of search queries and an advanced search by file attributes, but even by a results window that provides direct links to the files found, as well as to the folders containing these files. The results window is not very informative in the sense that you can read the entire found file only by running it, that is, it does not have a built-in file viewer. But an excerpt from the file where the searched word was found is displayed; in general, this display scheme is very reminiscent of Internet search engines.

Speaking about specific capabilities for processing search queries, it is worth noting that there is no such thing as “search text”; the maximum that can be searched is a phrase, if only because there is no multi-line text input field. However, you can analyze the entered phrase, and Snoop offers us a standard search set here: logical operations, mask search and quote search... not a lot. The program contains some rudiments of morphological search, but it is probably so crude that it most likely interferes with correct operation (during tests, many bugs with incorrect use of morphology were noticed).

But the program allows you to specify file attributes when searching (document date, file name, folder name), and in these queries you can also use the same search set. You can also search for letters by specifying the parameters (From, Subject..., etc.).

So, we figured out the search itself, what else is interesting about the program, for which it received so many awards, according to information from the official website? It’s hard to say what’s so special about it; most likely, the Bloodhound interface is attractive (exactly in appearance, not to mention usability).

Operations with indexes are very standard; a nice feature is the ability to update indexes on a schedule. Additionally, indexes can also be used online. From now on we need more details.

Despite the primitiveness of search queries, the program can be used to search for files, so its use can be justified in networks. Although this is a stretch, since in a large network the priority is to quickly search for data using complex search queries due to the huge amount of information - and there are clearly problems with the speed of the search and the program. I must say that the work with the network at Izhishika is thought out as it should. A separate application is designed specifically for this - Bloodhound Server. It works the same way as simply Snooper (they have one search engine), only for documents located on a central server or on shared resources on the corporate network. Snooper Server creates new indexes on shared resources or uses previously created ones. Any user of the corporate network can connect to the Search Server and use it to access any document (located in the current index) using an Internet browser. Agree, this scheme is extremely convenient: it turns out that files on your own network can be searched in the same way as information on the Internet through, for example, Google.

Assessing all the advantages and disadvantages of this program, the conclusion suggests itself that its capabilities are most likely not enough for corporate networks (despite the good organization of working with the network), but for a home computer or even for a home network it is, in principle, , it might come up. Although neither the speed of work nor the search capabilities inspire optimism...

Official website in Russian:
Distribution size: 6 MbGoogle Desktop Search + GDS Enterprise

Of course, we couldn’t ignore such a famous developer. The name Google already says a lot. People who have been using the most powerful Internet search engine for years will certainly, without a single doubt, decide to install this particular search engine on their computer. Just think: Google on your home computer! However, without giving in to provocations with a widely promoted brand, let’s try soberly, and most importantly objectively, to consider the capabilities of the “desktop” search engine from Google.

The first thing that catches your eye is the lack of its own shell for the program. Google Desktop Search is still located in the browser window, respectively, the entire interface of the desktop version was inherited from the software from its older Internet brother. Whether this is good or bad is a moot point: some people like the minimalism in the design of this search engine, while others want to see a full-fledged application filled with all kinds of buttons and so on.

What catches your eye right after the design? And the fact that this same Google Desktop Search begins to index everything on the computer, without any demand! And what’s most interesting is that it is impossible to select indexing paths using Google Desktop Search. You will have to download a separate program (TweakGDS), which will allow you to somewhat expand the Google Desktop settings, including specifying the places necessary for indexing. Although, by the time you figure all this out, it will already index a standard hard drive, so this setting is more likely to be needed when working with large amounts of data, which is very important when used in corporate networks (Enterprise versions). However, it is not a fact that after downloading TweakGDS, your problems will be solved. After all, it requires the Microsoft .NET Framework and Microsoft Scripting Runtime to work. Yeah... the installation, as well as access to the settings, could have been made simpler, although the developers can probably understand: why write something new when there is a ready-made search engine, ported it to the local computer and let the user “enjoy” , and a famous name will make another masterpiece out of “this.” Come on, let's end this lyrical digression and move on to the search.

As for analyzing search queries and delivering results, everything here is absolutely identical to Google on the Internet: the same system for displaying results, the same standard set of logical operations for search queries. In general, Google Desktop Search, like the previous program, is intended exclusively for searching for files - it, of course, does not have an internal viewer for these files. The number of file formats supported by Google Desktop Search is quite sufficient, and it is also nice that it searches visited Internet pages, taking data from the cache. Search and indexing speeds are quite acceptable. True, for home use. Google Desktop Search coped with an impressive 20 gigabytes of texts in 8 hours and 17 minutes. Spending several days processing information from the corporate network of a large enterprise is not something any system administrator would like to do. On the plus side: the size of the created index was on the same level (4.5 GB) as another search engine tested in this review - SearchInform.

The big advantage (or disadvantage - you decide) of Google Desktop Search is that it supports plugins, which can change a lot for the better. Another thing is that connecting plugins and setting them up complicates the task of installing a search engine so much that you begin to wonder whether all this is necessary when you can install a normal, full-fledged program in which everything will already be present. After all, to use each feature you will have to install a new plugin. Even in order for the program to fully work with archives, a separate gadget is needed. It’s fascinating and seductive that all these additional modules are free. However, if you do not take into account the desktop version of the search engine, then competent configuration of GDS Enterprise may not be within your power - after all, it is not for nothing that specialists from Google offer their services for setting up their own software for your network for only $10,000.

If you do go through the setup and installation procedure (or pay $10,000 to a quick response team from Google), you will understand that the complexity of the installation is more than compensated by the very flexible settings when used in corporate networks. An important aspect of using Google Desktop on a corporate network is the use of group policies, which makes it possible to set settings for each user.

To summarize, the most reasonable use for this program is a home or work computer. After all, for an ordinary computer, it’s enough just to install the program - it will do the rest itself (it won’t even ask you anything).

However, Google Desktop Search Enterprise will be acceptable in cases where there is an urgent need for flexible configuration of network policy to use the search engine, while the ability to process search queries will be in second place in importance, and the time (or money) spent on setting up the program will be in first place place.

Official site:
Distribution size including TweakGDS: 1.2 MbCopernic Desktop Search

Click on the picture to enlarge

The program interface evokes extremely positive emotions - everything is done in accordance with generally accepted standards, nothing superfluous, in a word, a pleasant design. For a beginner, understanding the Copernic Desktop Search interface will be very easy. Although, it is somewhat confusing that the designers clearly created the program interface taking into account the fact that the program will work in the standard Windows XP theme. When using the classic theme, the program does not look so nice. But this is more a matter of taste.

At the first launch, the program prompts you to create indexes for search. It seemed somewhat unusual that after selecting folders for indexing, the program did not offer to press any button, such as “Start indexing”, and indexing did not start automatically, only then it was noticed that Copernic was trying to start indexing while the computer was idle. You'll have to dig a little deeper into the program's options to configure everything properly. It should be noted that there are quite broad possibilities for setting up automatic index creation: a built-in scheduler, the ability to index while the computer is idle, in the background, with low priority. Indexing was not too fast - 10 hours 51 minutes - this is slower than in other search engines (except for Isle of Bloodhound, but Copernic is still an order of magnitude faster than the development of iSleuthHound Technologies.

Now about the structure of the index. In general, there is nothing special about it. It is possible to select file types, both in general and detailed form. That is, initially you can choose what you want to index - Documents, Images, Videos, Music. On the other tab of the options window, you will be able to select specific file types by extension. Additionally, you can configure the index so that, for example, pictures smaller than 16x16 in size are not indexed or sound files less than 10 seconds in length are not indexed. In addition to indexing files from folders, Copernic can work with emails and contacts from the address book of Microsoft Outlook and Microsoft Outlook Express, and indexing of Favorites and History from Internet Explorer is possible.

As for the search capabilities, they are very weak here. During tests, it was even revealed that the program does not search for documents in txt and html formats in Russian, allowing you to find them only by titles, and not by content. The only thing the program provides to improve search efficiency is the use of a standard set of logical operations, and even then, this feature was discovered experimentally, since it was not documented. By the way, the program’s help is also not all right - it is only available via the Internet, which, you see, is very inconvenient, and there is not too much help information on the Internet. Apparently, the developers decided that the simple interface of the program does not imply the presence of normal help. Continuing the conversation about search capabilities, it should be noted that, despite the weak analysis of queries, the program provides an interesting search system - the user can select the type of files (images, videos, music, etc.), enter a search query and select attributes specific to selected file type. For example, for sound files, these can be values from mp3 tags (artist, album, date, etc.), for images, for example, you can select their size (by resolution), in general, each type has its own settings. After searching for a specific file type, the program will display a very informative list in the results window, and if your request includes files of other types, you can open them by clicking on a specific link.

Separately, it is worth mentioning the results display window. Below the list of found files, the contents of these files are displayed (a similar scheme is often used in email clients). True, text viewing can only be done in the native format, and there is no plain text display mode, which is not always convenient, since opening a document in this case takes more time. But, given that Copernic can search for images and music, it is possible to view these multimedia files.

The basic principles of operation of this program are described, now let's see what Copernic Desktop Search can offer us for working with the network... In principle, you can watch for a very long time, but you will hardly be able to see anything. In other words, this program was not intended to be network-based. Copernic Desktop Search is a home search engine exclusively.

Obviously, the only (most logical) application of this program is a home computer. Here it will fully cope with all simple user search queries consisting of one or two words, will find the necessary information, and the division of search by file type and support for multimedia files along with background indexing in low priority mode, coupled with a pleasant interface, only give the program strength to gain trust among inexperienced users.

Official site
Distribution size: 2.6 MbISYS Desktop

Click on the picture to enlarge

A very powerful program. In terms of its level of equipment with all sorts of functions, it is somewhere close to the next SearchInform search system on the list. Moreover, the size of the installation file is more than 40Mb! It’s hard to say what could be squeezed into such dimensions, because the same SearchInform, with similar functionality, takes up 15Mb.

The installation process here is also not very pleasant, or rather not even the installation process. Even before downloading the program, you will be asked to register, otherwise there is no way. Next, the interface. It is made very nicely, nothing unnecessary catches the eye, however, these are the impressions of a person who is already somewhat accustomed to it. It will not be easy for a beginner to figure out where and what is located, where to click and where to finally search. It is highly recommended to read the help before starting work - you will save a lot of nerves and time. Added to everything else is the complete lack of support for the Russian language in the program. Not good. In addition, the windows here are not overloaded with controls, but we had to pay for this with multi-modules and the use of additional windows. For example, search queries are entered by launching one program, and index management is performed using another program. Search queries are also entered here in separate pop-up windows. It’s hard to say which is better - an overloaded interface or ubiquitous multi-windows; rather, it’s a matter of taste.

When it comes to creating indexes, the program provides features to simplify the process of setting options for a new index. These features include several ready-made templates for creating indexes for the folder “My Documents”, “Mail”, “Mail and Documents”, “Specific Folder”, “Folder with a selection of file types”, etc. Such templates simplify the creation of indexes on the first stage. The utility for working with indexes does not have a very good interface, which is intimidating with some complexity (this is a very subjective assessment, to be honest), however, if you look at it, it provides many useful options and, in general, its use does not cause much difficulty. ISYS Desktop can index data from various data sources, and also provides many flexible settings for such indexing. Additional indexing features include: support for SQL, FTP, TRIM Context, WORLDOX 2002, scripts. When creating an index, if you selected the "Folder with selection of file types" item, you have the opportunity to select file types for indexing manually (by extension). It must be said that there are simply a huge number of supported file types, but you will not be able to add your own type (extension) to the existing list. You can also note the presence of an indexing scheduler. Creating an index and processing 20 gigabytes of information took ISYS Desktop 6 hours and 13 minutes, ultimately showing a good time and the size of the created file - 7.9 GB.

The search capabilities of this program are quite good. What is used in ISYS is much more powerful than conventional support for logical operations. Among the advanced search capabilities, the program offers the use of synonyms and a sorting filter (by path, name and date of file creation). The set of logical operators is somewhat wider than the standard set. In addition to logical operations, the program allows you to work with many other operators, which, in principle, can replace some types of search; for example, search with parsing can be completely replaced by using special operators. I was very surprised that the program does not have a search using morphology. This is a serious omission, since search efficiency is greatly improved when using morphological analysis. In addition, there is no list of significant words, but there is an extensive list of insignificant words. Search functions such as “approximate search” and “heuristic analysis” are also announced.

ISYS provides a choice of several types of search queries, namely visual types. This is done using different types of windows for entering search queries, however, in fact, not a single window allows the use of technologies other than those listed above.

The search results are very informative and are displayed as a list of documents sorted by relevance. A preview of the selected document is displayed below. Unlike Copernic Desktop Search, preview here is available only in the form of plain text; it was not possible to display documents in their native format, be it Word, Html or PDF, although this, in principle, is not too critical. The program allows you to divide found documents into groups according to certain criteria (by default they are divided by relevance). You can also view already found documents by selecting individual folders (this is convenient when the result produces a very large number of documents).

Using the program on a corporate network is also very justified, since it provides good opportunities for organizing network search. The search system is based on the creation of a public index that contains indexed data from publicly available online resources.

In fact, the program from ISYS is worthy of attention, at least getting acquainted with it. This program is a mature project with a huge number of functions (not always and not everyone, of course, needs them, but still). The chances that the program will see some improvements in terms of processing search queries are unknown, but at the moment it can be recommended for almost universal use. And given that it is still too heavy for home systems, the main places for its installation are corporate networks.

Official site:
Distribution size: 40 MbSearchInform

Click on the picture to enlarge

It’s probably not worth starting right away with a description of the SearchInform interface. We should first describe the installation process, or rather one of its details: you cannot install the program without an Internet connection. The fact is that before the first launch, the program requires user registration (free) and sends all entered data to the server. Apparently, the developers had to take such measures in the fight against piracy, but this did not have a positive effect on the ease of installation.

The program interface is designed in compliance with all generally accepted rules, however, at first glance, it is somewhat cumbersome. Using the program for the first time, it seems that it is too complicated, sometimes it is not easy to remember in which menu or on which tab the desired option is located, however, with longer use, the interface no longer seems so terribly complex. The main thing is to read the certificate first.

Having understood the interface a little, you can start creating an index. The process itself is very simple and the indexing speed, even by eye, is significantly higher than all other search engines in the review. Clear test numbers show that SearchInform is twice as fast as dtSearch and iSYS in terms of indexing speed! The program indexed the provided data in the amount of 20 gigabytes in a record time of 3 hours 17 minutes. And the size of the created index turned out to be the smallest 4.4 GB - 100 megabytes less than Google Desktop Search.

The program supports, in addition to ordinary files and folders, also indexing emails, connecting and indexing databases (!) and other external sources (DMS, CRM), immediately during indexing you can specify a dictionary for conducting a morphological search, and all attributes can be indexed files. After creating the index, when trying to conduct the first test search for documents, you may become somewhat confused: “there are two types of search here, but which one do I need?” As mentioned earlier, the main thing is to read the certificate, then everything will become clear. The program can actually carry out two types of searches - phrase search and search for documents similar in content to the query text.

A description of all the main functions for analyzing a search query was given above, so now we will only list the search capabilities provided by this program. Let's start with phrase search: of course, morphological search, citation search, logical operations, search with word parsing (search at the beginning of the word, at the end, at the middle part, or a complete match), mixed citation search (when all words from the query must be present in the document, but not necessarily in the entered order), search with error correction, use of synonyms, “almost citation search” (search for the entered phrase as a citation, but other words may be present between the entered words), etc. Some of the options listed have their own specific settings. In addition, it is possible to use a dictionary of unimportant words, and the program already has a ready-made list of these words; you can also use a dictionary of priority words for searching (of course, you will have to fill it out yourself).

Here, in principle, we briefly reviewed all the main features of phrase search.

Let's move on to consider the features of this program - searching for similar documents. The developers claim that this is by no means a simple text search, it is precisely a “search for similar ones” - this is exactly how it is described everywhere, but oh well, you can call it whatever you want - the main point is. A quick search on the Internet can quickly reveal that so-called "similar search" is a new development in the field of text analysis. This system allows you to find texts that are similar in semantic content. The most pleasant thing was that after conducting test search queries, it turned out that the theory coincides quite well with practice! The program actually searches for documents with similar content and displays them in a list, sorting them by percentage of similarity.

Next, let's look at what SearchInform (in particular, its corporate version SearchInform Corporate) offers for working on a corporate network. There are two types of applications: server side and user side. The server part independently processes the specified indexes, and users can use them for search, depending on the access rights assigned to them. Users can be configured automatically using Windows accounts (in professional terms, SearchInform uses NTFS Windows authentication) or manually (users will have to be added separately). Each user can be allowed or denied access to certain indexes, and users can also be combined into groups. In general, SearchInform’s settings for working on the network are ahead of Google in terms of flexibility, and Ishhound Server in terms of convenience and simplicity.

Official site:
Distribution size: 14.7 Mb Comparison of indexing speeds

Search system	Indexing time	Index size
Bloodhound Prof Deluxe 4.5	38 hours 46 minutes	19 GB
Isys Desktop 7.0	6 hours 13 minutes	7.9 GB
DtSearch 7.0	6 hours 3 minutes	8.6 GB
Google Desktop Search Enterprise	8 hours 17 minutes	4.5 GB
Copernic Desktop Search *	10 hours 51 minutes	7 GB
SearchInform 1.5.02	3 hours 17 minutes	4.4 GB

* Most of the documents.html and .txt containing Russian text, although they were indexed, were impossible to find except by their names. Summary

All programs are worthy of attention.

Based on tests and a careful examination of each program presented in the review, certain conclusions can be drawn. So, Google Desktop Search Copernic Desktop Search is quite suitable for the inexperienced user as a home information search system. They cope well with simple queries, do not overload the user with settings and, moreover, are completely free. Google's attempt to enter the corporate search engine market is not yet very justified: for it to work properly, the program needs to be equipped with additional modules, and it is far from easy to set up. Therefore, the self-explanatory names Desktop Search, Copernic, and Google reserve behind them the niche of “desktop” search engines.

True, more powerful solutions - dtSearch, iSYS and SearchInform are also not foolproof and offer users their “desktop” versions. But at a reasonable price, unlike free software from Google and Copernic. Of course, you have to pay for power, speed and functionality. But the main focus of the developers of dtSearch, iSYS and SearchInform is, of course, on the corporate sector. Networking, functionality, indexing and search speed are what distinguish these products from their “competitors.” Based on the test results, the favorite was identified - SearchInform. The program provides the ability to search for similar documents, has the fastest indexing and search speeds, and has a good set of functions.

05/10/2016

FileSeek is a useful utility that allows many users to easily find the files they need on their hard drive. The multifunctional program is capable of finding data using various filters. Setting up the procedure is quick and does not require special attention. The user can customize the search parameters to suit their needs. FileSeek is very fast and scans hundreds of files. Along with the result, the file size, line number, date of last modification and document name are displayed. You can also create, manage and sync multiple profiles for different projects. It is possible to switch to another language...

30/03/2016

Copernic Desktop Search is a convenient application for searching for diverse information. Through the program, users of different levels will find email messages, attached files, and other documents. A simple interface helps the user find files with formats such as doc, docx, pdf, xlsx and others. You can find music files, graphics, images and videos. The advantages of the application include the small size of the application and its minimal load on the computer. The program uses processor resources, disk space and RAM. Copernic Desktop Search (CDS) contains a variety of filters.

21/10/2015

NeoSearch is one of the most convenient applications designed to search for the data the user needs on a computer. The program has a convenient, understandable, stylish user interface. It has a number of simple functionality that even a beginner can easily handle. After installing the program, file indexing starts. neoSearch checks the status of all files that are on the computer at that moment. This entire process takes minimal time. A specialized scale informs you about the progress of the application. Afterwards, the search results are displayed on the screen simultaneously in the form of four documents, which are max...

01/12/2014

Wise JetSearch is a program that provides a direct search for various files or folders located within various local drives of a personal computer or on removable portable storage media. This software advantageously replaces the standard file search model built into the operating system. It can work with NTFS and FAT drives; data is searched using a user-specified template, name or other individual specific parameters. The algorithm for working with the Wise JetSearch program is quite simple: enter keywords, select a drive, start search...

29/09/2014

FileSearchy - created to simplify the search for files on your computer. Unlike other search programs, this program searches by name in real time. The program also allows you to search the contents of files such as doc and pdf, and if it finds it, then it highlights the found text. FileSearchy instantly finds the files you need by name. The program itself supports tabs, which will allow multiple searches in different tabs, which can be aligned at the end. Supports search by date, file size and registry entries. It is possible to search for multiple lines as well as exclude lines that should not be...

05/09/2014

SoftPerfect Network Search Engine (NSE) is a program that is useful for quickly searching for shared files on a local network. It indexes and arranges files so that with the correct request, any user with access to a given local network will find the desired file in a few seconds. The network administrator has many opportunities, because the program has a wide selection of useful functions that will undoubtedly come in handy during work. For example, the function of selective indexing of files and folders. Installing and configuring the program will not take much time, since it is not complicated, it will be enough to copy a few files to the folder with...