INTERVIEW ON THE PRICE OF BUSINESS SHOW, MEDIA PARTNER OF THIS SITE.
Recently Kevin Price, Host of the nationally syndicated Price of Business Show, interviewed Elizabeth Thede.
Today, I wanted to reveal some secrets you need to know about text search engines. But first I need to start with some background on how a text search engine works. Imagine your data.
You may picture your emails as they appear in Microsoft Outlook, your PDFs as they look in Adobe Reader, and your “Office” files as they appear in Microsoft Word, Excel, Access, PowerPoint and OneNote. And most of these applications let you do basic searches for keywords like fall or autumn.
But for a search engine like dtSearch to instantly search across terabytes of mixed format data, it cannot pull up each file individually in its associated application. Instead, it has to go to the source and parse the binary format data. Now if you looked at the binary format of most of these file types, you’d see a blur of binary codes. In fact, in many of these binary formats, it is very difficult to even visually pick out the text. The job of a search engine is to fully parse that data.
The first step to parsing data is to figure out the correct data type, as the specifications to parse each particular file format are very different. Even going from say a .doc Microsoft Word document format to a .docx Microsoft Word format results in a completely different file format specification. And this brings us to the first secret you can expect a search engine like dtSearch to uncover.
Secret No. 1. You cannot hide a file from a search engine by giving it a phony extension. So, even if you name your PDF files with PowerPoint extensions and your email files with OneNote extensions, a search engine can figure out the actual underlying file type and parse it accordingly.
Secret No. 2. Just like you can’t hide a file from a search engine by giving it a phony extension, you also can’t hide a file from a search engine by nesting it in another file. A search engine like dtSearch can see right through files compressed in a ZIP or a RAR archive attachment say to an email file, or even a file nested inside other files, like an Excel spreadsheet embedded inside a Microsoft Word file.
Secret No. 3. You also can’t hide data from a search engine by putting it outside the main text of a file or email. Here’s what I mean by that. Microsoft Office files, PDFs and emails can sometimes include obscure metadata. You can find that data when you pull up the file in its associated application, but it can take some clicking around. And if you don’t go looking for it, you might not even realize it is there. However. such metadata is stored “openly” in the binary format of documents, emails and the like and accordingly is an “open book” to a search engine.
Secret No. 4. White on white or black on black text may be hidden in an ordinary associated application view, but it is just as apparent as black on white text to a search engine. So if you may be thinking of hiding data that way, it won’t work with a search engine.
Secret No. 5. Each binary file has its own unique hash value, and even that is searchable by a search engine like dtSearch. A hash value is generated from feeding every byte of a binary file into a hash-generating algorithm. If you edit even one character of a document, the result would be a completely different binary file hash value. dtSearch can not only generate a unique hash value for each file in a dataset, but also search the dataset for specific hash values.
Now some general background on dtSearch and how you can get a copy yourself to try. The company has enterprise and developer products that instantly search through terabytes of Office files, emails, databases and web data. All products offer over 25 different search options, and all can go deep into data to find information like credit card numbers.
Because dtSearch can search through terabytes, large enterprises like Fortune 100 companies and federal, state and international government agencies are dtSearch customers. But anyone can download a fully-functional 30-day evaluation version of dtSearch Desktop at dtSearch.com to instantly search terabytes of your own data, and see for yourself what secrets dtSearch can find.
LISTEN TO THE INTERVIEW IN ITS ENTIRETY HERE:
The Price of Business is one of the longest running shows of its kind in the country and is in markets coast to coast. The Host, Kevin Price, is a multi-award winning author, broadcast journalist, and syndicated columnist. Learn more about the show and its digital partners at www.PriceofBusiness.com (scroll down to the bottom of the page).