Why Is Pubmed a Good Database
J Med Libr Assoc. 2007 Oct; 95(4): 442–445.
Comparing test searches in PubMed and Google Scholar
Received 2006 Oct; Accepted 2007 May.
This article has been cited by other articles in PMC.
- Supplementary Materials
-
Appendix
GUID: ED64F011-89A7-40AF-BD5B-3059039CEAF0
Table 5
GUID: E4C9C494-0B68-459C-8DCC-C9422CD0B3C9
INTRODUCTION
Google Scholar has been met with both enthusiasm and criticism since its introduction in 2004. This search engine provides a simple way to access "peer-reviewed papers, theses, books, abstracts, and articles from academic publishers' sites, professional societies, preprint repositories, universities and other scholarly organizations" [1]. An obvious strength of Google Scholar is its intuitive interface, as the main search engine interface consists of a simple query box. In contrast, databases, such as PubMed, utilize search interfaces that offer a greater variety of advanced features. These additional features, while powerful, often lead to a complexity that may require a substantial investment of time to master. It has been observed that Google Scholar may allow searchers to "find some resources they can use rather than be frustrated by a database's search screen" [2]. Some even feel that "Google Scholar's simplicity may eventually consume PubMed" [3].
Along with ease of use, Google Scholar carries the familiar "Google" brand name. As Kennedy and Price so aptly stated, "College students AND professors might not know that library databases exist, but they sure know Google" [4]. The familiarity of Google may allow librarians and educators to ease students into the scholarly searching process by starting with Google Scholar and eventually moving to more complex systems. Felter noted that "as researchers work with Google Scholar and reach limitations of searching capabilities and options, they may become more receptive to other products" [5].
Google Scholar is also thought to provide increased access to gray literature [2], as it retrieves more than journal articles and includes preprint archives, conference proceedings, and institutional repositories [6]. Google Scholar also includes links to the online collections of some academic libraries. Including these access points in Google Scholar retrieval sets may ultimately help more users reach more of their own institution's subscriptions [7].
While its advantages are substantial, Google Scholar is not without flaws. The shortcomings of the system and its search interface have been well documented in the literature and include lack of reliable advanced search functions, lack of controlled vocabulary, and issues regarding scope of coverage and currency. Table 1 summarizes some of the reported criticisms of Google Scholar.
Table 1 Criticisms of Google Scholar
Vine found that while Google Scholar pulls in data from PubMed, many PubMed records are missing [20], and that Google Scholar also lacks features available in MEDLINE [12]. Others have noted that Google Scholar should not be the first or sole choice when searching for patient care information, clinical trials, or literature reviews [23,24]. Thorough review and testing of Google Scholar, being an approach similar to that used to evaluate licensed resources, is necessary to better understand its strengths and limitations. As Jacso states, "professional searchers must do sample test searches and correctly interpret the results to corroborate claims and get factual information about databases" [18]. This paper compares and contrasts a variety of test searches in PubMed and Google Scholar to gain a better understanding of Google Scholar's searching capabilities.
METHODOLOGY
Ten searches were performed in PubMed using a variety of available search features. The searches were repeated in Google Scholar to approximate a user's approach to those same topics in that search engine. The searches, performed between August and September 2006, were by topic, author, title, journal name, and/ or combinations of those fields (Appendix online). Topics included iron-deficiency anemia, bupropion for smoking cessation, and articles by specific authors in specific journals. The topics selected were loosely based on questions received during reference transactions or were previously developed for use during instruction.
For each search, the citations received via Google Scholar and PubMed were examined to determine a variety of characteristics including format, date, Medical Subject Headings (MeSH) where appropriate, uniqueness, duplications, and full-text availability from the author's institution.
Most searches were narrowed by date to produce sets of a reasonable size to allow comparison of unique items retrieved by each system. The search results were analyzed to determine possible reasons for the retrieval of unique items in each resource and to gather information on the general features of the Google Scholar results.
RESULTS
In eight of the ten searches, Google Scholar returned larger retrieval sets than PubMed (Table 2). Table 3 illustrates the characteristics of the items retrieved by Google Scholar, and Table 4 provides information on PubMed retrieval sets. Most items retrieved by Google Scholar were journal articles (Table 3). Items in other formats included: 9 books, 11 book reviews, 2 Web pages, 1 subject index listing, 1 thesis, 1 newsletter item, 1 bibliography, 4 author replies, 1 annual meeting abstract, and 1 draft document. These results yielded few gray literature items.
Table 2 Number of retrieved items
Table 3 Characteristics of Google Scholar results
Table 4 Characteristics of PubMed results
The main title link in Google Scholar citations was used to determine if full text was found. Full text was available in 46.96% (116/247) of the total citations retrieved. In most cases, it was assumed that full-text access was based on the institutional subscriptions available to the author of this study. Some items retrieved might have been freely available. In 22.67% (56/247) of the results, the Google Scholar citation was simply a link out to a PubMed record. As shown in Table 4, nearly half (48.98%; 72/147) of PubMed citations provided full-text access through the author's institution.
The unique items retrieved by each interface were examined to determine why they were missed by the other system. Across all searches, Google Scholar retrieved a total of 247 citations, 125 (50.61%) of which were unique to Google Scholar. Analysis revealed the following characteristics:
-
Thirty-two items (12.96%) retrieved by Google Scholar were formats other than journal articles.
-
Some unique Google Scholar items (10 items, 4.05%) appeared in journals not indexed by PubMed.
-
Google Scholar covered a wider date range and returned 4 items (1.62%) older than 1950 that were not in PubMed.
-
Google Scholar retrieved items based on its ability to search the full text of many articles rather than solely on citation data.
PubMed retrieved a total of 147 citations across all searches, and, of these, 46 (31.29%) were unique.
DISCUSSION
Assumptions of search engine performance based purely on retrieval quantities can be misleading without closer investigation of the results. For example, Table 2 shows that many of the searches returned quantities that were close in numbers. In search #1 (dietary supplements as a treatment for iron deficiency anemia), PubMed returned twenty-five citations, while Google Scholar returned twenty-six citations. However, only four citations were common to both systems. In search #2 (Mobius syndrome), Google Scholar returned eleven citations, while PubMed found ten citations but with an overlap of only two citations retrieved by both systems.
Terminology was observed to be a major factor affecting retrieval and the ability of both systems to return unique items. Some unique items retrieved by Google Scholar were off topic. These "false hits" appear to be related to Google Scholar's full-text searching along with a lack of controlled vocabulary. For example, the purpose of search #7 was to find articles on the topic of "wine" that appeared in the New England Journal of Medicine. Google Scholar retrieved eight items where the word "wine" appeared in the full text but was not the main topic of the article, in one case, retrieving an article where the authors acknowledge a colleague with the surname Wine. Google Scholar also returned items that contained the search terminology but did not match the intention of the search. In the search for information about dietary supplements in the treatment of iron deficiency (search #1), Google Scholar returned some citations about high iron stores rather than deficiency (Table 5 online). Google Scholar searches for a word or sequence of letters and not the concept or meaning.
The complete citations for all unique items retrieved by PubMed were examined. One possible explanation why Google Scholar failed to retrieve the same items was that many were indexed under the appropriate MeSH term, although the search phrase might not have appeared in the title or abstract. For example, search #9 was designed to retrieve articles by Visek about the topic of ammonia. While ammonia was not searched specifically as a MeSH term, PubMed automatically mapped it to MeSH. Of the unique citations retrieved by PubMed, some were indexed under ammonia although this term did not appear in the citation (Table 5 online). While Google Scholar offers the ability to use a tilde (∼) to retrieve alternative terminology, this ability does not provide the control that subject headings do.
CONCLUSION
Performing a direct and exact comparison between searches in Google Scholar and PubMed is not possible as the systems function in very different manners. For example, PubMed searches a well-defined set of journals, while Google Scholar includes resources beyond journals and the exact scope of coverage is not extensively described. Because the systems are not searching identical data, the results are often different.
Although these two systems are difficult to compare, it is still important to explore the differences between them. Librarians should understand the strengths and weaknesses of Google Scholar and be prepared to explain them to their users [14]. It may also be wise to consider including Google Scholar in bibliographic instructional sessions and to convey how it compares to other search interfaces [11]. For example, Google Scholar does not offer the number and extent of special searching and limiting features available in PubMed. However, Google Scholar provides some advantages in that it is an easy place to begin a search to find an initial retrieval of possibly worthwhile articles. It also offers searchers the ability to find citations to older items that they would miss if they use only PubMed. Additionally, Google Scholar has the potential to provide access to the gray literature. This increased access to a part of the biomedical literature, which can be difficult to search, may have implications for the public health field [25].
One of the most advantageous features of searching PubMed is the ability to utilize the MeSH vocabulary, as Google Scholar does not currently implement controlled vocabulary searching mechanisms. MeSH provides a powerful method of narrowing results and homing in on what the searcher needs. PubMed also offers substantially more features that allow searchers to narrow their retrieval to citations from clearly identified sources, as detailed in NLM's List of Journals Indexed for MEDLINE and List of Serials Indexed for Online Users [26]. The problem faced today by searchers is not a lack of information but rather an overload of information. For a researcher conducting human studies, writing a dissertation, finding information pertinent to patient care, or conducting an in-depth literature review, Google Scholar does not appear to be a replacement for PubMed, though it may serve effectively as an adjunct resource to complement databases with more fully developed searching features. It is important to note that both PubMed and Google Scholar are often upgraded with new features or with intended improvement of existing functions. It may be worthwhile to repeat this study in one or two years to determine if further refinements have improved their performance.
Supplementary Material
Acknowledgments
The author thanks the following individuals who offered invaluable advice and support: Pauline Cochrane, Robin Beck, Sandra De Groote, AHIP, Victoria Pifalo, and Ann Carol Weller.
Footnotes
Supplemental Table 5 and an appendix are available with the online version of this journal.
REFERENCES
- Google Scholar help: about Google Scholar. [Web document]. Mountain View, CA: Google, 2005 [cited Aug 2006]. <http://scholar.google.com/intl/en/scholar/about.html>. [Google Scholar]
- Kesselman M, Watstein SB.. Google Scholar(tm) and libraries: point/counterpoint. Ref Serv Rev. 2005;33(4):380–7. [Google Scholar]
- Abbasi K. Simplicity and complexity in health care: what medicine can learn from Google and iPod. J R Soc Med. 2005 Sep;98(9):389. [PMC free article] [PubMed] [Google Scholar]
- Kennedy S, Price G. Big news: "Google Scholar" is born. [Web document]. Resource Shelf. UK: Free Pint Limited, 2004. [cited Aug 2006]. <http://www.resourceshelf.com/2004/11/18/wow-its-google-scholar/>. [Google Scholar]
- Felter LM. Google Scholar, Scirus, and the scholarly search revolution. Searcher. 2005 Feb;13(2):43–8. [Google Scholar]
- Giles J. Science in the Web age: start your engines. Nature. 2005 Dec 1;438(7068):554–5. [PubMed] [Google Scholar]
- Notess GR. Scholarly Web searching: Google Scholar and Scirus. Online. 2005 Jul–Aug;29(4):39–41. [Google Scholar]
- Jacso P. Google Scholar (redux). [Web document]. Peter's Digital Reference Shelf, 2005. [cited Aug 2006]. <http://www.galegroup.com/reference/archive/200506/google>. [Google Scholar]
- Jacso P.. Google Scholar: the pros and the cons. Online Inform Rev. 2005;29(2):208–14. [Google Scholar]
- Burright M. Google Scholar: science & technology. [Web document]. Issues Sc Technol Libr 2006 Winter;45. [cited Aug 2006]. <http://www.istl.org/06-winter/databases2.html>. [Google Scholar]
- Giustini D, Barsky E.. A look at Google Scholar, PubMed, and Scirus: comparisons and recommendations. J CHLA/J ABSC. 2005;26:85–9. [Google Scholar]
- Vine R. Google Scholar [electronic resources review]. J Med Libr Assoc. 2006 Jan;94(1):97–9. [Google Scholar]
- Myhill M. The ADVISOR reviews … Google Scholar. [Web document]. The Charleston ADVISOR 2005 Apr;6(4). [cited Sep 2006]. <http://www.charlestonco.com/review.cfm?id=225>. [Google Scholar]
- Gardner S, Eng S.. Gaga over Google? Scholar in the social sciences. Library Hi Tech News. 2005;8:42–5. [Google Scholar]
- Friend FJ. Google Scholar: potentially good for users of academic information. J Electronic Publishing [serial online]. 2006 Jan;9(1). [cited Sep 2006]. <http://hdl.handle.net/2027/spo.3336451.0009.105>. [Google Scholar]
- Steinbrook R. Searching for the right search: reaching the medical literature. N Engl J Med. 2006 Jan 5;354(1):4–7. [PubMed] [Google Scholar]
- Wleklinski JM. Studying Google Scholar: wall to wall coverage? Online. 2005 May–Jun;29(3):22–6. [Google Scholar]
- Jacso P. As we may search: comparison of major features of the Web of Science, Scopus, and Google Scholar citation-based and citation-enhanced databases. Current Science. 2005 Nov;89(9):1537–47. [Google Scholar]
- Tennant R. Google, the naked emperor. Libr J. 2005 Aug;130(13):29. [Google Scholar]
- Vine R. Google Scholar is a full year late indexing PubMed content. [Web document]. SiteLines, Feb 2005. [cited Aug 2006]. <http://www.workingfaster.com/sitelines/archives/2005_02.html#000282>. [Google Scholar]
- Crawford W. Google: a company, not a religion. EContent. 2005 Nov;28(11):42. [Google Scholar]
- Jacso P. Side-by-side2, native search engines vs. Google Scholar. [Web document]. University of Hawai'i, 2005. [cited Aug 2006]. <http://www2.hawaii.edu/∼jacso/scholarly/side-by-side2.htm>. [Google Scholar]
- Henderson J. Google Scholar: a source for clinicians? CMAJ. 2005 Jun 7;172(12):1549–50. [PMC free article] [PubMed] [Google Scholar]
- Giustini D. How Google is changing medicine. BMJ. 2005 Dec 24;331(7531):1487–8. [PMC free article] [PubMed] [Google Scholar]
- Turner AM, Liddy ED, Bradley J, and Wheatley JA. Modeling public health interventions for improved access to the gray literature. J Med Libr Assoc. 2005 Oct;93(4):487–94. [PMC free article] [PubMed] [Google Scholar]
- National Library of Medicine. List of journals indexed for MEDLINE and list of serials indexed for online users terms and conditions. [Web document]. Bethesda, MD: The Library, 2007. [cited 29 Mar 2007]. <http://www.nlm.nih.gov/tsd/serials/terms_cond.html>. [Google Scholar]
Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2000776/