• Open access
  • Published: 14 August 2018

Defining the process to literature searching in systematic reviews: a literature review of guidance and supporting studies

  • Chris Cooper   ORCID: orcid.org/0000-0003-0864-5607 1 ,
  • Andrew Booth 2 ,
  • Jo Varley-Campbell 1 ,
  • Nicky Britten 3 &
  • Ruth Garside 4  

BMC Medical Research Methodology volume  18 , Article number:  85 ( 2018 ) Cite this article

194k Accesses

190 Citations

122 Altmetric

Metrics details

Systematic literature searching is recognised as a critical component of the systematic review process. It involves a systematic search for studies and aims for a transparent report of study identification, leaving readers clear about what was done to identify studies, and how the findings of the review are situated in the relevant evidence.

Information specialists and review teams appear to work from a shared and tacit model of the literature search process. How this tacit model has developed and evolved is unclear, and it has not been explicitly examined before.

The purpose of this review is to determine if a shared model of the literature searching process can be detected across systematic review guidance documents and, if so, how this process is reported in the guidance and supported by published studies.

A literature review.

Two types of literature were reviewed: guidance and published studies. Nine guidance documents were identified, including: The Cochrane and Campbell Handbooks. Published studies were identified through ‘pearl growing’, citation chasing, a search of PubMed using the systematic review methods filter, and the authors’ topic knowledge.

The relevant sections within each guidance document were then read and re-read, with the aim of determining key methodological stages. Methodological stages were identified and defined. This data was reviewed to identify agreements and areas of unique guidance between guidance documents. Consensus across multiple guidance documents was used to inform selection of ‘key stages’ in the process of literature searching.

Eight key stages were determined relating specifically to literature searching in systematic reviews. They were: who should literature search, aims and purpose of literature searching, preparation, the search strategy, searching databases, supplementary searching, managing references and reporting the search process.

Conclusions

Eight key stages to the process of literature searching in systematic reviews were identified. These key stages are consistently reported in the nine guidance documents, suggesting consensus on the key stages of literature searching, and therefore the process of literature searching as a whole, in systematic reviews. Further research to determine the suitability of using the same process of literature searching for all types of systematic review is indicated.

Peer Review reports

Systematic literature searching is recognised as a critical component of the systematic review process. It involves a systematic search for studies and aims for a transparent report of study identification, leaving review stakeholders clear about what was done to identify studies, and how the findings of the review are situated in the relevant evidence.

Information specialists and review teams appear to work from a shared and tacit model of the literature search process. How this tacit model has developed and evolved is unclear, and it has not been explicitly examined before. This is in contrast to the information science literature, which has developed information processing models as an explicit basis for dialogue and empirical testing. Without an explicit model, research in the process of systematic literature searching will remain immature and potentially uneven, and the development of shared information models will be assumed but never articulated.

One way of developing such a conceptual model is by formally examining the implicit “programme theory” as embodied in key methodological texts. The aim of this review is therefore to determine if a shared model of the literature searching process in systematic reviews can be detected across guidance documents and, if so, how this process is reported and supported.

Identifying guidance

Key texts (henceforth referred to as “guidance”) were identified based upon their accessibility to, and prominence within, United Kingdom systematic reviewing practice. The United Kingdom occupies a prominent position in the science of health information retrieval, as quantified by such objective measures as the authorship of papers, the number of Cochrane groups based in the UK, membership and leadership of groups such as the Cochrane Information Retrieval Methods Group, the HTA-I Information Specialists’ Group and historic association with such centres as the UK Cochrane Centre, the NHS Centre for Reviews and Dissemination, the Centre for Evidence Based Medicine and the National Institute for Clinical Excellence (NICE). Coupled with the linguistic dominance of English within medical and health science and the science of systematic reviews more generally, this offers a justification for a purposive sample that favours UK, European and Australian guidance documents.

Nine guidance documents were identified. These documents provide guidance for different types of reviews, namely: reviews of interventions, reviews of health technologies, reviews of qualitative research studies, reviews of social science topics, and reviews to inform guidance.

Whilst these guidance documents occasionally offer additional guidance on other types of systematic reviews, we have focused on the core and stated aims of these documents as they relate to literature searching. Table  1 sets out: the guidance document, the version audited, their core stated focus, and a bibliographical pointer to the main guidance relating to literature searching.

Once a list of key guidance documents was determined, it was checked by six senior information professionals based in the UK for relevance to current literature searching in systematic reviews.

Identifying supporting studies

In addition to identifying guidance, the authors sought to populate an evidence base of supporting studies (henceforth referred to as “studies”) that contribute to existing search practice. Studies were first identified by the authors from their knowledge on this topic area and, subsequently, through systematic citation chasing key studies (‘pearls’ [ 1 ]) located within each key stage of the search process. These studies are identified in Additional file  1 : Appendix Table 1. Citation chasing was conducted by analysing the bibliography of references for each study (backwards citation chasing) and through Google Scholar (forward citation chasing). A search of PubMed using the systematic review methods filter was undertaken in August 2017 (see Additional file 1 ). The search terms used were: (literature search*[Title/Abstract]) AND sysrev_methods[sb] and 586 results were returned. These results were sifted for relevance to the key stages in Fig.  1 by CC.

figure 1

The key stages of literature search guidance as identified from nine key texts

Extracting the data

To reveal the implicit process of literature searching within each guidance document, the relevant sections (chapters) on literature searching were read and re-read, with the aim of determining key methodological stages. We defined a key methodological stage as a distinct step in the overall process for which specific guidance is reported, and action is taken, that collectively would result in a completed literature search.

The chapter or section sub-heading for each methodological stage was extracted into a table using the exact language as reported in each guidance document. The lead author (CC) then read and re-read these data, and the paragraphs of the document to which the headings referred, summarising section details. This table was then reviewed, using comparison and contrast to identify agreements and areas of unique guidance. Consensus across multiple guidelines was used to inform selection of ‘key stages’ in the process of literature searching.

Having determined the key stages to literature searching, we then read and re-read the sections relating to literature searching again, extracting specific detail relating to the methodological process of literature searching within each key stage. Again, the guidance was then read and re-read, first on a document-by-document-basis and, secondly, across all the documents above, to identify both commonalities and areas of unique guidance.

Results and discussion

Our findings.

We were able to identify consensus across the guidance on literature searching for systematic reviews suggesting a shared implicit model within the information retrieval community. Whilst the structure of the guidance varies between documents, the same key stages are reported, even where the core focus of each document is different. We were able to identify specific areas of unique guidance, where a document reported guidance not summarised in other documents, together with areas of consensus across guidance.

Unique guidance

Only one document provided guidance on the topic of when to stop searching [ 2 ]. This guidance from 2005 anticipates a topic of increasing importance with the current interest in time-limited (i.e. “rapid”) reviews. Quality assurance (or peer review) of literature searches was only covered in two guidance documents [ 3 , 4 ]. This topic has emerged as increasingly important as indicated by the development of the PRESS instrument [ 5 ]. Text mining was discussed in four guidance documents [ 4 , 6 , 7 , 8 ] where the automation of some manual review work may offer efficiencies in literature searching [ 8 ].

Agreement between guidance: Defining the key stages of literature searching

Where there was agreement on the process, we determined that this constituted a key stage in the process of literature searching to inform systematic reviews.

From the guidance, we determined eight key stages that relate specifically to literature searching in systematic reviews. These are summarised at Fig. 1 . The data extraction table to inform Fig. 1 is reported in Table  2 . Table 2 reports the areas of common agreement and it demonstrates that the language used to describe key stages and processes varies significantly between guidance documents.

For each key stage, we set out the specific guidance, followed by discussion on how this guidance is situated within the wider literature.

Key stage one: Deciding who should undertake the literature search

The guidance.

Eight documents provided guidance on who should undertake literature searching in systematic reviews [ 2 , 4 , 6 , 7 , 8 , 9 , 10 , 11 ]. The guidance affirms that people with relevant expertise of literature searching should ‘ideally’ be included within the review team [ 6 ]. Information specialists (or information scientists), librarians or trial search co-ordinators (TSCs) are indicated as appropriate researchers in six guidance documents [ 2 , 7 , 8 , 9 , 10 , 11 ].

How the guidance corresponds to the published studies

The guidance is consistent with studies that call for the involvement of information specialists and librarians in systematic reviews [ 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 ] and which demonstrate how their training as ‘expert searchers’ and ‘analysers and organisers of data’ can be put to good use [ 13 ] in a variety of roles [ 12 , 16 , 20 , 21 , 24 , 25 , 26 ]. These arguments make sense in the context of the aims and purposes of literature searching in systematic reviews, explored below. The need for ‘thorough’ and ‘replicable’ literature searches was fundamental to the guidance and recurs in key stage two. Studies have found poor reporting, and a lack of replicable literature searches, to be a weakness in systematic reviews [ 17 , 18 , 27 , 28 ] and they argue that involvement of information specialists/ librarians would be associated with better reporting and better quality literature searching. Indeed, Meert et al. [ 29 ] demonstrated that involving a librarian as a co-author to a systematic review correlated with a higher score in the literature searching component of a systematic review [ 29 ]. As ‘new styles’ of rapid and scoping reviews emerge, where decisions on how to search are more iterative and creative, a clear role is made here too [ 30 ].

Knowing where to search for studies was noted as important in the guidance, with no agreement as to the appropriate number of databases to be searched [ 2 , 6 ]. Database (and resource selection more broadly) is acknowledged as a relevant key skill of information specialists and librarians [ 12 , 15 , 16 , 31 ].

Whilst arguments for including information specialists and librarians in the process of systematic review might be considered self-evident, Koffel and Rethlefsen [ 31 ] have questioned if the necessary involvement is actually happening [ 31 ].

Key stage two: Determining the aim and purpose of a literature search

The aim: Five of the nine guidance documents use adjectives such as ‘thorough’, ‘comprehensive’, ‘transparent’ and ‘reproducible’ to define the aim of literature searching [ 6 , 7 , 8 , 9 , 10 ]. Analogous phrases were present in a further three guidance documents, namely: ‘to identify the best available evidence’ [ 4 ] or ‘the aim of the literature search is not to retrieve everything. It is to retrieve everything of relevance’ [ 2 ] or ‘A systematic literature search aims to identify all publications relevant to the particular research question’ [ 3 ]. The Joanna Briggs Institute reviewers’ manual was the only guidance document where a clear statement on the aim of literature searching could not be identified. The purpose of literature searching was defined in three guidance documents, namely to minimise bias in the resultant review [ 6 , 8 , 10 ]. Accordingly, eight of nine documents clearly asserted that thorough and comprehensive literature searches are required as a potential mechanism for minimising bias.

The need for thorough and comprehensive literature searches appears as uniform within the eight guidance documents that describe approaches to literature searching in systematic reviews of effectiveness. Reviews of effectiveness (of intervention or cost), accuracy and prognosis, require thorough and comprehensive literature searches to transparently produce a reliable estimate of intervention effect. The belief that all relevant studies have been ‘comprehensively’ identified, and that this process has been ‘transparently’ reported, increases confidence in the estimate of effect and the conclusions that can be drawn [ 32 ]. The supporting literature exploring the need for comprehensive literature searches focuses almost exclusively on reviews of intervention effectiveness and meta-analysis. Different ‘styles’ of review may have different standards however; the alternative, offered by purposive sampling, has been suggested in the specific context of qualitative evidence syntheses [ 33 ].

What is a comprehensive literature search?

Whilst the guidance calls for thorough and comprehensive literature searches, it lacks clarity on what constitutes a thorough and comprehensive literature search, beyond the implication that all of the literature search methods in Table 2 should be used to identify studies. Egger et al. [ 34 ], in an empirical study evaluating the importance of comprehensive literature searches for trials in systematic reviews, defined a comprehensive search for trials as:

a search not restricted to English language;

where Cochrane CENTRAL or at least two other electronic databases had been searched (such as MEDLINE or EMBASE); and

at least one of the following search methods has been used to identify unpublished trials: searches for (I) conference abstracts, (ii) theses, (iii) trials registers; and (iv) contacts with experts in the field [ 34 ].

Tricco et al. (2008) used a similar threshold of bibliographic database searching AND a supplementary search method in a review when examining the risk of bias in systematic reviews. Their criteria were: one database (limited using the Cochrane Highly Sensitive Search Strategy (HSSS)) and handsearching [ 35 ].

Together with the guidance, this would suggest that comprehensive literature searching requires the use of BOTH bibliographic database searching AND supplementary search methods.

Comprehensiveness in literature searching, in the sense of how much searching should be undertaken, remains unclear. Egger et al. recommend that ‘investigators should consider the type of literature search and degree of comprehension that is appropriate for the review in question, taking into account budget and time constraints’ [ 34 ]. This view tallies with the Cochrane Handbook, which stipulates clearly, that study identification should be undertaken ‘within resource limits’ [ 9 ]. This would suggest that the limitations to comprehension are recognised but it raises questions on how this is decided and reported [ 36 ].

What is the point of comprehensive literature searching?

The purpose of thorough and comprehensive literature searches is to avoid missing key studies and to minimize bias [ 6 , 8 , 10 , 34 , 37 , 38 , 39 ] since a systematic review based only on published (or easily accessible) studies may have an exaggerated effect size [ 35 ]. Felson (1992) sets out potential biases that could affect the estimate of effect in a meta-analysis [ 40 ] and Tricco et al. summarize the evidence concerning bias and confounding in systematic reviews [ 35 ]. Egger et al. point to non-publication of studies, publication bias, language bias and MEDLINE bias, as key biases [ 34 , 35 , 40 , 41 , 42 , 43 , 44 , 45 , 46 ]. Comprehensive searches are not the sole factor to mitigate these biases but their contribution is thought to be significant [ 2 , 32 , 34 ]. Fehrmann (2011) suggests that ‘the search process being described in detail’ and that, where standard comprehensive search techniques have been applied, increases confidence in the search results [ 32 ].

Does comprehensive literature searching work?

Egger et al., and other study authors, have demonstrated a change in the estimate of intervention effectiveness where relevant studies were excluded from meta-analysis [ 34 , 47 ]. This would suggest that missing studies in literature searching alters the reliability of effectiveness estimates. This is an argument for comprehensive literature searching. Conversely, Egger et al. found that ‘comprehensive’ searches still missed studies and that comprehensive searches could, in fact, introduce bias into a review rather than preventing it, through the identification of low quality studies then being included in the meta-analysis [ 34 ]. Studies query if identifying and including low quality or grey literature studies changes the estimate of effect [ 43 , 48 ] and question if time is better invested updating systematic reviews rather than searching for unpublished studies [ 49 ], or mapping studies for review as opposed to aiming for high sensitivity in literature searching [ 50 ].

Aim and purpose beyond reviews of effectiveness

The need for comprehensive literature searches is less certain in reviews of qualitative studies, and for reviews where a comprehensive identification of studies is difficult to achieve (for example, in Public health) [ 33 , 51 , 52 , 53 , 54 , 55 ]. Literature searching for qualitative studies, and in public health topics, typically generates a greater number of studies to sift than in reviews of effectiveness [ 39 ] and demonstrating the ‘value’ of studies identified or missed is harder [ 56 ], since the study data do not typically support meta-analysis. Nussbaumer-Streit et al. (2016) have registered a review protocol to assess whether abbreviated literature searches (as opposed to comprehensive literature searches) has an impact on conclusions across multiple bodies of evidence, not only on effect estimates [ 57 ] which may develop this understanding. It may be that decision makers and users of systematic reviews are willing to trade the certainty from a comprehensive literature search and systematic review in exchange for different approaches to evidence synthesis [ 58 ], and that comprehensive literature searches are not necessarily a marker of literature search quality, as previously thought [ 36 ]. Different approaches to literature searching [ 37 , 38 , 59 , 60 , 61 , 62 ] and developing the concept of when to stop searching are important areas for further study [ 36 , 59 ].

The study by Nussbaumer-Streit et al. has been published since the submission of this literature review [ 63 ]. Nussbaumer-Streit et al. (2018) conclude that abbreviated literature searches are viable options for rapid evidence syntheses, if decision-makers are willing to trade the certainty from a comprehensive literature search and systematic review, but that decision-making which demands detailed scrutiny should still be based on comprehensive literature searches [ 63 ].

Key stage three: Preparing for the literature search

Six documents provided guidance on preparing for a literature search [ 2 , 3 , 6 , 7 , 9 , 10 ]. The Cochrane Handbook clearly stated that Cochrane authors (i.e. researchers) should seek advice from a trial search co-ordinator (i.e. a person with specific skills in literature searching) ‘before’ starting a literature search [ 9 ].

Two key tasks were perceptible in preparing for a literature searching [ 2 , 6 , 7 , 10 , 11 ]. First, to determine if there are any existing or on-going reviews, or if a new review is justified [ 6 , 11 ]; and, secondly, to develop an initial literature search strategy to estimate the volume of relevant literature (and quality of a small sample of relevant studies [ 10 ]) and indicate the resources required for literature searching and the review of the studies that follows [ 7 , 10 ].

Three documents summarised guidance on where to search to determine if a new review was justified [ 2 , 6 , 11 ]. These focused on searching databases of systematic reviews (The Cochrane Database of Systematic Reviews (CDSR) and the Database of Abstracts of Reviews of Effects (DARE)), institutional registries (including PROSPERO), and MEDLINE [ 6 , 11 ]. It is worth noting, however, that as of 2015, DARE (and NHS EEDs) are no longer being updated and so the relevance of this (these) resource(s) will diminish over-time [ 64 ]. One guidance document, ‘Systematic reviews in the Social Sciences’, noted, however, that databases are not the only source of information and unpublished reports, conference proceeding and grey literature may also be required, depending on the nature of the review question [ 2 ].

Two documents reported clearly that this preparation (or ‘scoping’) exercise should be undertaken before the actual search strategy is developed [ 7 , 10 ]).

The guidance offers the best available source on preparing the literature search with the published studies not typically reporting how their scoping informed the development of their search strategies nor how their search approaches were developed. Text mining has been proposed as a technique to develop search strategies in the scoping stages of a review although this work is still exploratory [ 65 ]. ‘Clustering documents’ and word frequency analysis have also been tested to identify search terms and studies for review [ 66 , 67 ]. Preparing for literature searches and scoping constitutes an area for future research.

Key stage four: Designing the search strategy

The Population, Intervention, Comparator, Outcome (PICO) structure was the commonly reported structure promoted to design a literature search strategy. Five documents suggested that the eligibility criteria or review question will determine which concepts of PICO will be populated to develop the search strategy [ 1 , 4 , 7 , 8 , 9 ]. The NICE handbook promoted multiple structures, namely PICO, SPICE (Setting, Perspective, Intervention, Comparison, Evaluation) and multi-stranded approaches [ 4 ].

With the exclusion of The Joanna Briggs Institute reviewers’ manual, the guidance offered detail on selecting key search terms, synonyms, Boolean language, selecting database indexing terms and combining search terms. The CEE handbook suggested that ‘search terms may be compiled with the help of the commissioning organisation and stakeholders’ [ 10 ].

The use of limits, such as language or date limits, were discussed in all documents [ 2 , 3 , 4 , 6 , 7 , 8 , 9 , 10 , 11 ].

Search strategy structure

The guidance typically relates to reviews of intervention effectiveness so PICO – with its focus on intervention and comparator - is the dominant model used to structure literature search strategies [ 68 ]. PICOs – where the S denotes study design - is also commonly used in effectiveness reviews [ 6 , 68 ]. As the NICE handbook notes, alternative models to structure literature search strategies have been developed and tested. Booth provides an overview on formulating questions for evidence based practice [ 69 ] and has developed a number of alternatives to the PICO structure, namely: BeHEMoTh (Behaviour of interest; Health context; Exclusions; Models or Theories) for use when systematically identifying theory [ 55 ]; SPICE (Setting, Perspective, Intervention, Comparison, Evaluation) for identification of social science and evaluation studies [ 69 ] and, working with Cooke and colleagues, SPIDER (Sample, Phenomenon of Interest, Design, Evaluation, Research type) [ 70 ]. SPIDER has been compared to PICO and PICOs in a study by Methley et al. [ 68 ].

The NICE handbook also suggests the use of multi-stranded approaches to developing literature search strategies [ 4 ]. Glanville developed this idea in a study by Whitting et al. [ 71 ] and a worked example of this approach is included in the development of a search filter by Cooper et al. [ 72 ].

Writing search strategies: Conceptual and objective approaches

Hausner et al. [ 73 ] provide guidance on writing literature search strategies, delineating between conceptually and objectively derived approaches. The conceptual approach, advocated by and explained in the guidance documents, relies on the expertise of the literature searcher to identify key search terms and then develop key terms to include synonyms and controlled syntax. Hausner and colleagues set out the objective approach [ 73 ] and describe what may be done to validate it [ 74 ].

The use of limits

The guidance documents offer direction on the use of limits within a literature search. Limits can be used to focus literature searching to specific study designs or by other markers (such as by date) which limits the number of studies returned by a literature search. The use of limits should be described and the implications explored [ 34 ] since limiting literature searching can introduce bias (explored above). Craven et al. have suggested the use of a supporting narrative to explain decisions made in the process of developing literature searches and this advice would usefully capture decisions on the use of search limits [ 75 ].

Key stage five: Determining the process of literature searching and deciding where to search (bibliographic database searching)

Table 2 summarises the process of literature searching as reported in each guidance document. Searching bibliographic databases was consistently reported as the ‘first step’ to literature searching in all nine guidance documents.

Three documents reported specific guidance on where to search, in each case specific to the type of review their guidance informed, and as a minimum requirement [ 4 , 9 , 11 ]. Seven of the key guidance documents suggest that the selection of bibliographic databases depends on the topic of review [ 2 , 3 , 4 , 6 , 7 , 8 , 10 ], with two documents noting the absence of an agreed standard on what constitutes an acceptable number of databases searched [ 2 , 6 ].

The guidance documents summarise ‘how to’ search bibliographic databases in detail and this guidance is further contextualised above in terms of developing the search strategy. The documents provide guidance of selecting bibliographic databases, in some cases stating acceptable minima (i.e. The Cochrane Handbook states Cochrane CENTRAL, MEDLINE and EMBASE), and in other cases simply listing bibliographic database available to search. Studies have explored the value in searching specific bibliographic databases, with Wright et al. (2015) noting the contribution of CINAHL in identifying qualitative studies [ 76 ], Beckles et al. (2013) questioning the contribution of CINAHL to identifying clinical studies for guideline development [ 77 ], and Cooper et al. (2015) exploring the role of UK-focused bibliographic databases to identify UK-relevant studies [ 78 ]. The host of the database (e.g. OVID or ProQuest) has been shown to alter the search returns offered. Younger and Boddy [ 79 ] report differing search returns from the same database (AMED) but where the ‘host’ was different [ 79 ].

The average number of bibliographic database searched in systematic reviews has risen in the period 1994–2014 (from 1 to 4) [ 80 ] but there remains (as attested to by the guidance) no consensus on what constitutes an acceptable number of databases searched [ 48 ]. This is perhaps because thinking about the number of databases searched is the wrong question, researchers should be focused on which databases were searched and why, and which databases were not searched and why. The discussion should re-orientate to the differential value of sources but researchers need to think about how to report this in studies to allow findings to be generalised. Bethel (2017) has proposed ‘search summaries’, completed by the literature searcher, to record where included studies were identified, whether from database (and which databases specifically) or supplementary search methods [ 81 ]. Search summaries document both yield and accuracy of searches, which could prospectively inform resource use and decisions to search or not to search specific databases in topic areas. The prospective use of such data presupposes, however, that past searches are a potential predictor of future search performance (i.e. that each topic is to be considered representative and not unique). In offering a body of practice, this data would be of greater practicable use than current studies which are considered as little more than individual case studies [ 82 , 83 , 84 , 85 , 86 , 87 , 88 , 89 , 90 ].

When to database search is another question posed in the literature. Beyer et al. [ 91 ] report that databases can be prioritised for literature searching which, whilst not addressing the question of which databases to search, may at least bring clarity as to which databases to search first [ 91 ]. Paradoxically, this links to studies that suggest PubMed should be searched in addition to MEDLINE (OVID interface) since this improves the currency of systematic reviews [ 92 , 93 ]. Cooper et al. (2017) have tested the idea of database searching not as a primary search method (as suggested in the guidance) but as a supplementary search method in order to manage the volume of studies identified for an environmental effectiveness systematic review. Their case study compared the effectiveness of database searching versus a protocol using supplementary search methods and found that the latter identified more relevant studies for review than searching bibliographic databases [ 94 ].

Key stage six: Determining the process of literature searching and deciding where to search (supplementary search methods)

Table 2 also summaries the process of literature searching which follows bibliographic database searching. As Table 2 sets out, guidance that supplementary literature search methods should be used in systematic reviews recurs across documents, but the order in which these methods are used, and the extent to which they are used, varies. We noted inconsistency in the labelling of supplementary search methods between guidance documents.

Rather than focus on the guidance on how to use the methods (which has been summarised in a recent review [ 95 ]), we focus on the aim or purpose of supplementary search methods.

The Cochrane Handbook reported that ‘efforts’ to identify unpublished studies should be made [ 9 ]. Four guidance documents [ 2 , 3 , 6 , 9 ] acknowledged that searching beyond bibliographic databases was necessary since ‘databases are not the only source of literature’ [ 2 ]. Only one document reported any guidance on determining when to use supplementary methods. The IQWiG handbook reported that the use of handsearching (in their example) could be determined on a ‘case-by-case basis’ which implies that the use of these methods is optional rather than mandatory. This is in contrast to the guidance (above) on bibliographic database searching.

The issue for supplementary search methods is similar in many ways to the issue of searching bibliographic databases: demonstrating value. The purpose and contribution of supplementary search methods in systematic reviews is increasingly acknowledged [ 37 , 61 , 62 , 96 , 97 , 98 , 99 , 100 , 101 ] but understanding the value of the search methods to identify studies and data is unclear. In a recently published review, Cooper et al. (2017) reviewed the literature on supplementary search methods looking to determine the advantages, disadvantages and resource implications of using supplementary search methods [ 95 ]. This review also summarises the key guidance and empirical studies and seeks to address the question on when to use these search methods and when not to [ 95 ]. The guidance is limited in this regard and, as Table 2 demonstrates, offers conflicting advice on the order of searching, and the extent to which these search methods should be used in systematic reviews.

Key stage seven: Managing the references

Five of the documents provided guidance on managing references, for example downloading, de-duplicating and managing the output of literature searches [ 2 , 4 , 6 , 8 , 10 ]. This guidance typically itemised available bibliographic management tools rather than offering guidance on how to use them specifically [ 2 , 4 , 6 , 8 ]. The CEE handbook provided guidance on importing data where no direct export option is available (e.g. web-searching) [ 10 ].

The literature on using bibliographic management tools is not large relative to the number of ‘how to’ videos on platforms such as YouTube (see for example [ 102 ]). These YouTube videos confirm the overall lack of ‘how to’ guidance identified in this study and offer useful instruction on managing references. Bramer et al. set out methods for de-duplicating data and reviewing references in Endnote [ 103 , 104 ] and Gall tests the direct search function within Endnote to access databases such as PubMed, finding a number of limitations [ 105 ]. Coar et al. and Ahmed et al. consider the role of the free-source tool, Zotero [ 106 , 107 ]. Managing references is a key administrative function in the process of review particularly for documenting searches in PRISMA guidance.

Key stage eight: Documenting the search

The Cochrane Handbook was the only guidance document to recommend a specific reporting guideline: Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [ 9 ]. Six documents provided guidance on reporting the process of literature searching with specific criteria to report [ 3 , 4 , 6 , 8 , 9 , 10 ]. There was consensus on reporting: the databases searched (and the host searched by), the search strategies used, and any use of limits (e.g. date, language, search filters (The CRD handbook called for these limits to be justified [ 6 ])). Three guidance documents reported that the number of studies identified should be recorded [ 3 , 6 , 10 ]. The number of duplicates identified [ 10 ], the screening decisions [ 3 ], a comprehensive list of grey literature sources searched (and full detail for other supplementary search methods) [ 8 ], and an annotation of search terms tested but not used [ 4 ] were identified as unique items in four documents.

The Cochrane Handbook was the only guidance document to note that the full search strategies for each database should be included in the Additional file 1 of the review [ 9 ].

All guidance documents should ultimately deliver completed systematic reviews that fulfil the requirements of the PRISMA reporting guidelines [ 108 ]. The guidance broadly requires the reporting of data that corresponds with the requirements of the PRISMA statement although documents typically ask for diverse and additional items [ 108 ]. In 2008, Sampson et al. observed a lack of consensus on reporting search methods in systematic reviews [ 109 ] and this remains the case as of 2017, as evidenced in the guidance documents, and in spite of the publication of the PRISMA guidelines in 2009 [ 110 ]. It is unclear why the collective guidance does not more explicitly endorse adherence to the PRISMA guidance.

Reporting of literature searching is a key area in systematic reviews since it sets out clearly what was done and how the conclusions of the review can be believed [ 52 , 109 ]. Despite strong endorsement in the guidance documents, specifically supported in PRISMA guidance, and other related reporting standards too (such as ENTREQ for qualitative evidence synthesis, STROBE for reviews of observational studies), authors still highlight the prevalence of poor standards of literature search reporting [ 31 , 110 , 111 , 112 , 113 , 114 , 115 , 116 , 117 , 118 , 119 ]. To explore issues experienced by authors in reporting literature searches, and look at uptake of PRISMA, Radar et al. [ 120 ] surveyed over 260 review authors to determine common problems and their work summaries the practical aspects of reporting literature searching [ 120 ]. Atkinson et al. [ 121 ] have also analysed reporting standards for literature searching, summarising recommendations and gaps for reporting search strategies [ 121 ].

One area that is less well covered by the guidance, but nevertheless appears in this literature, is the quality appraisal or peer review of literature search strategies. The PRESS checklist is the most prominent and it aims to develop evidence-based guidelines to peer review of electronic search strategies [ 5 , 122 , 123 ]. A corresponding guideline for documentation of supplementary search methods does not yet exist although this idea is currently being explored.

How the reporting of the literature searching process corresponds to critical appraisal tools is an area for further research. In the survey undertaken by Radar et al. (2014), 86% of survey respondents (153/178) identified a need for further guidance on what aspects of the literature search process to report [ 120 ]. The PRISMA statement offers a brief summary of what to report but little practical guidance on how to report it [ 108 ]. Critical appraisal tools for systematic reviews, such as AMSTAR 2 (Shea et al. [ 124 ]) and ROBIS (Whiting et al. [ 125 ]), can usefully be read alongside PRISMA guidance, since they offer greater detail on how the reporting of the literature search will be appraised and, therefore, they offer a proxy on what to report [ 124 , 125 ]. Further research in the form of a study which undertakes a comparison between PRISMA and quality appraisal checklists for systematic reviews would seem to begin addressing the call, identified by Radar et al., for further guidance on what to report [ 120 ].

Limitations

Other handbooks exist.

A potential limitation of this literature review is the focus on guidance produced in Europe (the UK specifically) and Australia. We justify the decision for our selection of the nine guidance documents reviewed in this literature review in section “ Identifying guidance ”. In brief, these nine guidance documents were selected as the most relevant health care guidance that inform UK systematic reviewing practice, given that the UK occupies a prominent position in the science of health information retrieval. We acknowledge the existence of other guidance documents, such as those from North America (e.g. the Agency for Healthcare Research and Quality (AHRQ) [ 126 ], The Institute of Medicine [ 127 ] and the guidance and resources produced by the Canadian Agency for Drugs and Technologies in Health (CADTH) [ 128 ]). We comment further on this directly below.

The handbooks are potentially linked to one another

What is not clear is the extent to which the guidance documents inter-relate or provide guidance uniquely. The Cochrane Handbook, first published in 1994, is notably a key source of reference in guidance and systematic reviews beyond Cochrane reviews. It is not clear to what extent broadening the sample of guidance handbooks to include North American handbooks, and guidance handbooks from other relevant countries too, would alter the findings of this literature review or develop further support for the process model. Since we cannot be clear, we raise this as a potential limitation of this literature review. On our initial review of a sample of North American, and other, guidance documents (before selecting the guidance documents considered in this review), however, we do not consider that the inclusion of these further handbooks would alter significantly the findings of this literature review.

This is a literature review

A further limitation of this review was that the review of published studies is not a systematic review of the evidence for each key stage. It is possible that other relevant studies could help contribute to the exploration and development of the key stages identified in this review.

This literature review would appear to demonstrate the existence of a shared model of the literature searching process in systematic reviews. We call this model ‘the conventional approach’, since it appears to be common convention in nine different guidance documents.

The findings reported above reveal eight key stages in the process of literature searching for systematic reviews. These key stages are consistently reported in the nine guidance documents which suggests consensus on the key stages of literature searching, and therefore the process of literature searching as a whole, in systematic reviews.

In Table 2 , we demonstrate consensus regarding the application of literature search methods. All guidance documents distinguish between primary and supplementary search methods. Bibliographic database searching is consistently the first method of literature searching referenced in each guidance document. Whilst the guidance uniformly supports the use of supplementary search methods, there is little evidence for a consistent process with diverse guidance across documents. This may reflect differences in the core focus across each document, linked to differences in identifying effectiveness studies or qualitative studies, for instance.

Eight of the nine guidance documents reported on the aims of literature searching. The shared understanding was that literature searching should be thorough and comprehensive in its aim and that this process should be reported transparently so that that it could be reproduced. Whilst only three documents explicitly link this understanding to minimising bias, it is clear that comprehensive literature searching is implicitly linked to ‘not missing relevant studies’ which is approximately the same point.

Defining the key stages in this review helps categorise the scholarship available, and it prioritises areas for development or further study. The supporting studies on preparing for literature searching (key stage three, ‘preparation’) were, for example, comparatively few, and yet this key stage represents a decisive moment in literature searching for systematic reviews. It is where search strategy structure is determined, search terms are chosen or discarded, and the resources to be searched are selected. Information specialists, librarians and researchers, are well placed to develop these and other areas within the key stages we identify.

This review calls for further research to determine the suitability of using the conventional approach. The publication dates of the guidance documents which underpin the conventional approach may raise questions as to whether the process which they each report remains valid for current systematic literature searching. In addition, it may be useful to test whether it is desirable to use the same process model of literature searching for qualitative evidence synthesis as that for reviews of intervention effectiveness, which this literature review demonstrates is presently recommended best practice.

Abbreviations

Behaviour of interest; Health context; Exclusions; Models or Theories

Cochrane Database of Systematic Reviews

The Cochrane Central Register of Controlled Trials

Database of Abstracts of Reviews of Effects

Enhancing transparency in reporting the synthesis of qualitative research

Institute for Quality and Efficiency in Healthcare

National Institute for Clinical Excellence

Population, Intervention, Comparator, Outcome

Preferred Reporting Items for Systematic Reviews and Meta-Analyses

Setting, Perspective, Intervention, Comparison, Evaluation

Sample, Phenomenon of Interest, Design, Evaluation, Research type

STrengthening the Reporting of OBservational studies in Epidemiology

Trial Search Co-ordinators

Booth A. Unpacking your literature search toolbox: on search styles and tactics. Health Information & Libraries Journal. 2008;25(4):313–7.

Article   Google Scholar  

Petticrew M, Roberts H. Systematic reviews in the social sciences: a practical guide. Oxford: Blackwell Publishing Ltd; 2006.

Book   Google Scholar  

Institute for Quality and Efficiency in Health Care (IQWiG). IQWiG Methods Resources. 7 Information retrieval 2014 [Available from: https://www.ncbi.nlm.nih.gov/books/NBK385787/ .

NICE: National Institute for Health and Care Excellence. Developing NICE guidelines: the manual 2014. Available from: https://www.nice.org.uk/media/default/about/what-we-do/our-programmes/developing-nice-guidelines-the-manual.pdf .

Sampson M. MJ, Lefebvre C, Moher D, Grimshaw J. Peer Review of Electronic Search Strategies: PRESS; 2008.

Google Scholar  

Centre for Reviews & Dissemination. Systematic reviews – CRD’s guidance for undertaking reviews in healthcare. York: Centre for Reviews and Dissemination, University of York; 2009.

eunetha: European Network for Health Technology Assesment Process of information retrieval for systematic reviews and health technology assessments on clinical effectiveness 2016. Available from: http://www.eunethta.eu/sites/default/files/Guideline_Information_Retrieval_V1-1.pdf .

Kugley SWA, Thomas J, Mahood Q, Jørgensen AMK, Hammerstrøm K, Sathe N. Searching for studies: a guide to information retrieval for Campbell systematic reviews. Oslo: Campbell Collaboration. 2017; Available from: https://www.campbellcollaboration.org/library/searching-for-studies-information-retrieval-guide-campbell-reviews.html

Lefebvre C, Manheimer E, Glanville J. Chapter 6: searching for studies. In: JPT H, Green S, editors. Cochrane Handbook for Systematic Reviews of Interventions; 2011.

Collaboration for Environmental Evidence. Guidelines for Systematic Review and Evidence Synthesis in Environmental Management.: Environmental Evidence:; 2013. Available from: http://www.environmentalevidence.org/wp-content/uploads/2017/01/Review-guidelines-version-4.2-final-update.pdf .

The Joanna Briggs Institute. Joanna Briggs institute reviewers’ manual. 2014th ed: the Joanna Briggs institute; 2014. Available from: https://joannabriggs.org/assets/docs/sumari/ReviewersManual-2014.pdf

Beverley CA, Booth A, Bath PA. The role of the information specialist in the systematic review process: a health information case study. Health Inf Libr J. 2003;20(2):65–74.

Article   CAS   Google Scholar  

Harris MR. The librarian's roles in the systematic review process: a case study. Journal of the Medical Library Association. 2005;93(1):81–7.

PubMed   PubMed Central   Google Scholar  

Egger JB. Use of recommended search strategies in systematic reviews and the impact of librarian involvement: a cross-sectional survey of recent authors. PLoS One. 2015;10(5):e0125931.

Li L, Tian J, Tian H, Moher D, Liang F, Jiang T, et al. Network meta-analyses could be improved by searching more sources and by involving a librarian. J Clin Epidemiol. 2014;67(9):1001–7.

Article   PubMed   Google Scholar  

McGowan J, Sampson M. Systematic reviews need systematic searchers. J Med Libr Assoc. 2005;93(1):74–80.

Rethlefsen ML, Farrell AM, Osterhaus Trzasko LC, Brigham TJ. Librarian co-authors correlated with higher quality reported search strategies in general internal medicine systematic reviews. J Clin Epidemiol. 2015;68(6):617–26.

Weller AC. Mounting evidence that librarians are essential for comprehensive literature searches for meta-analyses and Cochrane reports. J Med Libr Assoc. 2004;92(2):163–4.

Swinkels A, Briddon J, Hall J. Two physiotherapists, one librarian and a systematic literature review: collaboration in action. Health Info Libr J. 2006;23(4):248–56.

Foster M. An overview of the role of librarians in systematic reviews: from expert search to project manager. EAHIL. 2015;11(3):3–7.

Lawson L. OPERATING OUTSIDE LIBRARY WALLS 2004.

Vassar M, Yerokhin V, Sinnett PM, Weiher M, Muckelrath H, Carr B, et al. Database selection in systematic reviews: an insight through clinical neurology. Health Inf Libr J. 2017;34(2):156–64.

Townsend WA, Anderson PF, Ginier EC, MacEachern MP, Saylor KM, Shipman BL, et al. A competency framework for librarians involved in systematic reviews. Journal of the Medical Library Association : JMLA. 2017;105(3):268–75.

Cooper ID, Crum JA. New activities and changing roles of health sciences librarians: a systematic review, 1990-2012. Journal of the Medical Library Association : JMLA. 2013;101(4):268–77.

Crum JA, Cooper ID. Emerging roles for biomedical librarians: a survey of current practice, challenges, and changes. Journal of the Medical Library Association : JMLA. 2013;101(4):278–86.

Dudden RF, Protzko SL. The systematic review team: contributions of the health sciences librarian. Med Ref Serv Q. 2011;30(3):301–15.

Golder S, Loke Y, McIntosh HM. Poor reporting and inadequate searches were apparent in systematic reviews of adverse effects. J Clin Epidemiol. 2008;61(5):440–8.

Maggio LA, Tannery NH, Kanter SL. Reproducibility of literature search reporting in medical education reviews. Academic medicine : journal of the Association of American Medical Colleges. 2011;86(8):1049–54.

Meert D, Torabi N, Costella J. Impact of librarians on reporting of the literature searching component of pediatric systematic reviews. Journal of the Medical Library Association : JMLA. 2016;104(4):267–77.

Morris M, Boruff JT, Gore GC. Scoping reviews: establishing the role of the librarian. Journal of the Medical Library Association : JMLA. 2016;104(4):346–54.

Koffel JB, Rethlefsen ML. Reproducibility of search strategies is poor in systematic reviews published in high-impact pediatrics, cardiology and surgery journals: a cross-sectional study. PLoS One. 2016;11(9):e0163309.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Fehrmann P, Thomas J. Comprehensive computer searches and reporting in systematic reviews. Research Synthesis Methods. 2011;2(1):15–32.

Booth A. Searching for qualitative research for inclusion in systematic reviews: a structured methodological review. Systematic Reviews. 2016;5(1):74.

Article   PubMed   PubMed Central   Google Scholar  

Egger M, Juni P, Bartlett C, Holenstein F, Sterne J. How important are comprehensive literature searches and the assessment of trial quality in systematic reviews? Empirical study. Health technology assessment (Winchester, England). 2003;7(1):1–76.

Tricco AC, Tetzlaff J, Sampson M, Fergusson D, Cogo E, Horsley T, et al. Few systematic reviews exist documenting the extent of bias: a systematic review. J Clin Epidemiol. 2008;61(5):422–34.

Booth A. How much searching is enough? Comprehensive versus optimal retrieval for technology assessments. Int J Technol Assess Health Care. 2010;26(4):431–5.

Papaioannou D, Sutton A, Carroll C, Booth A, Wong R. Literature searching for social science systematic reviews: consideration of a range of search techniques. Health Inf Libr J. 2010;27(2):114–22.

Petticrew M. Time to rethink the systematic review catechism? Moving from ‘what works’ to ‘what happens’. Systematic Reviews. 2015;4(1):36.

Betrán AP, Say L, Gülmezoglu AM, Allen T, Hampson L. Effectiveness of different databases in identifying studies for systematic reviews: experience from the WHO systematic review of maternal morbidity and mortality. BMC Med Res Methodol. 2005;5

Felson DT. Bias in meta-analytic research. J Clin Epidemiol. 1992;45(8):885–92.

Article   PubMed   CAS   Google Scholar  

Franco A, Malhotra N, Simonovits G. Publication bias in the social sciences: unlocking the file drawer. Science. 2014;345(6203):1502–5.

Hartling L, Featherstone R, Nuspl M, Shave K, Dryden DM, Vandermeer B. Grey literature in systematic reviews: a cross-sectional study of the contribution of non-English reports, unpublished studies and dissertations to the results of meta-analyses in child-relevant reviews. BMC Med Res Methodol. 2017;17(1):64.

Schmucker CM, Blümle A, Schell LK, Schwarzer G, Oeller P, Cabrera L, et al. Systematic review finds that study data not published in full text articles have unclear impact on meta-analyses results in medical research. PLoS One. 2017;12(4):e0176210.

Egger M, Zellweger-Zahner T, Schneider M, Junker C, Lengeler C, Antes G. Language bias in randomised controlled trials published in English and German. Lancet (London, England). 1997;350(9074):326–9.

Moher D, Pham B, Lawson ML, Klassen TP. The inclusion of reports of randomised trials published in languages other than English in systematic reviews. Health technology assessment (Winchester, England). 2003;7(41):1–90.

Pham B, Klassen TP, Lawson ML, Moher D. Language of publication restrictions in systematic reviews gave different results depending on whether the intervention was conventional or complementary. J Clin Epidemiol. 2005;58(8):769–76.

Mills EJ, Kanters S, Thorlund K, Chaimani A, Veroniki A-A, Ioannidis JPA. The effects of excluding treatments from network meta-analyses: survey. BMJ : British Medical Journal. 2013;347

Hartling L, Featherstone R, Nuspl M, Shave K, Dryden DM, Vandermeer B. The contribution of databases to the results of systematic reviews: a cross-sectional study. BMC Med Res Methodol. 2016;16(1):127.

van Driel ML, De Sutter A, De Maeseneer J, Christiaens T. Searching for unpublished trials in Cochrane reviews may not be worth the effort. J Clin Epidemiol. 2009;62(8):838–44.e3.

Buchberger B, Krabbe L, Lux B, Mattivi JT. Evidence mapping for decision making: feasibility versus accuracy - when to abandon high sensitivity in electronic searches. German medical science : GMS e-journal. 2016;14:Doc09.

Lorenc T, Pearson M, Jamal F, Cooper C, Garside R. The role of systematic reviews of qualitative evidence in evaluating interventions: a case study. Research Synthesis Methods. 2012;3(1):1–10.

Gough D. Weight of evidence: a framework for the appraisal of the quality and relevance of evidence. Res Pap Educ. 2007;22(2):213–28.

Barroso J, Gollop CJ, Sandelowski M, Meynell J, Pearce PF, Collins LJ. The challenges of searching for and retrieving qualitative studies. West J Nurs Res. 2003;25(2):153–78.

Britten N, Garside R, Pope C, Frost J, Cooper C. Asking more of qualitative synthesis: a response to Sally Thorne. Qual Health Res. 2017;27(9):1370–6.

Booth A, Carroll C. Systematic searching for theory to inform systematic reviews: is it feasible? Is it desirable? Health Info Libr J. 2015;32(3):220–35.

Kwon Y, Powelson SE, Wong H, Ghali WA, Conly JM. An assessment of the efficacy of searching in biomedical databases beyond MEDLINE in identifying studies for a systematic review on ward closures as an infection control intervention to control outbreaks. Syst Rev. 2014;3:135.

Nussbaumer-Streit B, Klerings I, Wagner G, Titscher V, Gartlehner G. Assessing the validity of abbreviated literature searches for rapid reviews: protocol of a non-inferiority and meta-epidemiologic study. Systematic Reviews. 2016;5:197.

Wagner G, Nussbaumer-Streit B, Greimel J, Ciapponi A, Gartlehner G. Trading certainty for speed - how much uncertainty are decisionmakers and guideline developers willing to accept when using rapid reviews: an international survey. BMC Med Res Methodol. 2017;17(1):121.

Ogilvie D, Hamilton V, Egan M, Petticrew M. Systematic reviews of health effects of social interventions: 1. Finding the evidence: how far should you go? J Epidemiol Community Health. 2005;59(9):804–8.

Royle P, Milne R. Literature searching for randomized controlled trials used in Cochrane reviews: rapid versus exhaustive searches. Int J Technol Assess Health Care. 2003;19(4):591–603.

Pearson M, Moxham T, Ashton K. Effectiveness of search strategies for qualitative research about barriers and facilitators of program delivery. Eval Health Prof. 2011;34(3):297–308.

Levay P, Raynor M, Tuvey D. The Contributions of MEDLINE, Other Bibliographic Databases and Various Search Techniques to NICE Public Health Guidance. 2015. 2015;10(1):19.

Nussbaumer-Streit B, Klerings I, Wagner G, Heise TL, Dobrescu AI, Armijo-Olivo S, et al. Abbreviated literature searches were viable alternatives to comprehensive searches: a meta-epidemiological study. J Clin Epidemiol. 2018;102:1–11.

Briscoe S, Cooper C, Glanville J, Lefebvre C. The loss of the NHS EED and DARE databases and the effect on evidence synthesis and evaluation. Res Synth Methods. 2017;8(3):256–7.

Stansfield C, O'Mara-Eves A, Thomas J. Text mining for search term development in systematic reviewing: A discussion of some methods and challenges. Research Synthesis Methods.n/a-n/a.

Petrova M, Sutcliffe P, Fulford KW, Dale J. Search terms and a validated brief search filter to retrieve publications on health-related values in Medline: a word frequency analysis study. Journal of the American Medical Informatics Association : JAMIA. 2012;19(3):479–88.

Stansfield C, Thomas J, Kavanagh J. 'Clustering' documents automatically to support scoping reviews of research: a case study. Res Synth Methods. 2013;4(3):230–41.

PubMed   Google Scholar  

Methley AM, Campbell S, Chew-Graham C, McNally R, Cheraghi-Sohi S. PICO, PICOS and SPIDER: a comparison study of specificity and sensitivity in three search tools for qualitative systematic reviews. BMC Health Serv Res. 2014;14:579.

Andrew B. Clear and present questions: formulating questions for evidence based practice. Library Hi Tech. 2006;24(3):355–68.

Cooke A, Smith D, Booth A. Beyond PICO: the SPIDER tool for qualitative evidence synthesis. Qual Health Res. 2012;22(10):1435–43.

Whiting P, Westwood M, Bojke L, Palmer S, Richardson G, Cooper J, et al. Clinical effectiveness and cost-effectiveness of tests for the diagnosis and investigation of urinary tract infection in children: a systematic review and economic model. Health technology assessment (Winchester, England). 2006;10(36):iii-iv, xi-xiii, 1–154.

Cooper C, Levay P, Lorenc T, Craig GM. A population search filter for hard-to-reach populations increased search efficiency for a systematic review. J Clin Epidemiol. 2014;67(5):554–9.

Hausner E, Waffenschmidt S, Kaiser T, Simon M. Routine development of objectively derived search strategies. Systematic Reviews. 2012;1(1):19.

Hausner E, Guddat C, Hermanns T, Lampert U, Waffenschmidt S. Prospective comparison of search strategies for systematic reviews: an objective approach yielded higher sensitivity than a conceptual one. J Clin Epidemiol. 2016;77:118–24.

Craven J, Levay P. Recording database searches for systematic reviews - what is the value of adding a narrative to peer-review checklists? A case study of nice interventional procedures guidance. Evid Based Libr Inf Pract. 2011;6(4):72–87.

Wright K, Golder S, Lewis-Light K. What value is the CINAHL database when searching for systematic reviews of qualitative studies? Syst Rev. 2015;4:104.

Beckles Z, Glover S, Ashe J, Stockton S, Boynton J, Lai R, et al. Searching CINAHL did not add value to clinical questions posed in NICE guidelines. J Clin Epidemiol. 2013;66(9):1051–7.

Cooper C, Rogers M, Bethel A, Briscoe S, Lowe J. A mapping review of the literature on UK-focused health and social care databases. Health Inf Libr J. 2015;32(1):5–22.

Younger P, Boddy K. When is a search not a search? A comparison of searching the AMED complementary health database via EBSCOhost, OVID and DIALOG. Health Inf Libr J. 2009;26(2):126–35.

Lam MT, McDiarmid M. Increasing number of databases searched in systematic reviews and meta-analyses between 1994 and 2014. Journal of the Medical Library Association : JMLA. 2016;104(4):284–9.

Bethel A, editor Search summary tables for systematic reviews: results and findings. HLC Conference 2017a.

Aagaard T, Lund H, Juhl C. Optimizing literature search in systematic reviews - are MEDLINE, EMBASE and CENTRAL enough for identifying effect studies within the area of musculoskeletal disorders? BMC Med Res Methodol. 2016;16(1):161.

Adams CE, Frederick K. An investigation of the adequacy of MEDLINE searches for randomized controlled trials (RCTs) of the effects of mental health care. Psychol Med. 1994;24(3):741–8.

Kelly L, St Pierre-Hansen N. So many databases, such little clarity: searching the literature for the topic aboriginal. Canadian family physician Medecin de famille canadien. 2008;54(11):1572–3.

Lawrence DW. What is lost when searching only one literature database for articles relevant to injury prevention and safety promotion? Injury Prevention. 2008;14(6):401–4.

Lemeshow AR, Blum RE, Berlin JA, Stoto MA, Colditz GA. Searching one or two databases was insufficient for meta-analysis of observational studies. J Clin Epidemiol. 2005;58(9):867–73.

Sampson M, Barrowman NJ, Moher D, Klassen TP, Pham B, Platt R, et al. Should meta-analysts search Embase in addition to Medline? J Clin Epidemiol. 2003;56(10):943–55.

Stevinson C, Lawlor DA. Searching multiple databases for systematic reviews: added value or diminishing returns? Complementary Therapies in Medicine. 2004;12(4):228–32.

Suarez-Almazor ME, Belseck E, Homik J, Dorgan M, Ramos-Remus C. Identifying clinical trials in the medical literature with electronic databases: MEDLINE alone is not enough. Control Clin Trials. 2000;21(5):476–87.

Taylor B, Wylie E, Dempster M, Donnelly M. Systematically retrieving research: a case study evaluating seven databases. Res Soc Work Pract. 2007;17(6):697–706.

Beyer FR, Wright K. Can we prioritise which databases to search? A case study using a systematic review of frozen shoulder management. Health Info Libr J. 2013;30(1):49–58.

Duffy S, de Kock S, Misso K, Noake C, Ross J, Stirk L. Supplementary searches of PubMed to improve currency of MEDLINE and MEDLINE in-process searches via Ovid. Journal of the Medical Library Association : JMLA. 2016;104(4):309–12.

Katchamart W, Faulkner A, Feldman B, Tomlinson G, Bombardier C. PubMed had a higher sensitivity than Ovid-MEDLINE in the search for systematic reviews. J Clin Epidemiol. 2011;64(7):805–7.

Cooper C, Lovell R, Husk K, Booth A, Garside R. Supplementary search methods were more effective and offered better value than bibliographic database searching: a case study from public health and environmental enhancement (in Press). Research Synthesis Methods. 2017;

Cooper C, Booth, A., Britten, N., Garside, R. A comparison of results of empirical studies of supplementary search techniques and recommendations in review methodology handbooks: A methodological review. (In Press). BMC Systematic Reviews. 2017.

Greenhalgh T, Peacock R. Effectiveness and efficiency of search methods in systematic reviews of complex evidence: audit of primary sources. BMJ (Clinical research ed). 2005;331(7524):1064–5.

Article   PubMed Central   Google Scholar  

Hinde S, Spackman E. Bidirectional citation searching to completion: an exploration of literature searching methods. PharmacoEconomics. 2015;33(1):5–11.

Levay P, Ainsworth N, Kettle R, Morgan A. Identifying evidence for public health guidance: a comparison of citation searching with web of science and Google scholar. Res Synth Methods. 2016;7(1):34–45.

McManus RJ, Wilson S, Delaney BC, Fitzmaurice DA, Hyde CJ, Tobias RS, et al. Review of the usefulness of contacting other experts when conducting a literature search for systematic reviews. BMJ (Clinical research ed). 1998;317(7172):1562–3.

Westphal A, Kriston L, Holzel LP, Harter M, von Wolff A. Efficiency and contribution of strategies for finding randomized controlled trials: a case study from a systematic review on therapeutic interventions of chronic depression. Journal of public health research. 2014;3(2):177.

Matthews EJ, Edwards AG, Barker J, Bloor M, Covey J, Hood K, et al. Efficient literature searching in diffuse topics: lessons from a systematic review of research on communicating risk to patients in primary care. Health Libr Rev. 1999;16(2):112–20.

Bethel A. Endnote Training (YouTube Videos) 2017b [Available from: http://medicine.exeter.ac.uk/esmi/workstreams/informationscience/is_resources,_guidance_&_advice/ .

Bramer WM, Giustini D, de Jonge GB, Holland L, Bekhuis T. De-duplication of database search results for systematic reviews in EndNote. Journal of the Medical Library Association : JMLA. 2016;104(3):240–3.

Bramer WM, Milic J, Mast F. Reviewing retrieved references for inclusion in systematic reviews using EndNote. Journal of the Medical Library Association : JMLA. 2017;105(1):84–7.

Gall C, Brahmi FA. Retrieval comparison of EndNote to search MEDLINE (Ovid and PubMed) versus searching them directly. Medical reference services quarterly. 2004;23(3):25–32.

Ahmed KK, Al Dhubaib BE. Zotero: a bibliographic assistant to researcher. J Pharmacol Pharmacother. 2011;2(4):303–5.

Coar JT, Sewell JP. Zotero: harnessing the power of a personal bibliographic manager. Nurse Educ. 2010;35(5):205–7.

Moher D, Liberati A, Tetzlaff J, Altman DG, The PG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 2009;6(7):e1000097.

Sampson M, McGowan J, Tetzlaff J, Cogo E, Moher D. No consensus exists on search reporting methods for systematic reviews. J Clin Epidemiol. 2008;61(8):748–54.

Toews LC. Compliance of systematic reviews in veterinary journals with preferred reporting items for systematic reviews and meta-analysis (PRISMA) literature search reporting guidelines. Journal of the Medical Library Association : JMLA. 2017;105(3):233–9.

Booth A. "brimful of STARLITE": toward standards for reporting literature searches. Journal of the Medical Library Association : JMLA. 2006;94(4):421–9. e205

Faggion CM Jr, Wu YC, Tu YK, Wasiak J. Quality of search strategies reported in systematic reviews published in stereotactic radiosurgery. Br J Radiol. 2016;89(1062):20150878.

Mullins MM, DeLuca JB, Crepaz N, Lyles CM. Reporting quality of search methods in systematic reviews of HIV behavioral interventions (2000–2010): are the searches clearly explained, systematic and reproducible? Research Synthesis Methods. 2014;5(2):116–30.

Yoshii A, Plaut DA, McGraw KA, Anderson MJ, Wellik KE. Analysis of the reporting of search strategies in Cochrane systematic reviews. Journal of the Medical Library Association : JMLA. 2009;97(1):21–9.

Bigna JJ, Um LN, Nansseu JR. A comparison of quality of abstracts of systematic reviews including meta-analysis of randomized controlled trials in high-impact general medicine journals before and after the publication of PRISMA extension for abstracts: a systematic review and meta-analysis. Syst Rev. 2016;5(1):174.

Akhigbe T, Zolnourian A, Bulters D. Compliance of systematic reviews articles in brain arteriovenous malformation with PRISMA statement guidelines: review of literature. Journal of clinical neuroscience : official journal of the Neurosurgical Society of Australasia. 2017;39:45–8.

Tao KM, Li XQ, Zhou QH, Moher D, Ling CQ, Yu WF. From QUOROM to PRISMA: a survey of high-impact medical journals' instructions to authors and a review of systematic reviews in anesthesia literature. PLoS One. 2011;6(11):e27611.

Wasiak J, Tyack Z, Ware R. Goodwin N. Jr. Poor methodological quality and reporting standards of systematic reviews in burn care management. International wound journal: Faggion CM; 2016.

Tam WW, Lo KK, Khalechelvam P. Endorsement of PRISMA statement and quality of systematic reviews and meta-analyses published in nursing journals: a cross-sectional study. BMJ Open. 2017;7(2):e013905.

Rader T, Mann M, Stansfield C, Cooper C, Sampson M. Methods for documenting systematic review searches: a discussion of common issues. Res Synth Methods. 2014;5(2):98–115.

Atkinson KM, Koenka AC, Sanchez CE, Moshontz H, Cooper H. Reporting standards for literature searches and report inclusion criteria: making research syntheses more transparent and easy to replicate. Res Synth Methods. 2015;6(1):87–95.

McGowan J, Sampson M, Salzwedel DM, Cogo E, Foerster V, Lefebvre C. PRESS peer review of electronic search strategies: 2015 guideline statement. J Clin Epidemiol. 2016;75:40–6.

Sampson M, McGowan J, Cogo E, Grimshaw J, Moher D, Lefebvre C. An evidence-based practice guideline for the peer review of electronic search strategies. J Clin Epidemiol. 2009;62(9):944–52.

Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ (Clinical research ed). 2017;358.

Whiting P, Savović J, Higgins JPT, Caldwell DM, Reeves BC, Shea B, et al. ROBIS: a new tool to assess risk of bias in systematic reviews was developed. J Clin Epidemiol. 2016;69:225–34.

Relevo R, Balshem H. Finding evidence for comparing medical interventions: AHRQ and the effective health care program. J Clin Epidemiol. 2011;64(11):1168–77.

Medicine Io. Standards for Systematic Reviews 2011 [Available from: http://www.nationalacademies.org/hmd/Reports/2011/Finding-What-Works-in-Health-Care-Standards-for-Systematic-Reviews/Standards.aspx .

CADTH: Resources 2018.

Download references

Acknowledgements

CC acknowledges the supervision offered by Professor Chris Hyde.

This publication forms a part of CC’s PhD. CC’s PhD was funded through the National Institute for Health Research (NIHR) Health Technology Assessment (HTA) Programme (Project Number 16/54/11). The open access fee for this publication was paid for by Exeter Medical School.

RG and NB were partially supported by the National Institute for Health Research (NIHR) Collaboration for Leadership in Applied Health Research and Care South West Peninsula.

The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

Author information

Authors and affiliations.

Institute of Health Research, University of Exeter Medical School, Exeter, UK

Chris Cooper & Jo Varley-Campbell

HEDS, School of Health and Related Research (ScHARR), University of Sheffield, Sheffield, UK

Andrew Booth

Nicky Britten

European Centre for Environment and Human Health, University of Exeter Medical School, Truro, UK

Ruth Garside

You can also search for this author in PubMed   Google Scholar

Contributions

CC conceived the idea for this study and wrote the first draft of the manuscript. CC discussed this publication in PhD supervision with AB and separately with JVC. CC revised the publication with input and comments from AB, JVC, RG and NB. All authors revised the manuscript prior to submission. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Chris Cooper .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional file

Additional file 1:.

Appendix tables and PubMed search strategy. Key studies used for pearl growing per key stage, working data extraction tables and the PubMed search strategy. (DOCX 30 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Cooper, C., Booth, A., Varley-Campbell, J. et al. Defining the process to literature searching in systematic reviews: a literature review of guidance and supporting studies. BMC Med Res Methodol 18 , 85 (2018). https://doi.org/10.1186/s12874-018-0545-3

Download citation

Received : 20 September 2017

Accepted : 06 August 2018

Published : 14 August 2018

DOI : https://doi.org/10.1186/s12874-018-0545-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Literature Search Process
  • Citation Chasing
  • Tacit Models
  • Unique Guidance
  • Information Specialists

BMC Medical Research Methodology

ISSN: 1471-2288

literature review search algorithm

A systematic approach to searching: an efficient and complete method to develop literature searches

Affiliations.

  • 1 Biomedical Information Specialist, Medical Library, Erasmus MC-Erasmus University Medical Centre, Rotterdam, The Netherlands.
  • 2 Medical Library, Erasmus MC-Erasmus University Medical Centre, Rotterdam, The Netherlands.
  • 3 Spencer S. Eccles Health Sciences Library, University of Utah, Salt Lake City, UT.
  • 4 Department of Family Medicine, School for Public Health and Primary Care (CAPHRI), Maastricht University, Maastricht, The Netherlands, and Kleijnen Systematic Reviews, York, United Kingdom.
  • PMID: 30271302
  • PMCID: PMC6148622
  • DOI: 10.5195/jmla.2018.283

Creating search strategies for systematic reviews, finding the best balance between sensitivity and specificity, and translating search strategies between databases is challenging. Several methods describe standards for systematic search strategies, but a consistent approach for creating an exhaustive search strategy has not yet been fully described in enough detail to be fully replicable. The authors have established a method that describes step by step the process of developing a systematic search strategy as needed in the systematic review. This method describes how single-line search strategies can be prepared in a text document by typing search syntax (such as field codes, parentheses, and Boolean operators) before copying and pasting search terms (keywords and free-text synonyms) that are found in the thesaurus. To help ensure term completeness, we developed a novel optimization technique that is mainly based on comparing the results retrieved by thesaurus terms with those retrieved by the free-text search words to identify potentially relevant candidate search terms. Macros in Microsoft Word have been developed to convert syntaxes between databases and interfaces almost automatically. This method helps information specialists in developing librarian-mediated searches for systematic reviews as well as medical and health care practitioners who are searching for evidence to answer clinical questions. The described method can be used to create complex and comprehensive search strategies for different databases and interfaces, such as those that are needed when searching for relevant references for systematic reviews, and will assist both information specialists and practitioners when they are searching the biomedical literature.

  • Abstracting and Indexing / standards*
  • Databases, Factual / standards*
  • Information Storage and Retrieval / methods*
  • Medical Subject Headings
  • Review Literature as Topic*
  • Vocabulary, Controlled

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 01 February 2021

An open source machine learning framework for efficient and transparent systematic reviews

  • Rens van de Schoot   ORCID: orcid.org/0000-0001-7736-2091 1 ,
  • Jonathan de Bruin   ORCID: orcid.org/0000-0002-4297-0502 2 ,
  • Raoul Schram 2 ,
  • Parisa Zahedi   ORCID: orcid.org/0000-0002-1610-3149 2 ,
  • Jan de Boer   ORCID: orcid.org/0000-0002-0531-3888 3 ,
  • Felix Weijdema   ORCID: orcid.org/0000-0001-5150-1102 3 ,
  • Bianca Kramer   ORCID: orcid.org/0000-0002-5965-6560 3 ,
  • Martijn Huijts   ORCID: orcid.org/0000-0002-8353-0853 4 ,
  • Maarten Hoogerwerf   ORCID: orcid.org/0000-0003-1498-2052 2 ,
  • Gerbrich Ferdinands   ORCID: orcid.org/0000-0002-4998-3293 1 ,
  • Albert Harkema   ORCID: orcid.org/0000-0002-7091-1147 1 ,
  • Joukje Willemsen   ORCID: orcid.org/0000-0002-7260-0828 1 ,
  • Yongchao Ma   ORCID: orcid.org/0000-0003-4100-5468 1 ,
  • Qixiang Fang   ORCID: orcid.org/0000-0003-2689-6653 1 ,
  • Sybren Hindriks 1 ,
  • Lars Tummers   ORCID: orcid.org/0000-0001-9940-9874 5 &
  • Daniel L. Oberski   ORCID: orcid.org/0000-0001-7467-2297 1 , 6  

Nature Machine Intelligence volume  3 ,  pages 125–133 ( 2021 ) Cite this article

66k Accesses

178 Citations

142 Altmetric

Metrics details

  • Computational biology and bioinformatics
  • Computer science
  • Medical research

A preprint version of the article is available at arXiv.

To help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible, we designed a tool to accelerate the step of screening titles and abstracts. For many tasks—including but not limited to systematic reviews and meta-analyses—the scientific literature needs to be checked systematically. Scholars and practitioners currently screen thousands of studies by hand to determine which studies to include in their review or meta-analysis. This is error prone and inefficient because of extremely imbalanced data: only a fraction of the screened studies is relevant. The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We therefore developed an open source machine learning-aided pipeline applying active learning: ASReview. We demonstrate by means of simulation studies that active learning can yield far more efficient reviewing than manual reviewing while providing high quality. Furthermore, we describe the options of the free and open source research software and present the results from user experience tests. We invite the community to contribute to open source projects such as our own that provide measurable and reproducible improvements over current practice.

With the emergence of online publishing, the number of scientific manuscripts on many topics is skyrocketing 1 . All of these textual data present opportunities to scholars and practitioners while simultaneously confronting them with new challenges. Scholars often develop systematic reviews and meta-analyses to develop comprehensive overviews of the relevant topics 2 . The process entails several explicit and, ideally, reproducible steps, including identifying all likely relevant publications in a standardized way, extracting data from eligible studies and synthesizing the results. Systematic reviews differ from traditional literature reviews in that they are more replicable and transparent 3 , 4 . Such systematic overviews of literature on a specific topic are pivotal not only for scholars, but also for clinicians, policy-makers, journalists and, ultimately, the general public 5 , 6 , 7 .

Given that screening the entire research literature on a given topic is too labour intensive, scholars often develop quite narrow searches. Developing a search strategy for a systematic review is an iterative process aimed at balancing recall and precision 8 , 9 ; that is, including as many potentially relevant studies as possible while simultaneously limiting the total number of studies retrieved. The vast number of publications in the field of study often leads to a relatively precise search, with the risk of missing relevant studies. The process of systematic reviewing is error prone and extremely time intensive 10 . In fact, if the literature of a field is growing faster than the amount of time available for systematic reviews, adequate manual review of this field then becomes impossible 11 .

The rapidly evolving field of machine learning has aided researchers by allowing the development of software tools that assist in developing systematic reviews 11 , 12 , 13 , 14 . Machine learning offers approaches to overcome the manual and time-consuming screening of large numbers of studies by prioritizing relevant studies via active learning 15 . Active learning is a type of machine learning in which a model can choose the data points (for example, records obtained from a systematic search) it would like to learn from and thereby drastically reduce the total number of records that require manual screening 16 , 17 , 18 . In most so-called human-in-the-loop 19 machine-learning applications, the interaction between the machine-learning algorithm and the human is used to train a model with a minimum number of labelling tasks. Unique to systematic reviewing is that not only do all relevant records (that is, titles and abstracts) need to seen by a researcher, but an extremely diverse range of concepts also need to be learned, thereby requiring flexibility in the modelling approach as well as careful error evaluation 11 . In the case of systematic reviewing, the algorithm(s) are interactively optimized for finding the most relevant records, instead of finding the most accurate model. The term researcher-in-the-loop was introduced 20 as a special case of human-in-the-loop with three unique components: (1) the primary output of the process is a selection of the records, not a trained machine learning model; (2) all records in the relevant selection are seen by a human at the end of the process 21 ; (3) the use-case requires a reproducible workflow and complete transparency is required 22 .

Existing tools that implement such an active learning cycle for systematic reviewing are described in Table 1 ; see the Supplementary Information for an overview of all of the software that we considered (note that this list was based on a review of software tools 12 ). However, existing tools have two main drawbacks. First, many are closed source applications with black box algorithms, which is problematic as transparency and data ownership are essential in the era of open science 22 . Second, to our knowledge, existing tools lack the necessary flexibility to deal with the large range of possible concepts to be learned by a screening machine. For example, in systematic reviews, the optimal type of classifier will depend on variable parameters, such as the proportion of relevant publications in the initial search and the complexity of the inclusion criteria used by the researcher 23 . For this reason, any successful system must allow for a wide range of classifier types. Benchmark testing is crucial to understand the real-world performance of any machine learning-aided system, but such benchmark options are currently mostly lacking.

In this paper we present an open source machine learning-aided pipeline with active learning for systematic reviews called ASReview. The goal of ASReview is to help scholars and practitioners to get an overview of the most relevant records for their work as efficiently as possible while being transparent in the process. The open, free and ready-to-use software ASReview addresses all concerns mentioned above: it is open source, uses active learning, allows multiple machine learning models. It also has a benchmark mode, which is especially useful for comparing and designing algorithms. Furthermore, it is intended to be easily extensible, allowing third parties to add modules that enhance the pipeline. Although we focus this paper on systematic reviews, ASReview can handle any text source.

In what follows, we first present the pipeline for manual versus machine learning-aided systematic reviews. We then show how ASReview has been set up and how ASReview can be used in different workflows by presenting several real-world use cases. We subsequently demonstrate the results of simulations that benchmark performance and present the results of a series of user-experience tests. Finally, we discuss future directions.

Pipeline for manual and machine learning-aided systematic reviews

The pipeline of a systematic review without active learning traditionally starts with researchers doing a comprehensive search in multiple databases 24 , using free text words as well as controlled vocabulary to retrieve potentially relevant references. The researcher then typically verifies that the key papers they expect to find are indeed included in the search results. The researcher downloads a file with records containing the text to be screened. In the case of systematic reviewing it contains the titles and abstracts (and potentially other metadata such as the authors’s names, journal name, DOI) of potentially relevant references into a reference manager. Ideally, two or more researchers then screen the records’s titles and abstracts on the basis of the eligibility criteria established beforehand 4 . After all records have been screened, the full texts of the potentially relevant records are read to determine which of them will be ultimately included in the review. Most records are excluded in the title and abstract phase. Typically, only a small fraction of the records belong to the relevant class, making title and abstract screening an important bottleneck in systematic reviewing process 25 . For instance, a recent study analysed 10,115 records and excluded 9,847 after title and abstract screening, a drop of more than 95% 26 . ASReview therefore focuses on this labour-intensive step.

The research pipeline of ASReview is depicted in Fig. 1 . The researcher starts with a search exactly as described above and subsequently uploads a file containing the records (that is, metadata containing the text of the titles and abstracts) into the software. Prior knowledge is then selected, which is used for training of the first model and presenting the first record to the researcher. As screening is a binary classification problem, the reviewer must select at least one key record to include and exclude on the basis of background knowledge. More prior knowledge may result in improved efficiency of the active learning process.

figure 1

The symbols indicate whether the action is taken by a human, a computer, or whether both options are available.

A machine learning classifier is trained to predict study relevance (labels) from a representation of the record-containing text (feature space) on the basis of prior knowledge. We have purposefully chosen not to include an author name or citation network representation in the feature space to prevent authority bias in the inclusions. In the active learning cycle, the software presents one new record to be screened and labelled by the user. The user’s binary label (1 for relevant versus 0 for irrelevant) is subsequently used to train a new model, after which a new record is presented to the user. This cycle continues up to a certain user-specified stopping criterion has been reached. The user now has a file with (1) records labelled as either relevant or irrelevant and (2) unlabelled records ordered from most to least probable to be relevant as predicted by the current model. This set-up helps to move through a large database much quicker than in the manual process, while the decision process simultaneously remains transparent.

Software implementation for ASReview

The source code 27 of ASReview is available open source under an Apache 2.0 license, including documentation 28 . Compiled and packaged versions of the software are available on the Python Package Index 29 or Docker Hub 30 . The free and ready-to-use software ASReview implements oracle, simulation and exploration modes. The oracle mode is used to perform a systematic review with interaction by the user, the simulation mode is used for simulation of the ASReview performance on existing datasets, and the exploration mode can be used for teaching purposes and includes several preloaded labelled datasets.

The oracle mode presents records to the researcher and the researcher classifies these. Multiple file formats are supported: (1) RIS files are used by digital libraries such as IEEE Xplore, Scopus and ScienceDirect; the citation managers Mendeley, RefWorks, Zotero and EndNote support the RIS format too. (2) Tabular datasets with the .csv, .xlsx and .xls file extensions. CSV files should be comma separated and UTF-8 encoded; the software for CSV files accepts a set of predetermined labels in line with the ones used in RIS files. Each record in the dataset should hold the metadata on, for example, a scientific publication. Mandatory metadata is text and can, for example, be titles or abstracts from scientific papers. If available, both are used to train the model, but at least one is needed. An advanced option is available that splits the title and abstracts in the feature-extraction step and weights the two feature matrices independently (for TF–IDF only). Other metadata such as author, date, DOI and keywords are optional but not used for training the models. When using ASReview in the simulation or exploration mode, an additional binary variable is required to indicate historical labelling decisions. This column, which is automatically detected, can also be used in the oracle mode as background knowledge for previous selection of relevant papers before entering the active learning cycle. If unavailable, the user has to select at least one relevant record that can be identified by searching the pool of records. At least one irrelevant record should also be identified; the software allows to search for specific records or presents random records that are most likely to be irrelevant due to the extremely imbalanced data.

The software has a simple yet extensible default model: a naive Bayes classifier, TF–IDF feature extraction, a dynamic resampling balance strategy 31 and certainty-based sampling 17 , 32 for the query strategy. These defaults were chosen on the basis of their consistently high performance in benchmark experiments across several datasets 31 . Moreover, the low computation time of these default settings makes them attractive in applications, given that the software should be able to run locally. Users can change the settings, shown in Table 2 , and technical details are described in our documentation 28 . Users can also add their own classifiers, feature extraction techniques, query strategies and balance strategies.

ASReview has a number of implemented features (see Table 2 ). First, there are several classifiers available: (1) naive Bayes; (2) support vector machines; (3) logistic regression; (4) neural networks; (5) random forests; (6) LSTM-base, which consists of an embedding layer, an LSTM layer with one output, a dense layer and a single sigmoid output node; and (7) LSTM-pool, which consists of an embedding layer, an LSTM layer with many outputs, a max pooling layer and a single sigmoid output node. The feature extraction techniques available are Doc2Vec 33 , embedding LSTM, embedding with IDF or TF–IDF 34 (the default is unigram, with the option to run n -grams while other parameters are set to the defaults of Scikit-learn 35 ) and sBERT 36 . The available query strategies for the active learning part are (1) random selection, ignoring model-assigned probabilities; (2) uncertainty-based sampling, which chooses the most uncertain record according to the model (that is, closest to 0.5 probability); (3) certainty-based sampling (max in ASReview), which chooses the record most likely to be included according to the model; and (4) mixed sampling, which uses a combination of random and certainty-based sampling.

There are several balance strategies that rebalance and reorder the training data. This is necessary, because the data is typically extremely imbalanced and therefore we have implemented the following balance strategies: (1) full sampling, which uses all of the labelled records; (2) undersampling the irrelevant records so that the included and excluded records are in some particular ratio (closer to one); and (3) dynamic resampling, a novel method similar to undersampling in that it decreases the imbalance of the training data 31 . However, in dynamic resampling, the number of irrelevant records is decreased, whereas the number of relevant records is increased by duplication such that the total number of records in the training data remains the same. The ratio between relevant and irrelevant records is not fixed over interactions, but dynamically updated depending on the number of labelled records, the total number of records and the ratio between relevant and irrelevant records. Details on all of the described algorithms can be found in the code and documentation referred to above.

By default, ASReview converts the records’s texts into a document-term matrix, terms are converted to lowercase and no stop words are removed by default (but this can be changed). As the document-term matrix is identical in each iteration of the active learning cycle, it is generated in advance of model training and stored in the (active learning) state file. Each row of the document-term matrix can easily be requested from the state-file. Records are internally identified by their row number in the input dataset. In oracle mode, the record that is selected to be classified is retrieved from the state file and the record text and other metadata (such as title and abstract) are retrieved from the original dataset (from the file or the computer’s memory). ASReview can run on your local computer, or on a (self-hosted) local or remote server. Data (all records and their labels) remain on the users’s computer. Data ownership and confidentiality are crucial and no data are processed or used in any way by third parties. This is unique by comparison with some of the existing systems, as shown in the last column of Table 1 .

Real-world use cases and high-level function descriptions

Below we highlight a number of real-world use cases and high-level function descriptions for using the pipeline of ASReview.

ASReview can be integrated in classic systematic reviews or meta-analyses. Such reviews or meta-analyses entail several explicit and reproducible steps, as outlined in the PRISMA guidelines 4 . Scholars identify all likely relevant publications in a standardized way, screen retrieved publications to select eligible studies on the basis of defined eligibility criteria, extract data from eligible studies and synthesize the results. ASReview fits into this process, particularly in the abstract screening phase. ASReview does not replace the initial step of collecting all potentially relevant studies. As such, results from ASReview depend on the quality of the initial search process, including selection of databases 24 and construction of comprehensive searches using keywords and controlled vocabulary. However, ASReview can be used to broaden the scope of the search (by keyword expansion or omitting limitation in the search query), resulting in a higher number of initial papers to limit the risk of missing relevant papers during the search part (that is, more focus on recall instead of precision).

Furthermore, many reviewers nowadays move towards meta-reviews when analysing very large literature streams, that is, systematic reviews of systematic reviews 37 . This can be problematic as the various reviews included could use different eligibility criteria and are therefore not always directly comparable. Due to the efficiency of ASReview, scholars using the tool could conduct the study by analysing the papers directly instead of using the systematic reviews. Furthermore, ASReview supports the rapid update of a systematic review. The included papers from the initial review are used to train the machine learning model before screening of the updated set of papers starts. This allows the researcher to quickly screen the updated set of papers on the basis of decisions made in the initial run.

As an example case, let us look at the current literature on COVID-19 and the coronavirus. An enormous number of papers are being published on COVID-19. It is very time consuming to manually find relevant papers (for example, to develop treatment guidelines). This is especially problematic as urgent overviews are required. Medical guidelines rely on comprehensive systematic reviews, but the medical literature is growing at breakneck pace and the quality of the research is not universally adequate for summarization into policy 38 . Such reviews must entail adequate protocols with explicit and reproducible steps, including identifying all potentially relevant papers, extracting data from eligible studies, assessing potential for bias and synthesizing the results into medical guidelines. Researchers need to screen (tens of) thousands of COVID-19-related studies by hand to find relevant papers to include in their overview. Using ASReview, this can be done far more efficiently by selecting key papers that match their (COVID-19) research question in the first step; this should start the active learning cycle and lead to the most relevant COVID-19 papers for their research question being presented next. A plug-in was therefore developed for ASReview 39 , which contained three databases that are updated automatically whenever a new version is released by the owners of the data: (1) the Cord19 database, developed by the Allen Institute for AI, with over all publications on COVID-19 and other coronavirus research (for example SARS, MERS and so on) from PubMed Central, the WHO COVID-19 database of publications, the preprint servers bioRxiv and medRxiv and papers contributed by specific publishers 40 . The CORD-19 dataset is updated daily by the Allen Institute for AI and updated also daily in the plugin. (2) In addition to the full dataset, we automatically construct a daily subset of the database with studies published after December 1st, 2019 to search for relevant papers published during the COVID-19 crisis. (3) A separate dataset of COVID-19 related preprints, containing metadata of preprints from over 15 preprints servers across disciplines, published since January 1st, 2020 41 . The preprint dataset is updated weekly by the maintainers and then automatically updated in ASReview as well. As this dataset is not readily available to researchers through regular search engines (for example, PubMed), its inclusion in ASReview provided added value to researchers interested in COVID-19 research, especially if they want a quick way to screen preprints specifically.

Simulation study

To evaluate the performance of ASReview on a labelled dataset, users can employ the simulation mode. As an example, we ran simulations based on four labelled datasets with version 0.7.2 of ASReview. All scripts to reproduce the results in this paper can be found on Zenodo ( https://doi.org/10.5281/zenodo.4024122 ) 42 , whereas the results are available at OSF ( https://doi.org/10.17605/OSF.IO/2JKD6 ) 43 .

First, we analysed the performance for a study systematically describing studies that performed viral metagenomic next-generation sequencing in common livestock such as cattle, small ruminants, poultry and pigs 44 . Studies were retrieved from Embase ( n  = 1,806), Medline ( n  = 1,384), Cochrane Central ( n  = 1), Web of Science ( n  = 977) and Google Scholar ( n  = 200, the top relevant references). After deduplication this led to 2,481 studies obtained in the initial search, of which 120 were inclusions (4.84%).

A second simulation study was performed on the results for a systematic review of studies on fault prediction in software engineering 45 . Studies were obtained from ACM Digital Library, IEEExplore and the ISI Web of Science. Furthermore, a snowballing strategy and a manual search were conducted, accumulating to 8,911 publications of which 104 were included in the systematic review (1.2%).

A third simulation study was performed on a review of longitudinal studies that applied unsupervised machine learning techniques to longitudinal data of self-reported symptoms of the post-traumatic stress assessed after trauma exposure 46 , 47 ; 5,782 studies were obtained by searching Pubmed, Embase, PsychInfo and Scopus and through a snowballing strategy in which both the references and the citation of the included papers were screened. Thirty-eight studies were included in the review (0.66%).

A fourth simulation study was performed on the results for a systematic review on the efficacy of angiotensin-converting enzyme inhibitors, from a study collecting various systematic review datasets from the medical sciences 15 . The collection is a subset of 2,544 publications from the TREC 2004 Genomics Track document corpus 48 . This is a static subset from all MEDLINE records from 1994 through 2003, which allows for replicability of results. Forty-one publications were included in the review (1.6%).

Performance metrics

We evaluated the four datasets using three performance metrics. We first assess the work saved over sampling (WSS), which is the percentage reduction in the number of records needed to screen achieved by using active learning instead of screening records at random; WSS is measured at a given level of recall of relevant records, for example 95%, indicating the work reduction in screening effort at the cost of failing to detect 5% of the relevant records. For some researchers it is essential that all relevant literature on the topic is retrieved; this entails that the recall should be 100% (that is, WSS@100%). We also propose the amount of relevant references found after having screened the first 10% of the records (RRF10%). This is a useful metric for getting a quick overview of the relevant literature.

For every dataset, 15 runs were performed with one random inclusion and one random exclusion (see Fig. 2 ). The classical review performance with randomly found inclusions is shown by the dashed line. The average work saved over sampling at 95% recall for ASReview is 83% and ranges from 67% to 92%. Hence, 95% of the eligible studies will be found after screening between only 8% to 33% of the studies. Furthermore, the number of relevant abstracts found after reading 10% of the abstracts ranges from 70% to 100%. In short, our software would have saved many hours of work.

figure 2

a – d , Results of the simulation study for the results for a study systematically review studies that performed viral metagenomic next-generation sequencing in common livestock ( a ), results for a systematic review of studies on fault prediction in software engineering ( b ), results for longitudinal studies that applied unsupervised machine learning techniques on longitudinal data of self-reported symptoms of posttraumatic stress assessed after trauma exposure ( c ), and results for a systematic review on the efficacy of angiotensin-converting enzyme inhibitors ( d ). Fiteen runs (shown with separate lines) were performed for every dataset, with only one random inclusion and one random exclusion. The classical review performances with randomly found inclusions are shown by the dashed lines.

Usability testing (user experience testing)

We conducted a series of user experience tests to learn from end users how they experience the software and implement it in their workflow. The study was approved by the Ethics Committee of the Faculty of Social and Behavioral Sciences of Utrecht University (ID 20-104).

Unstructured interviews

The first user experience (UX) test—carried out in December 2019—was conducted with an academic research team in a substantive research field (public administration and organizational science) that has conducted various systematic reviews and meta-analyses. It was composed of three university professors (ranging from assistant to full) and three PhD candidates. In one 3.5 h session, the participants used the software and provided feedback via unstructured interviews and group discussions. The goal was to provide feedback on installing the software and testing the performance on their own data. After these sessions we prioritized the feedback in a meeting with the ASReview team, which resulted in the release of v.0.4 and v.0.6. An overview of all releases can be found on GitHub 27 .

A second UX test was conducted with four experienced researchers developing medical guidelines based on classical systematic reviews, and two experienced reviewers working at a pharmaceutical non-profit organization who work on updating reviews with new data. In four sessions, held in February to March 2020, these users tested the software following our testing protocol. After each session we implemented the feedback provided by the experts and asked them to review the software again. The main feedback was about how to upload datasets and select prior papers. Their feedback resulted in the release of v.0.7 and v.0.9.

Systematic UX test

In May 2020 we conducted a systematic UX test. Two groups of users were distinguished: an unexperienced group and an experienced user who already used ASReview. Due to the COVID-19 lockdown the usability tests were conducted via video calling where one person gave instructions to the participant and one person observed, called human-moderated remote testing 49 . During the tests, one person (SH) asked the questions and helped the participant with the tasks, the other person observed and made notes, a user experience professional at the IT department of Utrecht University (MH).

To analyse the notes, thematic analysis was used, which is a method to analyse data by dividing the information in subjects that all have a different meaning 50 using the Nvivo 12 software 51 . When something went wrong the text was coded as showstopper, when something did not go smoothly the text was coded as doubtful, and when something went well the subject was coded as superb. The features the participants requested for future versions of the ASReview tool were discussed with the lead engineer of the ASReview team and were submitted to GitHub as issues or feature requests.

The answers to the quantitative questions can be found at the Open Science Framework 52 . The participants ( N  = 11) rated the tool with a grade of 7.9 (s.d. = 0.9) on a scale from one to ten (Table 2 ). The unexperienced users on average rated the tool with an 8.0 (s.d. = 1.1, N  = 6). The experienced user on average rated the tool with a 7.8 (s.d. = 0.9, N  = 5). The participants described the usability test with words such as helpful, accessible, fun, clear and obvious.

The UX tests resulted in the new release v0.10, v0.10.1 and the major release v0.11, which is a major revision of the graphical user interface. The documentation has been upgraded to make installing and launching ASReview more straightforward. We made setting up the project, selecting a dataset and finding past knowledge is more intuitive and flexible. We also added a project dashboard with information on your progress and advanced settings.

Continuous input via the open source community

Finally, the ASReview development team receives continuous feedback from the open science community about, among other things, the user experience. In every new release we implement features listed by our users. Recurring UX tests are performed to keep up with the needs of users and improve the value of the tool.

We designed a system to accelerate the step of screening titles and abstracts to help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible. Our system uses active learning to train a machine learning model that predicts relevance from texts using a limited number of labelled examples. The classifier, feature extraction technique, balance strategy and active learning query strategy are flexible. We provide an open source software implementation, ASReview with state-of-the-art systems across a wide range of real-world systematic reviewing applications. Based on our experiments, ASReview provides defaults on its parameters, which exhibited good performance on average across the applications we examined. However, we stress that in practical applications, these defaults should be carefully examined; for this purpose, the software provides a simulation mode to users. We encourage users and developers to perform further evaluation of the proposed approach in their application, and to take advantage of the open source nature of the project by contributing further developments.

Drawbacks of machine learning-based screening systems, including our own, remain. First, although the active learning step greatly reduces the number of manuscripts that must be screened, it also prevents a straightforward evaluation of the system’s error rates without further onerous labelling. Providing users with an accurate estimate of the system’s error rate in the application at hand is therefore a pressing open problem. Second, although, as argued above, the use of such systems is not limited in principle to reviewing, no empirical benchmarks of actual performance in these other situations yet exist to our knowledge. Third, machine learning-based screening systems automate the screening step only; although the screening step is time-consuming and a good target for automation, it is just one part of a much larger process, including the initial search, data extraction, coding for risk of bias, summarizing results and so on. Although some other works, similar to our own, have looked at (semi-)automating some of these steps in isolation 53 , 54 , to our knowledge the field is still far removed from an integrated system that would truly automate the review process while guaranteeing the quality of the produced evidence synthesis. Integrating the various tools that are currently under development to aid the systematic reviewing pipeline is therefore a worthwhile topic for future development.

Possible future research could also focus on the performance of identifying full text articles with different document length and domain-specific terminologies or even other types of text, such as newspaper articles and court cases. When the selection of past knowledge is not possible based on expert knowledge, alternative methods could be explored. For example, unsupervised learning or pseudolabelling algorithms could be used to improve training 55 , 56 . In addition, as the NLP community pushes forward the state of the art in feature extraction methods, these are easily added to our system as well. In all cases, performance benefits should be carefully evaluated using benchmarks for the task at hand. To this end, common benchmark challenges should be constructed that allow for an even comparison of the various tools now available. To facilitate such a benchmark, we have constructed a repository of publicly available systematic reviewing datasets 57 .

The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We invite the community to contribute to open source projects such as our own, as well as to common benchmark challenges, so that we can provide measurable and reproducible improvement over current practice.

Data availability

The results described in this paper are available at the Open Science Framework ( https://doi.org/10.17605/OSF.IO/2JKD6 ) 43 . The answers to the quantitative questions of the UX test can be found at the Open Science Framework (OSF.IO/7PQNM) 52 .

Code availability

All code to reproduce the results described in this paper can be found on Zenodo ( https://doi.org/10.5281/zenodo.4024122 ) 42 . All code for the software ASReview is available under an Apache 2.0 license ( https://doi.org/10.5281/zenodo.3345592 ) 27 , is maintained on GitHub 63 and includes documentation ( https://doi.org/10.5281/zenodo.4287120 ) 28 .

Bornmann, L. & Mutz, R. Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Assoc. Inf. Sci. Technol. 66 , 2215–2222 (2015).

Article   Google Scholar  

Gough, D., Oliver, S. & Thomas, J. An Introduction to Systematic Reviews (Sage, 2017).

Cooper, H. Research Synthesis and Meta-analysis: A Step-by-Step Approach (SAGE Publications, 2015).

Liberati, A. et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J. Clin. Epidemiol. 62 , e1–e34 (2009).

Boaz, A. et al. Systematic Reviews: What have They Got to Offer Evidence Based Policy and Practice? (ESRC UK Centre for Evidence Based Policy and Practice London, 2002).

Oliver, S., Dickson, K. & Bangpan, M. Systematic Reviews: Making Them Policy Relevant. A Briefing for Policy Makers and Systematic Reviewers (UCL Institute of Education, 2015).

Petticrew, M. Systematic reviews from astronomy to zoology: myths and misconceptions. Brit. Med. J. 322 , 98–101 (2001).

Lefebvre, C., Manheimer, E. & Glanville, J. in Cochrane Handbook for Systematic Reviews of Interventions (eds. Higgins, J. P. & Green, S.) 95–150 (John Wiley & Sons, 2008); https://doi.org/10.1002/9780470712184.ch6 .

Sampson, M., Tetzlaff, J. & Urquhart, C. Precision of healthcare systematic review searches in a cross-sectional sample. Res. Synth. Methods 2 , 119–125 (2011).

Wang, Z., Nayfeh, T., Tetzlaff, J., O’Blenis, P. & Murad, M. H. Error rates of human reviewers during abstract screening in systematic reviews. PLoS ONE 15 , e0227742 (2020).

Marshall, I. J. & Wallace, B. C. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Syst. Rev. 8 , 163 (2019).

Harrison, H., Griffin, S. J., Kuhn, I. & Usher-Smith, J. A. Software tools to support title and abstract screening for systematic reviews in healthcare: an evaluation. BMC Med. Res. Methodol. 20 , 7 (2020).

O’Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M. & Ananiadou, S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst. Rev. 4 , 5 (2015).

Wallace, B. C., Trikalinos, T. A., Lau, J., Brodley, C. & Schmid, C. H. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinf. 11 , 55 (2010).

Cohen, A. M., Hersh, W. R., Peterson, K. & Yen, P.-Y. Reducing workload in systematic review preparation using automated citation classification. J. Am. Med. Inform. Assoc. 13 , 206–219 (2006).

Kremer, J., Steenstrup Pedersen, K. & Igel, C. Active learning with support vector machines. WIREs Data Min. Knowl. Discov. 4 , 313–326 (2014).

Miwa, M., Thomas, J., O’Mara-Eves, A. & Ananiadou, S. Reducing systematic review workload through certainty-based screening. J. Biomed. Inform. 51 , 242–253 (2014).

Settles, B. Active Learning Literature Survey (Minds@UW, 2009); https://minds.wisconsin.edu/handle/1793/60660

Holzinger, A. Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inform. 3 , 119–131 (2016).

Van de Schoot, R. & De Bruin, J. Researcher-in-the-loop for Systematic Reviewing of Text Databases (Zenodo, 2020); https://doi.org/10.5281/zenodo.4013207

Kim, D., Seo, D., Cho, S. & Kang, P. Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf. Sci. 477 , 15–29 (2019).

Nosek, B. A. et al. Promoting an open research culture. Science 348 , 1422–1425 (2015).

Kilicoglu, H., Demner-Fushman, D., Rindflesch, T. C., Wilczynski, N. L. & Haynes, R. B. Towards automatic recognition of scientifically rigorous clinical research evidence. J. Am. Med. Inform. Assoc. 16 , 25–31 (2009).

Gusenbauer, M. & Haddaway, N. R. Which academic search systems are suitable for systematic reviews or meta‐analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res. Synth. Methods 11 , 181–217 (2020).

Borah, R., Brown, A. W., Capers, P. L. & Kaiser, K. A. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open 7 , e012545 (2017).

de Vries, H., Bekkers, V. & Tummers, L. Innovation in the Public Sector: a systematic review and future research agenda. Public Adm. 94 , 146–166 (2016).

Van de Schoot, R. et al. ASReview: Active Learning for Systematic Reviews (Zenodo, 2020); https://doi.org/10.5281/zenodo.3345592

De Bruin, J. et al. ASReview Software Documentation 0.14 (Zenodo, 2020); https://doi.org/10.5281/zenodo.4287120

ASReview PyPI Package (ASReview Core Development Team, 2020); https://pypi.org/project/asreview/

Docker container for ASReview (ASReview Core Development Team, 2020); https://hub.docker.com/r/asreview/asreview

Ferdinands, G. et al. Active Learning for Screening Prioritization in Systematic Reviews—A Simulation Study (OSF Preprints, 2020); https://doi.org/10.31219/osf.io/w6qbg

Fu, J. H. & Lee, S. L. Certainty-enhanced active learning for improving imbalanced data classification. In 2011 IEEE 11th International Conference on Data Mining Workshops 405–412 (IEEE, 2011).

Le, Q. V. & Mikolov, T. Distributed representations of sentences and documents. Preprint at https://arxiv.org/abs/1405.4053 (2014).

Ramos, J. Using TF–IDF to determine word relevance in document queries. In Proc. 1st Instructional Conference on Machine Learning Vol. 242, 133–142 (ICML, 2003).

Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12 , 2825–2830 (2011).

MathSciNet   MATH   Google Scholar  

Reimers, N. & Gurevych, I. Sentence-BERT: sentence embeddings using siamese BERT-networks Preprint at https://arxiv.org/abs/1908.10084 (2019).

Smith, V., Devane, D., Begley, C. M. & Clarke, M. Methodology in conducting a systematic review of systematic reviews of healthcare interventions. BMC Med. Res. Methodol. 11 , 15 (2011).

Wynants, L. et al. Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal. Brit. Med. J . 369 , 1328 (2020).

Van de Schoot, R. et al. Extension for COVID-19 Related Datasets in ASReview (Zenodo, 2020). https://doi.org/10.5281/zenodo.3891420 .

Lu Wang, L. et al. CORD-19: The COVID-19 open research dataset. Preprint at https://arxiv.org/abs/2004.10706 (2020).

Fraser, N. & Kramer, B. Covid19_preprints (FigShare, 2020); https://doi.org/10.6084/m9.figshare.12033672.v18

Ferdinands, G., Schram, R., Van de Schoot, R. & De Bruin, J. Scripts for ‘ASReview: Open Source Software for Efficient and Transparent Active Learning for Systematic Reviews’ (Zenodo, 2020); https://doi.org/10.5281/zenodo.4024122

Ferdinands, G., Schram, R., van de Schoot, R. & de Bruin, J. Results for ‘ASReview: Open Source Software for Efficient and Transparent Active Learning for Systematic Reviews’ (OSF, 2020); https://doi.org/10.17605/OSF.IO/2JKD6

Kwok, K. T. T., Nieuwenhuijse, D. F., Phan, M. V. T. & Koopmans, M. P. G. Virus metagenomics in farm animals: a systematic review. Viruses 12 , 107 (2020).

Hall, T., Beecham, S., Bowes, D., Gray, D. & Counsell, S. A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38 , 1276–1304 (2012).

van de Schoot, R., Sijbrandij, M., Winter, S. D., Depaoli, S. & Vermunt, J. K. The GRoLTS-Checklist: guidelines for reporting on latent trajectory studies. Struct. Equ. Model. Multidiscip. J. 24 , 451–467 (2017).

Article   MathSciNet   Google Scholar  

van de Schoot, R. et al. Bayesian PTSD-trajectory analysis with informed priors based on a systematic literature search and expert elicitation. Multivar. Behav. Res. 53 , 267–291 (2018).

Cohen, A. M., Bhupatiraju, R. T. & Hersh, W. R. Feature generation, feature selection, classifiers, and conceptual drift for biomedical document triage. In Proc. 13th Text Retrieval Conference (TREC, 2004).

Vasalou, A., Ng, B. D., Wiemer-Hastings, P. & Oshlyansky, L. Human-moderated remote user testing: orotocols and applications. In 8th ERCIM Workshop, User Interfaces for All Vol. 19 (ERCIM, 2004).

Joffe, H. in Qualitative Research Methods in Mental Health and Psychotherapy: A Guide for Students and Practitioners (eds Harper, D. & Thompson, A. R.) Ch. 15 (Wiley, 2012).

NVivo v. 12 (QSR International Pty, 2019).

Hindriks, S., Huijts, M. & van de Schoot, R. Data for UX-test ASReview - June 2020. OSF https://doi.org/10.17605/OSF.IO/7PQNM (2020).

Marshall, I. J., Kuiper, J. & Wallace, B. C. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. J. Am. Med. Inform. Assoc. 23 , 193–201 (2016).

Nallapati, R., Zhou, B., dos Santos, C. N., Gulcehre, Ç. & Xiang, B. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In Proc. 20th SIGNLL Conference on Computational Natural Language Learning 280–290 (Association for Computational Linguistics, 2016).

Xie, Q., Dai, Z., Hovy, E., Luong, M.-T. & Le, Q. V. Unsupervised data augmentation for consistency training. Preprint at https://arxiv.org/abs/1904.12848 (2019).

Ratner, A. et al. Snorkel: rapid training data creation with weak supervision. VLDB J. 29 , 709–730 (2020).

Systematic Review Datasets (ASReview Core Development Team, 2020); https://github.com/asreview/systematic-review-datasets

Wallace, B. C., Small, K., Brodley, C. E., Lau, J. & Trikalinos, T. A. Deploying an interactive machine learning system in an evidence-based practice center: Abstrackr. In Proc. 2nd ACM SIGHIT International Health Informatics Symposium 819–824 (Association for Computing Machinery, 2012).

Cheng, S. H. et al. Using machine learning to advance synthesis and use of conservation and environmental evidence. Conserv. Biol. 32 , 762–764 (2018).

Yu, Z., Kraft, N. & Menzies, T. Finding better active learners for faster literature reviews. Empir. Softw. Eng . 23 , 3161–3186 (2018).

Ouzzani, M., Hammady, H., Fedorowicz, Z. & Elmagarmid, A. Rayyan—a web and mobile app for systematic reviews. Syst. Rev. 5 , 210 (2016).

Przybyła, P. et al. Prioritising references for systematic reviews with RobotAnalyst: a user study. Res. Synth. Methods 9 , 470–488 (2018).

ASReview: Active learning for Systematic Reviews (ASReview Core Development Team, 2020); https://github.com/asreview/asreview

Download references

Acknowledgements

We would like to thank the Utrecht University Library, focus area Applied Data Science, and departments of Information and Technology Services, Test and Quality Services, and Methodology and Statistics, for their support. We also want to thank all researchers who shared data, participated in our user experience tests or who gave us feedback on ASReview in other ways. Furthermore, we would like to thank the editors and reviewers for providing constructive feedback. This project was funded by the Innovation Fund for IT in Research Projects, Utrecht University, the Netherlands.

Author information

Authors and affiliations.

Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, the Netherlands

Rens van de Schoot, Gerbrich Ferdinands, Albert Harkema, Joukje Willemsen, Yongchao Ma, Qixiang Fang, Sybren Hindriks & Daniel L. Oberski

Department of Research and Data Management Services, Information Technology Services, Utrecht University, Utrecht, the Netherlands

Jonathan de Bruin, Raoul Schram, Parisa Zahedi & Maarten Hoogerwerf

Utrecht University Library, Utrecht University, Utrecht, the Netherlands

Jan de Boer, Felix Weijdema & Bianca Kramer

Department of Test and Quality Services, Information Technology Services, Utrecht University, Utrecht, the Netherlands

Martijn Huijts

School of Governance, Faculty of Law, Economics and Governance, Utrecht University, Utrecht, the Netherlands

Lars Tummers

Department of Biostatistics, Data management and Data Science, Julius Center, University Medical Center Utrecht, Utrecht, the Netherlands

Daniel L. Oberski

You can also search for this author in PubMed   Google Scholar

Contributions

R.v.d.S. and D.O. originally designed the project, with later input from L.T. J.d.Br. is the lead engineer, software architect and supervises the code base on GitHub. R.S. coded the algorithms and simulation studies. P.Z. coded the very first version of the software. J.d.Bo., F.W. and B.K. developed the systematic review pipeline. M.Huijts is leading the UX tests and was supported by S.H. M.Hoogerwerf developed the architecture of the produced (meta)data. G.F. conducted the simulation study together with R.S. A.H. performed the literature search comparing the different tools together with G.F. J.W. designed all the artwork and helped with formatting the manuscript. Y.M. and Q.F. are responsible for the preprocessing of the metadata under the supervision of J.d.Br. R.v.d.S, D.O. and L.T. wrote the paper with input from all authors. Each co-author has written parts of the manuscript.

Corresponding author

Correspondence to Rens van de Schoot .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks Jian Wu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information.

Overview of software tools supporting systematic reviews.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

van de Schoot, R., de Bruin, J., Schram, R. et al. An open source machine learning framework for efficient and transparent systematic reviews. Nat Mach Intell 3 , 125–133 (2021). https://doi.org/10.1038/s42256-020-00287-7

Download citation

Received : 04 June 2020

Accepted : 17 December 2020

Published : 01 February 2021

Issue Date : February 2021

DOI : https://doi.org/10.1038/s42256-020-00287-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

A systematic review, meta-analysis, and meta-regression of the prevalence of self-reported disordered eating and associated factors among athletes worldwide.

  • Hadeel A. Ghazzawi
  • Lana S. Nimer
  • Haitham Jahrami

Journal of Eating Disorders (2024)

Systematic review using a spiral approach with machine learning

  • Amirhossein Saeidmehr
  • Piers David Gareth Steel
  • Faramarz F. Samavati

Systematic Reviews (2024)

The spatial patterning of emergency demand for police services: a scoping review

  • Samuel Langton
  • Stijn Ruiter
  • Linda Schoonmade

Crime Science (2024)

Exploring the potential of federated learning in mental health research: a systematic literature review

  • Samar Samir Khalil
  • Noha S. Tawfik
  • Marco Spruit

Applied Intelligence (2024)

Screening Smarter, Not Harder: A Comparative Analysis of Machine Learning Screening Algorithms and Heuristic Stopping Criteria for Systematic Reviews in Educational Research

  • Diego G. Campos
  • Tim Fütterer
  • Ronny Scherer

Educational Psychology Review (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

literature review search algorithm

  • Research article
  • Open access
  • Published: 15 February 2021

Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making

  • Alan Brnabic 1 &
  • Lisa M. Hess   ORCID: orcid.org/0000-0003-3631-3941 2  

BMC Medical Informatics and Decision Making volume  21 , Article number:  54 ( 2021 ) Cite this article

27k Accesses

44 Citations

3 Altmetric

Metrics details

Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making.

This systematic literature review was conducted to identify published observational research of employed machine learning to inform decision making at the patient-provider level. The search strategy was implemented and studies meeting eligibility criteria were evaluated by two independent reviewers. Relevant data related to study design, statistical methods and strengths and limitations were identified; study quality was assessed using a modified version of the Luo checklist.

A total of 34 publications from January 2014 to September 2020 were identified and evaluated for this review. There were diverse methods, statistical packages and approaches used across identified studies. The most common methods included decision tree and random forest approaches. Most studies applied internal validation but only two conducted external validation. Most studies utilized one algorithm, and only eight studies applied multiple machine learning algorithms to the data. Seven items on the Luo checklist failed to be met by more than 50% of published studies.

Conclusions

A wide variety of approaches, algorithms, statistical software, and validation strategies were employed in the application of machine learning methods to inform patient-provider decision making. There is a need to ensure that multiple machine learning approaches are used, the model selection strategy is clearly defined, and both internal and external validation are necessary to be sure that decisions for patient care are being made with the highest quality evidence. Future work should routinely employ ensemble methods incorporating multiple machine learning algorithms.

Peer Review reports

Traditional methods of analyzing large real-world databases (big data) and other observational studies are focused on the outcomes that can inform at the population-based level. The findings from real-world studies are relevant to populations as a whole, but the ability to predict or provide meaningful evidence at the patient level is much less well established due to the complexity with which clinical decision making is made and the variety of factors taken into account by the health care provider [ 1 , 2 ]. Using traditional methods that produce population estimates and measures of variability, it is very challenging to accurately predict how any one patient will perform, even when applying findings from subgroup analyses. The care of patients is nuanced, and multiple non-linear, interconnected factors must be taken into account in decision making. When data are available that are only relevant at the population level, health care decision making is less informed as to the optimal course of care for a given patient.

Clinical prediction models are an approach to utilizing patient-level evidence to help inform healthcare decision makers about patient care. These models are also known as prediction rules or prognostic models and have been used for decades by health care professionals [ 3 ]. Traditionally, these models combine patient demographic, clinical and treatment characteristics in the form of a statistical or mathematical model, usually regression, classification or neural networks, but deal with a limited number of predictor variables (usually below 25). The Framingham Heart Study is a classic example of the use of longitudinal data to build a traditional decision-making model. Multiple risk calculators and estimators have been built to predict a patient’s risk of a variety of cardiovascular outcomes, such as atrial fibrillation and coronary heart disease [ 4 , 5 , 6 ]. In general, these studies use multivariable regression evaluating risk factors identified in the literature. Based on these findings, a scoring system is derived for each factor to predict the likelihood of an adverse outcome based on a patient’s score across all risk factors evaluated.

With the advent of more complex data collection and readily available data sets for patients in routine clinical care, both sample sizes and potential predictor variables (such as genomic data) can exceed the tens of thousands, thus establishing the need for alternative approaches to rapidly process a large amount of information. Artificial intelligence (AI), particularly machine learning methods (a subset of AI), are increasingly being utilized in clinical research for prediction models, pattern recognition and deep-learning techniques used to combine complex information for example genomic and clinical data [ 7 , 8 , 9 ]. In the health care sciences, these methods are applied to replace a human expert to perform tasks that would otherwise take considerable time and expertise, and likely result in potential error. The underlying concept is that a machine will learn by trial and error from the data itself, to make predictions without having a pre-defined set of rules for decision making. Simply, machine learning can simply be better understood as “learning from data.” [ 8 ].

There are two types of learning from the data, unsupervised and supervised. Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labelled responses. The most common unsupervised learning method is cluster analysis, which is used for exploratory data analysis to find hidden patterns or grouping in data. Supervised learning involves making a prediction based on a set of pre-specified input and output variables. There are a number of statistical tools used for supervised learning. Some examples include traditional statistical prediction methods like regression models (e.g. regression splines, projection pursuit regression, penalized regression) that involve fitting a model to data, evaluating the fit and estimating parameters that are later used in a predictive equation. Other tools include tree-based methods (e.g. classification and regression trees [CART] and random forests), which successively partition a data set based on the relationships between predictor variables and a target (outcome) variable. Other examples include neural networks, discriminant functions and linear classifiers, support vector classifiers and machines. Often, predictive tools are built using various forms of model aggregation (or ensemble learning) that may combine models based on resampled or re-weighted data sets. These different types of models can be fitted to the same data using model averaging.

Classical statistical regression methods used for prediction modeling are well understood in the statistical sciences and the scientific community that employs them. These methods tend to be transparent and are usually hypothesis driven but can overlook complex associations with limited flexibility when a high number of variables are investigated. In addition, when using classic regression modeling, choosing the ‘right’ model is not straightforward. Non-traditional machine learning algorithms, and machine learning approaches, may overcome some of these limitations of classical regression models in this new era of big data, but are not a complete solution as they must be considered in the context of the limitations of data used in the analysis [ 2 ].

While machine learning methods can be used for both population-based models as well as for informed patient-provider decision making, it is important to note that the data, model, and outputs used to inform the care of an individual patient must meet the highest standards of research quality, as the choice made will likely have an impact on both the long- and short-term patient outcomes. While a range of uncertainty can be expected for population-based estimates, the risk of error for patient level models must be minimized to ensure quality patient care. The risks and concerns of utilizing machine learning for individual patient decision making have been raised by ethicists [ 10 ]. The risks are not limited to the lack of transparency, limited data regarding the confidence of the findings, and the risk of reducing patient autonomy in choice by relying on data that may foster a more paternalistic model of healthcare. These are all important and valid concerns, and therefore the role of machine learning for patient care must meet the highest standards to ensure that shared, not simply informed, evidence-based decision making be supported by these methods.

A systematic literature review was published in 2018 that evaluated the statistical methods that have been used to enable large, real-world databases to be used at the patient-provider level [ 11 ]. Briefly, this study identified a total of 115 articles that evaluated the use of logistic regression (n = 52, 45.2%), Cox regression (n = 24, 20.9%), and linear regression (n = 17, 14.8%). However, an interesting observation noted several studies utilizing novel statistical approaches such as machine learning, recursive partitioning, and development of mathematical algorithms to predict patient outcomes. More recently, publications are emerging describing the use of Individualized Treatment Recommendation algorithms and Outcome Weighted Learning for personalized medicine using large observational databases [ 12 , 13 ]. Therefore, this systematic literature review was designed to further pursue this observation to more comprehensively evaluate the use of machine learning methods to support patient-provider decision making, and to critically evaluate the strengths and weaknesses of these methods. For the purposes of this work, data supporting patient-provider decision making was defined as that which provided information specifically on a treatment or intervention choice; while both population-based and risk estimator data are certainly valuable for patient care and decision making, this study was designed to evaluate data that would specifically inform a choice for the patient with the provider. The overarching goal is to provide evidence of how large datasets can be used to inform decisions at the patient level using machine learning-based methods, and to evaluate the quality of such work to support informed decision making.

This study originated from a systematic literature review that was conducted in MEDLINE and PsychInfo; a refreshed search was conducted in September 2020 to obtain newer publications (Table 1 ). Eligible studies were those that analyzed prospective or retrospective observational data, reported quantitative results, and described statistical methods specifically applicable to patient-level decision making. Specifically, patient-level decision making referred to studies that provided data for or against a particular intervention at the patient level, so that the data could be used to inform decision making at the patient-provider level. Studies did not meet this criterion if only a population-based estimates, mortality risk predictors, or satisfaction with care were evaluated. Additionally, studies designed to improve diagnostic tools and those evaluating health care system quality indicators did not meet the patient-provider decision-making criterion. Eligible statistical methods for this study were limited to machine learning-based approaches. Eligibility was assessed by two reviewers and any discrepancies were discussed; a third reviewer was available to serve as a tie breaker in case of different opinions. The final set of eligible publications were then abstracted into a Microsoft Excel document. Study quality was evaluated using a modified Luo scale, which was developed specifically as a tool to standardize high-quality publication of machine learning models [ 14 ]. A modified version of this tool was utilized for this study; specifically, the optional item were removed, and three terms were clarified: item 6 (define the prediction problem) was redefined as “define the model,” item 7 (prepare data for model building) was renamed “model building and validation,” and item 8 (build the predictive model) was renamed “model selection” to more succinctly state what was being evaluated under each criterion. Data were abstracted and both extracted data and the Luo checklist items were reviewed and verified by a second reviewer to ensure data comprehensiveness and quality. In all cases of differences in eligibility assessment or data entry, the reviewers met and ensured agreement with the final set of data to be included in the database for data synthesis, with a third reviewer utilized as a tie breaker in case of discrepancies. Data were summarized descriptively and qualitatively, based on the following categories: publication and study characteristics; patient characteristics; statistical methodologies used, including statistical software packages; strengths and weaknesses; and interpretation of findings.

The search strategy was run on September 1, 2020 and identified a total of 34 publications that utilized machine learning methods for individual patient-level decision making (Fig.  1 ). The most common reason for study exclusion, as expected, was due to the study not meeting the patient-level decision making criterion. A summary of the characteristics of eligible studies and the patient data are included in Table 2 . Most of the real-world data sources included retrospective databases or designs (n = 27, 79.4%), primarily utilizing electronic health records. Six analyses utilized prospective cohort studies and one utilized data from a cross sectional study.

figure 1

Prisma diagram of screening and study identification

General approaches to machine learning

The types of classification or prediction machine learning algorithms are reported in Table 2 . These included decision tree/random forest analyses (19 studies) [ 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 ] and neural networks (19 studies) [ 24 , 25 , 26 , 27 , 28 , 29 , 30 , 32 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 ]. Other approaches included latent growth mixture modeling [ 45 ], support vector machine classifiers [ 46 ], LASSO regression [ 47 ], boosting methods [ 23 ], and a novel Bayesian approach [ 26 , 40 , 48 ]. Within the analytical approaches to support machine learning, a variety of methods were used to evaluate model fit, such as Akaike Information Criterion, Bayesian Information Criterion, and the Lo-Mendel-Rubin likelihood ratio test [ 22 , 45 , 47 ], and while most studies included the area under the curve (AUC) of receiver-operator characteristic (ROC) curves (Table 3 ), analyses also included sensitivity/specificity [ 16 , 19 , 24 , 30 , 41 , 42 , 43 ], positive predictive value [ 21 , 26 , 32 , 38 , 40 , 41 , 42 , 43 ], and a variety of less common approaches such as the geometric mean [ 16 ], use of the Matthews correlation coefficient (ranges from -1.0, completely erroneous information, to + 1.0, perfect prediction) [ 46 ], defining true/false negatives/positives by means of a confusion matrix [ 17 ], calculating the root mean square error of the predicted versus original outcome profiles [ 37 ], or identifying the model with the best average performance training and performance cross validation [ 36 ].

Statistical software packages

The statistical programs used to perform machine learning varied widely across these studies, no consistencies were observed (Table 2 ). As noted above, one study using decision tree analysis used Quinlan’s C5.0 decision tree algorithm [ 15 ] while a second used an earlier version of this program (C4.5) [ 20 ]. Other decision tree analyses utilized various versions of R [ 18 , 19 , 22 , 24 , 27 , 47 ], International Business Machines (IBM) Statistical Package for the Social Sciences (SPSS) [ 16 , 17 , 33 , 47 ], the Azure Machine Learning Platform [ 30 ], or programmed the model using Python [ 23 , 25 , 46 ]. Artificial neural network analyses used Neural Designer [ 34 ] or Statistica V10 [ 35 ]. Six studies did not report the software used for analysis [ 21 , 31 , 32 , 37 , 41 , 42 ].

Families of machine learning algorithms

Also as summarized in Table 2 , more than one third of all publications (n = 13, 38.2%) applied only one family of machine learning algorithm to model development [ 16 , 17 , 18 , 19 , 20 , 34 , 37 , 41 , 42 , 43 , 46 , 48 ]; and only four studies utilized five or more methods [ 23 , 25 , 28 , 45 ]. One applied an ensemble of six different algorithms and the software was set to run 200 iterations [ 23 ], and another ran seven algorithms [ 45 ].

Internal and external validation

Evaluation of study publication quality identified the most common gap in publications as the lack of external validation, which was conducted by only two studies [ 15 , 20 ]. Seven studies predefined the success criteria for model performance [ 20 , 21 , 23 , 35 , 36 , 46 , 47 ], and five studies discussed the generalizability of the model [ 20 , 23 , 34 , 45 , 48 ]. Six studies [ 17 , 18 , 21 , 22 , 35 , 36 ] discussed the balance between model accuracy and model simplicity or interpretability, which was also a criterion of quality publication in the Luo scale [ 14 ]. The items on the checklist that were least frequently met are presented in Fig.  2 . The complete quality assessment evaluation for each item in the checklist is included in Additional file 1 : Table S1.

figure 2

Least frequently met study quality items, modified Luo Scale [ 14 ]

There were a variety of approaches taken to validate the models developed (Table 3 ). Internal validation with splitting into a testing and validation dataset was performed in all studies. The cohort splitting approach was conducted in multiple ways, using a 2:1 split [ 26 ], 60/40 split [ 21 , 36 ], a 70/30 split [ 16 , 17 , 22 , 30 , 33 , 35 ], 75/25 split [ 27 , 40 ], 80/20 split [ 46 ], 90/10 split [ 25 , 29 ], splitting the data based on site of care [ 48 ], a 2/1/1 split for training, testing and validation [ 38 ], and splitting 60/20/20, where the third group was selected for model selection purposes prior to validation [ 34 ]. Nine studies did not specifically mention the form of splitting approach used [ 15 , 18 , 19 , 20 , 24 , 29 , 39 , 45 , 47 ], but most of those noted the use of k fold cross validation. One training set corresponded to 90% of the sample [ 23 ], whereas a second study was less clear, as input data were at the observation level with multiple observations per patient, and 3 of the 15 patients were included in the training set [ 37 ]. The remaining studies did not specifically state splitting the data into testing and validation samples, but most specified they performed five-fold cross validation (including one that generally mentioned cohort splitting) [ 18 , 45 ] or ten-fold cross validation strategies [ 15 , 19 , 20 , 28 ].

External validation was conducted by only two studies (5.9%). Hische and colleagues conducted a decision tree analysis, which was designed to identify patients with impaired fasting glucose [ 20 ]. Their model was developed in a cohort study of patients from the Berlin Potsdam Cohort Study (n = 1527) and was found to have a positive predictive value of 56.2% and a negative predictive value of 89.1%. The model was then tested on an independent from the Dresden Cohort (n = 1998) with a family history of type II diabetes. In external validation, positive predictive value was 43.9% and negative predictive value was 90.4% [ 20 ]. Toussi and colleagues conducted both internal and external validation in their decision tree analysis to evaluate individual physician prescribing behaviors using a database of 463 patient electronic medical records [ 15 ]. For the internal validation step, the cross-validation option was used from Quinlan’s C5.0 decision tree learning algorithm as their study sample was too small to split into a testing and validation sample, and external validation was conducted by comparing outcomes to published treatment guidelines. Unfortunately, they found little concordance between physician behavior and guidelines potentially due to the timing of the data not matching the time period in which guidelines were implemented, emphasizing the need for a contemporaneous external control [ 15 ].

Handling of missing values

Missing values were addressed in most studies (n = 21, 61.8%) in this review, but there were thirteen remaining studies that did not mention if there were missing data or how they were handled (Table 3 ). For those that reported methods related to missing data, there were a wide variety of approaches used in real-world datasets. The full information maximum likelihood method was used for estimating model parameters in the presence of missing data for the development of the model by Hertroijs and colleagues, but patients with missing covariate values at baseline were excluded from the validation of the model [ 45 ]. Missing covariate values were included in models as a discrete category [ 48 ]. Four studies removed patients from the model with missing data [ 46 ], resulting in the loss of 16%-41% of samples in three studies [ 17 , 36 , 47 ]. Missing data from primary outcome variables were reported among with 59% (men) and 70% (women) within a study of diabetes [ 16 ]. In this study, single imputation was used; for continuous variables CART (IBM SPSS modeler V14.2.03) and for categorical variables the authors used the weighted K-Nearest Neighbor approach using RapidMiner (V.5) [ 16 ]. Other studies reported exclusion but not specifically the impact on sample size [ 29 , 31 , 38 , 44 ]. Imputation was conducted in a variety of ways for studies with missing data [ 22 , 25 , 28 , 33 ]. Single imputation was used in the study by Bannister and colleagues, but followed by multiple imputation in the final model to evaluate differences in model parameters [ 22 ]. One study imputed with a standard last-imputation-forward approach [ 26 ]. Spline techniques were used to impute missing data in the training set of one study [ 37 ]. Missingness was largely retained as an informative variable, and only variables missing for 85% or more of participants were excluded by Alaa et al. [ 23 ] while Hearn et al. used a combination of imputation and exclusion strategies [ 40 ]. Lastly, missing or incomplete data were imputed using a model-based approach by Toussi et al. [ 15 ] and using an optimal-impute algorithm by Bertsimas et al. [ 21 ].

Strengths and weaknesses noted by authors

Publications summarized the strengths and weaknesses of the machine learning methods employed. Low complexity and simplicity of machine-based learning models were noted as strengths of this approach [ 15 , 20 ]. Machine learning approaches were both powerful and efficient methods to apply to large datasets [ 19 ]. It was noted that parameters in this study that were significant at the patient level were included, even if at the broader population-based level using traditional regression analysis model development they would have not been significant and therefore would have been otherwise excluded using traditional approaches [ 34 ]. One publication noted the value of machine learning being highly dependent on the model selection strategy and parameter optimization, and that machine learning in and of itself will not provide better estimates unless these steps are conducted properly [ 23 ].

Even when properly planned, machine learning approaches are not without issues that deserve attention in future studies that employ these techniques. Within the eligible publications, weaknesses included overfitting the model with the inclusion of too much detail [ 15 ]. Additional limitations are based on the data sources used for machine learning, such as the lack of availability of all desired variables and missing data that can affect the development and performance of these models [ 16 , 34 , 36 , 48 ]. The lack of all relevant variables was noted as a particular concern for retrospective database studies, where the investigator is limited to what has been recorded [ 26 , 28 , 29 , 38 , 40 ]. Importantly and as observed in the studies included in this review, the lack of external validation was stated as a limitation of studies included in this review [ 28 , 30 , 38 , 42 ].

Limitations can also be on the part of the research team, as the need for both clinical and statistical expertise in the development and execution of studies using machine learning-based methodology, and users are warned against applying these methods blindly [ 22 ]. The importance of the role of clinical and statistical experts in the research team was noted in one study and highlighted as a strength of their work [ 21 ].

This study systematically reviewed and summarized the methods and approaches used for machine learning as applied to observational datasets that can inform patient-provider decision making. Machine learning methods have been applied much more broadly across observational studies than in the context of individual decision making, so the summary of this work does not necessarily apply to all machine learning-based studies. The focus of this work is on an area that remains largely unexplored, which is how to use large datasets in a manner that can inform and improve patient care in a way that supports shared decision making with reliable evidence that is applicable to the individual patient. Multiple publications cite the limitations of using population-based estimates for individual decisions [ 49 , 50 , 51 ]. Specifically, a summary statistic at the population level does not apply to each person in that cohort. Population estimates represent a point on a potentially wide distribution, and any one patient could fall anywhere within that distribution and be far from the point estimate value. On the other extreme, case reports or case series provide very specific individual-level data, but are not generalizable to other patients [ 52 ]. This review and summary provides guidance and suggestions of best practices to improve and hopefully increase the use of these methods to provide data and models to inform patient-provider decision making.

It was common for single modeling strategies to be employed within the identified publications. It has long been known that single algorithms to estimation can produce a fair amount of uncertainty and variability [ 53 ]. To overcome this limitation, there is a need for multiple algorithms and multiple iterations of the models to be performed. This, combined with more powerful analytics in recent years, provides a new standard for machine learning algorithm choice and development. While in some cases, a single model may fit the data well and provide an accurate answer, the certainty of the model can be supported through novel approaches, such as model averaging [ 54 ]. Few studies in this review combined multiple families of modeling strategies along with multiple iterations of the models. This should become a best practice in the future and is recommended as an additional criterion to assess study quality among machine learning-based modeling [ 54 ].

External validation is critical to ensure model accuracy, but was rarely conducted in the publications included in this review. The reasons for this could be many, such as lack of appropriate datasets or due to the lack of awareness of the importance of external validation [ 55 ]. As model development using machine learning increases, there is a need for external validation prior to application of models in any patient-provider setting. The generalizability of models is largely unknown without these data. Publications that did not conduct external validation also did not note the need for this to be completed, as generalizability was discussed in only five studies, one of which had also conducted the external validation. Of the remaining four studies, the role of generalizability was noted in terms of the need for future external validation in only one study [ 48 ]. Other reviews that were more broadly conducted to evaluate machine learning methods similarly found a low rate of external validation (6.6% versus 5.9% in this study) [ 56 ]. It was shown that there was lower prediction accuracy by external validation than simply by cross validation alone. The current review, with a focus on machine learning to support decision making at a practical level, suggests external validation is an important gap that should be filled prior to using these models for patient-provider decision making.

Luo and others suggest that k -fold validation may be used with proper stratification of the response variable as part of the model selection strategy [ 14 , 55 ]. The studies identified in this review generally conducted 5- or tenfold validation. There is no formal rule for the selection for the value of k , which is typically based on the size of the dataset; as k increases, bias will be reduced, but in turn variance will increase. While the tradeoff has to be accounted for, k  = 5–10 has been found to be reasonable for most study purposes [ 57 ].

The evidence from identified publications suggests that the ethical concerns of lack of transparency and failure to report confidence in the findings are largely warranted. These limitations can be addressed through the use of multiple modeling approaches (to clarify the ‘black box’ nature of these approaches) and by including both external and high k-fold validation (to demonstrate the confidence in findings). To ensure these methods are used in a manner that improves patient care, the expectations of population-based risk prediction models of the past are no longer sufficient. It is essential that the right data, the right set of models, and appropriate validation are employed to ensure that the resulting data meet standards for high quality patient care.

This study did not evaluate the quality of the underlying real-world data used to develop, test or validate the algorithms. While not directly part of the evaluation in this review, researchers should be aware that all limitations of real-world data sources apply regardless of the methodology employed. However, when observational datasets are used for machine learning-based research, the investigator should be aware of the extent to which the methods they are using depend on the data structure and availability, and should evaluate a proposed data source to ensure it is appropriate for the machine learning project [ 45 ]. Importantly, databases should be evaluated to fully understand the variables included, as well as those variables that may have prognostic or predictive value, but may not be included in the dataset. The lack of important variables remains a concern with the use of retrospective databases for machine learning. The concerns with confounding (particularly unmeasured confounding), bias (including immortal time bias), and patient selection criteria to be in the database must also be evaluated [ 58 , 59 ]. These are factors that should be considered prior to implementing these methods, and not always at the forefront of consideration when applying machine learning approaches. The Luo checklist is a valuable tool to ensure that any machine-learning study meets high research standards for patient care, and importantly includes the evaluation of missing or potentially incorrect data (i.e. outliers) and generalizability [ 14 ]. This should be supplemented by a thorough evaluation of the potential data to inform the modeling work prior to its implementation, and ensuring that multiple modeling methods are applied.

This review found a wide variety of approaches, methods, statistical software and validation strategies that were employed in the application of machine learning methods to inform patient-provider decision making. Based on these findings, there is a need to ensure that multiple modeling approaches are employed in the development of machine learning-based models for patient care, which requires the highest research standards to reliably support shared evidence-based decision making. Models should be evaluated with clear criteria for model selection, and both internal and external validation are needed prior to applying these models to inform patient care. Few studies have yet to reach that bar of evidence to inform patient-provider decision making.

Availability of data and materials

All data generated or analyzed during this study are included in this published article and its supplementary information files.

Abbreviations

Artificial intelligence

Area under the curve

Classification and regression trees

Logistic least absolute shrinkage and selector operator

Steyerberg EW, Claggett B. Towards personalized therapy for multiple sclerosis: limitations of observational data. Brain. 2018;141(5):e38-e.

Fröhlich H, Balling R, Beerenwinkel N, Kohlbacher O, Kumar S, Lengauer T, et al. From hype to reality: data science enabling personalized medicine. BMC Med. 2018;16(1):150.

Article   PubMed   PubMed Central   Google Scholar  

Steyerberg EW. Clinical prediction models. Berlin: Springer; 2019.

Book   Google Scholar  

Schnabel RB, Sullivan LM, Levy D, Pencina MJ, Massaro JM, D’Agostino RB Sr, et al. Development of a risk score for atrial fibrillation (Framingham Heart Study): a community-based cohort study. Lancet. 2009;373(9665):739–45.

D’Agostino RB, Wolf PA, Belanger AJ, Kannel WB. Stroke risk profile: adjustment for antihypertensive medication. Framingham Study Stroke. 1994;25(1):40–3.

Article   CAS   PubMed   Google Scholar  

Framingham Heart Study: Risk Functions 2020. https://www.framinghamheartstudy.org/ .

Gawehn E, Hiss JA, Schneider G. Deep learning in drug discovery. Mol Inf. 2016;35:3–14.

Article   CAS   Google Scholar  

Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, et al. Applications of machine learning in drug discovery and development. Nat Rev Drug Discov. 2019;18(6):463–77.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Marcus G. Deep learning: A critical appraisal. arXiv preprint arXiv:180100631. 2018.

Grote T, Berens P. On the ethics of algorithmic decision-making in healthcare. J Med Ethics. 2020;46(3):205–11.

Article   PubMed   Google Scholar  

Brnabic A, Hess L, Carter GC, Robinson R, Araujo A, Swindle R. Methods used for the applicability of real-world data sources to individual patient decision making. Value Health. 2018;21:S102.

Article   Google Scholar  

Fu H, Zhou J, Faries DE. Estimating optimal treatment regimes via subgroup identification in randomized control trials and observational studies. Stat Med. 2016;35(19):3285–302.

Liang M, Ye T, Fu H. Estimating individualized optimal combination therapies through outcome weighted deep learning algorithms. Stat Med. 2018;37(27):3869–86.

Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, et al. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res. 2016;18(12):e323.

Toussi M, Lamy J-B, Le Toumelin P, Venot A. Using data mining techniques to explore physicians’ therapeutic decisions when clinical guidelines do not provide recommendations: methods and example for type 2 diabetes. BMC Med Inform Decis Mak. 2009;9(1):28.

Ramezankhani A, Hadavandi E, Pournik O, Shahrabi J, Azizi F, Hadaegh F. Decision tree-based modelling for identification of potential interactions between type 2 diabetes risk factors: a decade follow-up in a Middle East prospective cohort study. BMJ Open. 2016;6(12):e013336.

Pei D, Zhang C, Quan Y, Guo Q. Identification of potential type II diabetes in a Chinese population with a sensitive decision tree approach. J Diabetes Res. 2019;2019:4248218.

Neefjes EC, van der Vorst MJ, Verdegaal BA, Beekman AT, Berkhof J, Verheul HM. Identification of patients with cancer with a high risk to develop delirium. Cancer Med. 2017;6(8):1861–70.

Mubeen AM, Asaei A, Bachman AH, Sidtis JJ, Ardekani BA, Initiative AsDN. A six-month longitudinal evaluation significantly improves accuracy of predicting incipient Alzheimer’s disease in mild cognitive impairment. J Neuroradiol. 2017;44(6):381–7.

Hische M, Luis-Dominguez O, Pfeiffer AF, Schwarz PE, Selbig J, Spranger J. Decision trees as a simple-to-use and reliable tool to identify individuals with impaired glucose metabolism or type 2 diabetes mellitus. Eur J Endocrinol. 2010;163(4):565.

Bertsimas D, Dunn J, Pawlowski C, Silberholz J, Weinstein A, Zhuo YD, et al. Applied informatics decision support tool for mortality predictions in patients with cancer. JCO Clin Cancer Inform. 2018;2:1–11.

Bannister CA, Halcox JP, Currie CJ, Preece A, Spasic I. A genetic programming approach to development of clinical prediction models: a case study in symptomatic cardiovascular disease. PLoS ONE. 2018;13(9):e0202685.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Alaa AM, Bolton T, Di Angelantonio E, Rudd JHF, van der Schaar M. Cardiovascular disease risk prediction using automated machine learning: a prospective study of 423,604 UK Biobank participants. PLoS ONE. 2019;14(5):e0213653.

Baxter SL, Marks C, Kuo TT, Ohno-Machado L, Weinreb RN. Machine learning-based predictive modeling of surgical intervention in glaucoma using systemic data from electronic health records. Am J Ophthalmol. 2019;208:30–40.

Dong Y, Xu L, Fan Y, Xiang P, Gao X, Chen Y, et al. A novel surgical predictive model for Chinese Crohn’s disease patients. Medicine (Baltimore). 2019;98(46):e17510.

Hill NR, Ayoubkhani D, McEwan P, Sugrue DM, Farooqui U, Lister S, et al. Predicting atrial fibrillation in primary care using machine learning. PLoS ONE. 2019;14(11):e0224582.

Kang AR, Lee J, Jung W, Lee M, Park SY, Woo J, et al. Development of a prediction model for hypotension after induction of anesthesia using machine learning. PLoS ONE. 2020;15(4):e0231172.

Karhade AV, Ogink PT, Thio Q, Cha TD, Gormley WB, Hershman SH, et al. Development of machine learning algorithms for prediction of prolonged opioid prescription after surgery for lumbar disc herniation. Spine J. 2019;19(11):1764–71.

Kebede M, Zegeye DT, Zeleke BM. Predicting CD4 count changes among patients on antiretroviral treatment: Application of data mining techniques. Comput Methods Programs Biomed. 2017;152:149–57.

Kim I, Choi HJ, Ryu JM, Lee SK, Yu JH, Kim SW, et al. A predictive model for high/low risk group according to oncotype DX recurrence score using machine learning. Eur J Surg Oncol. 2019;45(2):134–40.

Kwon JM, Jeon KH, Kim HM, Kim MJ, Lim S, Kim KH, et al. Deep-learning-based out-of-hospital cardiac arrest prognostic system to predict clinical outcomes. Resuscitation. 2019;139:84–91.

Kwon JM, Lee Y, Lee Y, Lee S, Park J. An algorithm based on deep learning for predicting in-hospital cardiac arrest. J Am Heart Assoc. 2018;7(13):26.

Scheer JK, Smith JS, Schwab F, Lafage V, Shaffrey CI, Bess S, et al. Development of a preoperative predictive model for major complications following adult spinal deformity surgery. J Neurosurg Spine. 2017;26(6):736–43.

Lopez-de-Andres A, Hernandez-Barrera V, Lopez R, Martin-Junco P, Jimenez-Trujillo I, Alvaro-Meca A, et al. Predictors of in-hospital mortality following major lower extremity amputations in type 2 diabetic patients using artificial neural networks. BMC Med Res Methodol. 2016;16(1):160.

Rau H-H, Hsu C-Y, Lin Y-A, Atique S, Fuad A, Wei L-M, et al. Development of a web-based liver cancer prediction model for type II diabetes patients by using an artificial neural network. Comput Methods Programs Biomed. 2016;125:58–65.

Ng T, Chew L, Yap CW. A clinical decision support tool to predict survival in cancer patients beyond 120 days after palliative chemotherapy. J Palliat Med. 2012;15(8):863–9.

Pérez-Gandía C, Facchinetti A, Sparacino G, Cobelli C, Gómez E, Rigla M, et al. Artificial neural network algorithm for online glucose prediction from continuous glucose monitoring. Diabetes Technol Therapeut. 2010;12(1):81–8.

Azimi P, Mohammadi HR, Benzel EC, Shahzadi S, Azhari S. Use of artificial neural networks to decision making in patients with lumbar spinal canal stenosis. J Neurosurg Sci. 2017;61(6):603–11.

Bowman A, Rudolfer S, Weller P, Bland JDP. A prognostic model for the patient-reported outcome of surgical treatment of carpal tunnel syndrome. Muscle Nerve. 2018;58(6):784–9.

Hearn J, Ross HJ, Mueller B, Fan CP, Crowdy E, Duhamel J, et al. Neural networks for prognostication of patients with heart failure. Circ. 2018;11(8):e005193.

Google Scholar  

Isma’eel HA, Cremer PC, Khalaf S, Almedawar MM, Elhajj IH, Sakr GE, et al. Artificial neural network modeling enhances risk stratification and can reduce downstream testing for patients with suspected acute coronary syndromes, negative cardiac biomarkers, and normal ECGs. Int J Cardiovasc Imaging. 2016;32(4):687–96.

Isma’eel HA, Sakr GE, Serhan M, Lamaa N, Hakim A, Cremer PC, et al. Artificial neural network-based model enhances risk stratification and reduces non-invasive cardiac stress imaging compared to Diamond-Forrester and Morise risk assessment models: a prospective study. J Nucl Cardiol. 2018;25(5):1601–9.

Jovanovic P, Salkic NN, Zerem E. Artificial neural network predicts the need for therapeutic ERCP in patients with suspected choledocholithiasis. Gastrointest Endosc. 2014;80(2):260–8.

Zhou HF, Huang M, Ji JS, Zhu HD, Lu J, Guo JH, et al. Risk prediction for early biliary infection after percutaneous transhepatic biliary stent placement in malignant biliary obstruction. J Vasc Interv Radiol. 2019;30(8):1233-41.e1.

Hertroijs DF, Elissen AM, Brouwers MC, Schaper NC, Köhler S, Popa MC, et al. A risk score including body mass index, glycated haemoglobin and triglycerides predicts future glycaemic control in people with type 2 diabetes. Diabetes Obes Metab. 2018;20(3):681–8.

Oviedo S, Contreras I, Quiros C, Gimenez M, Conget I, Vehi J. Risk-based postprandial hypoglycemia forecasting using supervised learning. Int J Med Inf. 2019;126:1–8.

Khanji C, Lalonde L, Bareil C, Lussier MT, Perreault S, Schnitzer ME. Lasso regression for the prediction of intermediate outcomes related to cardiovascular disease prevention using the TRANSIT quality indicators. Med Care. 2019;57(1):63–72.

Anderson JP, Parikh JR, Shenfeld DK, Ivanov V, Marks C, Church BW, et al. Reverse engineering and evaluation of prediction models for progression to type 2 diabetes: an application of machine learning using electronic health records. J Diabetes Sci Technol. 2016;10(1):6–18.

Patsopoulos NA. A pragmatic view on pragmatic trials. Dialogues Clin Neurosci. 2011;13(2):217–24.

Lu CY. Observational studies: a review of study designs, challenges and strategies to reduce confounding. Int J Clin Pract. 2009;63(5):691–7.

Morgenstern H. Ecologic studies in epidemiology: concepts, principles, and methods. Annu Rev Public Health. 1995;16(1):61–81.

Vandenbroucke JP. In defense of case reports and case series. Ann Intern Med. 2001;134(4):330–4.

Buckland ST, Burnham KP, Augustin NH. Model selection: an integral part of inference. Biometrics. 1997;53:603–18.

Zagar A, Kadziola Z, Lipkovich I, Madigan D, Faries D. Evaluating bias control strategies in observational studies using frequentist model averaging 2020 (submitted).

Kang J, Schwartz R, Flickinger J, Beriwal S. Machine learning approaches for predicting radiation therapy outcomes: a clinician’s perspective. Int J Radiat Oncol Biol Phys. 2015;93(5):1127–35.

Scott IM, Lin W, Liakata M, Wood J, Vermeer CP, Allaway D, et al. Merits of random forests emerge in evaluation of chemometric classifiers by external validation. Anal Chim Acta. 2013;801:22–33.

Kuhn M, Johnson K. Applied predictive modeling. Berlin: Springer; 2013.

Hess L, Winfree K, Muehlenbein C, Zhu Y, Oton A, Princic N. Debunking Myths While Understanding Limitations. Am J Public Health. 2020;110(5):E2-E.

Thesmar D, Sraer D, Pinheiro L, Dadson N, Veliche R, Greenberg P. Combining the power of artificial intelligence with the richness of healthcare claims data: Opportunities and challenges. PharmacoEconomics. 2019;37(6):745–52.

Download references

Acknowledgements

Not applicable.

No funding was received for the conduct of this study.

Author information

Authors and affiliations.

Eli Lilly and Company, Sydney, NSW, Australia

Alan Brnabic

Eli Lilly and Company, Indianapolis, IN, USA

Lisa M. Hess

You can also search for this author in PubMed   Google Scholar

Contributions

AB and LMH contributed to the design, implementation, analysis and interpretation of the data included in this study. AB and LMH wrote, revised and finalized the manuscript for submission. AB and LMH have both read and approved the final manuscript.

Corresponding author

Correspondence to Lisa M. Hess .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

Authors are employees of Eli Lilly and Company and receive salary support in that role.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

Table S1. Study quality of eligible publications, modified Luo scale [14].

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Brnabic, A., Hess, L.M. Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making. BMC Med Inform Decis Mak 21 , 54 (2021). https://doi.org/10.1186/s12911-021-01403-2

Download citation

Received : 07 July 2020

Accepted : 20 January 2021

Published : 15 February 2021

DOI : https://doi.org/10.1186/s12911-021-01403-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Decision making
  • Decision tree
  • Random forest
  • Automated neural network

BMC Medical Informatics and Decision Making

ISSN: 1472-6947

literature review search algorithm

  • Open access
  • Published: 15 January 2022

Automation of literature screening using machine learning in medical evidence synthesis: a diagnostic test accuracy systematic review protocol

  • Yuelun Zhang 1   na1 ,
  • Siyu Liang 2   na1 ,
  • Yunying Feng 3   na1 ,
  • Qing Wang 4 ,
  • Feng Sun 5 ,
  • Shi Chen 2 ,
  • Yiying Yang 3 ,
  • Huijuan Zhu 2 &
  • Hui Pan 2  

Systematic Reviews volume  11 , Article number:  11 ( 2022 ) Cite this article

7836 Accesses

14 Citations

13 Altmetric

Metrics details

Systematic review is an indispensable tool for optimal evidence collection and evaluation in evidence-based medicine. However, the explosive increase of the original literatures makes it difficult to accomplish critical appraisal and regular update. Artificial intelligence (AI) algorithms have been applied to automate the literature screening procedure in medical systematic reviews. In these studies, different algorithms were used and results with great variance were reported. It is therefore imperative to systematically review and analyse the developed automatic methods for literature screening and their effectiveness reported in current studies.

An electronic search will be conducted using PubMed, Embase, ACM Digital Library, and IEEE Xplore Digital Library databases, as well as literatures found through supplementary search in Google scholar, on automatic methods for literature screening in systematic reviews. Two reviewers will independently conduct the primary screening of the articles and data extraction, in which nonconformities will be solved by discussion with a methodologist. Data will be extracted from eligible studies, including the basic characteristics of study, the information of training set and validation set, and the function and performance of AI algorithms, and summarised in a table. The risk of bias and applicability of the eligible studies will be assessed by the two reviewers independently based on Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2). Quantitative analyses, if appropriate, will also be performed.

Automating systematic review process is of great help in reducing workload in evidence-based practice. Results from this systematic review will provide essential summary of the current development of AI algorithms for automatic literature screening in medical evidence synthesis and help to inspire further studies in this field.

Systematic review registration

PROSPERO CRD42020170815 (28 April 2020).

Peer Review reports

Systematic reviews synthesise the results of multiple original publications to provide clinicians with comprehensive knowledge and current optimal evidence in answering certain research questions. The major steps of a systematic review are defining a structured review question, developing inclusion criteria, searching in the databases, screening for relevant studies, collecting data from relevant studies, assessing the risk of bias critically, undertaking meta-analyses where appropriate, and assessing reporting biases [ 1 , 2 , 3 ]. A systematic review aims to provide a complete, exhaustive summary of current literature relevant to a research question with an objective and transparent approach. In the light of these characteristics, systematic reviews, in particular those combining high quality evidence, which used to be at the very top of the medical evidence pyramid [ 4 ] and now become regarded as an indispensable tool for evidence viewing [ 5 ], are widely used by reviewers in the practice of evidence-based medicine.

However, conducting systematic reviews for clinical decision making is time-consuming and labour-intensive, as the reviewers are supposed to perform a thorough search to identify any literatures that may be relevant, read through all abstracts of retrieved literatures, and identify the potential candidates for further full-text screening [ 6 ]. For original researches, the median time from the publication to their first inclusion in a systematic review ranged from 2.5 to 6.5 years [ 7 ]. It usually takes over a year to publish a systematic review from the time of literature search [ 8 ]. However, with advances in clinical research, this evidence and systematic review conclusions it generates may be out of date within several years. With the explosive increase of original research articles, reviewers have found difficulty identifying most relevant evidence in time, let alone updating systematic reviews periodically [ 9 ]. Therefore, researchers are exploring automatic methods to improve the efficacy of evidence synthesis while reducing the workload of systematic reviews.

Recent progresses in computer science show a promising future that more intelligent works can be accomplished with the aid of automatic technologies, such as pattern recognition and machine learning (ML). Being seen as a subset of artificial intelligence (AI), ML utilises algorithms to build mathematical models based on training data in order to make predictions or decisions without being explicitly programmed [ 10 ]. Various ML studies have been introduced in the medical field, such as diagnosis, prognosis, genetic analysis, and drug screening, to support clinical decision making [ 11 , 12 , 13 , 14 ]. When it comes to automatic methods for systematic reviews, models for automatic literature screening have been explored to reduce repetitive work and save time for reviewers [ 15 , 16 ].

To date, limited research has been focused on automatic methods used for biomedical literature screening in systematic review process. Automated literature classification systems [ 17 ] or hybrid relevance rating models [ 18 ] were tested in specific datasets, yet further extension of review datasets and performance improvement are required. To address this gap in knowledge, this article describes the protocol for a systematic review aiming at summarising existing automatic methods to screen relevant biomedical literature in the systematic review process, and evaluating the accuracy of the AI tools.

The primary objective of this review is to assess the diagnostic accuracy of AI algorithms (index test) compared with gold-standard human investigators (reference standard) for screening relevant literatures from original literatures identified by electronic search in systematic review. The secondary objective of this review is to describe the time and work saved by AI algorithms in literature screening. Additionally, we plan to conduct subgroup analyses to explore the potential factors that associate with the accuracy of AI algorithms.

Study registration

We prepared this protocol following the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) [ 19 ]. This systematic review has been registered on PROSPERO (Registration number: CRD42020170815, 28 April 2020).

Review question

Our review question was refined using PRISMA-DTA framework, as detailed in Table 1 . In this systematic review, “literatures” refer to the subjects of the diagnostic test (the “participants” in Table 1 ), and “studies” refer to the studies included in our review.

Inclusion and exclusion criteria

We will include studies in medical research that reported a structured study question, described the source of the training or validation sets, developed or employed AI models for automatic literature screening, and used the screening results from human investigators as the reference standard.

We will exclude traditional clinical studies in human participants, editorials, commentaries, or other non-original reports. Pure methodological studies in AI algorithms without application in evidence synthesis will be excluded as well.

Information source and search strategy

An experienced methodologist will conduct searches in major public electronic medical and computer science databases, including PubMed, Embase, ACM Digital Library, and IEEE Xplore Digital Library, for publications ranged from January 2000 to present. We set this time range because to the best of our knowledge, AI algorithms prior to 2000 are unlikely to be applicable in evidence synthesis [ 20 ]. In addition to the literature search, we will also find more relevant studies through checking the reference lists of included studies identified by electronic search. Related abstracts and preprints will be searched in Google scholar. There are no language restrictions in searches. We will use free text words, MeSH/EMTREE terms, IEEE Terms, INSPEC Terms, and ACM Computing Classification System to develop strategies related to three major concepts: systematic review, literature screening, and AI. Multiple synonyms for each concept will be incorporated into the search. The Systematic Review Toolbox ( http://systematicreviewtools.com/ ) will also be utilised to detect potential automation methods in medical research evidence synthesis. Detailed search strategy used in PubMed is shown in Supplementary Material 1.

Study selection

Literatures with titles and abstracts from online electronic databases will be downloaded and imported into EndNote X9.3.2 software (Thomson Reuters, Toronto, Ontario, Canada) for further process after removing duplications.

All studies will be screened independently by 2 authors based on the titles and abstracts. Those which do not meet the inclusion criteria will be excluded with specific reasons. Disagreements will be solved by discussion with a methodologist if necessary. After the initial screening, the full texts of the potentially relevant studies will be independently reviewed by the two authors to make decisions on final inclusions. Conflicts will be resolved in the same way as they were initially screened. Excluded studies will be listed and noted according to PRISMA-DTA flowchart.

Data collection

A data collection form will be used for information extraction. Data from the eligible studies will be independently extracted and verified by two investigators. Disagreements will be resolved through discussion and consultation with the original publication. We will also try to contact the authors to collect the missing data. If one study did not report detailed accuracy data or did not provide enough data that are essential to calculate the accuracy data, this study will be omitted from the quantitative data synthesis.

The following data will be extracted from the original studies: characteristics of study, information of training set and validation set, and the function and performance of AI algorithms. The definitions of variables in data extraction are shown in Table 2 .

Risk of bias assessment, applicability, and levels of evidence

Two authors will independently assess risk of bias and applicability with a checklist based on Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) [ 21 ]. The QUADAS-2 contains 4 domains, respectively regarding patient selection, index test, reference standard, and flow and timing risk of bias. The risk of bias is classified as “low”, “high”, or “unclear”. Studies with high risk of bias will be excluded in the sensitivity analysis.

In this systematic review, the “participants” are literatures rather than human subjects. The index test is AI model used for automatic literature screening. Therefore, we will slightly revise the QUADAS-2 to fit our research context (Table 3 ). We deleted one signal question in the QUADAS-2 “was there an appropriate interval between index test and reference standard”. The purpose of this signal question in the original version of the QUADAS-2 is to judge the bias caused by the change of disease status between the index test and the reference test. The “disease status”, or the final inclusion status of one literature in our research context, will not change; thus, there are no such concerns.

The levels of the evidence body will be evaluated by the Grading of Recommendations, Assessment, Development and Evaluations (GRADE) framework [ 22 ].

Diagnostic accuracy measures

We will extract the data of per study in a two-by-two contingency table from the formal publication text, appendices, or by contacting the main authors to collect sensitivity, specificity, precision, negative predictive value (NPV), positive predictive value (PPV), negative likelihood ratio (NLR), positive likelihood ratio (PLR), diagnostic odds ratios (DOR), F-measure, and accuracy with 95% CI. If the outcomes cannot be formulated in a two-by-two contingency table, we will extract the reported performance data. If possible, we will also assess the area under the curve (AUC), as the two-by-two contingency table may not be available in some scenarios.

Qualitative and quantitative synthesis of results

We will qualitatively describe the application of AI in literature screening and evaluate and compare the accuracy of the AI tools. If there were adequate details and homogeneous data for the quantitative meta-analysis, we will combine the accuracy of AI algorithms in literature screening using the random-effects Rutter-Gatsonis hierarchical summarised receiver operating characteristic curve (HSROC) model which was recommended by the Cochrane Collaboration for combining the evidence for diagnostic accuracy [ 23 ]. The effect of threshold will be incorporated in the model in which heterogeneous thresholds among different studies will be allowed. The combined point estimates of accuracy will be retrieved from the summarised receiver operating characteristic curve (ROC).

Subgroup analyses and meta-regression will be used to explore the between-study heterogeneity. We will explore the following predefined sources of heterogeneity: (1) AI algorithm type, (2) study area of validation set (targeted specific diseases, interventions, or a general area), (3) searched electronic databases (PubMed, EMBASE, or others), and (4) proportion of eligible to original studies (the number of eligible literature identified in the screening step divided by the number of original literature identified during the electronic search). Furthermore, we will analyse the possible sources of heterogeneity from both dataset and methodological perspectives in HSROC as covariates following the recommendations from the Cochrane Handbook for Diagnostic Tests Review [ 23 ]. We regarded the factor as a source of heterogeneity if the coefficient of the covariate in the HSROC model was statistically significant. We will not evaluate the reporting bias (e.g. publication bias) since the hypothesis underlying the commonly used methods, such as funnel plot or Egger’s test, may not be satisfied in our research context. Data were analysed using R software, version 4.0.2 (R Foundation for Statistical Computing, Vienna, Austria) with two-tailed probability of type I error of 0.05 ( α =0.05).

Systematic review has developed rapidly within the last decades and plays a key role in enabling the spread of evidence-based practice. Systematic review, though costing less than primary research in money expenditure, is still time-consuming and labour-intensive. Conducting systematic review begins with electronic database searching for a specific research question, then at least two reviewers read each abstract of retrieved records to identify potential candidate literatures for full-text screening. Only 2.9% retrieved records are relevant and included in the final synthesis on average [ 24 ]; typically, reviewers have to find the proverbial needle in the haystack of irrelevant titles and abstracts. Computational scientists have developed various algorithms for automatic literature screening. Developing an automatic literature screening instrument will be source-saving and improve the quality of systematic review by liberating reviewers from repetitive work. In this systematic review, we aim to describe and evaluate the development process and algorithms used in various AI literature screening systems, in order to build a pipeline for the update of existing tools and creation of new models.

The accuracy of automatic literature screening instruments varied widely in different algorithms and review topics [ 17 ]. The automatic literature screening systems can reach a sensitivity as high as 95%, despite at the expense of specificity, since reviewers try to include every publication relative to the topic of review. As the automatic systems may have a low specificity, it is also important to evaluate how much reviewing work the reviewers can save in the step of screening. We will not only assess the diagnostic accuracy of AI screening algorithms compared with human investigators, but also collect the information of work saved by AI algorithms in literature screening. Additionally, we plan to conduct subgroup analyses to identify potential factors that associate with the accuracy and efficacy of AI algorithms.

As far as we know, this will be the first systematic review to evaluate AI algorithms for automatic literature screening in evidence synthesis. Few systematic reviews have focused on the application of AI algorithms in medical practice. The literature search strategies in previous published systematic reviews rarely use specific algorithms as search terms. Most of them generally use words such as “artificial intelligence” and “machine learning” in strategies, which may lose the studies that only reported one specific algorithm. In order to include AI-related studies as much as possible, our search strategy contained all of the AI algorithms commonly used in the past 50 years, and it was reviewed by an expert in ML. The process of literature screening can be assessed under the framework of the diagnostic test. Findings from this proposed systematic review will provide a comprehensive and essential summary of the application of AI algorithms for automatic literature screening in evidence synthesis. The proposed systematic review may also help to improve and promote the automatic methods in evidence synthesis in the future by locating and identifying the potential weakness in the current AI models and methods.

Availability of data and materials

The datasets used and analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

  • Artificial intelligence

Area under the curve

Diagnostic odds ratio

Grading of Recommendations, Assessment, Development and Evaluations

Hierarchical summarised receiver operating characteristic curve

Negative likelihood ratio

Negative predictive value

Positive likelihood ratio

Positive predictive value

Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols

Quality Assessment of Diagnostic Accuracy Studies

Receiver operating characteristic curve

Support vector machine

Higgins J, Thomas J, Chandler J, et al. Cochrane handbook for systematic reviews of interventions version 6.0 (updated July 2019)Cochrane, 2019. Reference Source; 2020.

Google Scholar  

Mulrow CD, Cook D. Systematic reviews: synthesis of best evidence for health care decisions: ACP Press; 1998.

Armstrong R, Hall BJ, Doyle J, Waters E. ‘Scoping the scope’ of a cochrane review. J Public Health. 2011;33(1):147–50.

Article   Google Scholar  

Paul M, Leibovici L. Systematic review or meta-analysis? Their place in the evidence hierarchy. Clin Microbiol Infect. 2014;20(2):97–100. https://doi.org/10.1111/1469-069112489 2014(1469-0691 (Electronic)):97-100.

Article   CAS   PubMed   Google Scholar  

Murad MH, Asi N, Alsawas M, Alahdab F. New evidence pyramid. Evid Based Med. 2016;21(4):125.

Bigby M. Evidence-based medicine in a nutshell: a guide to finding and using the best evidence in caring for patients. Arch Dermatol. 1998;134(12):1609–18.

CAS   PubMed   Google Scholar  

Bragge P, Clavisi O, Turner T, Tavender E, Collie A, Gruen RL. The global evidence mapping initiative: scoping research in broad topic areas. BMC Med Res Methodol. 2011;11(1):92.

Sampson M, Shojania KG, Garritty C, Horsley T, Ocampo M, Moher D. Systematic reviews can be produced and published faster. J Clin Epidemiol. 2008;61(6):531–6.

Shojania K, Sampson M, Ansari M, Ji J, Doucette S, Moher D. How quickly do systematic reviews go out of date? A survival analysis. Ann Intern Med. 2007;147(4):224–33 2007(1539-3704 (Electronic)):224-233.

Bishop CM. Pattern recognition and machine learning: Springer; 2006.

Wang L-Y, Chakraborty A, Comaniciu D. Molecular diagnosis and biomarker identification on SELDI proteomics data by ADTBoost method. Paper presented at: 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference. 2006.

Cetin MS, Houck JM, Vergara VM, Miller RL, Calhoun V. Multimodal based classification of schizophrenia patients. Paper presented at: 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 2015.

Sun Y, Loparo K. Information extraction from free text in clinical trials with knowledge-based distant supervision. Paper presented at: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC). 2019.

Li M, Lu Y, Niu Z, Wu F-X. United complex centrality for identification of essential proteins from PPI networks. IEEE/ACM Transact Comput Biol Bioinform. 2015;14(2):370–80.

Whittington C, Feinman T, Lewis SZ, Lieberman G, Del Aguila M. Clinical practice guidelines: machine learning and natural language processing for automating the rapid identification and annotation of new evidence. J Clin Oncol. 2019;37.

Turner MD, Chakrabarti C, Jones TB, et al. Automated annotation of functional imaging experiments via multi-label classification. Front Neurosci. 2013;7:240.

Cohen AM, Hersh WR, Peterson K, Yen P-Y. Reducing workload in systematic review preparation using automated citation classification. J Am Med Inform Assoc. 2006;13(2):206–19.

Article   CAS   Google Scholar  

Rúbio TR, Gulo CA. Enhancing academic literature review through relevance recommendation: using bibliometric and text-based features for classification. Paper presented at: 2016 11th Iberian Conference on Information Systems and Technologies (CISTI). 2016.

Shamseer L, Moher D, Clarke M, et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. BMJ. 2015;350:g7647.

Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: a systematic review. Syst Rev. 2015;4:78.

Whiting PF, Rutjes AW, Westwood ME, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36.

Guyatt GH, Oxman AD, Vist GE, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ. 2008;336(7650):924–6.

Macaskill P, Gatsonis C, Deeks J, Harbord R, Takwoingi Y. Cochrane handbook for systematic reviews of diagnostic test accuracy. Version 09 0. London: The Cochrane Collaboration; 2010.

Sampson M, Tetzlaff J, Urquhart C. Precision of healthcare systematic review searches in a cross-sectional sample. Res Synth Methods. 2011;2(2):119–25.

Download references

Acknowledgements

We thank Professor Siyan Zhan (Department of Epidemiology and Biostatistics, School of Public Health, Peking University Health Science Center, [email protected] ) for her critical comments in designing this study. We also thank Dr. Bin Zhang (Institute of Medical Information/Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, [email protected] ) for her critical suggestions in developing search strategies.

This study will be supported by the Undergraduate Innovation and Entrepreneurship Training Program (Number 202010023001). The sponsors have no role in study design, data collection, data analysis, interpretations of findings, and decisions for dissemination.

Author information

Yuelun Zhang, Siyu Liang, and Yunying Feng contributed equally to this work and should be regarded as co-first authors.

Authors and Affiliations

Medical Research Center, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Yuelun Zhang

Department of Endocrinology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, 1 Shuaifuyuan, Dongcheng District, Beijing, China

Siyu Liang, Shi Chen, Huijuan Zhu & Hui Pan

Eight-year Program of Clinical Medicine, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Yunying Feng, Yiying Yang & Xin He

Research Institute of Information and Technology, Tsinghua University, Beijing, China

Department of Epidemiology and Biostatistics, School of Public Health, Peking University Health Science Center, Beijing, China

You can also search for this author in PubMed   Google Scholar

Contributions

H Pan conceived this research. This protocol was designed by YL Zhang, SY Liang, and YY Feng. YY Yang, X He, Q Wang, F Sun, S Chen, and HJ Zhu provided critical suggestions and comments on the manuscript. YL Zhang, SY Liang, and YY Feng wrote the manuscript. All authors read and approved the final manuscript. H Pan is the guarantor for this manuscript.

Corresponding author

Correspondence to Hui Pan .

Ethics declarations

Ethics approval and consent to participate.

This research is exempt from ethics approval because the work is carried out on published documents.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: supplementary table 1.

. Search strategy for PubMed.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Zhang, Y., Liang, S., Feng, Y. et al. Automation of literature screening using machine learning in medical evidence synthesis: a diagnostic test accuracy systematic review protocol. Syst Rev 11 , 11 (2022). https://doi.org/10.1186/s13643-021-01881-5

Download citation

Received : 20 August 2020

Accepted : 27 December 2021

Published : 15 January 2022

DOI : https://doi.org/10.1186/s13643-021-01881-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Evidence-based practice
  • Natural language process
  • Systematic review
  • Diagnostic test accuracy

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

literature review search algorithm

Duke University Libraries

Literature Reviews

  • Getting started
  • Types of reviews
  • 1. Define your research question
  • 2. Plan your search
  • 3. Search the literature
  • 4. Organize your results
  • 5. Synthesize your findings
  • 6. Write the review
  • AI suggestions

Introduction to AI

Research rabbit, copilot (powered by chatgpt4).

  • Thompson Writing Studio This link opens in a new window
  • Need to write a systematic review? This link opens in a new window

literature review search algorithm

Contact a Librarian

Ask a Librarian

Generative AI tools have been receiving a lot of attention lately because they can create content like text, images, and music. These tools employ machine learning algorithms that can produce unique and sometimes unexpected results. Generative AI has opened up exciting possibilities in different fields, such as language models like GPT and image generators.

However, students need to approach these tools with awareness and responsibility. Here are some key points to consider:

Novelty and Creativity : Generative AI tools can produce content that is both innovative and unexpected. They allow users to explore new ideas, generate unique artworks, and even compose original music. This novelty is one of their most exciting aspects.

Ethical Considerations : While generative AI offers creative potential, it also raises ethical questions. Students should be aware of potential biases, unintended consequences, and the impact of their generated content. Responsible use involves considering the broader implications.

Academic Integrity : When using generative AI tools for academic purposes, students should consult their instructors. Policies regarding the use of AI-generated content may vary across institutions. Always seek guidance to ensure compliance with academic integrity standards.

In summary, generative AI tools are powerful and fascinating, but students should approach them thoughtfully, seek guidance, and adhere to institutional policies. Please refer to the Duke Community Standard  for questions related to ethical AI use.

literature review search algorithm

Research Rabbit is a literature mapping tool that takes one paper and performs backward- and forward citation searching in addition to recommending "similar work." It scans the Web for publicly available content to build its "database" of work.

Best suited for...

Disciplines whose literature is primarily published in academic journals.

Considerations

  • Integrates with Zotero
  • Works mostly with just journal articles
  • Potential for bias in citation searching/mapping

»   researchrabbit.ai   «

center

What is it?

Elicit is a tool that semi-automates time-intensive research processes, such as summarizing papers , extracting data , and synthesizing information . Elicit pulls academic literature from Semantic Scholar , an academic search engine that also uses machine learning to summarize information.

Empirical research (i.g., the sciences, especially biomedicine).

  • Both free and paid versions
  • Doesn't work well in identifying facts or in theoretical/non-empirical research (e.g., the humanities)
  • Potential biases in the natural language processing (NLP) algorithms
  • Summarized information and extracted data will still need to be critically analyzed and verified for accuracy by the user

»   elicit.com   «

literature review search algorithm

Think of Consensus as ChatGPT for research! Consensus is "an AI-powered search engine designed to take in research questions, find relevant insights within research papers, and synthesize the results using the power of large language models" ( Consensus.app ).  Consensus runs its language model over its entire body of scientific literature (which is sourced from Semantic Scholar ) and extracts the “key takeaway” from every paper.

The social sciences and sciences (non-theoretical disciplines).

  • Free and paid versions
  • Similar to Elicit, Consensus should not be used to ask questions about basic facts
  • Consensus recommends that you ask questions related to research that has already been conducted by scientists
  • Potential for biases in the input data from participants

»   consensus.app   «

literature review search algorithm

Dubbed the "AI-powered Swiss Army Knife for information discovery," Perplexity is used for answering questions (including basic facts, a function that many other AI tools are not adept at doing), exploring topics in depth utilizing Microsoft's Copilot, organizing your research into a library, and interacting with your data (including asking questions about your files).

Perplexity has wide-reaching applications and could be useful across disciplines.

  • Free and paid pro versions (the pro version utilizes Microsoft's Copilot AI tool)
  • Available in desktop, iOS, and Android apps
  • See  Perplexity's blog for more info
  • Your personal information and data on how you use the tool are stored for analytical purposes (however, this feature can be turned off in settings)
  • Features a browser plug-in, Perplexity Companion , that is essentially a blend of Google and ChatGPT

»   perplexity.ai   «

Did you know that as Duke faculty, staff, and students, we have free access to ChatGPT4 via Microsoft Copilot ?

Log in with your Duke credentials to start using it today.

literature review search algorithm

The OG of generative AI tools, ChatGPT-4 is the latest iteration of the popular chatbot, answering questions and generating text that sounds like it was written by a human. While not a replacement for conducting research, it can be helpful when it comes to brainstorming topics or research questions and also as a writing tool (rewriting or paraphrasing content, assessing tone, etc.).

All users across all disciplines.

  • ChatGPT-3.5 is the default version of free and paid-tier chat users.
  • Since it can't verify its sources, be wary of hallucinations (or made-up citations) that can look very real.
  • It is not 100% accurate ! While ChatGPT-4 is touted as being 40% more accurate than its predecessor, users are still expected to verify the information generated by it.
  • There is always the potential for bias since ChatGPT was trained on a massive dataset of websites, articles, books, etc. (much of which is inherently biased since it was created by humans).

For ChatGPT-4 (access provided by Duke and requires login) »   copilot.microsoft.com   «

For ChatGPT-3.5 (free) »   chat.openai.com   «

  • << Previous: 6. Write the review
  • Next: Thompson Writing Studio >>
  • Last Updated: Feb 15, 2024 1:45 PM
  • URL: https://guides.library.duke.edu/lit-reviews

Duke University Libraries

Services for...

  • Faculty & Instructors
  • Graduate Students
  • Undergraduate Students
  • International Students
  • Patrons with Disabilities

Twitter

  • Harmful Language Statement
  • Re-use & Attribution / Privacy
  • Support the Libraries

Creative Commons License

Book cover

International Conference on Advanced Information Systems Engineering

CAiSE 2022: Advanced Information Systems Engineering pp 129–146 Cite as

Systematic Literature Review Search Query Refinement Pipeline: Incremental Enrichment and Adaptation

  • Maisie Badami 11 ,
  • Boualem Benatallah 11 &
  • Marcos Baez 12  
  • Conference paper
  • First Online: 03 June 2022

1449 Accesses

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13295)

Systematic literature reviews (SLRs) are at the heart of evidence-based research, collecting and integrating empirical evidence regarding specific research questions. A leading step in the search for relevant evidence is composing Boolean search queries, which are still at the core of how information retrieval systems work to perform an advanced literature search. Building these queries thus requires going from the general aims of the research questions into actionable search terms that are combined into potentially complex Boolean expressions. Researchers are thus tasked with the daunting and challenging task of building and refining search queries in their quest for sufficient coverage and proper representation of the literature. In this paper, we propose an adaptive Boolean query generation and refinement pipeline for SLR search. Our approach utilizes a reinforcement learning technique to learn the optimal modifications for a query based on the feedback collecting from the researchers about the query retrieval performance. Empirical evaluations with 10 SLR datasets showed our approach to achieve comparable performance to that of queries manually composed by SLR authors.

  • Systematic reviews
  • Query enrichment
  • Query adaptation
  • Reinforcement learning
  • Word embedding

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

The per-iteration regret is the mean of rewards of a choice with the best rewards and the action taken by the algorithm [ 28 ].

We define a query clause as a conjunction of a set of terms such as ( \(t_1\) AND \(t_3\) ).

In a query cluster such as ( \(t_1\) OR \(t_2\) ) , \(t_1\) is considered as sibling term to \(t_2\) .

For full details about the datasets, experimental details and in-depth results please refer to our supplementary material at https://tinyurl.com/496zuar3 and implementation details on https://tinyurl.com/2rp4m5cs .

Popular dataset repositories, at https://zenodo.org and http://figshare.com .

We only included the results of three seed types in the table, the full list is available in Appendix at https://tinyurl.com/496zuar3 .

Adamo, G., Ghidini, C., Di Francescomarino, C.: What is a process model composed of ? A systematic literature review of meta-models in bpm. arXiv preprint arXiv:2011.09177 (2020)

Badami, M., Baez, M., Zamanirad, S., et al.: On how cognitive computing will plan your next systematic review. arXiv preprint arXiv:2012.08178 (2020)

Barišić, A., Goulão, M., Amaral, V.: Domain-specific language domain analysis and evaluation: a systematic literature review. Universidade Nova da Lisboa, Faculdade de Ciencias e Technologia (2015)

Google Scholar  

Brochu, E., Cora, V.M., et al.: A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 (2010)

Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surveys (CSUR) 44 (1), 1–50 (2012)

Article   Google Scholar  

van Dinter, R., Tekinerdogan, B., Catal, C.: Automation of systematic literature reviews: A systematic literature review. Information & Soft. Tech, p. 106589 (2021)

Frank, M., Hilbrich, M., Lehrig, S., Becker, S.: Parallelization, modeling, and performance prediction in the multi-/many core area: a systematic literature review. In: 2017 IEEE 7th International Symposium on Cloud and Service Computing (SC2), pp. 48–55. IEEE (2017)

Garousi, V., Felderer, M.: Experience-based guidelines for effective and efficient data extraction in systematic reviews in software engineering. In: Proceedings of EASE 2017, pp. 170–179 (2017)

Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S.: A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38 (6), 1276–1304 (2011)

Jamshidi, P., Ahmad, A., Pahl, C.: Cloud migration research: a systematic review. IEEE Trans. Cloud Comput. 1 (2), 142–157 (2013)

Kim, Y., Seo, J., Croft, W.B.: Automatic Boolean query suggestion for professional search. In: Proceedings of SIGIR, pp. 825–834 (2011)

Kitchenham, B., Brereton, O.P., Budgen, D., Turner, M., Bailey, J., Linkman, S.: Systematic literature reviews in software engineering-a systematic literature review. Inf. Softw. Technol. 51 (1), 7–15 (2009)

Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in software engineering (2007)

Kohavi, R., Longbotham, R., Sommerfield, D., et al.: Controlled experiments on the web: survey and practical guide. DMKD 18 (1), 140–181 (2009)

MathSciNet   Google Scholar  

Kuzi, S., Shtok, A., Kurland, O.: Query expansion using word embeddings. In: Proceedings of CIKM, pp. 1929–1932 (2016)

Lee, G.E., Sun, A.: Seed-driven document ranking for systematic reviews in evidence-based medicine. In: SIGIR, pp. 455–464 (2018)

Li, H., Scells, H., Zuccon, G.: Systematic review automation tools for end-to-end query formulation. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2141–2144 (2020)

Manning, C.D., Surdeanu, M., et al.: The stanford coreNLP natural language processing toolkit. In: Proceedings of ACL, pp. 55–60 (2014)

Marcos-Pablos, S., García-Peñalvo, F.J.: Decision support tools for SLR search string construction. In: Proceedings of TEEM 2018, pp. 660–667 (2018)

Mergel, G.D., Silveira, M.S., da Silva, T.S.: A method to support search string building in systematic literature reviews through visual text mining. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp. 1594–1601 (2015)

Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: NeurIPS, pp. 3111–3119 (2013)

Miller, G.A.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

Ouzzani, M., Hammady, H., Fedorowicz, Z., Elmagarmid, A.: Rayyan-a web and mobile app for systematic reviews. Syst. Rev. 5 (1), 210 (2016)

Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of EMNLP, pp. 1532–1543 (2014)

Qin, C., Eichelberger, H., Schmid, K.: Enactment of adaptation in data stream processing with latency implications-a systematic literature review. Inf. Softw. Technol. 111 , 1–21 (2019)

Radjenović, D., Heričko, M., Torkar, R., Živkovič, A.: Software fault prediction metrics: a systematic literature review. Inf. Softw. Technol. 55 (8), 1397–1418 (2013)

Robertson, S.: Understanding inverse document frequency: on theoretical arguments for IDF. J. Doc. 60 (5), 503–520 (2004)

Russo, D., Van Roy, B., Kazerouni, A., et al.: A tutorial on Thompson sampling. arXiv preprint arXiv:1707.02038 (2017)

Salton, G., Buckley, C.: Improving retrieval performance by relevance feedback. J. Am. Soc. Inf. Sci. 41 (4), 288–297 (1990)

Scells, H., Zuccon, G.: Generating better queries for systematic reviews. In: ACM SIGIR, pp. 475–484 (2018)

Scells, H., Zuccon, G., Koopman, B.: Automatic Boolean query refinement for systematic review literature search. In: WWW, pp. 1646–1656 (2019)

Scells, H., Zuccon, G., Koopman, B.: A comparison of automatic Boolean query formulation for systematic reviews. Inf. Retrieval J. 24 (1), 3–28 (2021)

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)

Tabebordbar, A., Beheshti, A., Benatallah, B., et al.: Feature-based and adaptive rule adaptation in dynamic environments. DSE 5 (3), 207–223 (2020)

Teixeira, E.N., Aleixo, F.A., de Sousa Amâncio, F.D., OliveiraJr, E., Kulesza, U., Werner, C.: Software process line as an approach to support software process reuse: a systematic literature review. Inf. Softw. Technol. 116 , 106175 (2019)

Wahono, R.S.: A systematic literature review of software defect prediction. J. Softw. Eng. 1 (1), 1–16 (2015)

Wallace, B.C., Small, K., Brodley, C.E., et al.: Who should label what ? Instance allocation in multiple expert active learning. In: SDM, pp. 176–187. SIAM (2011)

Williams, J.J., Kim, J., Rafferty, A., et al.: AXIS: generating explanations at scale with learner sourcing and machine learning. In: L@Scale, pp. 379–388 (2016)

Download references

Author information

Authors and affiliations.

University of New South Wales (UNSW), Sydney, Australia

Maisie Badami & Boualem Benatallah

LIRIS - University of Claude Bernard Lyon 1, Villeurbanne, France

Marcos Baez

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Maisie Badami .

Editor information

Editors and affiliations.

Department of Service and Information System Engineering (ESSI), Universitat Politècnica de Catalunya, Barcelona, Spain

Xavier Franch

Ghent University, Gent, Belgium

Geert Poels

Frederik Gailly

KU Leuven, Leuven, Belgium

Monique Snoeck

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Cite this paper.

Badami, M., Benatallah, B., Baez, M. (2022). Systematic Literature Review Search Query Refinement Pipeline: Incremental Enrichment and Adaptation. In: Franch, X., Poels, G., Gailly, F., Snoeck, M. (eds) Advanced Information Systems Engineering. CAiSE 2022. Lecture Notes in Computer Science, vol 13295. Springer, Cham. https://doi.org/10.1007/978-3-031-07472-1_8

Download citation

DOI : https://doi.org/10.1007/978-3-031-07472-1_8

Published : 03 June 2022

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-07471-4

Online ISBN : 978-3-031-07472-1

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • v.8(5); 2022 May

Logo of heliyon

Examining the developments in scheduling algorithms research: A bibliometric approach

Associated data.

The data that support the findings of this study are openly available in figshare at https://doi.org/10.6084/m9.figshare.19494359 .

This study examined the developments in the field of Scheduling algorithms in the last 30 years (1992–2021) to help researchers gain new insight and uncover the emerging areas of growth for further research in this field. This study, therefore, carried out a bibliometric analysis of 12,644 peer-reviewed documents extracted from the Scopus database using the Bibliometrix R package for bibliometric analysis via the Biblioshiny web interface. The results of this study established the development status of the field of Scheduling Algorithms, the growth rate, and emerging thematic areas for further research, institutions, and country collaborations. It also identified the most impactful and leading authors, keywords, sources, and publications in this field. These findings can help both budding and established researchers to find new research focus and collaboration opportunities and make informed decisions as they research the field of scheduling algorithms and their applications.

  • • Scheduling algorithms research has an annual scientific production growth rate of 16.62%.
  • • The keyword “scheduling” is the most relevant keyword. It has the highest betweenness centrality of 101.94.
  • • Buyya Rajkumar is the most productive author in scheduling algorithms research.
  • • Edge Computing is the most discussed topic concerning scheduling algorithms research in 2020 and 2021.
  • • Institutional collaborations have not been well-established in this field.

Bibliometric Analysis, Scheduling Algorithms, Bibliometrix, Science Mapping, Biblioshiny

1. Introduction

Scheduling is allocating resources to tasks to optimize certain performance measures such as waiting time, throughput, and makespan. The study of scheduling spans over 60 years in various fields such as Operations Research, Management, Computer Science, Industrial Engineering, and Health ( An et al., 2012 ; Burdett and Kozan, 2018 ; Kumar et al., 2018 ; Sharma et al., 2021a , 2021b ; Yan et al., 2021 ). Its advent in the field of Computer Science resulted from the need to solve scheduling problems arising from the development of the Operating Systems (OS) ( Leung, 2004 ). The development of OS led to the need for efficient scheduling algorithms to manage computational resources such as CPU time, Storage, Memory, and I/O devices in computer systems. Some of the scheduling algorithms are First Come First Serve (FCFS), Shortest Job First (SJF), Priority scheduling, Shortest Remaining Time (SRT), Round Robin, Min-Min, Max-Min, and Multilevel queue. The major goal is to ensure fairness in allocating resources to tasks. Different variants of these algorithms have been developed ( Chandiramani et al., 2019 ; Nazar et al., 2019 ; Omotehinwa et al., 2019a , 2019b ; Sharma et al., 2021a ) and the area of application of these algorithms cuts across different domains; load balancing in Cloud Computing ( Das et al., 2017 ; Ghosh and Banerjee, 2018 ), cost minimization in fog computing, and energy-balancing in the Internet of Things (IoT) ( Choudhari et al., 2018 ; Tychalas and Karatza, 2020 ). This study aims to carry out a bibliometric analysis of the developments of scheduling algorithms in the last 30 years. According to Verbeek et al. (2002) , bibliometric analysis is "the collection, the handling, and the analysis of quantitative bibliographic data, derived from scientific publications". In science and applied sciences, bibliometric analysis has become an acceptable standard for descriptive and evaluative measures of the impact of research outputs. It is also a key component of research evaluation methodology ( Ellegaard and Wallin, 2015 ). The use of bibliometrics in different disciplines has continued to increase. Studies similar to this study ( Dao et al., 2017 ; Rahimi et al., 2022 ; Shishido and Estrella, 2018 ; Yu et al., 2018 ) have carried out a bibliometric analysis of the genetic algorithm, grids, and clouds, Cloud Computing Technology, and Non-dominated Sorting Genetic Algorithm II (NSGA II). The systematic review helps to gain new insight and uncover meaningful knowledge from cumulative data of a research field ( Radhakrishnan et al., 2017 ). Bibliometric tools make the extraction of new knowledge from an enormous amount of literature very fast and less cumbersome when compared to attempting to uncover this knowledge through the traditional systematic literature review. A brief overview of the bibliometric analysis study and systematic review study is presented in Table 1 .

Table 1

Differences, advantages, and disadvantages of bibliometric analysis and systematic review study.

It is important to note that these two methods can complement research. For example, the result of a bibliographic coupling produced through bibliometric analysis could be subject to content analysis using a systematic review to draw inferences on the clusters presented in the bibliographic coupling.

Scheduling algorithms have vast areas of application, and it will be a tedious work to carry out a traditional systematic literature review to find out the leading sources, most relevant authors and institutions, authors and country collaborations, emerging thematic areas, top keywords, and the scientific production of researchers in this area. This information can help both established and budding researchers in getting improved search results as the top keywords in this area can be identified quickly, most relevant authors, articles, and sources can guide on where to publish, top publications to review, potential research collaborators, emerging thematic areas can create a new focus for researchers. Existing survey papers have not focused on the general landscape or developments in this field. They have largely taken a narrow path that either focus on specific topics or area of applications of scheduling algorithms. These studies ( Almansour and Allah, 2019 ; Arora and Banyal, 2022 ; Arunarani et al., 2019 ; Ghafari et al., 2022 ; Kumar et al., 2019 ; Sana and Li, 2021 ) focused on the extensive review of scheduling techniques in cloud computing ( Agrawal et al., 2021 ; Davis and Burns, 2011 ; Olofintuyi et al., 2020 ), focused on the review of operating system scheduling algorithms while other studies such as ( Maipan-Uku et al., 2017 ; Prajapati and Shah, 2014 ; Yousif et al., 2015 ) focused on scheduling in grid computing. A general overview of scheduling algorithms is necessary at this time to understand the trends over time and to discover a new research focus. It is also worthy of note that there is no study on bibliometric analysis of scheduling algorithms at the time of this writing. Therefore, this study is very important at this time. To understand the landscape of scheduling algorithms research, this study set out to answer the following questions:

  • i. What is the annual scientific production growth rate in scheduling algorithms research?
  • ii. What is the scientific production of authors? Which are the most relevant publications, most relevant sources globally and locally in scheduling algorithms research?
  • iii. What are the emerging, evolving, developed, and underdeveloped thematic areas in scheduling algorithms research?
  • iv. What is the country's scientific production like? How established is the collaboration network of authors, institutions, and countries in scheduling algorithms research?
  • v. What are the trends in topics, top keywords, and co-occurrence network of keywords in scheduling algorithms research?

2. Methodology

The methodology used in this study is a bibliometric analysis which is a widely used methodology for the statistical evaluation of scientific outputs. The bibliometric analysis is done using Bibliometrix. Bibliometrix is an open-source science mapping analysis tool for statistical measurement of productivity within a research domain ( Aria and Cuccurullo, 2017 ). According to Dervis (2019) , "Bibliometrix is an R statistical package for analyzing and visualizing the bibliometric data from Web of Science and Scopus databases. It is written in R language, which operates under the GNU operating system”. It was developed by Aria, Massimo from the University of Naples, and Cuccurullo, Corrado from the University of Campania, both in Italy ( Aria and Cuccurullo, 2017 ). Other bibliometric tools include Pajek, ScientoTex, CiteSpace, BibExcel, and VOSviewer. However, they are not as comprehensive, and the analysis is not easily reproduced as in Bibliometrix. Moral-Muñoz et al. (2020) stated that Bibliometrix is outstanding in the variety of analysis that can be carried out with it.

In this section, the standard workflow for bibliometrics: Study design, Data collection, Data analysis, Data visualization, and interpretation as presented in the study by ( Zupic and Čater, 2015 ) and the procedure for bibliometric analysis itemized by ( Donthu et al., 2021 ) was followed.

2.1. Study design

This study was designed to examine the developments in scheduling algorithms within the last 30 years, specifically from 1992 to 2021. The specific questions answered by this study are enumerated in section 1 .

2.2. Data collection

The data used in this study were extracted from the Scopus database. Scopus is a repository for abstracts and citations of peer-reviewed literature such as Journals and conference proceedings. According to its blog ( Scopus, 2020 ), Scopus is the largest of such repositories. The literature search process involves defining search strings containing keywords commonly used and/or related to scheduling algorithms research. The search string; (TITLE-ABS-KEY ("Scheduling Algorithm∗") AND TITLE-ABS-KEY ("Multilevel queue" OR "Round Robin" OR "Shortest Job First" OR "First Come First Serve" OR "Shortest Remaining Time" OR "Priority queue" OR "Makespan" OR "average waiting time" OR "turnaround time" OR "Context Switch∗" OR "Completion time" OR "Resource allocation" OR "Time Quantum" OR "Grid computing" OR "Cloud Computing" OR "Edge Computing" OR "Fuzzy Computing" OR "Evolutionary Computing" OR "Green Computing")) AND (EXCLUDE (SRCTYPE, "Undefined")) AND (EXCLUDE (PUBSTAGE, "AIP")) AND (LIMIT-TO (PUBYEAR > 1991 AND PUBYEAR < 2022)) AND (LIMIT-TO (LANGUAGE, "English")) returned 12,881 documents. The wild card asterisk (∗) ending the keyword “Scheduling algorithm” in the search ensures that 0 or more characters may complete the keywords. For example, "Scheduling algorithm∗" means the search should select if available Scheduling algorithm or Scheduling algorithm s. There are Inclusion Criteria (IC), and Exclusion Criteria (EC) imposed on the search such that the result should include only documents written in the English language, excluding all articles in press "AIP". The 12,881 records were exported to CSV via the Scopus document search interface. It is impossible to export more than 2,000 records at a time; hence, the records were exported in batches and merged using Microsoft Excel. The web interface provided for no coding use of Bibliometrix called Biblioshiny was used to filter documents of type “Letter”, “Note”, “Editorial”, “Retracted”, and documents tagged "Na N – No Author Name" were removed. Also, some of the documents with language; English/Croatia and English/Japanese, were removed. The filtering returned 12,644 documents analyzed in this study. The search was conducted on 21 st March 2022. Table 2 presents the summary of the main information about the data. The literature in this dataset is composed of six types of documents which are Articles (n = 5,689), Book (n = 7), Book chapters (n = 98), Conference papers (n = 6,802), Erratum (n = 1), and Review papers (n = 47). Information about the document contents includes Author's Keywords (DE) and Keyword Plus (ID). The author's keywords showed the total number of keywords contained in all documents. In contrast, the keywords plus is the total number of keywords generated by Scopus from titles, keywords, and abstracts of documents in this dataset ( Aria and Cuccurullo, 2017 ). Other information includes the total number of authors in the entire document in this dataset, co-authors per document, and collaboration index. On successful cleaning of the data from Scopus, the function biblioshiny() was invoked in R studio to launch the biblioshiny web interface after loading the bibliometrix package with the function library(“bibliometrix”). The CSV file containing the data extracted from Scopus was loaded through the biblioshiny interface. The Keyword Co-occurrence Network, Thematic Map, Collaboration Network, Trend Topics, Source, Document, and Author impact analyses were carried out.

Table 2

Main information and summary of the dataset.

2.3. Data analysis

The file containing the 12,644 documents was converted into bibliographic data frames for analysis after uploading through the Biblioshiny web interface accessed through the browser. In this study, descriptive analysis was carried out to determine the most relevant authors, most cited authors, and most productive countries based on the first author's affiliation, top journals, and top keywords based on the frequency of occurrence. Citation analysis was done to determine the most cited references. Co-occurrence, collaboration, and co-word analysis were also carried out. The results are presented as networks.

2.4. Data visualization and interpretation

Data visualization and interpretation are presented in section 3 .

3. Results and discussion

A preliminary analysis with a limited sample size yielded very different results. This implies that a sample size not large enough will lead to extremely biased results. Therefore, the sample size must be representative of the population size, and the methodology must be carefully designed to capture both broad and narrow themes in the field being researched. These will ensure negligible bias in the outcome.

In this section, the analysis results are presented based on the questions in the study design section.

3.1. What is the annual scientific production growth rate in scheduling algorithms research?

Table 3 shows the frequency of production for each year from 1992 to 2021. The highest number of production of publications in scheduling algorithm research was recorded in 2019, representing 8.04% (1,016) of the total number of publications. The sudden decrease in the number of publications in 2020 and 2021 could result from the global impact of the COVID-19 pandemic. The lowest annual growth rate of -31.9% was recorded in 2020 ( Table 3 ).

Table 3

Rate of production of articles per year and the annual growth rate.

It can be observed from Figure 1 that scheduling algorithms research has an annual scientific production growth rate of 16.62% ( Eq. (2) ). This growth rate is exponential and going by the “Rule of 70” ( Eq. (1) ) ( Mahajan, 2021 ); this annual growth rate is expected to double in 4 years.

Figure 1

Annual scientific growth of scheduling algorithms research.

Rule of 70:

t is the doubling time; f is the annual percentage growth rate.

The annual percentage growth rate is determined thus:

The AGR is determined by calculating the percentage increase in the number of publications for each year ( Eq. (3) ).

Where T C Y is the total number of publications in the current year and T P Y is the total number of publications in the previous year.

3.2. What is the scientific production of authors? Which are the most relevant publications, most relevant sources globally and locally in scheduling algorithms research?

In Table 4 , the top 10 most productive authors in the scheduling algorithms research are based on the number of publications and contributions of authors to a publication. Fractional counting measures the contributions of authors to publications with more than one author ( Sivertsen et al., 2019 ). In normalized fractional counting ( Sivertsen et al., 2019 ), for an n-authored paper, the contribution of the author is 1 / n . The sum of the fractional contribution of the author for all papers by the author is the number of articles fractionalized. For example, in the last two years, an author has had a publication with four authors and another publication with three authors. The number of articles fractionalized is 0.25 ( 1 4 ) + 0.33 ( 1 3 ) = 0.58.

Table 4

Top 10 most productive authors in scheduling algorithm research.

The first or lead author of a research article is considered the most valuable contributor. In contrast, the corresponding author is the most senior researcher or principal investigator who owns the design for which the research was carried out. The corresponding author, in most cases, is usually the first or the last author of a research article. However, in a recent study by Yu and Yin (2021) , the findings revealed that an increase in the number of authors in an article decreases the first or last author's chances of being the corresponding author. From Table 4 , Wang J.-B is the most valuable contributor, with 30 articles as the first author and 29 as the corresponding author. Assuming that the contributions of all authors are equal through fractional counting, Wang J.-B has the highest number of articles fractionalized. Yuan J and Zandieh M have 33 and 35 articles as corresponding authors, suggesting that they are senior researchers in this field and supervised most of the research they co-authored. Buyya R is the most productive, with a total of 65 articles. It is essential to point out that the total number of publications count for each author was based on author ID and not the frequency of appearance of names on each document. It is impossible to uniquely identify authors by name as two or more authors could have the same last name and initials and be counted as one author.

In Figure 2 , the top ten authors' production over time is shown. The bubble size indicates the number of documents produced per year by the authors. The lines represent the span of production over time (timeline). The number of citations determines the intensity of the colour of the bubble, and deeper colour indicates higher citations. From Figure 2 , it can be observed that Buyya R has the longest timeline (2001–2021) and has been consistent in scheduling algorithms research. He published three articles and recorded 193.8 total citations per year in 2012. He has an H-index of 30 based on the available data, which makes him the most impactful author in this field. The highest number of publications (13) in a year were recorded by Zandieh M and Wang J.-B in 2009 and 2010. The size of the bubbles indicates this for these years. However, Zandieh has the shortest span from 2008 to 2014.

Figure 2

Top 10 authors' publications over time. The bubble size indicates the number of documents produced per year by the authors. The lines represent the span of production over time (timeline). The number of citations determines the intensity of the colour of the bubble; deeper colour indicates higher citations.

Li K. has produced 43 documents, and he has an H-index of 26 and one publication in this field in 2021.

Table 5 revealed the top 10 most cited publications locally and globally. In this study, the relevance of a research constituent, such as author, publication, source, institution, and so on, is measured by the frequency of document production and/or citations. The local citation measures the number of times each publication was cited by other publications within the dataset analyzed in this study. The global citations (total citations) measure the number of times each of the publications was cited across the entire Scopus database. The global citation also indicates the degree of relevance of a particular publication to other fields of study. The study by Beloglazov et al. (2012) is the most significant and impactful to other fields as it has the highest number of global citations (1,943) on the top 10 list while the study by Kwok and Ahmad (1999) is the most locally cited by documents within the analyzed dataset.

Table 5

Top 10 most cited publications based on local and global citations. The local citation measures the number of times each publication was cited by other publications within the dataset analyzed in this study. The global citations (total citations) measure the number of times each of the publications was cited across the entire Scopus database.

Interestingly, the most productive author in this field, Buyya Rajkumar from the University of Melbourne, Australia (See Table 3 ), co-authored three (3) publications in the top 10 list of most cited publications. Also, all publications on the list have more global citations than local citations, which is ideal. This indicates that the publications are not only relevant and impactful to scheduling algorithms research but other fields. Content analysis of the literature presented in Tables  5 and ​ and6 6 was carried out to determine the proposed, applied, or reviewed scheduling algorithms.

Table 6

The most relevant recent research is based on the total number of citations. The relevance of a research constituent, such as author, publication, source, institution, and so on, is measured by the frequency of document production and/or citations.

Citation is one of the most popular metrics often used to measure the impact of an article in its field ( Aksnes et al., 2019 ). Therefore, in a bid to examine recent research in this field that has gained momentum and relevance, using citation count as a measure of impact, the top 5 articles in 2021, the top 3 in 2020, and the top 2 in 2019 are presented in Table 6 . These articles are ( Chen et al., 2021 ), ( Yang et al., 2020a ; 2020b ), and ( Cheng et al., 2019 ), the top most cited in 2021, 2020, and 2019 respectively. Their broad research areas are Cloud Computing, Distributed machines, Internet of Things, Packet Scheduling, and Edge Computing.

The top 10 most relevant sources based on the number of published documents are presented in Figure 3 . Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) published 641 articles, has a total global citation count of 4,607 based on available data, and it has the highest number of publications by any source in this dataset. It has an H-index of 28. The global citation measures the number of citations received by each source in the entire Scopus database. The most cited sources locally ( Figure 4 ) measure the number of citations received by each document within the documents in the dataset used in this research which spans 1992 to 2021.

Figure 3

Top 10 Most relevant sources based on the number of publications. The number of publications determines the intensity of the colour of the ball and its size; deeper colour and bigger size indicate a higher number of publications.

Figure 4

Top 10 Most locally cited sources. The local citation measures the number of times each source was cited by publications within the dataset analyzed. The number of publications determines the intensity of the colour of the ball and its size; deeper colour and bigger size indicate a higher number of publications.

From Figure 4 , the European Journal of Operational Research published by Elsevier is the most cited source locally, with 5,751 citations, an H-index of 31, and a global citation count of 2,967. Future Generation Computer Systems, published by Elsevier, ranked 9 th on the most relevant sources and most locally cited list, has an H-index of 40 and a local citation count of 1,508.

3.3. What are the emerging, evolving, developed, and underdeveloped thematic areas in scheduling algorithms research?

The thematic structure of themes in scheduling algorithms research is shown in Figure 5 . The co-word analysis network and cluster are based on Cobo et al. (2011) study. The bibliometrix function thematicMap() received the following parameters: Field = “ID”, No. of words = 500, No of Labels = 10, Minimum cluster per frequency per thousand documents = 5, Label size = 0.3, Clustering Algorithm = “Louvain”, layout = “Auto”. To avoid bias, the keywords in the search query ( Scheduling Algorithm; Scheduling Algorithms; Multilevel queue; Round Robin; Shortest Job First; First Come First Serve; Shortest Remaining Time; Priority queue; Makespan; average waiting time; turnaround time; Context Switch; Context Switches; Completion time; Resource allocation; Time Quantum; Grid computing; Cloud Computing; Edge Computing; Fuzzy Computing; Evolutionary Computing; Green Computing ) were exempted from the co-word analysis and clustering. The clusters are generated from Keyword Plus (ID); author keywords, keywords in articles, and other important keywords within the abstract of the literature. The thematic map provides a clear interpretation of connections between themes ( Grivel et al., 1995 ). The thematic map presents a visualization of the semantic strength of the themes’ internal (correlation between concepts) and external (cohesiveness of nodes) association, which are measured as density and centrality, respectively. A thematic map helps to identify the important concepts within a field. The upper-right quadrant (Q1) shows the driving or motor themes, the lower-right quadrant (Q2) shows the basic themes, the lower-left quadrant (Q3) shows the emerging or declining themes, and the upper-left quadrant (Q4) shows the developed themes less used and possibly understudied in this field. Figure 5 revealed that the themes such as multitasking, task-scheduling and energy utilization in the green cluster are the most important (high centrality) in scheduling algorithm research. The cluster is intercepted by the motor theme (Q1) and Basic theme (Q2), which indicate that they are fairly developed.

Figure 5

Thematic map of concepts in scheduling algorithms research. The themes/sub-themes are generated from Keyword Plus (ID); author keywords, keywords in articles, and other essential keywords within the abstract of the literature. The upper-right quadrant (Q1) shows the driving or motor themes, the lower-right quadrant (Q2) shows the basic themes, the lower-left quadrant (Q3) shows the emerging or declining themes, and the upper-left quadrant (Q4) shows the developed themes less used and possibly understudied.

The themes in the blue cluster are characterized by low density and centrality, which shows that they are underdeveloped and are fairly important in scheduling algorithm research. The themes are likely to move towards Q3 over time. This means they are likely to keep declining. The trend analysis in Figure 9 also shows that the themes are not recurring in recent times. The size of the cluster also indicates that there are a few themes in this category. Scheduling is connected to other concepts such as optimization, genetic algorithms, heuristic algorithms, computational complexity, and heuristic methods in Q4. The cluster is characterized by high density and low centrality. Hence, they are well developed. These concepts are less used and possibly understudied in scheduling algorithms.

Figure 9

Trends of topics in scheduling algorithms research.

3.4. What is the country's scientific production like? How established is the collaboration network of authors, institutions, and countries in scheduling algorithms research?

Table 7 revealed that China is the leading country in scheduling algorithms research, with 14,055 appearances on documents within the analyzed datasets and a total of 44,248 citations. Authors from the United States of America appeared 4,851 times, with a total citation count of 25,807 and an average article citation of 28.71. It is also interesting that three continents (with indigenous human populations), Africa, Oceania, and South America, are not represented in the top 10 countries leading research in scheduling algorithms.

Table 7

Top 10 most productive countries based on the frequency of appearance in publications in scheduling algorithms research.

The network diagrams in Figures  6 , ​ ,7, 7 , and ​ and11 11 were generated based on an unsupervised community detection clustering algorithm called the Louvain algorithm ( Blondel et al., 2008 ). Other parameters set include number of nodes = 50, minimum number of edges = 2, normalization = association, remove isolated nodes = yes, layout = Auto. These parameters could be passed to the network function in bibliometrix through R or biblioshiny interface. In this study, they were passed through the biblioshiny web interface. The authors' collaboration network analysis is presented in the network graph in Figure 6 . The network graph comprises nodes that represent the authors and edges that indicate connections (links) between the authors. The edges show that the connected authors have published at least two documents together. The size of the nodes indicates the number of documents published by the author in collaboration, and the thickness of the edges indicates the frequency of collaboration; a thicker edge means more collaborations between the connected authors (nodes). Figure 6 revealed that Zomaya A. Y. from the University of Sydney and Lee Y. C. from Macquarie University, both from Australia, have well-established collaborations. A well-established collaboration can also be seen between Wu C.-C from Feng Chia University, Taiwan, and Yin Y. from Kunming University, China, as indicated by a thicker edge connecting the authors. Wang L. from Tsingua University, China, and Wang J.-B from Shenyang Aerospace University, China, are the biggest facilitators of collaborations in the green and red cluster of nodes with a betweenness centrality of 193.16 and 193.63, respectively. The authors' collaboration is quite sparse, and Table 2 reported a collaboration index of 1.46, which is quite low. The geographical proximity of authors largely facilitates collaborations.

Figure 6

Social structure: Authors' collaboration network in scheduling algorithm research. The edges show that the connected authors have published at least two documents together. The size of the nodes indicates the number of documents published by the author in collaboration, and the thickness of the edges indicates the frequency of collaboration; a thicker edge means more collaborations between the connected authors (nodes).

Figure 7

Social structure: Institution collaboration network in scheduling algorithm research. The connection between nodes indicates institutional collaboration. The thickness of the edges indicates the frequency of collaboration, and a thicker edge means more collaborations between the connected institutions.

Figure 11

Keywords co-occurrence network in scheduling algorithms research. The edges connecting the keywords (nodes) depict an established relationship between the connected nodes. A thicker edge indicates a high frequency of co-occurrence of the connected keywords in multiple documents.

Figure 7 also shows that institutional collaborations are established by the geographical proximity of institutions. Connections are largely between institutions within the same country. All the institutions clustered in green colour are based in Iran, institutions dominate the ones in purple from Australia, the pink clusters are institutions in China, and the red cluster is also dominated by institutions in China save for the collaborations established with the National University of Singapore. Shenyang Aerospace University and the Dalian University of Technology both in China have well-established collaborations. Some of the most relevant institutions in this field in terms of the number of articles published are Tsinghua University, China, Beijing University of Posts and Telecommunications, China, Islamic Azad University, Iran, National University of Defense Technology, China, Hunan University, China with a total of 362, 360, 242, 223 and 221 articles published respectively.

In Figure 8 , countries' collaboration maps are shown. Only countries with at least five documents are shown on the map. It can be observed that there exists a well-established collaboration between China and the United States of America, as indicated by the thick line connecting the two countries. Australia also has a well-established collaboration with China. The collaboration between Egypt and Saudi Arabia, Australia and Jordan, India and China, Japan and Poland, India and Australia, and the United States of America and Brazil are not expressive, as indicated by the thin lines connecting the countries.

Figure 8

Social structure: Countries collaboration map. The thickness of connecting lines indicates the frequency of collaboration.

3.5. What are the trends in topics, top keywords, and co-occurrence network of keywords in scheduling algorithms research?

Keywords analysis helps to understand the trends in topics and concepts that are gaining attention in a research field. The results of the keyword analysis are shown in Figures  9 and ​ and10. 10 . The trend topic analysis was carried out on the keywords in the titles of articles only. Minimum frequency = 50, No. of words per year = 5, Word Stemming = No; the Bigram frequency measure was used to count how often a pair of keywords co-occur. Figure 9 revealed the trends in topics for each year between 2004 and 2021. The years before 2004 did not have pairs of words with a frequency greater than or equal to 50, so they were not captured in the result. Packet scheduling, fair scheduling, and real-time scheduling were the trending topics between 2004 and 2007, with packet scheduling most discussed. The pair of words with the highest frequency (indicated by the biggest bubble in 2015) in all publication titles are Scheduling Algorithm, with a frequency of 1,560. Figure 9 also revealed that Edge Computing is the most discussed and trend topic concerning scheduling algorithms research in 2020 and 2021. Researchers, especially the young ones interested in scheduling algorithms research, can start looking at edge computing to identify new problems in this area that are gaining the attention of other researchers that could be addressed from the scheduling algorithms perspective. The result also showed that studies on Scheduling Algorithms research covered diverse research areas such as Cloud data, Reinforcement Learning, Ant Colony Optimization, Virtual Machines, Edge Computing, Mobile Edge, and so on. It can be observed (See Figure 9 ) that there has been a continuous increase in the number of keywords over the years, and it peaked around the year 2017. This implies that there has been a rapid knowledge expansion in this field.

Figure 10

Most relevant keywords in scheduling algorithms research. The size of a keyword is determined by its frequency of occurrence in the entire dataset.

The word cloud shown in Figure 10 further revealed the most frequently used and most relevant keywords in scheduling algorithms research. The size of a keyword is determined by its frequency of occurrence in the entire dataset. The word cloud was generated from Author Keywords (DE), publication titles, and abstracts. This collection is referred to as Keyword Plus (ID) in the Biblioshiny interface. Other parameters are Word occurrence = Frequency, Shape = Circle, Ellipticity = 0.65, Padding = 1, Font size = 1, Font type = Impact. The top three most relevant keywords in scheduling algorithms research are scheduling (Freq = 6,795), resource allocation (Freq = 2,136), and cloud computing (2,135), the reason why these keywords are bigger than others in the word cloud. Other relevant keywords in the scheduling algorithms research include makespan, optimization, algorithms, multitasking, and quality of service.

Keywords Co-occurrence Network (KCN) is essential for gaining new insight into a field based on the patterns and links of co-occurring keywords in analyzed literature ( Radhakrishnan et al., 2017 ). The keywords co-occurrence network was generated based on the Louvain clustering algorithm. The Keyword Plus, which comprises keywords in Titles, Abstract, and Author keywords, were clustered. Other parameters are as stated in Figures  6 and ​ and7. 7 . Figure 11 shows the KCN in scheduling algorithms research. The KCN is composed of nodes (Keywords) and edges (links). The edges connecting the nodes (keywords) depict an established relationship between the connected nodes. A thicker edge indicates a stronger association or closer relationship or high frequency of co-occurrence between the connected nodes. The thickness means more shared documents between the connected keywords, that is, the number of times the connected keywords co-occur in multiple documents. The thick edge connection between “scheduling” and “optimization” shows that the two keywords co-occur in more documents compared to other connected pairs of keywords. The importance of a keyword (node) within a network is shown by the number of links (edges) to that keyword. “scheduling” is an essential keyword with the highest betweenness centrality of 101.94. This means that the keyword forms the bridge between major research themes in this field ( Painter et al., 2019 ).

The clusters are characterized by the areas of research in which these keywords are frequently co-occurring. The keywords in the blue cluster are closer to research areas such as packet scheduling and Long-Term Evolution, the green cluster cloud computing, green computing, fog computing, virtual machines, and big data, while the keywords in the red cluster often appear in CPU scheduling, parallel and distributed systems research.

4. Conclusion

This study conducted an exploratory bibliometric analysis of publications in Scheduling Algorithms research. The study results revealed that Scheduling algorithms related to research have an annual scientific production growth rate of 16.62%. The keyword "scheduling" is the most important keyword in scheduling algorithms research, with the highest betweenness centrality of 101.94. It forms the bridge between major research themes in Scheduling Algorithms related research. Edge Computing is the most discussed and trend topic concerning scheduling algorithms research in 2020 and 2021. Institutional collaborations have not been well-established in this field.

The limitation of this study is that the dataset used is from a single source; Scopus. This is because the biblioshiny interface does not have the facility to merge BibTeX files from different sources. Analysis of a combination of data from different sources such as Web of Science, Google Scholar, and Scopus may give better insight into the field. However, studies ( Bar-Ilan, 2018 ; Martín-Martín et al., 2018 , 2021 ) have shown that Scopus and Web of Science have most documents in common, that is, a large intersect of documents. According to Vieira and Gomes (2009) , 2/3 of Scopus documents are available on the Web of Science. Ninety-three per cent of the documents found on the Web of Science are also in the Scopus database ( Martín-Martín et al., 2021 ).

Future work may merge documents from different databases and filter out the duplicates. Although this study did not include preprints and articles in the press in its search string, such work may give a more accurate picture of the interests in this field. Also, it would be interesting to see the percentage of preprints in this field compared to refereed works. Future work could include preprints and articles in the press for these insights.

The findings of this study can help both budding and established researchers to find new research focus, relevant sources, and collaboration opportunities and make informed decisions as they research scheduling algorithms and their applications.

Declarations

Author contribution statement.

Temidayo Oluwatosin Omotehinwa: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.

Funding statement

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Data availability statement

Declaration of interests statement.

The authors declare no conflict of interest.

Additional information

No additional information is available for this paper.

Acknowledgements

I appreciate the efforts of Agbo, Friday Joseph and Dr. Emmanuel Mogaji at the Centre for Multidisciplinary Research and Innovation (CEMRI) for their assistance during the data collection period and proofreading. Also, the immense contributions of the section editor and reviewers to improve the quality of the manuscript are highly appreciated. I am very grateful to the publisher for granting a full waiver of the article processing charge.

  • Agrawal P., Gupta A.K., Mathur P. CPU scheduling in operating system: a review. Lect. Notes Netw. Syst. 2021; 166 :279–289. [ Google Scholar ]
  • Aksnes D.W., Langfeldt L., Wouters P. Citations, citation indicators, and research quality: an overview of basic concepts and theories. Sage Open. 2019; 9 (1) [ Google Scholar ]
  • Almansour N., Allah N.M. 2019 International Conference on Computer and Information Sciences, ICCIS 2019. 2019, May 15. A survey of scheduling algorithms in cloud computing. [ Google Scholar ]
  • An Y.J., Kim Y.D., Jeong B.J., Kim S.D. Scheduling healthcare services in a home healthcare system. J. Oper. Res. Soc. 2012; 63 (11):1589–1599. [ Google Scholar ]
  • Aria M., Cuccurullo C. bibliometrix: an R-tool for comprehensive science mapping analysis. J. Informetr. 2017; 11 (4):959–975. [ Google Scholar ]
  • Arora N., Banyal R.K. Hybrid scheduling algorithms in cloud computing: a review. Int. J. Electr. Comput. Eng. 2022; 12 (1):880–895. [ Google Scholar ]
  • Arunarani A.R., Manjula D., Sugumaran V. Task scheduling techniques in cloud computing: a literature survey. Future Generat. Comput. Syst. 2019; 91 :407–415. [ Google Scholar ]
  • Bar-Ilan J. Tale of three databases: the implication of coverage demonstrated for a sample query. Front. Res. Metr. Anal. 2018; 3 (6):1–9. [ Google Scholar ]
  • Beloglazov A., Abawajy J., Buyya R. Energy-aware resource allocation heuristics for efficient management of data centres for Cloud computing. Future Generat. Comput. Syst. 2012; 28 (5):755–768. [ Google Scholar ]
  • Bini E., Buttazzo G.C. Measuring the performance of schedulability tests. R. Time Syst. 2005; 30 (1):129–154. [ Google Scholar ]
  • Blondel V.D., Guillaume J.L., Lambiotte R., Lefebvre E. Fast unfolding of communities in large networks. J. Stat. Mech. Theor. Exp. 2008; 2008 (10):P10008. [ Google Scholar ]
  • Burdett R.L., Kozan E. An integrated approach for scheduling health care activities in a hospital. Eur. J. Oper. Res. 2018; 264 (2):756–773. [ Google Scholar ]
  • Buyya R., Murshed M. GridSim: a toolkit for the modelling and simulation of distributed resource management and scheduling for Grid computing. Concurrency Comput. Pract. Ex. 2002; 14 (13–15):1175–1220. [ Google Scholar ]
  • Buyya R., Ranjan R., Calheiros R.N. Proceedings of the 2009 International Conference on High-Performance Computing and Simulation, HPCS 2009. 2009. Modeling and simulation of scalable cloud computing environments and the cloudsim toolkit: challenges and opportunities; pp. 1–11. [ Google Scholar ]
  • Chandiramani K., Verma R., Sivagami M. A modified priority preemptive algorithm for CPU scheduling. Procedia Comput. Sci. 2019; 165 :363–369. [ Google Scholar ]
  • Chen H., Zhu X., Liu G., Pedrycz W. Uncertainty-Aware online scheduling for real-time workflows in cloud service environment. IEEE Transact. Serv. Comput. 2021; 14 (4):1167–1178. [ Google Scholar ]
  • Cheng X., Lyu F., Quan W., Zhou C., He H., Shi W., Shen X. Space/aerial-assisted computing offloading for IoT applications: a learning-based approach. IEEE J. Sel. Area. Commun. 2019; 37 (5):1117–1129. [ Google Scholar ]
  • Choudhari T., Moh M., Moh T.S. Proceedings of the ACMSE 2018 Conference, 2018-January. 2018. Prioritized task scheduling in fog computing; pp. 1–8. [ Google Scholar ]
  • Cobo M.J., Lopez-Herrera A.G., Herrera-Viedma E., Herrera F. An approach for detecting, quantifying, and visualizing the evolution of a research field. J. Infometr. 2011; 5 (1):146–166. [ Google Scholar ]
  • Dao S.D., Abhary K., Marian R. A bibliometric analysis of Genetic Algorithms throughout the history. Comput. Ind. Eng. 2017; 110 :395–403. [ Google Scholar ]
  • Das N.K.C., George M.S., Jaya P. Proceedings of the 2017 IEEE International Conference on Communication and Signal Processing, ICCSP 2017. 2017. Incorporating weighted round-robin in honeybee algorithm for enhanced load balancing in cloud environment; pp. 384–389. [ Google Scholar ]
  • Davis R.I., Burns A. A survey of hard real-time scheduling for multiprocessor systems. ACM Comput. Surv. 2011; 43 (4) [ Google Scholar ]
  • Dervis H. Bibliometric analysis using bibliometrix an R package. J. Sci. Res. 2019; 8 (3):156–160. [ Google Scholar ]
  • Dhinesh Babu L.D., Venkata Krishna P. Honey bee behavior inspired load balancing of tasks in cloud computing environments. Appl. Soft Comput. 2013; 13 (5):2292–2303. [ Google Scholar ]
  • Donthu N., Kumar S., Mukherjee D., Pandey N., Lim W.M. How to conduct a bibliometric analysis: an overview and guidelines. J. Bus. Res. 2021; 133 :285–296. [ Google Scholar ]
  • Ellegaard O., Wallin J.A. The bibliometric analysis of scholarly production: how great is the impact? Scientometrics. 2015; 105 (3):1809–1831. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Gatti R., Shivashankar Improved resource allocation scheme for optimizing the performance of cell-edge users in LTE-A system. J. Ambient Intell. Hum. Comput. 2021; 12 (1):811–819. [ Google Scholar ]
  • Ghafari R., Kabutarkhani F.H., Mansouri N. Task scheduling algorithms for energy optimization in cloud environment: a comprehensive review. Cluster Comput. 2022; 25 (2):1035–1093. [ Google Scholar ]
  • Ghosh S., Banerjee C. Proceedings - 2018 4th IEEE International Conference on Research in Computational Intelligence and Communication Networks, ICRCICN 2018. 2018. Dynamic time quantum priority based round robin for load balancing in cloud environment; pp. 33–37. [ Google Scholar ]
  • Goudarzi M., Wu H., Palaniswami M., Buyya R. An application placement technique for concurrent IoT applications in edge and fog computing environments. IEEE Trans. Mobile Comput. 2021; 20 (4):1298–1311. [ Google Scholar ]
  • Grivel L., Mutschke P., Polanco X. Thematic mapping on bibliographic databases by cluster analysis: a description of the SDOC environment with SOLIS. Knowl. Organ. 1995; 22 (2):70–77. [ Google Scholar ]
  • Kumar M., Mittal M.L., Soni G., Joshi D. A hybrid TLBO-TS algorithm for integrated selection and scheduling of projects. Comput. Ind. Eng. 2018; 119 :121–130. [ Google Scholar ]
  • Kumar M., Sharma S.C., Goel A., Singh S.P. Vol. 143. Academic Press; 2019. A comprehensive survey for scheduling techniques in cloud computing; pp. 1–33. (Journal of Network and Computer Applications). [ Google Scholar ]
  • Kwok Y.K., Ahmad I. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 1999; 31 (4):406–471. [ Google Scholar ]
  • Leung J.Y.T. Handbook of Scheduling: Algorithms, Models, and Performance Analysis. CRC Press; 2004. Handbook of scheduling: algorithms, models, and performance analysis. [ Google Scholar ]
  • Liu J., Mao Y., Zhang J., Letaief K.B. IEEE International Symposium on Information Theory - Proceedings, 2016-August. 2016. Delay-optimal computation task scheduling for mobile-edge computing systems; pp. 1451–1455. [ Google Scholar ]
  • Liu S., Wang Z., Wei G., Li M. Distributed set-membership filtering for multirate systems under the round-robin scheduling over sensor networks. IEEE Trans. Cybern. 2020; 50 (5):1910–1920. [ PubMed ] [ Google Scholar ]
  • Lv Z., Chen D., Lou R., Wang Q. Intelligent edge computing based on machine learning for smart city. Future Generat. Comput. Syst. 2021; 115 :90–99. [ Google Scholar ]
  • Mahajan S. When to postpone approximating: The Rule of 69.3ish. American Journal of Physics. 2021; 89 (2):131–133. [ Google Scholar ]
  • Maipan-uku J., Rabiu I., Mishra A. Immediate/batch mode scheduling algorithms for grid computing: a review. Int. J. Regul. Govern. 2017; 5 (7):1–13. [ Google Scholar ]
  • Martín-Martín A., Orduna-Malea E., Delgado López-Cózar E. Coverage of highly-cited documents in Google Scholar, web of science, and Scopus: a multidisciplinary comparison. Scientometrics. 2018; 116 (3):2175–2188. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Martín-Martín A., Thelwall M., Orduna-Malea E., Delgado López-Cózar E. Google scholar, Microsoft academic, Scopus, dimensions, web of science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations. Scientometrics. 2021; 126 (1):871–906. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • McKeown N. The iSLIP scheduling algorithm for input-queued switches. IEEE/ACM Trans. Netw. 1999; 7 (2):188–201. [ Google Scholar ]
  • Moral-Muñoz J.A., Herrera-Viedma E., Santisteban-Espejo A., Cobo M.J. Software tools for conducting bibliometric analysis in science: an up-to-date review. Profesional de La Informacion. 2020; 29 (1):1699–2407. [ Google Scholar ]
  • Nazar T., Javaid N., Waheed M., Fatima A., Bano H., Ahmed N. Vol. 25. Springer; Cham: 2019. Modified shortest Job first for load balancing in cloud-fog computing; pp. 63–76. (Lecture Notes on Data Engineering and Communications Technologies). [ Google Scholar ]
  • Olofintuyi S.S., Omotehinwa T.O., Owotogbe J.S. A survey of variants of round robin CPU scheduling algorithms. FUDMA J. Sci. 2020; 4 (4):526–546. [ Google Scholar ]
  • Omotehinwa T.O., Azeez S.I., Olofintuyi S.S. A simplified improved dynamic round robin (SIDRR) CPU scheduling algorithm. Int. J. Informat. Proc. Commun. 2019; 7 (2):122–140. [ Google Scholar ]
  • Omotehinwa T.O., Igbaoreto A., Oyekanmi E. An improved round robin CPU scheduling algorithm for asymmetrically distributed burst times. Afr. J. MIS. 2019; 1 (4):50–68. https://afrjmis.net [ Google Scholar ]
  • Painter D.T., Daniels B.C., Jost J. Network analysis for the digital humanities: principles, problems, extensions. Isis. 2019; 110 (3):538–554. [ Google Scholar ]
  • Prajapati H.B., Shah V.A. International Conference on Advanced Computing and Communication Technologies, ACCT. 2014. Scheduling in grid computing environment; pp. 315–324. [ Google Scholar ]
  • Radhakrishnan S., Erbis S., Isaacs J.A., Kamarthi S. Novel keyword co-occurrence network-based methods to foster systematic reviews of scientific literature. PLoS One. 2017; 12 (3) [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Rahimi I., Gandomi A.H., Deb K., Chen F., Nikoo M.R. Scheduling by NSGA-II: review and bibliometric analysis. Processes. 2022; 10 (1):1–31. [ Google Scholar ]
  • Sana M.U., Li Z. Efficiency aware scheduling techniques in cloud computing: a descriptive literature review. PeerJ Comp. Sci. 2021; 7 :1–37. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Scopus . Elsevier Scopus Blog; 2020. About. https://blog.scopus.com//about 2020. [ Google Scholar ]
  • Sharma D.K., Kwatra K., Manwani M., Arora N., Goel A. Optimized resource allocation technique using self-balancing fast MinMin algorithm. Lecture Notes Data Eng. Commun. Technol. 2021; 54 :473–487. [ Google Scholar ]
  • Sharma R., Nitin N., AlShehri M.A.R., Dahiya D. Priority-based joint EDF–RM scheduling algorithm for individual real-time task on distributed systems. J. Supercomput. 2021; 77 (1):890–908. [ Google Scholar ]
  • Shishido H.Y., Estrella J.C. Proceedings - International Conference of the Chilean Computer Science Society, SCCC, 2017-October. 2018. Bibliometric analysis of workflow scheduling in grids and clouds; pp. 1–9. [ Google Scholar ]
  • Sivertsen G., Rousseau R., Zhang L. Measuring scientific contributions with modified fractional counting. J. Informetr. 2019; 13 (2):679–694. [ Google Scholar ]
  • Sundararaj V. Optimal task assignment in mobile cloud computing by queue based ant-bee algorithm. Wireless Pers. Commun. 2019; 104 (1):173–197. [ Google Scholar ]
  • Tychalas D., Karatza H. A scheduling algorithm for a fog computing system with bag-of-tasks jobs: simulation and performance evaluation. Simulat. Model. Pract. Theor. 2020; 98 :101982. [ Google Scholar ]
  • Verbeek A., Debackere K., Luwel M., Zimmermann E. Measuring progress and evolution in science and technology - I: the multiple uses of bibliometric indicators. Int. J. Manag. Rev. 2002; 4 (2):179–211. [ Google Scholar ]
  • Vieira E, Gomes J. A comparison of Scopus and Web of Science for a typical university. Scientometrics. 2009; 81 (2):587–600. [ Google Scholar ]
  • Yan P., Cai X., Ni D., Chu F., He H. Two-stage matching-and-scheduling algorithm for real-time private parking-sharing programs. Comput. Oper. Res. 2021; 125 :105083. [ Google Scholar ]
  • Yang H.H., Liu Z., Quek T.Q.S., Poor H.V. Scheduling policies for federated learning in wireless networks. IEEE Trans. Commun. 2020; 68 (1):317–333. [ Google Scholar ]
  • Yang L., Yao H., Wang J., Jiang C., Benslimane A., Liu Y. Multi-UAV-enabled load-balance mobile-edge computing for IoT networks. IEEE Internet Things J. 2020; 7 (8):6898–6908. [ Google Scholar ]
  • Yoo T., Goldsmith A. On the optimality of multiantenna broadcast scheduling using zero-forcing beamforming. IEEE J. Sel. Area. Commun. 2006; 24 (3):528–541. [ Google Scholar ]
  • Yousif A., Nor S.M., Abdualla A.H., Bashir M.B. Job scheduling algorithms on grid computing: state-of- the art. Int. J. Grid Distrib. Comput. 2015; 8 (6):125–140. [ Google Scholar ]
  • Yu J., Yang Z., Zhu S., Xu B., Li S., Zhang M. Proceedings of 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2018. 2018. A bibliometric analysis of cloud computing technology research; pp. 2353–2358. [ Google Scholar ]
  • Yu J., Yin C. The relationship between the corresponding author and its byline position: an investigation based on the academic big data. J. Phys. Conf. 2021; 1883 (1) [ Google Scholar ]
  • Yuan H., Bi J., Zhou M., Liu Q., Ammari A.C. Biobjective task scheduling for distributed green data centers. IEEE Trans. Autom. Sci. Eng. 2021; 18 (2):731–742. [ Google Scholar ]
  • Zupic I., Čater T. Bibliometric methods in management and organization. Organ. Res. Methods. 2015; 18 (3):429–472. [ Google Scholar ]

Help | Advanced Search

Computer Science > Computers and Society

Title: problematic machine behavior: a systematic literature review of algorithm audits.

Abstract: While algorithm audits are growing rapidly in commonality and public importance, relatively little scholarly work has gone toward synthesizing prior work and strategizing future research in the area. This systematic literature review aims to do just that, following PRISMA guidelines in a review of over 500 English articles that yielded 62 algorithm audit studies. The studies are synthesized and organized primarily by behavior (discrimination, distortion, exploitation, and misjudgement), with codes also provided for domain (e.g. search, vision, advertising, etc.), organization (e.g. Google, Facebook, Amazon, etc.), and audit method (e.g. sock puppet, direct scrape, crowdsourcing, etc.). The review shows how previous audit studies have exposed public-facing algorithms exhibiting problematic behavior, such as search algorithms culpable of distortion and advertising algorithms culpable of discrimination. Based on the studies reviewed, it also suggests some behaviors (e.g. discrimination on the basis of intersectional identities), domains (e.g. advertising algorithms), methods (e.g. code auditing), and organizations (e.g. Twitter, TikTok, LinkedIn) that call for future audit attention. The paper concludes by offering the common ingredients of successful audits, and discussing algorithm auditing in the context of broader research working toward algorithmic justice.

Submission history

Access paper:.

  • Download PDF
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

DBLP - CS Bibliography

Bibtex formatted citation.

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

IMAGES

  1. Overview of the search algorithm for the systematic literature review

    literature review search algorithm

  2. Literature search algorithm.

    literature review search algorithm

  3. Literature search and review strategy algorithm. After an extensive

    literature review search algorithm

  4. Flow diagram of literature search strategy

    literature review search algorithm

  5. Flow diagram of the literature search algorithm.

    literature review search algorithm

  6. Literature search algorithm.

    literature review search algorithm

VIDEO

  1. Literature Review

  2. What is Literature Review?

  3. Approaches to searching the literature

  4. 3) DW Chapter 2 Literature Review Search Strategy Part 1

  5. Literature review and its process

  6. 4) DW Chapter 2 Literature Review Search Strategy Part 2

COMMENTS

  1. A systematic approach to searching: an efficient and complete method to develop literature searches

    This method describes how single-line search strategies can be prepared in a text document by typing search syntax (such as field codes, parentheses, and Boolean operators) before copying and pasting search terms (keywords and free-text synonyms) that are found in the thesaurus.

  2. How to carry out a literature search for a systematic review: a

    Literature reviews are conducted for the purpose of (a) locating information on a topic or identifying gaps in the literature for areas of future study, (b) synthesising conclusions in an area of ambiguity and (c) helping clinicians and researchers inform decision-making and practice guidelines.

  3. A Systematic Literature Review of A* Pathfinding

    open access Abstract A* is a search algorithm that has long been used in the pathfinding research community. Its efficiency, simplicity, and modularity are often highlighted as its strengths compared to other tools. Due to its ubiquity and widespread usage, A* has become a common option for researchers attempting to solve pathfinding problems.

  4. Defining the process to literature searching in systematic reviews: a

    Background Systematic literature searching is recognised as a critical component of the systematic review process. It involves a systematic search for studies and aims for a transparent report of study identification, leaving readers clear about what was done to identify studies, and how the findings of the review are situated in the relevant evidence. Information specialists and review teams ...

  5. A systematic approach to searching: an efficient and complete ...

    Review Literature as Topic* Vocabulary, Controlled Creating search strategies for systematic reviews, finding the best balance between sensitivity and specificity, and translating search strategies between databases is challenging.

  6. How-to conduct a systematic literature review: A quick guide for

    A Systematic Literature Review (SLR) is a research methodology to collect, identify, and critically analyze the available research studies (e.g., articles, conference proceedings, books, dissertations) through a systematic procedure . An SLR updates the reader with current literature about a subject .

  7. Computational Literature Reviews: Method, Algorithms, and Roadmap

    In search of a solution to this dilemma, computational techniques are beginning to support human researchers in synthesizing large bodies of literature. ... covering the CLR process from identifying the review objectives to selecting algorithms and reporting findings. We make the CLR method accessible to novice and expert users alike by ...

  8. An open source machine learning framework for efficient and ...

    Developing a search strategy for a systematic review is an iterative process aimed at balancing recall and precision 8,9; that is, including as many potentially relevant studies as possible while ...

  9. Guidance on Conducting a Systematic Literature Review

    Literature review is an essential feature of academic research. Fundamentally, knowledge advancement must be built on prior existing work. To push the knowledge frontier, we must know where the frontier is. By reviewing relevant literature, we understand the breadth and depth of the existing body of work and identify gaps to explore.

  10. [2402.08565] Artificial Intelligence for Literature Reviews

    This manuscript presents a comprehensive review of the use of Artificial Intelligence (AI) in Systematic Literature Reviews (SLRs). A SLR is a rigorous and organised methodology that assesses and integrates previous research on a given topic. Numerous tools have been developed to assist and partially automate the SLR process. The increasing role of AI in this field shows great potential in ...

  11. What influences algorithmic decision-making? A systematic literature

    Thus, we conduct a systematic literature review ... Inspired by the seminal work of Dietvorst et al. (2015), we initially used two search terms, "algorithm aversion" and "algorithm appreciation" on a pilot basis in "Scopus" and "ACM DL (Full-text)" databases. The search resulted in 22 studies in Scopus and 4 studies in ACM DL ...

  12. Systematic literature review of machine learning methods used in the

    This study originated from a systematic literature review that was conducted in MEDLINE and PsychInfo; a refreshed search was conducted in September 2020 to obtain newer publications (Table 1).Eligible studies were those that analyzed prospective or retrospective observational data, reported quantitative results, and described statistical methods specifically applicable to patient-level ...

  13. Automation of literature screening using machine learning in medical

    Systematic review is an indispensable tool for optimal evidence collection and evaluation in evidence-based medicine. However, the explosive increase of the original literatures makes it difficult to accomplish critical appraisal and regular update. Artificial intelligence (AI) algorithms have been applied to automate the literature screening procedure in medical systematic reviews.

  14. AI tools

    What is it? Think of Consensus as ChatGPT for research! Consensus is "an AI-powered search engine designed to take in research questions, find relevant insights within research papers, and synthesize the results using the power of large language models" (Consensus.app).Consensus runs its language model over its entire body of scientific literature (which is sourced from Semantic Scholar) and ...

  15. PDF Problematic Machine Behavior: A Systematic Literature Review of

    expands the scope beyond just the sharing economy, including audits of search algorithms like Google, recommendation algorithms such as those used by Spotify, computer vision algorithms suchasAmazon's"Rekognition,"andmore. Still, the literature review presented byDillahunt et al. [46] was instructive for identifying potentially relevant studies.

  16. A Systematic Literature Review of A* Pathfinding

    A* is a search algorithm that has long been used in the pathfinding research community. Its efficiency, simplicity, and modularity are often highlighted as its strengths compared to other...

  17. Systematic Literature Review Search Query Refinement Pipeline

    While most of the literature on SLR automation focuses on study selection (see [] for a review), interest in SLR search support has recently sparked.These efforts can be categorised into two main groups: i) automatic techniques for generating search queries, and ii) automatic refining of SLR search queries.. In the search query generation task, studies have leveraged information from the ...

  18. Intelligent optimization: Literature review and state-of-the-art

    Also, evolutionary and meta-heuristic intelligent optimization algorithms (from literature review to state-of-the-art methods) along with appropriate analyzes are reported in Section 4. ... Algorithms that focus on the heuristic guidance of a constructor or local search algorithm so that the algorithm can overcome sensitive situations, such as ...

  19. Examining the developments in scheduling algorithms research: A

    Scheduling algorithms have vast areas of application, and it will be a tedious work to carry out a traditional systematic literature review to find out the leading sources, most relevant authors and institutions, authors and country collaborations, emerging thematic areas, top keywords, and the scientific production of researchers in this area.

  20. Problematic Machine Behavior: A Systematic Literature Review of

    While algorithm audits are growing rapidly in commonality and public importance, relatively little scholarly work has gone toward synthesizing prior work and strategizing future research in the area. This systematic literature review aims to do just that, following PRISMA guidelines in a review of over 500 English articles that yielded 62 algorithm audit studies. The studies are synthesized ...

  21. Cuckoo Search Algorithm for Optimization Problems—A Literature Review

    This article presents a literature review of the cuckoo search algorithm. The objective of this study is to summarize the overview and the application of CS in all categories that were reviewed. From the various researches in the literature, it was proven that standard CS in combination with Lévy flight was the main technique that is used in CS.

  22. An automated method for developing search strategies for systematic

    Our search term selection strategy focuses on cumulative effect and seeks to create an open-source search software in Python. Software design. Software design describes the structure of the software to be implemented, the data models used by the system, the interfaces, and, sometimes, the algorithms used [32]. Requirements usually precede the ...

  23. Point-of-Care Ultrasound (POCUS) in Adult Cardiac Arrest: Clinical Review

    Point-of-Care Ultrasound (POCUS) is a rapid and valuable diagnostic tool available in emergency and intensive care units. In the context of cardiac arrest, POCUS application can help assess cardiac activity, identify causes of arrest that could be reversible (such as pericardial effusion or pneumothorax), guide interventions like central line placement or pericardiocentesis, and provide real ...

  24. Over 170 million fake reviews were removed from Maps and Search thanks

    In brief: Google receives around 20 million business-related contributions to Maps and Search every day, including hours, ratings, photos, reviews, videos and more. Unfortunately, fake reviews are ...

  25. Literature review as a research methodology: An ...

    Journals & Books , Pages 333-339 Literature review as a research methodology: An overview and guidelines Add to Mendeley https://doi.org/10.1016/j.jbusres.2019.07.039 Knowledge production within the field of business research is accelerating at a tremendous speed while at the same time remaining fragmented and interdisciplinary.