The Coupler: a new bibliometric tool for relational citation, bibliographic coupling and co-citation analysis

Introduction: The use of programming languages in the context of Metric Information Studies has gained ground among the scientific community in the area as they are practical, free or have low computational cost. Objectives: To present a new, free and alternative bibliometric tool, aimed at relational citation analysis, focusing on bibliographic coupling, built on the R programming language, entitled The Coupler . Methodology: We ground the relational analysis of citation, co-citation and bibliographic coupling from a mathematical perspective and present the source code of the tool and the conditions under which it was built with its access granted for all. We use generic data and expose all its features to test the tool; and to demonstrate the use with real data, we operate bibliometric, patentometric, altmetric and natural language data. Results: The tool builds citation, coupling and co-citation matrices, in addition to calculating normalized values via Salton’s Cosine and Jaccard Index of the bibliographic coupling frequencies. Furthermore, The Coupler builds bibliographic coupling networks and identifies the coupling units responsible for each coupling pair; the latter feature, a distinct result for traditional bibliometric software. Conclusion: The paper concludes that the tool behaves as expected and satisfactorily when processing both generic data and bibliometric, patentometric, altmetric and natural language data. Among the results, especially and differentiating from other software, we highlight the identification of coupling units, and calculations of normalized coupling frequencies via Salton’s Cosine and Jaccard Index.


| 4
Univariate citation analysis refers to the impact and measurement of the scientific community recognition based on the citations received by a certain set of documents, authors, institutions, or any other unit of analysis.In this regard, there are, for example, mean citation rates per year per article, the Impact Factor, or CiteScore.Regarding the second item, it seeks to establish groupings between different units of analysis.At this point, bibliographic coupling and co-citation analyses are found.
Regarding bibliographic coupling analysis (BC), the method was proposed by Kesller (1962;1963) to identify how articles are grouped together through the sharing of cited references.That is, if two articles share at least one reference (they cite the same reference), these articles are bibliographically linked.Kesller (1963) evaluated BC according to two types of groupings between articles called GA and GB: GA) Given a set of articles, they are all coupled together (they cite at least one reference in common); GB) all articles are coupled to a certain article.
The number of cited references in common by two articles was called bibliographic coupling strength (or frequency of bibliographic coupling) and the references responsible for coupling these two articles are defined as coupling units.In this way, BC is configured as a relationship between citing articles (citing-citing) Years later, Small (1973) coined the term co-citation analysis (CA) based on the co-occurrence of two documents cited together in at least one list of references, that is, CA measures in how many documents other two documents are co-cited (cited concomitantly).Thus, this analysis reveals the relationship between cited articles (citedcited).In summary, while BC quantifies the similarity of references between two documents, CA exposes the recurrence in which two references are cited jointly.This quantification can be understood as the co-citation frequency between two articles.
Bibliographic coupling and co-citation relational analyses have evolved over time so that White (1981) and White Griffith (1981) proposed author co-citation analysis (ACA).In this perspective, ACA verifies in how many articles two authors were cited simultaneously.For this, the paper of the cited author is considered unique and no longer the reference of the cited article.
In 2008, Zhao and Strotmann (2008) enunciated author bibliographic coupling (ABA).This new analysis intends to couple two researchers from two perspectives: using cited documents or cited authors.In the first case, the number of common references two researchers share is computed.In the second one, the cited author's paper is considered unique and the number of authors cited in common by two researchers is calculated.
Since the first studies proposed by Kesller (1962;1963), relational analyses have gained ground in the field of bibliometrics and various methods of normalization of coupling frequencies, such as the Jaccard Index (JI) and the Salton's Cosine (SC), have been debated since Sen and Gan (1983).These methods can be described in Equations 1 and 2 as: Where A and B are any two analysis units to be coupled and  (, ) represents how many coupling units A and B share (they cite in common).
| 5 Traditionally, relational citation analyses are basically the result of matrix interactions conceived from the citation adjacency matrix of the analyzed situation.That is, given a citation matrix  × , the matrix containing the bibliographic coupling frequencies (  ) is given by:   =  ×   , while the matrix containing the co-citation frequencies (  ) is given by   =   × .This is exemplified in Equation 3: Where A, B and C are citing units and i, j, k, l are cited units.Furthermore, the second matrix corresponds to the bibliographic coupling matrix, and the third matrix corresponds to the co-citation matrix.To illustrate, units A and B have coupling strength equal to 2 (coupling units are i and l); units i and l have a co-citation frequency equal to 2 (they were simultaneously cited by A and B).As for the main diagonal of the coupling and co-citation matrices, these elements represent a reflexive connection of a certain element with itself.In co-citation matrices, this reflexive relationship represents the number of elements that cited it.For example, item i was cited by A, B and C, so the main diagonal element related to i is equal to 3. In coupling matrices, these values are little representative.In this sense, Grácio and Oliveira (2015) warn about this situation and suggest: electing the highest frequency of the element with the others; adopting the mean frequency of each element as the diagonal value; recording zeros diagonally; or defining the diagonal as a set of missing values.Among them, the last alternative has greater acceptance and use in the research community, as it is more easily executed and has less conceptual bias.
It is even possible to observe another result, the cardinality of the list of cited units of A is equal to four (i,j,k,l), of B is equal to 2 (i, l), and of C is equal to 1 (i).From this information, it is possible to normalize the coupling frequency values via Salton's Cosine and/or Jaccard Index, as shown in Equation 4, respectively.
Equations 3 and 4 explain a consequence of the matrix calculation between the citation  matrix and its transpose: the coupling matrices (normalized or not) and the cocitation matrix must be square and symmetric in relation to the diagonal.That is, the elements of the upper triangular part are the same as those of the lower triangular part.This result infers that both AB (citing-citing relationship) and the CA (cited-cited relationship) do not represent directed relationships, such as the citation matrix (citingcited relationship).i 3 1 1 2  j 1 1 1 1  k 1 1 1

METHODOLOGY
Three steps were organized to efficiently present The Coupler.The first step describes the information regarding its computational aspects such as its construction, hosting and features.In this step, the tool's design, its graphical interface, its code, and how the user can run The Coupler online or offline are presented, in addition to information about the file formats compatible with the tool (Figures 1 and 2).
In the second step, all of The Coupler's features are demonstrated using the data present in Equations 3 and 4. All processing possibilities supported by the tool are demonstrated, as well as data export (Figures 3 to 14).
In the third step, tests are performed using bibliometric, patentometric, altmetric and natural language data.The bibliometric data used in the tests were obtained after searching for the term "altmetrics" in all fields in the Web Of Science database, generating three results: Doc.1) Maricato and Martins (2017); Doc.2) Rocha and Silva (2020); Doc.3) Gouveia (2019).For this, the three documents were coupled using the authors cited in each document as coupling units (Figure 15a).To extract the cited authors, the export selected authors function of the VosViewer software was used (Appendix 2).
For processing the patentometric data, three patents, present in the Derwent Innovation Index database, with codes: CN113317206 (2021), CN112450208 (2021) and CN112229155 (2021) were combined.In this case, the three patents were coupled via the cited patents (Appendix 3).This result is shown in Figure 15b.
As for altmetric data, data from Delbianco (2022, p. 114) were used, referring to Twitter followers of three scientific journal profiles in the area of Information Science: 1) Acervo: Revista do Arquivo Nacional; 2) AtoZ: novas práticas em informação e conhecimento; 3) Ciência da Informação em Revista.In this way, the three profiles were coupled via common followers (the number of followers in common was verified).Coupling using followers as the coupling unit is shown in Figure 15c.
As a last analysis, a natural language processing was performed comparing the words (in common) present in the abstracts of the three aforementioned articles (Doc.1:Maricato and Martins (2017); Doc.2: Rocha and Silva (2020); Doc.3: Gouveia ( 2019)).To extract the words, the export selected terms function of the VosViewer software was used.By default, the software suggests using the most relevant 60% of the processed abstract words.For this analysis, the pattern suggested by the software was used and the abstracts were coupled via words in common (Appendix 4).This analysis is shown in Figure 15d.All tests in the subsequent sections were performed using both the online and offline versions of The Coupler.The computer used in all operations has a Windows 10 operating system, Intel Core (TM) i7-8550UCPU@1.80GHz-1.99GHz processor and 8 Gigabytes of RAM memory.

THE COUPLER
The Coupler was developed based on the theoretical bases of relational citation analysis.A web application (web-app) focused on relational citation analysis, focused on coupling analyses.The tool is capable of building citation, coupling and co-citation matrices.In addition to this matrix perspective, the tool calculates normalized values via Salton's Cosine and Jaccard index of the bibliographic coupling frequencies admitting any type of analysis unit and any type of coupling unit.Furthermore, it builds the bibliographic coupling network and identifies the coupling units responsible for each | 7 coupling pair.The latter feature is a distinct result for traditional bibliometric software.The source code is presented in Appendix 1, and can be executed via the R programming language (by pasting the code into R) or accessed via the website https://rafaelcastanha.shinyapps.io/thecoupler.For hosting the web-app, the private server of shinyapps.ioby R-Studio was used, as it allows the implementation of shiny applications on the web environment.
At the time this paper was submitted, The Coupler was in the process of registration in the National Institute of Industrial Property (INPI) through the Unesp Innovation Agency (AUIN).The required record is a computer program record, based on the Copyright Law (Law No. 9.610/1998) and the Law of Software (Law No. 9.609/1998).Consequently, after registration, the author guarantees maintenance and assistance (to users) of The Coupler.
The availability of the tool as an online application facilitates dissemination and access from different devices and operating systems such as Linux, Mac-OS and Windows.The Coupler is also accessed via mobile devices such as tablets and cell phones with different systems such as Android or ios; in addition to different browsers such as Google Chrome, Microsoft Edge, Safari, among others.In this way, the app does not require prior installation of the R programming language, only the file to be processed.
The resource, both online and offline (accessed via R) only supports .txtfiles organized in columns with a header that is tabbed, separated by commas or semicolons.Taking as an example the units A, B and C from Equation 2, Figure 1 presents these three types of organization.Thus, The Coupler requires prior, and possibly manual, organization of the processed files.Some software, such as VosViewer for example, provide the option of extracting cited units (authors, articles or journals) through its features export selected authors, export selected cited references or export selected sources.
The manual organization of the mentioned items can be understood as a preprocessing step of The Coupler.This step can be very useful for papers that are not indexed in databases capable of automatically extracting cited references (such as Web Of Science, Scopus and Dimensions), for non-digitized papers in which the extraction must be done manually, and for non-bibliographic data, in which organization or collection would be carried out manually.
Unlike software such as VosViewer and Bibliometrix, which process data coming directly from databases in .csv,.xlmsor .risformats, The Coupler has this feature, however, the tool is capable of processing any type of unit , such as lists of: DOI (Digital Object Identifier), ORCID, researchers who make up research groups or departments, research areas of interest, researchers present in departments, references suggested in course teaching plans, followers on social networks, among others.In this way, the other tools mentioned above can be complementary, and not mutually exclusive.The option for the .txtextension is justified as the extension is light and easy to access on several devices.In this way, the only requirement for using The Coupler is the existence of a previously organized file.Figure 2 illustrates the graphical interface of the tool.

Figure 2. The Coupler graphical interface
Source: by the author When inserting the tabbed file in "Select the file", the user must choose the type of separator in the file: separated by comma or semicolon.After that, when clicking on "Coupling!" the tool will return six results, which are shown in tab format in Figure 2: i) the bibliographic coupling network; ii) the coupling frequencies between each pair of units and their normalizations via Salton's Cosine and Jaccard Index (Equation 1) organized in table format; iii) the coupling units responsible for coupling each pair of units, organized in table format; iv) the citation matrix; v) the coupling matrix between the citing units; vi) the co-citation matrix between the cited units.
Given these six results, the user will be able to export all the obtained results, as the tool allows saving the image from the bibliographic coupling network and offers the option to download the table containing the bibliographic coupling frequencies between each analyzed unit, the table containing all the coupling units and the citation, coupling and co-citation matrices.Except for the image, all files are saved in tabulated .txtformats and organized in columns.
In addition, the user can choose to view the network from three perspectives by clicking on "Normalizations": i) without normalization: user will return the coupling network valued by the absolute values of the coupling frequencies; ii) Salton's Cosine: user will return the coupling network valued by the normalized values of the coupling frequencies via Salton's Cosine; iii) Jaccard Index: user will return the coupling network valued by the normalized values of the coupling frequencies via the Jaccard Index.
As much as the central idea of the tool is bibliometric analysis and, consequently, the use of bibliographic units such as authors or articles, the tool is not capable of distinguishing whether the input data deal with bibliographic elements or not, bringing greater diversity to its use.Thus, similarity (coupling) or co-occurrence (co-citation) analyses can be extended to any analysis units, such as altmetric and patentometric units, or even, any type of analysis of intersection (similarity) between sets.

DEMONSTRATION OF ANALYSIS VIA THE COUPLER
To present The Coupler's analyses and tests, the file shown in Figure 1 was initially used to demonstrate how it works.After this test, bibliometric, patentometric, altmetric and natural language data were used.In this way, the files present in Figure 1 can be understood as any analysis unit (authors, documents, institutions, among others).For the first test, Figure 3 (a, b, c) shows the coupling network and the normalization possibilities through "without normalization", Salton's Cosine or Jaccard Index.
When choosing between the three options, the edges vary in thickness, therefore the thicker the greater the proximity between the analyzed units, or even, the thicker the edges, the greater the frequency (strength) of bibliographic coupling.The visualization of the bibliographic coupling network uses the igraph graph library present in R. It is known that several software programs are capable of generating different networks, and in this context, the focus of The Coupler is not exactly the visualization of the graph, since different software can do this, but rather the relational aspects present behind the representation and found in the tabs underlying the "Bibliographic Coupling Network".Thus, Figure 4 presents the results found in "Coupling Frequencies".Figure 4 presents the elements: i) "X1" and "X2": represent the units of analysis to be compared (coupled); ii) "refs_X1" and "refs_X2": cardinality of the items cited by "X1" and "X2", respectively; iii) "Coupling": items cited in common by X1 and X2, respectively; iv) Saltons_Cosine: Coupling values normalized via Salton's Cosine; v) Jaccard_Index: Coupling values normalized via Jaccard Index.

| 11
As an example, the first line reads: unit A has a list of references made up of 4 elements, unit B has a list of references made up of 2 elements, and both cited 2 elements in common.This value normalized via Salton's Cosine is equal to 0.7071068, and normalized via Jaccard Index is equal to 0.5.
In this tab, the "Download Data" button export these results.By exporting the data, a .txtfile named "Coupling Frequencies.txt" is generated, as shown in Figure 5.The data in Figure 5 are tabulated and organized into columns.This feature facilitates its use in spreadsheets.In addition, it is possible to use the "Search" field to search for units of analysis the user wants to focus on in addition to the possibility of navigating through the pages.By default, the tool supports the initial 25 results.

| 12
Moreover, the code behind The Coupler excludes null columns, that is, in case a unit has not cited any reference, this unit will be excluded from the analysis, since it will not couple with any other unit.This case is analogous to articles that do not have references, that is, in which the author of the article does not cite anyone in his/her study.
Following, the "Coupling Units" tab brings the most relevant and unprecedented result among all those present in the tool, which is the identification of the coupling units, that is, the elements in common for each pair of analyzed units.Figure 6 illustrates the identification of all coupling units.The next displayed tab is the "Citation Matrix".This tab displays the matrix composed of citing and cited units, similar to matrix  in Equation 3.That is, the citation

| 13
adjacency matrix that will give rise to the analyses of bibliographic coupling and cocitation is displayed.This matrix is shown in Figure 8.The citation adjacency matrix in Figure 8 represents the asymmetric citation relationship between units A, B and C and elements i, j, k, l.Similarly to the previous examples, this matrix can be exported, as shown in Figure 9. From the citation matrix, the coupling and co-citation matrices can be obtained through the matrix operations between the citation  matrix and its transpose   .Thus, on the next tab, the "Coupling Matrix" which represents the  ×   relation is found.This relationship is present in Figure 10. Figure 10 shows, in absolute values and in matrix form, the bibliographic coupling relationship between units A, B and C.This result is similar to that found in the "Coupling" column of the "Coupling Frequencies" tab.The option for this redundancy of results is due to the fact that, traditionally, relational analyses are treated from a matrix point of view, thus, this option is offered to the user.The export of these data is similar to the others and is shown in Figure 11.The coupling matrix in Figure 11 generated by The Coupler is composed of zeros on its main diagonal.Essentially, the use of zeros is well accepted.These values are notably different from those found in Equation 2, in which the product between the citation matrices and their transpose returns the value of the product between vector A, B or C with itself.However, this does not affect the result of the bibliographic coupling.

| 14
As a final result, we have the co-citation matrix between the cited units i, j, k and l.This matrix is shown in Figure 12, and explains the matrix relationship between the transposed citation matrix and the citation matrix itself (  × ).The matrix in Figure 12 is similar to that found in Equation 3. In essence, a cocitation matrix quantifies the frequency with which two units were cited concomitantly.The main diagonal of a matrix represents the citation frequency received by each unit.Thus, the values of the main diagonal 3, 1, 1 and 2 indicate that the units i, j, k, and l were cited 3, 1, 1 and 2 times, respectively.It is possible to export the citation matrix in .txtextension, as shown in Figure 13.The co-citation matrix export can be considered the last result provided by The Coupler.Thus, from the set of data obtained through the tool, the user can proceed with the analyses.As a last demonstration, if a set of units to be analyzed does not show any coupling between them, the application will show the alert message: "WARNING: No couplings between units" as shown in Figure 14.In case none of the analysis units are coupled together, none of the six analyses (Figures 3,4,6,8,10,12) will be processed.In this way, in order for the coupling frequency calculations to be processed, the identification of the coupling units in addition to the citation, coupling and co-citation matrices, at least two units must be coupled to each other.

| 16
If necessary, the user can use the exported data from the citation, coupling and cocitation matrices in software such as Ucinet to build citation, coupling and co-citation networks.It is noteworthy that, when exporting data in tabulated format, the use of these data in electronic spreadsheets such as Microsoft Excel or Google Sheets is facilitated.

BIBLIOMETRIC, PATENTOMETRIC, ALTIMMETRIC DATA AND NATURAL LANGUAGE
The main idea of The Coupler is to promote analysis of proximities between any sets, and regardless of the bibliometric focus, it allows any type of analysis, whether within the scope of MIS or not.Thus, as a final demonstration of the tool, bibliometric, patentometric, altmetric and natural language data were processed.After processing, coupling frequencies (similarities) and coupling units between each analyzed pair were displayed.These four analyses differ little from those previously exemplified (Figures 3 to 13) since The Coupler does not differentiate the units of analysis.This characteristic favors its use, as by admitting diverse data, this tool becomes extremely versatile and can be applied in any type of similarity and/or co-occurrence analysis.These analyses are exemplified by identifying: co-authors in common between researchers (For example, taking Figure 1 as a basis, researchers would be A, B and C and co-authors i, j, k and l), common members of research groups, co-occurrence and/or similarity of keywords, references cited in common by course teaching plans, among others.

| 17
All calculations shown in Figure 15 were manually checked, and the tool correctly calculated Coupling, Salton's Cosine, and Jaccard Index values, and correctly identified all units of analysis and all coupling units; the latter, The Coupler's standout feature.In analyses of publications in a particular area, institution or specific theme, the coupling units represent a fundamental part of the intellectual structure of the set of analyzed papers.Identifying the elements responsible for coupling two items may provide means

| 18
for identifying the main influences of a domain, based on the recurrence of certain units in the different lists of analyzed references.

FINAL CONSIDERATIONS
This research presented and demonstrated the use of the software The Coupler, a new free tool for relational analysis focused on bibliographic coupling (or several similarity analyses).Firstly, from a mathematical and bibliometric point of view, the analyses of citation, bibliographic coupling and co-citation was based, and, from these foundations, the new tool was presented.
The Coupler consists of a web application capable of generating citation, coupling and co-citation matrices, in addition to calculating normalized coupling frequencies via Salton's Cosine and Jaccard Index, building the bibliographic coupling network valued by the normalizations and identifying the coupling units (elements responsible for connecting two units of analysis).
This last result can be considered the tool's most prominent item, since this type of identification is not common in other bibliometric tools.In the same proportion, the calculation of normalized coupling frequencies via Salton's Cosine and Jaccard Index also provides The Coupler with its own originality, since normalizations are extremely useful for comparisons between different contexts to be carried out.
Furthermore, the tool was submitted to a demonstration using generic data (Figures 3 to 13) as well as tests using real data, commonly used in bibliometric, patentometric, altmetric analysis and in natural language processing.The Coupler responded to the tests as expected, calculating all coupling frequencies between the units of analysis, in an absolute and normalized way, in addition to identifying all coupling units.
While limitations are found, for the web-app version, the tool will find problems in processing large datasets, since, currently, its hosting on the shinyapps.ioserver has an instance size limit of 1 gigabyte.In cases of large sets, we suggest using The Coupler in its offline version and adjusting the memory limit used by the R software.For memory adjustments, the functions memory.sizeand memory.limitare used.In simulated tests, The Coupler, launched offline via R, completed analysis of a file containing 1001 citing articles and 10010 cited items in approximately 50 minutes and 3 seconds.That is, when processing 1001 items, the tool completed the calculation of 500500 coupling interactions together with their respective normalized values, in addition to the citation, coupling and co-citation matrices.It is noteworthy that the computational cost may vary according to the computer used.This test was performed on the computer mentioned in the Methodology section.
With this, the tool was able to process diverse data and thus, it can be considered as an alternative tool to other bibliometric software aimed at relational analysis.As next steps, we expect to keep The Coupler always up to date, free of any problems that may occur, in addition to carrying out future implementations required from users, such as, for example, the direct acceptance of files exported from the Web Of Science and Scopus databases and converting the tool into an executable file.Finally, we invite the entire community to use The Coupler.

•
Funding: This study was partially funded by the Coordination for the Improvement of Higher Education Personnel -Brazil (CAPES), Financial Code: 88887.678240/2022-00.• Conflicts of interest: The author certifies that he has no commercial or associative interest that represents a conflict of interest in relation to the manuscript.• Ethical approval: Not applicable.• Availability of data and material: All data generated and analyzed during the present study are available in the body of the original text and in its annexes.• Authors' contributions: Conceptualization, Data Curation, Formal Analysis, Funding Acquisition, Research, Methodology, Resources, Supervision, Writing -Original Draft, Writing -Review & Edition: CASTANHA, R. G.

Figure 1 .
Figure 1.Types of file organization

Figure 3 .
Figure 3. Bibliographic coupling network generated via The Coupler using: a) without normalization; b) Salton's cosine; c) Jaccard index

Figure 4 .
Figure 4. Coupling Frequencies generated via The Coupler

Figure 5 .
Figure 5. Export of results in "Coupling Frequencies"

Figure 6 .
Figure 6.Coupling units identified via The Coupler

Figure 7 .
Figure 7. Export coupling units via The Coupler

Figure 8 .
Figure 8. Citation Matrix generated via The Coupler

Figure 9 .
Figure 9. Citation matrix export via The Coupler

Figure 10 .
Figure 10.Coupling Matrix generated via The Coupler

Figure 11 .
Figure 11.Export the coupling matrix via The Coupler

Figure 12 .
Figure 12.Co-citation matrix generated via The Coupler

Figure 13 .
Figure 13.Co-citation matrix export via The Coupler

Figure 14 .
Figure 14.The Coupler warning message