Anwendung Checkliste „Efficient Graph-Based Document Similarity“

  1. What phenomena or properties are being investigated? Why are they of interest?
    – Ähnlichkeiten zwischen Dokumenten sollen anhand eines effizienten Graphs erkannt werden, der auf Semantikuntersuchungen basiert
    – gängige Methoden operieren z.Bsp über Wortverteilung innerhalb des Textes
    – dabei werden Mehrdeutigkeit von Wörtern und Synonyme häufig zum Problem
    – außerdem sind Vergleiche zwischen verschiedenen Texten schwierig, z. Bsp. durch Verwendung unterschiedl. Vokabulars
  2.  Has the aim of the research been articulated? What are the specific hypotheses and research questions? Are these elements convincingly connected to each other?
    – gibt nicht wirklich Hypothesen
    – sie wollen zeigen, dass ihr Algorithmus:
    1. Vergleichsergebnis höhere Korrelation mit der menschlichen Vorstellung von Dokumentenähnlichkeit
    2. auch für kurze Dokumente anwendbar
    3. durch die Graph-Basirung effizient
  3. To what extent is the work innovative? Is this reflected in the claims?
    –  der Vergleich auf semantischer Ebene verbessert die gängigen Methoden –> zuverlässeriges Finden von Ähnlichkeiten
  4. What would disprove the hypothesis? Does it have any improbable consequences?
    – es werden keine negativen Aspekte im Paper benannt
    – wenn es nicht schneller wäre oder die Ergebnisse des Vergleichs von Dokumenten gleich oder schlechter ist, als bei bisherigen Verfahren
  5. What are the underlying assumptions? Are they sensible?
    – Texte sind vergleichbar anhand semantischer Zusammenhänge
  6. Has the work been critically questioned? Have you satisfied yourself that it is sound science?
    – ausführliche Modelbeschreibung bei den Ähnlichkeitsfunktionen
    – aber keine Einschränkungen oder kritische Fälle werden betrachtet
    –> eher nein
  7. What forms of evidence are to be used? If it is a model or a simulation, what demonstrates that the results have practical validity?
    – mathematisches Model als Grundlage der Ähnlichkeitsfunktionen
    – Experimente mit echten Datensätzen
  8. How is the evidence to be measured? Are the chosen methods of measurement objective, appropriate, and reasonable?
    – wird die Suchzeit-Kompkexität betrachtet –> sinnvoll, da ein schnelleres Verfahren als bisherige erreicht werden soll
    – Vergleich der Ergebnisse aus Experimenten –> ebenfalls sinnvoll, da sie bessere Suchergebnisse erzielen wollen
  9. What are the qualitative aims, and what makes the quantitative measures you have chosen appropriate to those aims?
    – Qualität: schneller und bessere Vergleichsergbnisse
  10. What compromises or simplifications are inherent in your choice of measure?
    – Nicht beantwortbar, da ich das Paper nicht geschrieben habe und damit nicht für Wahl verantwortlich war
  11. Will the outcomes be predictive?
  12. What is the argument that will link the evidence to the hypothesis?
    – Ergebnisse der Experiemente und der Laufzeitvergleich(?)
  13. To what extent will positive results persuasively confirm the hypothesis? Will negative results disprove it?
  14. What are the likely weaknesses of or limitations to your approach?
    – Es muss bei kurzen Sätzen mindestens eine Einheit gefunden werden mit der verknüpft werden kann

Ein Gedanke zu „Anwendung Checkliste „Efficient Graph-Based Document Similarity“

  1. Katrin sagt:

    Hier ist mein Lösungsvorschlag zur Checkliste:

    Scientific Argumentation checklist from Zobel, applied to the paper „Efficient Graph-Based Document Similarity“ by Paul et. al. Proc. ESWC 2016
    ********************************************************************************************

    Regarding hypotheses and questions,

    1. What phenomena or properties are being investigated? Why are they of interest?

    The relatedness of documents is used in a lot of applications. Newer approaches use graph-based document models which take relations between entities into account. However, operations such as comparison are expensive in graphs. The paper presents an efficien approach for finding similarities between documents based on graphs.

    2. Has the aim of the research been articulated? What are the specific hypotheses and research questions? Are these elements convincingly connected to each other?

    The aim was clearly articulated (to find an efficient way to compare graph-based document models). However, there are neither specific hypotheses nor a clear research question.

    3. To what extent is the work innovative? Is this reflected in the claims?

    The approach takes special characteristics of the graph of a document into account. It exploits hierarchical and transversal relations between entities which reduces computing costs.

    4. What would disprove the hypothesis? Does it have any improbable consequences?

    I assume the hypothesis is, that if you take the special characteristics of hierarchical and transversal relations into account, you can speed up computation of graph-based document similarity. The hypothesis would be disproved, if experiments would show that there is no gain in speed and reduce in computing costs. An improble consequence could be, that hierarchical and transversal relations would introduce othere characteristics as well which would even lower speed of computation.

    5. What are the underlying assumptions? Are they sensible?

    Underlying assumptions might be that hierarchical and transversal relations do not introduce side effects which affect computing. In my opinion this assumption is sensible, as those relations still make up the graph, i.e. they are relations in general. It is just a more special view.

    6. Has the work been critically questioned? Have you satisfied yourself that it is sound science?

    There is no real critical assessment of the work. A discussion section is missing. The paper contains an extensive evaluation of the presented approach, but possible drawbacks or disadvantages have not been named.

    Regarding evidence and measurement,

    1. What forms of evidence are to be used? If it is a model or a simulation, what demonstrates that the results have practical validity?

    Experiments and comparisons with other measures have been performed in order to show, that the presented approach is indeed faster and less expensive in computing costs.

    2. How is the evidence to be measured? Are the chosen methods of measurement objective, appropriate, and reasonable?

    Experiments have been run on standard datasets which have been acknowledged as gold standards in the community. A comparison has been drawn with well-known and state of the art approaches, such that the evidence can indeed be seen as objective, appropriate and reasonable. However, especially for the speed test, the authors fail to state which hardware was used for the experiments. This is a crucial fact for reproducability.

    3. What are the qualitative aims, and what makes the quantitative measures you have chosen appropriate to those aims?

    qualitative aims: document similarity has been shown by computing different correlation coefficients, speed has been measured by stopping run time of an execution task.

    4. What compromises or simplifications are inherent in your choice of measure?

    The authors compared their approach only to a selected number of other methods (those which are most comparable).

    5. Will the outcomes be predictive?

    Yes. Clear trends could be shown in the experiments, that the presented approach does indeed find better results (better similarity than with other approaches), and it is faster.

    6. What is the argument that will link the evidence to the hypothesis?

    The numbers show that the presented approach outperforms those which have been used for comparison.

    7. To what extent will positive results persuasively confirm the hypothesis? Will negative results disprove it?

    Positive results confirm the hypothesis, as speed and correlation are measured which directly correlates to the goal of the work. Negative results would indeed disprove the hypothesis.

    8. What are the likely weaknesses of or limitations to your approach?

    The authors do not state any.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.