Hausaufgabe Checkliste – Kai

Regarding hypotheses and questions

  • What phenomena or properties are being investigated? Why are they of interest?
    • Graph-based document models are often slow, but are can give promising results. Paper invesitgates means to improve performance. They are of intereset since relatedness of documents are used in many applications  (e.g. document retrieval and recommendation).
  • Has the aim of the research been articulated? What are the specific hypotheses and
    research questions? Are these elements convincingly connected to each other?

    • Not really hypotheses / research questions.
    • Developing a faster algorithm compared to other graph-traversal based approaches.
    • Specific claims:

      • higher correlation with human notions of document similarity
      • holds for short documents with few annotations
      • document similarity is computed (more) efficiently
  • To what extent is the work innovative? Is this reflected in the claims?
    • They use a different/new  approach for efficient knowlegde-graph based semantic similarity (the preprocessing step called Semantic Document Expansion??)
  • What would disprove the hypothesis? Does it have any improbable consequences?
    • Not really hypothesis, rather their claims: It would be disproved if the results from the experiments would be slower / worse than the other approaches.
  • What are the underlying assumptions? Are they sensible?
  • Has the work been critically questioned? Have you satisfied yourself that it is
    sound science?

Regarding evidence and measurement

  • What forms of evidence are to be used? If it is a model or a simulation, what
    demonstrates that the results have practical validity?

    • Running a simulation with standard benchmark (enables comparison with other work).
  • How is the evidence to be measured? Are the chosen methods of measurement
    objective, appropriate, and reasonable?

    • Is measured by results from experiments. Its objective, fitting, reasonable.
  • What are the qualitative aims, and what makes the quantitative measures you have
    chosen appropriate to those aims?

    • Qualtitative aims: approach is faster / better than similar approaches
    • Quantitative measures: correlation, time, ranking in comparison to other approaches
  • What compromises or simplifications are inherent in your choice of measure?
    • I don’t know. Maybe that it uses a standard benchmark
  • Will the outcomes be predictive?
  • What is the argument that will link the evidence to the hypothesis?
    • I think plain, objective data is enough as an argument. So no real argument is given.
  • To what extent will positive results persuasively confirm the hypothesis? Will
    negative results disprove it?

    • Positive results ARE the proof, that confirms the „hypothesis“ (the claim that its better), thus negative results would easily disprove them.
  • What are the likely weaknesses of or limitations to your approach?
    • Works only for short sentences if an entity can be found to be linked with (maybe its not that improbable that sentences don’t have entities, I don’t know)

Ein Gedanke zu „Hausaufgabe Checkliste – Kai

  1. Katrin sagt:

    Hier ist mein Lösungsvorschlag für die Checkliste:
    Scientific Argumentation checklist from Zobel, applied to the paper „Efficient Graph-Based Document Similarity“ by Paul et. al. Proc. ESWC 2016
    ********************************************************************************************

    Regarding hypotheses and questions,

    1. What phenomena or properties are being investigated? Why are they of interest?

    The relatedness of documents is used in a lot of applications. Newer approaches use graph-based document models which take relations between entities into account. However, operations such as comparison are expensive in graphs. The paper presents an efficien approach for finding similarities between documents based on graphs.

    2. Has the aim of the research been articulated? What are the specific hypotheses and research questions? Are these elements convincingly connected to each other?

    The aim was clearly articulated (to find an efficient way to compare graph-based document models). However, there are neither specific hypotheses nor a clear research question.

    3. To what extent is the work innovative? Is this reflected in the claims?

    The approach takes special characteristics of the graph of a document into account. It exploits hierarchical and transversal relations between entities which reduces computing costs.

    4. What would disprove the hypothesis? Does it have any improbable consequences?

    I assume the hypothesis is, that if you take the special characteristics of hierarchical and transversal relations into account, you can speed up computation of graph-based document similarity. The hypothesis would be disproved, if experiments would show that there is no gain in speed and reduce in computing costs. An improble consequence could be, that hierarchical and transversal relations would introduce othere characteristics as well which would even lower speed of computation.

    5. What are the underlying assumptions? Are they sensible?

    Underlying assumptions might be that hierarchical and transversal relations do not introduce side effects which affect computing. In my opinion this assumption is sensible, as those relations still make up the graph, i.e. they are relations in general. It is just a more special view.

    6. Has the work been critically questioned? Have you satisfied yourself that it is sound science?

    There is no real critical assessment of the work. A discussion section is missing. The paper contains an extensive evaluation of the presented approach, but possible drawbacks or disadvantages have not been named.

    Regarding evidence and measurement,

    1. What forms of evidence are to be used? If it is a model or a simulation, what demonstrates that the results have practical validity?

    Experiments and comparisons with other measures have been performed in order to show, that the presented approach is indeed faster and less expensive in computing costs.

    2. How is the evidence to be measured? Are the chosen methods of measurement objective, appropriate, and reasonable?

    Experiments have been run on standard datasets which have been acknowledged as gold standards in the community. A comparison has been drawn with well-known and state of the art approaches, such that the evidence can indeed be seen as objective, appropriate and reasonable. However, especially for the speed test, the authors fail to state which hardware was used for the experiments. This is a crucial fact for reproducability.

    3. What are the qualitative aims, and what makes the quantitative measures you have chosen appropriate to those aims?

    qualitative aims: document similarity has been shown by computing different correlation coefficients, speed has been measured by stopping run time of an execution task.

    4. What compromises or simplifications are inherent in your choice of measure?

    The authors compared their approach only to a selected number of other methods (those which are most comparable).

    5. Will the outcomes be predictive?

    Yes. Clear trends could be shown in the experiments, that the presented approach does indeed find better results (better similarity than with other approaches), and it is faster.

    6. What is the argument that will link the evidence to the hypothesis?

    The numbers show that the presented approach outperforms those which have been used for comparison.

    7. To what extent will positive results persuasively confirm the hypothesis? Will negative results disprove it?

    Positive results confirm the hypothesis, as speed and correlation are measured which directly correlates to the goal of the work. Negative results would indeed disprove the hypothesis.

    8. What are the likely weaknesses of or limitations to your approach?

    The authors do not state any.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.