Oh god

Regarding hypotheses and questions,
• What phenomena or properties are being investigated? Why are they of interest?
-investigating the retrieval of documents based on their relatedness of a query document
-word-approaches are susceptible to problems caused by polysemy and synonymy; different vocabularies and text length or languages are hard to compare
-or searching for related documents, example: „[…]news website might want to recommend content with regards to the article a user is reading.“

• Has the aim of the research been articulated? What are the specific hypotheses and
research questions? Are these elements convincingly connected to each other?

-The aim is to present a scalable approach for related-document search using entity-based document similarity
-What is the performance of the model?

• To what extent is the work innovative? Is this reflected in the claims?

-They said directly : „[…]document models based on explicit semantics become competitive
compared to the predominant vector-space document models based on implicit semantics“
-So semantic based models were always inferior to vector space models

• What would disprove the hypothesis? Does it have any improbable consequences?

-If the performance of the presented model would be „a lot worse“ to vector space models?
• What are the underlying assumptions? Are they sensible?

-underlying assumptions? Well they are not stated explicitly but I guess there are some assumptions related to document retrieval models
Well I don’t know

• Has the work been critically questioned? Have you satisfied yourself that it is sound science?

-A lot of their similarity functions are leaned on other works; inspired by them or explaining the differences

Regarding evidence and measurement,
• What forms of evidence are to be used? If it is a model or a simulation, what
demonstrates that the results have practical validity?

-tests are executed on real life databases(dbpedia which is a small wikipedia and lucene which is a collection of articles)
• How is the evidence to be measured? Are the chosen methods of measurement
objective, appropriate, and reasonable?
-there are a bunch of similarity functions and benchmark tests in experiments

• What are the qualitative aims, and what makes the quantitative measures you have
chosen appropriate to those aims?

-qualitative aim is to retrieve highly correlated documents

quantitative measures:
-use of the standard benchmark for multiple sentence document similarity
-testing the approach on short documents
-comparing their approach with other approaches that perform unsupervised semantic similarity assessment

• What compromises or simplifications are inherent in your choice of measure?

-choice of the free variables beforehand or the limitation of the evaluation to a set of documents?

• Will the outcomes be predictive?

-Well I couldn’t predict the results just with the definitions

• What is the argument that will link the evidence to the hypothesis?


• To what extent will positive results persuasively confirm the hypothesis? Will
negative results disprove it?

-if positive: (copy paste) „[…]document models based on explicit semantics become competitive
compared to the predominant vector-space document models based on implicit semantics“

and yes negative results will disprove it

• What are the likely weaknesses of or limitations to your approach?

-ehhh maybe the usage of the model on other data sets which aren’t „neatly“ organised?

abstract abstract: outlier detection in temporal data

Outlier mining is an important data analysis task to distinguish objects that highly deviate from regular objects in their local neighborhood. There are various detection methods to identify these objects which are, for instance, density, distance or probability based.
Anomaly detection is used in application domains which involves problems like bank fraud, medical problems or research on crime activities.
But with the increasing size of the incoming data and the fact that gathering data is cheap, the interest in temporal databases has been growing.
This paper tries to provide an overview about existing outlier mining methods in temporal data.


(also mein Plan ist es outlier mining kurz zu erklären und weshalb dieser wichtig ist; anschließend will ich noch temporal data in Zusammenhang mit outlier mining bringen;)

Ein paar ausgewählte Publikationen

Hauptsächlich habe ich darauf geachtet wie oft es zitiert wurde, ob es eine angemessene Anzahl von Referenzen hinsichtlich seiner länge hatte und wie es strukturiert war.

(habe z.B nicht darauf geachtet in was es publiziert wurde und welche „Citation indices der Autor hatte)

Die ersten 3 Publikationen geben zu einem Aspekt des Themas eine Übersicht, während die letzten 3 Publikationen etwas genauer in das Thema gehen.

Chandola, Varun, Arindam Banerjee, and Vipin Kumar. „Anomaly detection: A survey.“ ACM computing surveys (CSUR) 41.3 (2009): 15.

Roddick, John F., and Myra Spiliopoulou. „A survey of temporal knowledge discovery paradigms and methods.“ Knowledge and Data Engineering, IEEE Transactions on 14.4 (2002): 750-767.

Gupta, Manish, et al. „Outlier detection for temporal data.“ Synthesis Lectures on Data Mining and Knowledge Discovery 5.1 (2014): 1-129.

Phua, Clifton, et al. „A comprehensive survey of data mining-based fraud detection research.“ arXiv preprint arXiv:1009.6119 (2010).

Zhang, Yang, Nirvana Meratnia, and Paul Havinga. „Outlier detection techniques for wireless sensor networks: A survey.“ Communications Surveys & Tutorials, IEEE 12.2 (2010): 159-170.
Hill, David J., and Barbara S. Minsker. „Anomaly detection in streaming environmental sensor data: A data-driven modeling approach.“ Environmental Modelling & Software 25.9 (2010): 1014-1022.

„Warp Drive Research Key to Interstellar Travel“ – Summary

Physicist Harold „Sonny“ White at NASA’s Johnson Space Center in Houston is investigating the feasibility of building a real warp-drive engine. He has assembled a tabletop experiment designed to create tiny distortions in spacetime, the malleable fabric of the universe.

If the experiment is successful, it may eventually lead to the development of a system that could generate a bubble of warped spacetime around a spacecraft, so that this warp drive would distort the spacetime along its path, allowing it to sidestep the laws of physics that prohibit faster-than-light travel. Such spacecraft could cross the distances between stars in just a matter of weeks.

Despite the fact that White’s idea is jeered at by other physicists and has allocated only a $50.000 budget, he gets a lot of support from a surprising number of scientists, engineers and amateur space enthusiasts who are believing in his dream. This resulted in the founding of various organizations- the 100 Year Starship project, the Tau Foundation, Icarus Interstellar- that seek to lay the groundwork for an unmanned interstellar mission that could be launched by the end of the century.

With contemporary spacecraft propulsion progress it is impossible to reach far away planets like Jupiter or Saturn in a reasonable time, let alone any of the nearby stars that could harbor habitable planets. The probe Voyager 1 can travel with the blistering speed of 38.610 miles per hour but even then it would take at least 70.000 years to reach them.

The distance of the stars are not the only problems once the theoretical speed is reached.
Interstellar dust can cause huge damage while travelling millions of miles per hour but can be prevented by heavily shielding the craft. In return it would take more fuel to accelerate the craft.
Another problem that needs to be addressed is the deceleration to avoid passing by its craft’s destination. Suggestions are that the engine would fire in the opposite direction it is facing but this would end in having to carry a heavier load of fuel for the spacecraft.

Other possibilities besides the warp drive that are less hypothetical are also being researched.
The Icarus Interstellar is coordinating a study of a proposed mission that use fusion power to propel the spacecraft. With nuclear fusion as impetus a probe could be thousands of times faster than the voyager 1 but researchers haven’t been able to successfully build a fusion power plant for the past 50 years. Nevertheless the researches are avidly continuing.

Exploring other star systems might not only be a curiosity. Jill Jarter argues that it is essential for humanities long-term survival. As long as humanity is confined to earth they are at risk of extinction by a planetary catastrophe like a nuclear war, a pandemic, an asteroid impact and so on.

Punctuation Game – Put in the missing punctuation marks (, ; -)

We live in the era of Big Data, with storage and transmission capacity measured not just in terabytes but in petabytes (where peta- denotes a quadrillion or a thousand trillion). Data collection is constant and even insidious with every click, and every „like“ stored somewhere for something. This book reminds us that data is anything but „raw“; that we shouldn’t think of data as a natural resource but as a cultural one that needs to be generated, protected and interpreted. The book’s essays describe eight episodes in the history of data from the predigital to the digital. Together they address such issues as the ways that different kinds of data and different domains of inquiry are mutually defining; how data are variously „cooked“ in the processes of their collection and use; and conflicts over what can or can’t be „reduced“ to data. Contributors discuss the intellectual history of data as a concept; describe early financial modeling and some unusual sources for astronomical data; discover the prehistory of the database in newspaper clippings and index cards; and consider contemporary „dataveillance“ of our online habits as well as the complexity of scientific data curation.

During succession, ecosystem development occurs; but in the long-term absence of catastrophic disturbance, a decline phase eventually follows. We studied six long term chronosequences in Australia, Sweden, Alaska, Hawaii and New Zealand; for each, the decline phase was associated with a reduction in tree basal area and an increase in the substrate nitrogen to phosphorus ratio, indicating increasing phosphorus limitation over time. These changes were often associated with reductions in litter decomposition rates, phosphorus release from litter and biomass, and activity of decomposer microbes. Our findings suggest that the maximal biomass phase, reached during succession, cannot be maintained in the long-term absence of major disturbance, and that similar patterns of decline occur in forested ecosystems spanning the tropical temperate and boreal zones.


Dieses mal war es die Aufgabe für kurze Textauszüge den Grund für die Mehrdeutigkeit zu bestimmen. Dabei konnte man zwischen (1) improper syntax (word order), (2) missing comma, (3) unclear pronoun reference, or (4) grouping of conflicting words wählen.


At this time, the Department of Energy is only considering Yucca Mountain as a possible storage site for nuclear waste. Other possible sites are excluded from discussion.

[Syntax – richtig sollte es „considering only yucca mountain“ heißen, only bevor yucca macht es mehrdeutig]

If the airplane waits too long to take off the de-ice fluid can dissipate.

[Kommasetzung- too long to take off, the de-ice fluid can dissipate]

The Lunar Module was only designed to hold two astronauts and to have a life time of forty-five hours.

[Syntax – designed to hold only two astronauts and to have only a life time…]

The beams are positioned with respect to the chopper blade so that while one beam passes the output of the opposite beam is completely blocked.

[Kommasetzung – „passes, the output“; könnte man beim ersten mal so lesen, dass der chopper blade geblockt wird]

The Hindenburg was filled with hydrogen because it is lighter than air…The report claimed that a hull wire could have ruptured a gas cell if it fractured.

[Vielleicht wegen dem „it“? – also unclear pronoun reference]

Avoiding complicated multi-ordered calculations, the equations come from fundamental definitions of mass flow, work, and efficiency.


To provide spill protection, all tanks were equipped with basins and automatic shutoff devices or overfill alarms or ball float valves.

[Kommasetzung  – fehlen wohl kommas weil die „or“ aufzählung nicht eindeutig ist]

Being the first step in introducing CFD, Jones had to set up conservative assumptions.


As with any system errors occur in localization. (system, errors)

[Kommasetzung – „system, errors“ ]

Having a model would help designers predict the effects of engine operation over all speeds.


Studiumsfach – Einfach ausprobieren

Die Zeit nach der Schule ist wohl eines der wichtigsten Momente für den weiteren Verlauf des Lebens. Man kann sich zum ersten mal frei entscheiden, welchen Lebensweg man eingehen möchte. Aber manchmal trifft man auch auf Leute, die ihre Entscheidungen in dieser zeit spontan festlegen, ohne groß über die Zukunft nachzudenken. Hi.

Ich studiere gerade Informatik, habe aber dieses Studienfach nicht aus bestimmten Gründen gewählt, da ich nach der Schule noch relativ offen war für jeden möglichen Studiengang. Am Ende kam es sogar dazu, dass ich Ratschläge von meinen Eltern geholt habe. Rückblickend aber waren diese Empfehlungen nicht so wirklich überwältigend wie ich dachte, da Sie ziemlich simple Analogieschlüsse zwischen meiner Freizeit und Computerspiele gezogen haben.

Da ich nicht wirklich viel gegen Informatik hatte, habe ich das Studium einfach gewählt ohne groß nachzudenken und wollte schauen wie es mir danach ergeht. Mit dieser Einstellung hätte wohl echt viel schief gehen können, aber anders als erwartet lief es so gut, dass man schon Spaß hatte. Ein Sportmatch macht auch mehr Spaß wenn man gut punktet, spielt wie man es möchte und man am gewinnen ist. Aber so ein Spiel kann natürlich seine Tiefpunkte haben, wo man auf die besseren Momente hofft, aber diese waren nie so gravierend das ich den Gedanken zu einem Wechsel hatte.

Nach diesem Vorgehen ist es wohl auch nicht wirklich überraschend, dass ich mich von dem Zufall überraschen werde, was ich nach dem Studium machen werde, da ich keine großen Pläne habe. Ich schätze das alles passieren kann, sodass man vielleicht in einer kleinen Firma für Webdesign oder doch vielleicht in einer Aktiengesellschaft wie Opel endet.

Und danach geht es auf eine Weltreise.