NewsFlash is a news app that lets you catch up on news in a few minutes a day. It aggregates news articles worldwide into individual events, and summarize them into just the essentials.
The app itself is quite minimalistic. User experience revolves around the Feed, which displays a list of trending events. Articles contain an image extracted from one of the source articles (contained in a nice parallax scrolling effect), a summary of the event, and links to the top five most relevant sources. Furthermore, all images are loaded asynchronously and cached for easy access later on.
The backend is the real star of the show - I ended up writing a custom and versatile RESTful API written mostly in Python and Flask (but with some C holding it together in places). It aggregates thousands of articles every day and uses Natural Language Processing to find the most similar ones, group them into events, and extract contextual details - such as a location if it exists, or keywords, or a likely category. It also uses the respective APIs to look up article performance on social media, the sum of which constitutes an event's Social Score. After articles have been aggregated into individual events, our backend extracts the text from all of the articles and summarizes it. My algorithm is a combination of: TF-IDF (term frequency–inverse document frequency), which extracts important words and ideas common to all the articles; Centroid, which clusters sentences into general ideas; and TextSum, which extracts the most relevant sentences from the most important ideas.
Our team made a video demo of the app for the end-of-semester showcase! Check it out here:
© Wei Xiong, 2017