NGI ENGINEROOM

Explorations in Next Generation Internet Emerging

EU Engineroom focuses on identifying and evaluating the key enabling technologies and topics that will underpin the Next Generation Internet in 2025.

Engineroom’s three key pillars:

Consortium's partners:

Sources

Unique terms: 0+

Media articles: 0
Working papers: 0

Analysis period: two years and three months

Multiple gigabytes of raw
and processed data

Why ArXiv and SSRN?

Lengthy publication process in scientific journals

Broad coverage:

SSRN's e-Library provides almost 800,000 research papers from 365,000 researchers across 30 disciplines

ArXiv provides open access to almost 1,500,000 e-prints mostly in STEM fields

Methodology

Project Goal and General Idea

Major aim is to identify key technologies determining the development of Internet until 2025

Strong focus on the relationship between technological areas and social issues

Data-driven approach with heterogeneous sources of data

Trend analysis

Analysis based on the frequency of appearances for all unigrams and bigrams in the texts

Average monthly change in the analised term's frequency is calculated by OLS regressions

The coefficient reveals the trending unigrams and bigrams

Co-occurrence analysis

Exploring the relationship between topics

Pairs of terms which are mentioned together in media articles

The number of articles containing both terms is divided by the number of articles including our previously identified keyword of interest for every media website

Issue mapping

Articles are categorised across two dimensions: geography (EU vs US) and covered topic (social vs technological)

Words are ranked based on their frequency in articles classified as social and non-social (technological)

Wikipedia network analysis

Matching the keywords to Wikipedia articles and parsing their text to extract hyperlinks

Generating the network of hyperlinks that connects the articles with one another

Using a community detection algorithm (the Louvain method) to identify clusters of nodes

Main Programming Tools

Topic identification

132 most trending NGI related keywords are identified

Grouped into 21 wider areas

The size of the bubble is based on the regression coefficient

Bigger bubble: more robust trend

Topic co-occurrence

The goal is to dive deeper in emerging technologies

Relationship between social issues and technology

These pairs frequently appear together in articles (news) or are used in comments about a topic (reddit)

News co-occurrence

Reddit co-occurrence

Short topic list

Click buttons to read short descriptions

Linked keywords: open internet, net neutrality, personal data, cambridge analytica, identity theft, black box, ai research	Linked keywords: hate speech, alt-right, extremist content, sexism, gender discrimination, #metoo, child safety, trafficking, parental control, youtube kids, diversity, racism, accessibility, 5G networks, care robots, voice assistants and chatbots, online safety	Linked keywords: smart contracts, distributed ledgers, facial recognition, digital assistant, voice assistant
Linked keywords: cybersecurity, ransomware, cyberwar, cyber threats, meltdown, nonpetya, hacking, quantum computing, encryption, critical infrastructure, autonomous weapons, killer robots, equifax	Linked keywords: Machine learning, deep learning, algorithmic bias, algorithmic accountability, artificial intelligence, black box, open AI, data lakes, transparency	Linked keywords: election hacking, election meddling, fake news, foreign intelligence, new media, filter bubble, echo chamber, media literacy, weaponisation of information, advertising, cambridge analytica, bots, fake accounts, platform economics, media platform, conspiracy theories
Linked keywords: privacy, informed consent, smart cities, self-driving cars, facial recognition, surveillance, data brokers	Linked keywords: open internet, net neutrality, free speech, internet freedom, gig economy, ico, worker's rights, tech giants, distributed ledgers, consumer protection	Linked keywords: blockchain, cryptocurrency, smart devices, energy efficiency, mining, renewable energy, data storage

Issue mapping

Articles are classified in two dimensions: EU/US, social issue/technology

EU axis: articles from European sources or concerning Europe

Social issues axis: articles containing words from a pre-defined list of social topics

Mapping trending words with article type based on no. of occurrences

Top right corner: EU articles on social issues

Bottom left corner: US articles on technology

Charts

Application to explore trending keywords by source

Common terms: compare the trend of the keyword across sources

Trend robustness

Case study

Topic clusters around online privacy

Online privacy is a widely discussed issue within the academia.
In order to identify main research topics we have done a quick topic modeling exercise.

First we have web-scraped working papers related to online privacy from the perspective of Social Sciences: SSRN.

On this dataset we have performed document clustering using tf-idf, multidimensional scaling, k-means and pyLDAvis.

The preliminary results:

SSRN topic clusters around online privacy

Wikipedia

Network of trending keywords (based on arXiv, SSRN and Techno Media)

Expanded Network of trending keywords (based on arXiv, SSRN and Techno Media)

Network around NGI related technologies

Network around NGI related social issues and values keywords

About

EU ENGINEROOM has received funding from the European Union's Horizon 2020 research and innovation programme under the Grant Agreement no 780643. The content of this website does not represent the opinion of the European Union, and the European Union is not responsible for any use that might be made of such content.

NGI ENGINEROOM

Sources

Unique terms: 0+

Media articles: 0Working papers: 0

Analysis period: two years and three months

Multiple gigabytes of rawand processed data

Why ArXiv and SSRN?

Lengthy publication process in scientific journals Broad coverage:

SSRN's e-Library provides almost 800,000 research papers from 365,000 researchers across 30 disciplines

ArXiv provides open access to almost 1,500,000 e-prints mostly in STEM fields

Methodology

Project Goal and General Idea

Major aim is to identify key technologies determining the development of Internet until 2025 Strong focus on the relationship between technological areas and social issues Data-driven approach with heterogeneous sources of data

Trend analysis

Analysis based on the frequency of appearances for all unigrams and bigrams in the texts Average monthly change in the analised term's frequency is calculated by OLS regressions The coefficient reveals the trending unigrams and bigrams

Co-occurrence analysis

Exploring the relationship between topics Pairs of terms which are mentioned together in media articles The number of articles containing both terms is divided by the number of articles including our previously identified keyword of interest for every media website

Issue mapping

Articles are categorised across two dimensions: geography (EU vs US) and covered topic (social vs technological) Words are ranked based on their frequency in articles classified as social and non-social (technological)

Wikipedia network analysis

Matching the keywords to Wikipedia articles and parsing their text to extract hyperlinks Generating the network of hyperlinks that connects the articles with one another Using a community detection algorithm (the Louvain method) to identify clusters of nodes

Main Programming Tools

Topic identification

132 most trending NGI related keywords are identified

Grouped into 21 wider areas

The size of the bubble is based on the regression coefficient

Bigger bubble: more robust trend

Topic co-occurrence

The goal is to dive deeper in emerging technologies

Relationship between social issues and technology

These pairs frequently appear together in articles (news) or are used in comments about a topic (reddit)

News co-occurrence

Reddit co-occurrence

Short topic list

Issue mapping

Articles are classified in two dimensions: EU/US, social issue/technology

EU axis: articles from European sources or concerning Europe

Social issues axis: articles containing words from a pre-defined list of social topics

Mapping trending words with article type based on no. of occurrences

Top right corner: EU articles on social issues

Bottom left corner: US articles on technology

Charts

Application to explore trending keywords by source

Common terms: compare the trend of the keyword across sources

Trend robustness

Case study

Topic clusters around online privacy

The preliminary results:

SSRN topic clusters around online privacy

Wikipedia

Network of trending keywords (based on arXiv, SSRN and Techno Media)

Expanded Network of trending keywords (based on arXiv, SSRN and Techno Media)

Network around NGI related technologies

Network around NGI related social issues and values keywords

About

Media articles: 0
Working papers: 0

Multiple gigabytes of raw
and processed data

Lengthy publication process in scientific journals

Broad coverage:

Major aim is to identify key technologies determining the development of Internet until 2025

Strong focus on the relationship between technological areas and social issues

Data-driven approach with heterogeneous sources of data

Analysis based on the frequency of appearances for all unigrams and bigrams in the texts

Average monthly change in the analised term's frequency is calculated by OLS regressions

The coefficient reveals the trending unigrams and bigrams

Exploring the relationship between topics

Pairs of terms which are mentioned together in media articles

The number of articles containing both terms is divided by the number of articles including our previously identified keyword of interest for every media website

Articles are categorised across two dimensions: geography (EU vs US) and covered topic (social vs technological)

Words are ranked based on their frequency in articles classified as social and non-social (technological)

Matching the keywords to Wikipedia articles and parsing their text to extract hyperlinks

Generating the network of hyperlinks that connects the articles with one another

Using a community detection algorithm (the Louvain method) to identify clusters of nodes