Text as Data: Text Mining and Analysis in Economics, Social Sciences and Finance using R

Prof. Dr. Ulrich Fritsche

Course

Announcements

Dear participants,

I did not give the possibility to sign the attendance sheet for the course at all days. Therefore I constructed a google sheet: Please follow the link and fill a row with your name and e-mail address (please use the address, which you used to register for the course) and make "x" fo the days you attended. Part-time attendance is counted as attendance. Thank you.

https://docs.google.com/spreadsheets/d/142q8QlTBlze2DVp1n5u_zRVDxWD1IM72i9AeI9nUyfo/edit?usp=sharing

Regards

Ulrich Fritsche

Announcement created on: 09/04/2018 07:10

Dear participants,

due to an important meeting at faculty level scheduled for Tuesday, 14-16h, we will change the workshop schedule slightly:

We will extend the lunch break (scheduled for 13h - 14h) to 16h. There are several cafés close to campus (some with free or free-but-limited WiFi access as e.g. Campus Suite, Balzac, Stadtgeflüster,...).

The exercise 2 (frequency analysis) - scheduled for 14-16h will be shifted to 16h-18h.

I apologize for any inconvenience.

Kind regards

Ulrich Fritsche

Announcement created on: 15/03/2018 15:12

Dear participants,

we checked the compatibility issues in the computer lab yesterday. Attached you will find a 30-days-active-link to a dropbox folder containg code, data, slides and materials. Please download the folder to your preferred destination (e.g. "own documents"). Dropbox allows for downloading folders including subfolders as ZIP file. Please be aware that for some exercises, relative directory references are used. Therefore please keep all files in the respective sub-directories.

https://www.dropbox.com/sh/ksroo8t9actamg7/AADdmiK541FkbxINw9A6yDola?dl=0

Please check that you are able to download the material. I will bring also a USB stick for emergency cases to the lectures. For questions please do not hesitate to contact me.

Kind regards

Ulrich Fritsche

Announcement created on: 08/03/2018 10:54

Dear participants,

for those of you intending to work with their "own" laptops. The Universität Hamburg is a member of the "eduroam" network. For those of you working at eduroam member institutions: Make sure that your laptop -- either private or university-owned -- has access to the internet by appropriate eduroam settings (see https://www.rrz.uni-hamburg.de/services/netz/wlan/wlan-eduroam.html). In case you are not sure, contact your IT department/ help desk.

If you are a member of WiSo faculty but bring a laptop of your own and you have a regular RRZ account (either as a student or a Wissenschaftlicher Mitarbeiter_in) you can of course use the WiFi at the campus everywhere. Check out: https://www.rrz.uni-hamburg.de/services/netz/wlan.html what to do if you are not familiar with that.

I will check the R/ RStudio versions in the computer lab in the next days for compatibility reasons. For the happy ones who can install (or let install) the software on the laptop - please install R version 3.4.3 and R Studio version 1.1.423. All codes were tested using the mentioned versions and running (tested on private and university owned laptops).

For those who are not working at Universität Hamburg/ WiSo Fakultät: the graduate school (Dr. Beckmann) requested the creation of some guest accounts (with user ID and PW). We will hand out the temporary account details on the first day of the course. You are not allowed to share this with unauthorized persons.

Before the course starts, please select, copy and run the following code snippet in R:

rm(list=ls())
packages <- c("rvest",
"readtext",
"tm",
"slam",
"reshape",
"ggplot2",
"SnowballC",
"openNLP",
"Matrix",
"igraph",
"topicmodels",
"reshape2",
"ldatuning",
"tokenizers",
"SparseM",
"LiblineaR",
"glmnet",
"doParallel"
)
install.packages(packages)

I will upload code and data probably on Wednesday after testing in the lab.

Kind regards

Ulrich Fritsche

Announcement created on: 05/03/2018 15:34

Dear participants,

I uploaded some introductory texts. You can find the pdfs here:

https://www.dropbox.com/sh/2zrsor94f6u7lcq/AADMmm378uCS3KWbY6XqqDAua?dl=0

Furthermore, I prepared the schedule for the four days. You can download apdf here:

https://www.dropbox.com/s/sh46mjmemhvvbm4/Plan%20for%20text%20mining%20course.pdf?dl=0

I will upload slides about one week in advance. I will furthermore upload code snippets and data. Code and data will be available on the first day of the course.

Kind regards

Ulrich Fritsche

Announcement created on: 08/02/2018 12:44

Description

COURSE CONCEPT

Based on a fundamental knowledge of R (recommended: introductory course in R programming), we investigate the possibilities for text retrieval and text mining in economics and social sciences. The course consists of basic introductory presentation and practical R exercises in the CIP pool.Topics include:

  • Basic concepts in R (very brief)
  • Text crawling
  • Zipf’s law
  • Frequency analysis
  • Basic lexicometrics
  • Term extraction and collocation
  • Basic topic models

INTRODUCTORY LITERATURE

  • Levshina, Natalia: How to do Linguistics in R, Amsterdam: John Benjamins Publishing Company, 2015.
  • Lemke, Matthias; Wiedemann, Gregor (Hg.) (2015): Text Mining in den Sozialwissenschaften. Grundlagen und Anwendungen zwischen qualitativer und quantitativer Diskursanalyse. Wiesbaden: VS Verlag für Sozialwissenschaften (Kritische Studien zur Demokratie).

General Data

  • Abbreviation
    20-Text as Data
  • Semester
    winter semester 17/18
  • Target Groups
  • Course Type
  • Course Language
  • Departments
    Faculty of Economics and Social Sciences

Place and Time

Date
  • Place
    Von Melle Park 9 Raum A514
  • Time
    from 19/03/2018 to 22/03/2018 from 09:00 to 16:00

Recognition Modalities

  • Number of Semester Hours
    2
  • Amount of Credit Points
    4
  • Creditable as
    • WiSo doctoral program: WiSo methods for Economics
    • WiSo doctoral program: WiSo methods for Social Economics
    • WiSo doctoral program: WiSo methods for Social Siences

Registration Modalities

  • Type of Place Allocation
    Manual Place Allocation (after the registration deadline)
  • Information about Registration
  • Max. Number of Participants