CPE 466: Knowledge Discovery in Data
Lab 1 materials

Dataset

SB277Utter.jsonVaccination-Discussion Dataset
Digital Democracy Information about SB277

Queries

query01.txt
query02.txt
query03.txt
query04.txt
query05.txt
query06.txt
query07.txt
query08.txt
query09.txt
query10.txt

Information Needs

InfoNeed01.txt
InfoNeed02.txt
InfoNeed03.txt
InfoNeed04.txt
InfoNeed05.txt

Stopword Removal Materials

Lists of stopwordsranks.nl
Stopwords in MySQLMySQL stopwords
Onix Text Retrieval ToolkitStopword list

Stopword Files

Ranks.nl smallstopwords-short.txt
Ranks.nl mediumstopwords-medium.txt
Ranks.nl largestopwords-long.txt
MySQL stopwords-mysql.txt
Onix stopwords-onix.txt

Stemming Materials

Porter Stemming AlgorithmOfficial Web Page
Porter Algorithm original paperdef.txt
Porter Algorithm in Java java.txt

September 29, 2015 dekhtyar at calpoly.edu