csm23
Class TextProcessing

java.lang.Object
  |
  +--csm23.TextProcessing

public class TextProcessing
extends java.lang.Object

Title: Text Processing Description: version 1.0 Copyright: Copyright (c) 2001 Company: Unis


Constructor Summary
TextProcessing()
           
 
Method Summary
 void clear()
          Deletes the information in the hashtable
 java.util.Hashtable getNumUniWords()
          Returns number of unique words
 java.util.Hashtable getNumWords()
          Returns number of total words
 java.util.Hashtable getTotal_wfrequency()
          Returns the total word frequency.
 java.util.Hashtable wordfreq(java.lang.String fileName, boolean total_wordlist_done, boolean ignoreCase, boolean includeNumber, boolean ignoreTags)
          Calculates word and frequencies of each given file
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TextProcessing

public TextProcessing()
Method Detail

getTotal_wfrequency

public java.util.Hashtable getTotal_wfrequency()
Returns the total word frequency. When "wordfreq" is run on multiple files, the total word-frequency is calculated simultaneously.

Returns:
total word frequency

getNumWords

public java.util.Hashtable getNumWords()
Returns number of total words

Returns:
number of words

getNumUniWords

public java.util.Hashtable getNumUniWords()
Returns number of unique words

Returns:
unique words

clear

public void clear()
Deletes the information in the hashtable


wordfreq

public java.util.Hashtable wordfreq(java.lang.String fileName,
                                    boolean total_wordlist_done,
                                    boolean ignoreCase,
                                    boolean includeNumber,
                                    boolean ignoreTags)
Calculates word and frequencies of each given file

Parameters:
fileName - the input file
total_wordlist_done - whether the given file was processed (recommended:false)
ignoreCase - if it is true, the cases will be ignored (recommended:true)
includeNumber - if it is true, the numbers will be taken into account in the computation (recommended:false)
ignoreTags - if it is true, the tags will be ignored (recommended =true)
Returns:
the word-frequency pairs (Keys are string type objects which indicate 'words' and values are "Counter" type objects)