public class PubPageCrawler extends ACrawler
Constructor and Description |
---|
PubPageCrawler(PDatabase pdb,
java.lang.String page,
int transLev,
boolean refPubMode,
boolean download)
Sets up the pubpage crawling.
|
Modifier and Type | Method and Description |
---|---|
protected void |
crawl()
Downloads the page if needed, grabs BibTeX if any, extracts authors, title
and year, and crawls referring publications if transitivity level is above 0.
|
Publication |
getPublication() |
getTime, interrupt, launch, run, scheduleCrawlers, waitForCrawlers
activeCount, checkAccess, clone, countStackFrames, currentThread, destroy, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, stop, suspend, toString, yield
public PubPageCrawler(PDatabase pdb, java.lang.String page, int transLev, boolean refPubMode, boolean download)
pdb
- PDatabase object which contains information for database specific crawling.page
- Pubpage URL or HTML content - 'download' parameter will tell.transLev
- 0: only basic information, 1: referrer publications also 2: referrer of referrers also will be grabbed.refPubMode
- If true, it handles the pubpage as pubpage of a referring publication which may need different patterns to be used.download
- If true, URL given in 'page' parameter will be used to download the page, otherwise 'page' will be used as the downloaded HTML content.public Publication getPublication()