dirhunt package¶
Submodules¶
dirhunt.cli module¶
dirhunt.crawler module¶
- class dirhunt.crawler.Crawler(max_workers=None, interesting_extensions=None, interesting_files=None, std=None, progress_enabled=True, timeout=10, depth=3, not_follow_subdomains=False, exclude_sources=(), not_allow_redirects=False, proxies=None, delay=0, limit=1000, to_file=None, user_agent=None, cookies=None, headers=None)[source]¶
Bases:
ThreadPoolExecutor
- create_report(to_file)[source]¶
Write to a file a report with current json() state. This file can be read to continue an analysis.
- property options_file¶
- urls_info = None¶
dirhunt.exceptions module¶
- exception dirhunt.exceptions.EmptyError(extra_body='')[source]¶
Bases:
DirHuntError
- exception dirhunt.exceptions.IncompatibleVersionError(extra_body='')[source]¶
Bases:
DirHuntError
- exception dirhunt.exceptions.RequestError(extra_body='')[source]¶
Bases:
DirHuntError
dirhunt.management module¶
dirhunt.processors module¶
- class dirhunt.processors.Error(crawler_url, error)[source]¶
Bases:
ProcessBase
- key_name = 'error'¶
- name = 'Error'¶
- class dirhunt.processors.GenericProcessor(response, crawler_url)[source]¶
Bases:
ProcessBase
- key_name = 'generic'¶
- name = 'Generic'¶
- class dirhunt.processors.ProcessBase(response, crawler_url)[source]¶
Bases:
object
- property flags¶
- index_file = None¶
- key_name = ''¶
- name = ''¶
- status_code = 0¶
- class dirhunt.processors.ProcessBlankPageRequest(response, crawler_url)[source]¶
Bases:
ProcessHtmlRequest
- key_name = 'blank'¶
- name = 'Blank page'¶
- class dirhunt.processors.ProcessCssStyleSheet(response, crawler_url)[source]¶
Bases:
ProcessBase
- key_name = 'css'¶
- name = 'CSS StyleSheet'¶
- class dirhunt.processors.ProcessHtmlRequest(response, crawler_url)[source]¶
Bases:
ProcessBase
- key_name = 'html'¶
- name = 'HTML document'¶
- class dirhunt.processors.ProcessIndexOfRequest(response, crawler_url)[source]¶
Bases:
ProcessHtmlRequest
- files = None¶
- property flags¶
- index_titles = ('index of', 'directory listing for')¶
- key_name = 'index_of'¶
- name = 'Index Of'¶
- class dirhunt.processors.ProcessJavaScript(response, crawler_url)[source]¶
Bases:
ProcessBase
- key_name = 'js'¶
- name = 'JavaScript'¶
- class dirhunt.processors.ProcessNotFound(response, crawler_url)[source]¶
Bases:
ProcessBase
- property flags¶
- key_name = 'not_found'¶
- name = 'Not Found'¶
dirhunt.url module¶
- class dirhunt.url.Url(address)[source]¶
Bases:
object
- property directories¶
- property directory_path¶
- property domain¶
- property domain_port¶
Dominio con el puerto si lo hay
- property fragment¶
- property full_path¶
- property is_absolute¶
Si es sólo un path o una dirección entera
- property name¶
- property only_domain¶
Dominio sin el puerto
- property path¶
- property port¶
- property protocol¶
- property protocol_domain¶
- property query¶
- property url¶
- property urlparsed¶
dirhunt.utils module¶
- dirhunt.utils.force_url(url)[source]¶
Transform domain.com to http://domain.com
Try the most common protocols until you get an answer. Check the destination url in case the server is redirecting the response to invalidate it.