dirhunt package¶
Submodules¶
dirhunt.cli module¶
dirhunt.crawler module¶
-
class
dirhunt.crawler.
Crawler
(max_workers=None, interesting_extensions=None, interesting_files=None, std=None, progress_enabled=True, timeout=10, depth=3, not_follow_subdomains=False, exclude_sources=(), not_allow_redirects=False, proxies=None, delay=0, limit=1000, to_file=None, user_agent=None, cookies=None, headers=None)[source]¶ Bases:
concurrent.futures.thread.ThreadPoolExecutor
-
create_report
(to_file)[source]¶ Write to a file a report with current json() state. This file can be read to continue an analysis.
-
property
options_file
¶
-
urls_info
= None¶
-
dirhunt.exceptions module¶
dirhunt.management module¶
dirhunt.processors module¶
-
class
dirhunt.processors.
Error
(crawler_url, error)[source]¶ Bases:
dirhunt.processors.ProcessBase
-
key_name
= 'error'¶
-
name
= 'Error'¶
-
-
class
dirhunt.processors.
GenericProcessor
(response, crawler_url)[source]¶ Bases:
dirhunt.processors.ProcessBase
-
key_name
= 'generic'¶
-
name
= 'Generic'¶
-
-
class
dirhunt.processors.
Message
(error, level='ERROR')[source]¶ Bases:
dirhunt.processors.Error
-
class
dirhunt.processors.
ProcessBase
(response, crawler_url)[source]¶ Bases:
object
-
property
flags
¶
-
index_file
= None¶
-
key_name
= ''¶
-
name
= ''¶
-
status_code
= 0¶
-
property
-
class
dirhunt.processors.
ProcessBlankPageRequest
(response, crawler_url)[source]¶ Bases:
dirhunt.processors.ProcessHtmlRequest
-
key_name
= 'blank'¶
-
name
= 'Blank page'¶
-
-
class
dirhunt.processors.
ProcessCssStyleSheet
(response, crawler_url)[source]¶ Bases:
dirhunt.processors.ProcessBase
-
key_name
= 'css'¶
-
name
= 'CSS StyleSheet'¶
-
-
class
dirhunt.processors.
ProcessHtmlRequest
(response, crawler_url)[source]¶ Bases:
dirhunt.processors.ProcessBase
-
key_name
= 'html'¶
-
name
= 'HTML document'¶
-
-
class
dirhunt.processors.
ProcessIndexOfRequest
(response, crawler_url)[source]¶ Bases:
dirhunt.processors.ProcessHtmlRequest
-
files
= None¶
-
property
flags
¶
-
index_titles
= ('index of', 'directory listing for')¶
-
key_name
= 'index_of'¶
-
name
= 'Index Of'¶
-
-
class
dirhunt.processors.
ProcessJavaScript
(response, crawler_url)[source]¶ Bases:
dirhunt.processors.ProcessBase
-
key_name
= 'js'¶
-
name
= 'JavaScript'¶
-
-
class
dirhunt.processors.
ProcessNotFound
(response, crawler_url)[source]¶ Bases:
dirhunt.processors.ProcessBase
-
property
flags
¶
-
key_name
= 'not_found'¶
-
name
= 'Not Found'¶
-
property
dirhunt.url module¶
-
class
dirhunt.url.
Url
(address)[source]¶ Bases:
object
-
property
directories
¶
-
property
directory_path
¶
-
property
domain
¶
-
property
domain_port
¶ Dominio con el puerto si lo hay
-
property
fragment
¶
-
property
full_path
¶
-
property
is_absolute
¶ Si es sólo un path o una dirección entera
-
property
name
¶
-
property
only_domain
¶ Dominio sin el puerto
-
property
path
¶
-
property
port
¶
-
property
protocol
¶
-
property
protocol_domain
¶
-
property
query
¶
-
property
url
¶
-
property
urlparsed
¶
-
property
dirhunt.utils module¶
-
dirhunt.utils.
force_url
(url)[source]¶ Transform domain.com to http://domain.com
Try the most common protocols until you get an answer. Check the destination url in case the server is redirecting the response to invalidate it.