Thursday, March 28, 2013

Introducing dumpmon: A Twitter-bot that Monitors Paste-Sites for Account/Database Dumps and Other Interesting Content


TL;DR

I created a Twitter-bot which monitors multiple paste sites for different types of content (account/database dumps, network device configuration files, etc.). You can find it on Twitter and on Github.

Introduction

Paste-sites such as Pastebin, Pastie, Slexy, and many others offer users (often anonymously) the ability to upload raw text of their choice. This is helpful in many scenarios, such as sending a crash report to someone or pasting temporary code. However, in addition to some people not being careful with what they upload (leaving passwords and other sensitive data in the text), attackers have been starting to use these sites to share post-compromise data, including user account data, database dumps, URLs of compromised sites, and more.

Since there are so many users uploading text to these sites, it's often difficult to find these interesting files manually. While techniques such as Google Alerts can be applied, the results are often a day or two old and are sometimes deleted. This prompted me to create a tool which monitors these sites in "real-time" (less than a minute of delay for the slowest sites) for specific expressions, and then automatically rank, aggregate, and post these results to Twitter for further analysis. I call this tool DumpMon.


Similar Tools

There are a couple of similar tools available which do essentially the same thing as dumpmon - with just a few key differences:
  • @PastebinLeaks - with its last tweet on December 16, 2011, PastebinLeaks no longer appears to provide pastebin monitoring. However, I really like how it integrated quite a few different expressions, such as one for HTTP passwords, Cisco and Juniper configuration files, etc. Unfortunately, as far as I can tell PastebinLeaks is closed-source.
  • @PastebinDorks - This bot (intentionally closed-source, still in "alpha") is still active and posts a few tweets per day. This bot appears to be primarily concerned with account credential dumps. I think the idea of assigning a numerical rank to a tweet could help determine the usefulness of a paste, but it makes the actual data found unclear.

My goal with dumpmon is to create the "next step" of paste site monitoring with the following key features:
  • Open-Source. I'm always open to contributions via Github. I'm working on creating all the documentation - should be up soon.
  • Monitors more than just Pastebin (full site listing in Appendix)
  • Supports multiple file types (ie the Cisco configuration files and honeypot logs)
  • For large account dumps, simply gives you the raw information (Emails: x, Hashes: y) directly in tweet

In the future, I would like to look into implementing the following features:
  • Automatically run found hashes through large wordlists and posting results
  • Allow users to tweet a regular expression they want monitored to the bot. The bot will then tweet them the paste once it finds a match
  • Search for interesting details from other sources of information (such as popular forums, etc.) instead of just paste sites
  • Allow caching of "most interesting" results to prevent deletion
  • Create daily/monthly reports that show the amount of detected data for aiding in password research

With those features outlined - let me quickly show you how I built the bot. Don't care? Just go straight to the bot here.

Bot Architecture

Here is the general architecture of the bot that's currently running:

As you can see, each site runs from its own separate thread which monitors for new pastes, downloads each one and matches it against a series of regular expressions. Then, if it finds a match, it will build and post a tweet that looks like the following:



If hashes are found, it will also include the number of hashes as well as the ratio of emails to hashes. The "Keywords" attribute seen gives an approximate ratio of "positive keywords" found out of a given list, such as "Target: ", "available dbs", "member_id","hacked by", "database: ", etc.), subtracting value for each regex matched from the blacklist. Just another metric to help determine if a paste is "interesting." It should also be noted that the emails are found are unique

Don't Bite the Hand that Feeds

It's commonly that the most time-expensive part of web scraping is actually fetching the content. While I could go about speeding up this process by completely using an event-driven framework such as Gevent, Twisted, or others, I wanted to do my best to my best to respect the sites hosting the content. Also, I didn't want the tool to get temporarily blocked... For a third time (my bad, Pastebin). With this being the case, my bot uses the following algorithm to only get new pastes using polite time constraints.



Appendix

Currently, dumpmon supports the following paste-types:
  • Account/Database dumps
  • Google API Keys
  • Cisco Configuration Files (Juniper to be added soon)
  • Honeypot Log Dumps
Dumpmon also supports the following paste-sites:


If you can think of any other paste sites you want added, let me know!




- Jordan

22 comments:

  1. Try all of them here:
    http://awk.freeshell.org/ListOfPastebins
    http://www.similarsitesearch.com/alternatives-to/pastebin.com

    and the obvious:
    https://pastee.org/

    http://paste2.org/

    ReplyDelete
  2. It seems you wrote a tool that acts using the same concept as pystemon. https://github.com/cvandeplas/pystemon

    pystemon does support a lot more sites and has a very flexible configuration. Perhaps you're interested to join efforts?

    ReplyDelete
    Replies
    1. Hey Christophe,

      Thanks for the comment! The tools certainly do look similar - I can't believe I hadn't seen your tool before I got started developing dumpmon! Looks like "great minds think alike".

      And I like a lot of the things you've done with pystemon! Unfortunately, I think the structure of our solutions are so different, that it would be difficult to simply combine the two into one product. Also, I looked at the other sites, and all of them had around 1 paste every 4 hours, and they mostly seemed to be "garbage pastes".. I'll keep watch and see if it'd be worth the effort of making a quick module for them.

      Although, with all that being said - I'm always open to contributions! I try to give credit where credit is due (as will be seen in a blog post shortly as well as on the Github "contributors" section). If you have any ideas to make dumpmon better, please don't hesitate to let me know!

      Delete
  3. Hi Jordan,

    Thanks for your work and for sharing this.

    I've made the necessary changes to settings.py and worked through the errors I was getting there. Now I'm getting the following:

    Traceback (most recent call last):
    File "dumpmon.py", line 12, in
    from lib.Pastebin import Pastebin, PastebinPaste
    File "/home/dave/Projects/dumpmon/lib/Pastebin.py", line 1, in
    from .Site import Site
    File "/home/dave/Projects/dumpmon/lib/Site.py", line 7, in
    from settings import USE_DB, DB_HOST, DB_PORT
    ImportError: cannot import name USE_DB

    Any ides about what could be going on? I haven't made any changes from the original source other than the settings.py
    file.

    Thank you very much,

    Dave

    ReplyDelete
    Replies
    1. Hi David - thanks for giving dumpmon a shot!

      You actually found something I've been meaning to do.. I have added the DB settings in my actual settings.py file, but never updated the example settings file. I've done that now, and you should see an update.

      If you do not want to use a Mongo database, just set "USE_DB" to False, and you should be good to go! Adding DB support is a new feature, and I will be updating the readme shortly.

      Let me know if you have any questions!

      -Jordan

      Delete
  4. Hello there

    Good Tool and good work. I have installed all the dependencies and generated settings.py with all the requirements but when I am trying to run the script I get the following error:

    Traceback (most recent call last):
    File "dumpmon.py", line 17, in
    from twitter import Twitter, OAuth
    ImportError: cannot import name Twitter

    Any feedback will be appreciated.

    Thanks :)

    ReplyDelete
    Replies
    1. Hey Phiber_Optik, thanks for trying dumpmon!

      I think the error may be due to the Twitter Python library you have installed. There are two competing ones, and I started using one - but switched to the other when maxme and I discussed Python 3 compatibility. The library I am using now (which is also Python 3 compatible) is here: https://github.com/sixohsix/twitter

      Let me know if that helps fix your problem! If you have the other one installed, you may need to remove it first, since I believe they are imported with the same name.

      Delete
  5. I followed your instructions and it seems everything is working now :)

    Just a proof of concept
    python dumpmon.py -v
    [u'robert_ambridge@hotmail.com']
    [u'contactsDrumsdrums@thomann.dePh']
    [u'abuse@abuse.online.nl']
    [u'monte@ispi.net', u'monte@ohrt.com']

    Thank you very much

    ReplyDelete
  6. [python-twitter](https://code.google.com/p/python-twitter/)
    $ pip install python-twitter
    $ pip install beautifulsoup4
    $ pip install requests
    $ pip install pymongo <-- for MongoDB support (must have mongod running!)

    I got passed on the onpython-pip install python-twitter
    Requirement already satisfied (use --upgrade to upgrade): python-twitter in ./site-packages
    Cleaning up...

    python-pip install python-twitter
    Requirement already satisfied (use --upgrade to upgrade): python-twitter in ./site-packages
    Cleaning up...

    python-pip install requests
    Requirement already satisfied (use --upgrade to upgrade): requests in ./site-packages
    Cleaning up...


    python-pip install pymongo
    Requirement already satisfied (use --upgrade to upgrade): pymongo in /usr/lib64/python2.7/site-packages
    Cleaning up...

    Next, edit the settings.py to include your Twitter application settings. <---- where do i get this?

    Im kinda loss I would love to set up my own dumpmon thanks



    ReplyDelete
  7. Exception in thread Thread-1:
    Traceback (most recent call last):
    File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
    File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
    File "/home/pikachoo/dumpmon/lib/Site.py", line 102, in monitor
    self.update()
    File "/home/pikachoo/dumpmon/lib/Pastebin.py", line 32, in update
    lambda tag: tag.name == 'td' and tag.a and '/archive/' not in tag.a['href'] and tag.a['href'][1:])
    File "/usr/local/lib/python2.7/dist-packages/bs4/element.py", line 1167, in find_all
    return self._find_all(name, attrs, text, limit, generator, **kwargs)
    File "/usr/local/lib/python2.7/dist-packages/bs4/element.py", line 499, in _find_all
    found = strainer.search(i)
    File "/usr/local/lib/python2.7/dist-packages/bs4/element.py", line 1527, in search
    found = self.search_tag(markup)
    File "/usr/local/lib/python2.7/dist-packages/bs4/element.py", line 1483, in search_tag
    or (markup and self._matches(markup, self.name))
    File "/usr/local/lib/python2.7/dist-packages/bs4/element.py", line 1565, in _matches
    return match_against(markup)
    File "/home/pikachoo/dumpmon/lib/Pastebin.py", line 32, in
    lambda tag: tag.name == 'td' and tag.a and '/archive/' not in tag.a['href'] and tag.a['href'][1:])
    File "/usr/local/lib/python2.7/dist-packages/bs4/element.py", line 892, in __getitem__
    return self.attrs[key]
    KeyError: 'href'

    ReplyDelete
  8. Would like to know if you'd like to also detect JUNOS and F5 configurations? I can provide some guidance on what to look for...

    ReplyDelete
    Replies
    1. Hi there!

      I'm always looking for new things to monitor. If you could provide what to look for, I'd be happy to include it.

      Thanks!

      Delete
  9. In Paste.py isn't this doing the same regex search twice?
    if regex.search(self.text):
    _ logging.debug('\t[-] ' + regex.search(self.text).group(1))

    In which case, wouldn't it be twice as fast/efficient to do this?
    var = regex.search(self.text)
    if var:
    _ logging.debug('\t[-] ' + var.group(1))

    ReplyDelete
  10. Thanks for the fantastic work. Is there a way to add the ability to download the raw pastes from these sites once they're identified?

    ReplyDelete
    Replies
    1. If you're using Mongo DB support, the text is saved automatically. Otherwise, you can simply add a line to save the text after it's extracted and identified. One example of this can be found here:

      https://github.com/jordan-wright/dumpmon/blob/master/lib/helper.py#L48

      Let me know if you have any questions!

      Delete
  11. Great tool came across it while looking on igoogle

    I have had issues configuring for a few reasons... I have a question how long before it starts outputting to twitter?
    when i run sudo python dumpmon.py i get output looks like email addresses but that is all i am getting..

    ive been to twitter and created an app with my tokens and secrets etc....

    I've posted my experience on my blog great work....

    Dave

    ReplyDelete
    Replies
    1. Hi there - thanks for giving dumpmon a try.

      If you start seeing output of email addresses, then the script is likely working properly. It's really difficult to debug issues without more information, but my best guess would be that there is an issue with the Twitter configuration in use. Can you use those same oauth creds to send a test tweet and see if it shows up? If it does, then you can assume it is an issue with the script, and I can look into it further.

      Let me know if you have any questions!

      Delete
    2. Hi Jordan, no problems did a fresh install on my vm with kali linux followed my previous installations on fedora using apt-get instead of yum... I think its working I now have a post.....

      Many thanks again

      sorry was so long replying I wasnt able to devote enough time to reload over the last couple of days...

      Great program :)

      Dave

      Delete
  12. This comment has been removed by the author.

    ReplyDelete
  13. Hello There !
    Recently I Came Across To This Wonderful Tool and I had installed on my Linux immediately. Everything is perfect but i don't know how to save the paste as they are identified (as you said earlier to edit the code in helper.py but i don't know how).Any feedback will be appreciated.
    Thanks

    ReplyDelete