Saving a Twitter stream to RavenDB database using C#

In an earlier post  I explained how you can use C# to access a Twitter stream. In this post I will show you how to save the tweets from a Twitter stream to RavenDB.

The goal of this post is not to perform a deep dive into NOSQL databases or the Tweetinvi API. Instead its to get you up and running with the minimum of ceremony so you can start conducting your own experiments.

Raven DB is an open source NOSQL database for.NET which as my first experience of a NOSQL database I have found relatively straightforward to start experimenting with.

You can download RavenDB from here.  At the time of writing the stable release was 3.5.3 and I chose to use the installer which then proceeded to install RavenDB via the familiar wizard installation process.

 

 

 

 

 

 

 

 

 

 

 

Once installed you should have a folder structure similar to this:

If, like me you are new the world of NoSQL databases it is worth working your way through the Fundamentals tutorial. I found this an excellent introduction which I highly recommend.

To start RavenDB double click on the Start.cmd batch file in the root of the RavenDB directory. You should shortly see a new command window and a new tab of your default browser showing what databases you have. (which will be empty for the first time launch)

With RavenDB installed and running we can now start Visual Studio and create a new console application. I’ve called mine TrendingOnTwitterNoSQL

Using NuGet, add the following packages:

TweetinviAPI

RavenDB.Client

 

Navigate to Program.cs and add the following using statements:

using System;

using Raven.Client.Document;

using Tweetinvi;

Within the Main method add the following:

Auth.SetUserCredentials("CONSUMER_KEY", "CONSUMER_SECRET", "ACCESS_TOKEN", "ACCESS_TOKEN_SECRET");

Replace COMSUMER_KEY etc. with your Twitter API credentials. If you don’t yet have them. You can obtain them by going here and following the instructions.

Now add the following two lines:

  var stream = Stream.CreateFilteredStream();
  stream.AddTrack("CanadianGP");

The first line creates a filtered Twitter stream. A Twitter stream gives you the developer access to live information on Twitter. There are a number of different streams available. In this post we will be using one that returns information about a trending topic. More information about Twitter streams can be found in the Twitter docs and the TweetInvi docs.

At the time of writing, the Canadian Grand Prix was trending on Twitter which you can see in the second line.

The next step is to create a new class which will manage the  RavenDB document store.  Here is the complete code.


using System; 
using Raven.Client; 
using Raven.Client.Document; 

namespace TrendingOnTwitterNoSQL 
{ 
  class DocumentStoreHolder 
    { 
      private static readonly Lazy<IDocumentStore> LazyStore = 
          new Lazy<IDocumentStore>(() => 
          { 
            var store = new DocumentStore 
            { 
              Url = "http://localhost:8080", 
              DefaultDatabase = "CanadianGP" 
            }; 
            return store.Initialize(); 
           }); 
    
    public static IDocumentStore Store => LazyStore.Value; 
  } 
}

In the context of RavenDB, the Document Store holds the RavenDB URL, the default database etc. More information can be found about the Document Store in the tutorial.

According to the documentation for typical applications you normally need one document store hence the reason why the DocumentStoreHolder class is a Singleton.

The important thing to note in this class is the database URL and the name of the Default Database, CanadianGP. This is the name of the database that will store Tweets about the CanadianGP.

Returning to Program.cs add the following underneath stream.AddTrack to obtain a new document store:

  var documentStore = DocumentStoreHolder.Store;

The final class that needs to be created is called TwitterModel and is shown below


namespace TrendingOnTwitterNoSQL
{
  class TwitterModel
  {
    public long Id { get; set; }
    public string Tweet { get; set; }
  }
}

This class is will be used to save the Tweet information that the program is interested in, the Twitter ID and the Tweet.  The is a lot of other information that is available, but for the sake of brevity this example is only interested in the id and the tweet.

With this class created the final part of the code is shown below


using (BulkInsertOperation bulkInsert = documentStore.BulkInsert())
{
  stream.MatchingTweetReceived += (sender, theTweet) =>
  {
    Console.WriteLine(theTweet.Tweet.FullText);
    var tm = new TwitterModel
    {
      Id = theTweet.Tweet.Id,
      Tweet = theTweet.Tweet.FullText
    };

    bulkInsert.Store(tm);
  };
stream.StartStreamMatchingAllConditions();
}

As the tweets will be arriving in clusters, the RavenDB BulkInsert method is used. You can see this at line 1.

Once a matching Tweet is found, line 3, it is output to the console. Next a new TwitterModel object is created and its fields are assigned the Tweet Id and the Tweet Text. This object is then saved to the database.

The complete Program.cs should now look like:


using System;
using Raven.Client.Document;
using Tweetinvi;

namespace TrendingOnTwitterNoSQL
{
  class Program
  {
    static void Main(string[] args)
    {
      Auth.SetUserCredentials("CONSUMER_KEY", "CONSUMER_SECRET", "ACCESS_TOKEN", "ACCESS_TOKEN_SECRET");

      var stream = Stream.CreateFilteredStream();
      stream.AddTrack("CanadianGP");

      var documentStore = DocumentStoreHolder.Store;

      using (BulkInsertOperation bulkInsert = documentStore.BulkInsert())
      {
        stream.MatchingTweetReceived += (sender, theTweet) =>
        {
          Console.WriteLine(theTweet.Tweet.FullText);

          var tm = new TwitterModel
          {
            Id = theTweet.Tweet.Id,
            Tweet = theTweet.Tweet.FullText
          };

          bulkInsert.Store(tm);

       };
       stream.StartStreamMatchingAllConditions();
     }
   }
 }
}

After running this program for a short while you will have a number of Tweets saved. To view them, switch back to your browser, if not already on the RavenDB page navigate to http://localhost:8080 and click on the database that you created.

 

 

 

 

 

Selecting the relevant database you will then see the tweets.

 

 

 

 

 

 

 

Summary

In this post I have detailed the steps required to save a Twitter Stream of a topic of interest to a RavenDB.

A complete example is available on github

Acknowledgements

The genesis of this post came from the generous answers given to my question on StackOverflow.

Boxing and Unboxing in C#

This post is an aide-memoire as I learn more about boxing and unboxing in C# and is based upon this part of the C# docs.

Boxing

Is the process of converting a value type (such as int or bool) to the type Object or to any interface type implemented by this value type. When the CLR boxes a value type, it wraps the value inside a System.Object and stores it in the managed heap.

Boxing is implicit.

int i = 10; 
// this line boxes i
object o = i;

Although it is possible to perform the boxing explicitly it is not required.

int i = 10;
// explicit boxing
object o = (object)i;

Unboxing

Extracts the value type from the object.

Unboxing is explicit.

int i = 10; 
// boxes i
object o = i;
// unboxes the object to the int value type named j
int j = (int)o;

Performance

Both boxing and Unboxing are computationally expensive operations.

 

Streaming Twitter with C#

In this article I will walk through the steps required to create a C# console application that prints a Twitter stream to the console using TweetInvi library

The example was built using Visual Studio 2015 Community Edition and .NET Framework 4.6.

Step 1

Start Visual Studio and create a new console application, I’ve called mine TwitterPublicStream.

 Step 2

Right click on the project in the solution explorer window (In the example below this is TwitterPublicStream) and select Manage Nuget Packages

 

 

 

 

 Step 3

Search for tweetinvi and once found, install it, accepting the various licences, if you are happy to do so.

 Step 4

In order to use the Twitter API’s, you first need to obtain some credentials. To do this you now need to visit the Twitter API home page and follow the instructions.

Step 5

After that 4 step ceremony we are now ready to write some code.

using System;
using Tweetinvi;

namespace TwitterPublicStream
{
 class Program
 {
 static void Main(string[] args)
 {
 // add your Twitter API credentials here 
 Auth.SetUserCredentials("CONSUMER_KEY", "CONSUMER_SECRET", "ACCESS_TOKEN", "ACCESS_TOKEN_SECRET");

 var stream = Stream.CreateFilteredStream();
 // change LEITOT to something that is currently trending on twitter 
 stream.AddTrack("LEITOT");
 stream.MatchingTweetReceived += (sender, theTweet) =&gt;
 {
   
    Console.WriteLine($"A tweet containing LEITOT has been found; the tweet is {theTweet.Tweet}");
 
 };
 stream.StartStreamMatchingAllConditions();

 }
 }
}

At line 2 a using statement is added for the Tweetinvi library.

At line 11 you need to add your Twitter API credentials that you obtained in Step 4.

At line 15 the AddTrack method is called. Track in the Twitter API context is a comma-separated list of phrases which will be used to determine what Tweets will be delivered on the stream. You can find out more here. Whilst testing this I suggest selecting a trending topic without the #. The one shown in the code was a football game between Leicester and Spurs.

At line 16 the MatchingTweetReceived event will output the contents of the tweet to the console.

Line 21 starts the streaming.

Step 6

In the final step, compile and run the program. After a few seconds you should start seeing Tweets populate the console window.

 

Summary

In this article I have explained how to use the superb library TweetInvi to stream Tweets of interest from Twitter into a C# console application.

C# Utility to emulate the XPath 3 function path()

I recently needed to examine a number of XML files and print out the element names that contained text greater than X number of characters. In addition I also need to print the location of that element within the XML document.

i.e. given the following document

<bookshop>
  <book>
    <title>Microsoft Visual C# Step by Step</title>
  </book>
</bookshop>

…if I was interested in book titles that had more then 10 characters I would want to see:

/bookshop/book/title/
Microsoft Visual C# Step by Step

Whilst it is straightforward to return the text node, finding the XPATH location proved to be more challenging than I initially thought. The reason being is that whilst XPATH 3.0 introduced the path() function that returns the current XPATH location, the number of programming languages that I know (PL/SQL, Python and C#) do not implement XPATH 3.0 yet.

As a result I had to build my own utility. I chose to write this in C# as this is a language I have spent the past 18 months learning and I am now looking for real world problems I can solve using it.

The utility can be found on github. The “engine” of the utility is copied from this Stackoverflow answer: http://stackoverflow.com/a/241291/55640 provided by Jon Skeet.

Although far from feature complete I hope it will give someone facing a similar challenge a head start.

Let me know what you think.

Naming Windows .bat files

Whilst setting up a Elixir development environment I ran into a interesting problem.

I created a new .bat file in Windows 10 to start up Elixir’s Interactive Shell. The command I wanted to run was:

iex --werl

So I created a new Windows .bat file called iex.bat which contained this single command.

Unfortunately when I double clicked on the icon instead of seeing the Elixir Interactive Shell I saw that the command was continually looping:

 

 

 

 

 

 

 

Fortunately this Stackoverflow answer helped in identifying the problem. If you create a .bat file with the same name as the command you wish to run you will end up in a endless loop!

If this article has helped you, please take a moment to up vote the Stackoverflow answer.

 

You don’t have to suck at Excel

Watching Joel Spolsky’s masterful presentation You Suck at Excel with Joel Spolsky will make you better at using Excel.

You don’t have to watch it all , watch the first 5 minutes about how to correctly paste in Excel will improve your productivity whilst reducing your #REF! error stress levels and will put you head and shoulders above most of the people using Excel today.

Some other highlights to look out for in the first 20 minutes are:

  1.  R1C1 mode
  2. Riding the Range
  3. Rounding errors

I could go on and on but I am stopping you watching the video so suffice to say this is such an excellent and generous presentation by a superb technical leader.

Thank you Joel.

 

 

Book Review: Introduction to Javascript Object Notation by Lindsay Bassett

Whilst attending the UKOUG Tech16 conference, several of the talks I attended mentioned the use of JSON (or to give it’s full name JavaScript Object Notation) These talks made me realise how little I actually knew about this Data Interchange Format.

There are many resources for learning JSON, from websites of varying quality to paid for video courses on Pluralsight or for free on YouTube. However my favourite method of learning something new is by reading book(s) and then conducting experiments using what I have learnt. So I chose the following: Introduction to Javascript Object Notation by Lindsay Bassett.  I chose this title as it had a good review and at just over 100 pages was not going to be a door stop that I will never finish.

The book begins with an overview of JSON, it’s syntax, the available datatypes and validating your JSON documents using JSON Schema before switching gears and moving on to demonstrate how JSON can be used in client and server side frameworks and NoSQL databases.

The book was a pleasure to read, new concepts are concisely introduced and no assumptions of your knowledge are made and having now read it I am far more confident in my understanding of this latest Interchange Format.

If you are looking to get up and running with JSON it is easy for me to recommend this book.

 

 

 

Technical Books I have read in 2016

I have always enjoyed reading books about Programming. From books that lead you to take your first tentative steps with a new language to ones that take you on a deep dive into the world of particular feature. I especially enjoy ones that discuss language agnostic programming concepts such as debugging, estimating etc. Books like Code Complete, The Pragmatic Programmers, The Mythical Man Month and Don’t Make Me Think.

To me technical books are such a bargain. For £20 – £30 you can gain knowledge and insight that can make you so much better at your job, such as taking different approaches to solving the daily problems that we as programmers face. Without a doubt there is a lot of published rubbish out there but fortunately in these days of reviews and questions on the numerous Stack Exchange sites it is a lot easier to avoid the charlatans and their ammo pouches stuffed with silver bullets. Although as you will see from my own list, one or two may still slip through the net!

Here are the programming related books I have read this year, listed in the order that they were read.

cplayersThe C# Player’s Guide (2nd Edition)

This is my favourite book that I have read whilst learning C#. Immediately accessible. The large format of the book along with the lucid and easy to grasp descriptions of Object Orientated topics make this my recommended book to anyone that is interested in learning C#.

Django By Example djangobe

Unfortunately this book is still on the “bought but not read” pile. It is no reflection on the book I have been focusing my attention on learning C# this year.

C# 6.0 and the .NET 4.6cnet46 Framework

At 1600+ pages this was certainly the biggest technical book I bought this year. For me it is too unwieldy to use on a day to day basis so, for the first time I have abandoned the printed version of a book and have spent the last 8 months using the e-book. Usually the ebook is open on one monitor whilst Visual Studio is open in the other. Not sure if it’s such a good book for beginners but as a reference I can see myself returning to it to look things up.

The Psychology of Computer Programming: Silver Anniversary Editionpcp

I have been wanting to read this book for several years and finally got round to it. It is by a very long way my favourite read this year and it is in the top 5 all time technical books I have ever read. Although 45 years old, the ideas discussed then are still very relevant today; How we don’t read existing code to see how others have solved problems, the critical importance of having code reviews, egoless programming, estimating and setting expectations around delivery times. I could go on and on. If you haven’t read it, order it today you will not regret it. It will make you a better programmer or manager!

learnciadLearn C# in One Day and learn it well

The worse book I read this year. I have already written what I think of it here.  Not much more to add so moving on to the final book…..

Working Effectively With Legacy Code wewlc

The final book for this year is another classic and I have high expectations for it. Currently I am a third of a way through but I will have finished it by the end of the year. At this point I think it should be called “Working Effectively with Legacy Object Oriented Code” because a lot of the ideas in the book code are centred around legacy Object Oriented code. I will update this once I get to the end of the book.

Summary

This year marks a slight change from previous year lists in that I haven’t read any Oracle database or Application Express books. There are two reasons for this. First I don’t think there have been any unmissable Oracle books published this year (I am interested in Real World SQL and PL/SQL that was published in September 2016 however I awaiting reviews or to actually have a look through it) –  and secondly most of my spare time has been spent learning C#.

I have taken something from each of these five books this year, yes even Learn C# in a day. I know that as a result of reading these books, I will start 2017 a better programmer.

An introduction to Web scraping using Python 3

In this article I will demonstrate how easy it is to perform basic text Web scraping using Python and just a few lines of code.

The example have been developed and tested using Python  3.5.2.

The first step is to see if you have the following third party libraries already installed; Requests and Beautiful Soup 4. So start idle and try typing the following command:


import requests

After you press return, if you see no error messages then requests is installed. If you see an error message that shows requests has not been found, you should install it using pip from the command line as shown below.


pip install requests

Repeat the process to see if you already have the Beautiful Soup library installed, fortunately you don’t have too much to type….


import bs4

Again if Python complains that it can’t find the library, use pip from the command line to install it.


pip install beautifulsoup4

With the libraries installed, here is a program that scrapes this site. It returns the titles from the blog posts that are shown on this page.

To demonstrate how this is achieved with just a few lines of code, here is the program without comments:


import requests, bs4

def getTitlesFromMySite(url):

 res = requests.get(url)
 res.raise_for_status()

 soup = bs4.BeautifulSoup(res.text, 'html.parser')
 elems = soup.select('.entry-title')
 
 return elems


titles = getTitlesFromMySite('http://www.oraclefrontovik.com')

for title in titles:
 print(title.text)

Now the same code but this time with each section commented…


# import requests (for downloading web pages) and beautiful soup (for parsing html) 
import requests, bs4

# create a function that allows a parameter containing a url to be passed into it
def getTitlesFromMySite(url):

# download the webpage and store it in res variable
res = requests.get(url)
# check for problems - if there are, raise_for_status() raises an exception
# and the program stops at this point
res.raise_for_status()

# running the downloaded webpage through Beautiful Soup returns a
# Beautiful Soup object which represents the HTML as a nested data structure.
soup = bs4.BeautifulSoup(res.text, 'html.parser')

# store in an array the items that match this css selector. 
# I will explain how I obtained this entry below
elems = soup.select('.entry-title')

return elems

# call the function and store the results in titles
titles = getTitlesFromMySite('http://www.oraclefrontovik.com')

# loop through the array printing out the title.
for title in titles:
print(title.text)

Running the example returns the following expected output….


Learn C# in One Day and Learn It Well – Review

Contributing to an Open Source Project

A step by step guide to building a Raspberry Pi Hedgehog camera

Is there more than one reason to use PL/SQL WHERE CURRENT OF ?

Structured Basis Testing

Raspberry Pi connected to WiFi but no internet access

The auditing capabilities of Flashback Data Archive in Oracle 12c.

DBMS_UTILITY.FORMAT_ERROR_BACKTRACE and the perils of the RAISE statement

Using INSERT ALL with related tables

The best lesson I learnt from Steve McConnell

To summarise, the code imports two third party libraries, requests and Beautiful Soup 4, that perform the lions share of the work. In the example I use the requests library to download a web page as HTML and then pass it to Beautiful Soup along with a CSS selector to return the information I want from it.

Obtaining the CSS selector

The code example has the following line which extracts the part of the webpage, the blog post titles, that we are interested in:

elems = soup.select('.entry-title')

Using Firefox, I obtained the CSS Selector ‘.entry-title’ by:

  1. Navigate to the page of interest, in this case, oraclefrontovik.com
  2. Opened Firefox developer tools (Ctrl + Shift + I)
  3. Highlighted the first title (which at the time of writing was Learn C# in One Day and Learn it Well – Review) , right click and select Inspect Element
  4. In the console, I then right click and select Copy and then choose CSS Selector from the sub menu.

At the time of writing, I was unable to get the same CSS Selector using the native developer tools from Chrome. If you know of a way please let me know in the comments.

Summary

In this post I have walked through the steps to perform basic text Web scraping using Python 3.

Learn C# in One Day and Learn It Well – Review

I have been learning C# and the .NET framework for a while now and have been working my way through several books; The C# Programming Yellow Book, The C# Player’s Guide (2nd Edition) and C# 6.0 and the .NET 4.6 Framework All of these books have helped me to varying degrees to get comfortable in Object Oriented programming, the C# language and the .NET framework.

When learning a new programming language, I always look to improve my knowledge of the fundamentals, so seeing an introduction to C# book that was getting good reviews piqued my interest. That book was Learn C# in One Day and Learn It Well although I am very suspicious of Learn X in Y days\hours\minutes titles (see Peter Novig masterly description) I ordered a copy.

learncinaday

 

 

 

 

 

 

 

At 153 pages the book is slim and can be divided into two parts. Chapters 1 through 11 cover the various building blocks that make up a programming language such as variables, arrays, condition statements as well as briefly touching on Object Orientated concepts. The second half of the book, starting on page 128 brings together what you have learnt in a project by building a Payroll programme.

I think the book is self published, obviously not an issue in itself however I felt that it could have done with a review\editor to catch the typos and misaligned paragraphs but these are minor irritants. The real point of this post is, can you use this book to learn C# in a Day?

In my opinion no. The main problem with this book is how brief the topics are covered. Take for example Interfaces which are discussed on pages 107 – 109. The text compares Interfaces with Abstract classes, however no where in these two pages does it tell you what an interface actually is and why you would want to create one.

In summary I am not sure who the target audience for this book is. Perhaps someone that just needs to get some course work or module “working” For everyone else it is far to brief and does not go into enough detail especially explaining why you would want to use a feature of the language. If you are interested in learning C# my advice would be to put the £8 towards a better resource.