December 13, 2015

List all the links from RSS link using Python

Blogs and news sites use RSS(Rich Site Summary) feeds. Python can be used to fetch updates. I have written a simple program which can fetch RSS feed and print links.

I have written the same application in both Python 2.7 and Python 3 both.

In Python 3, urllib2.urlopen() is replaced with urllib.request.urlopen(). Python 3 code is mentioned below.

December 02, 2015

Remove all the followers from your twitter account

You might use social networks more often. All of us know that it is really hard to do bulk operations in Facebook and Twitter.

I wanted to remove all the followers from my twitter account. So I googled it. Then I found there is no way to remove followers. Only way is blocking them and unblocking them. But this way if you are following that person, your subscription will be removed automatically.

I tried to do this with "tweepy" Python module. You can modify the program as you need. Please be-careful when you run this script, you are going to loose all your followers. To get consumer and access token keys, please visit https://apps.twitter.com/. Make your keys safe, never share your keys publicly.

May 21, 2015

How to fix Incompatible clusterIDS in Hadoop?

When you are installing and trying to setup your Hadoop cluster you might face a issue like below.
FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool (Datanode Uuid unassigned) service to master/192.168.1.1:9000. Exiting. 
java.io.IOException: Incompatible clusterIDs in /home/hadoop/hadoop/data: namenode clusterID = CID-68a4c0d2-5524-486e-8bc9-e1fc3c5c2e29; datanode clusterID = CID-c6c3e9e5-be1c-4a3f-a4b2-bb9441a989c5
I just quoted first two line of the error. But full stack trace would look like below.

You might haven't formatted your name node properly. But if this was in test environment you can easily delete data and name node folders, and reformat the HDFS. To format you can run below command.

WARNING!!! : IF YOU RUN BELOW COMMAND YOU WILL LOOSE ALL YOUR DATA.
hdfs namenode -format
But if you have a lot of data in your Hadoop cluster and you can't easily format it. Then this post is for you.

First stop all Hadoop processes running. Then login into you name node. Find the value of dfs.namenode.name.dir property. Run below command with your namenode folder.
cat <dfs.namenode.name.dir>/current/VERSION
Then You will see a content like below.
#Thu May 21 08:29:01 UTC 2015
namespaceID=1938842004
clusterID=CID-68a4c0d2-5524-486e-8bc9-e1fc3c5c2e29
cTime=0
storageType=NAME_NODE
blockpoolID=BP-2104944316-127.0.1.1-1430820636449
layoutVersion=-60
Copy the clusterID from nematode. Then login into the problematic slave node. Find dfs.datanode.data.dir folder. Run below command to edit the VERSION file.
vim <dfs.datanode.data.dir>/current/VERSION 
Your datanode cluster VERSION file will look like below. Replace the cluster ID you copied from name node.
#Thu May 21 08:31:31 UTC 2015
storageID=DS-b7d3c421-0366-4a66-8d14-78362389ed73
clusterID=CID-c6c3e9e5-be1c-4a3f-a4b2-bb9441a989c5
cTime=0
datanodeUuid=724f8bad-c0ca-4ded-98d6-a860d3165289
storageType=DATA_NODE
layoutVersion=-56
Then everything will be okay!

Hadoop MultipleInputs Example

Let's assume you are working for ABC Group. And they have ABC America airline,  ABM Mobile, ABC Money and ABC hotel blah blah. ABC this and that. So you got multiple data sources. They have different types/columns. So you can't run single Hadoop Job on all the data.

You got several data files from all these businesses.
(Edited this data file 33 time to get it aligned. ;) Don't tell anyone!)

So your job is to calculate the total amount that one person spent for ABC group. For this you can run jobs for each company and then run another job to calculate the sum. But what I'm going to tell you is "NOOOO! You can do this with one job." Your Hadoop administrator will love this idea.

You need to develop custom InputFormat and a custom RecordReader. I have created both of these classes inside custom InputFormat class. Sample InputFormat should look like below.


nextKeyValue() method is the place where you should code according to your data files.

Developing custom InputFormat classes is not just enough. Also you need to change the main class of your job. You main class should look like below.

Line no. 26-28 adds your custom inputs to the job. Also you don't want to set Mapper class separately because you can't set it too. If you want you can develop separate mapper classes for your different file types. I'll write a blog post about that method also.
To build the JAR from my sample project you need Maven. Run below command to build JAR from Maven project. You can find the JAR file inside the target folder once you build the project.
mvn clean install
/
|----/user
     |----/hadoop
          |----/airline_data
          |    |----/airline.txt
          |----/book_data
          |    |----/book.txt
          |----/mobile_data
               |----/mobile.txt
With this change you may have to change the way you run the job. My file structure looks like above. I have different folders for different types. You can run job from the command below.
hadoop jar /vagrant/muiltiinput-sample-1.0-SNAPSHOT.jar /user/hadoop/airline_data /user/hadoop/book_data /user/hadoop/mobile_data output_result
If you have followed all the steps properly you will get job's output like this.

Job will create a folder called output_result. If you want to see the content you can run below command.
hdfs dfs -cat output_result1/part*
I ran my sample project on my sample data set. My result file looked like below.
12345678 500
23452345 937
34252454 850
43545666 1085
56785678 709
67856783 384
Source code of this project is available on GitHub
https://github.com/dedunu/hadoop-multiinput-sample

Enjoy Hadoop!

May 18, 2015

IMAP Java Test program and JMeter Script

One of my colleagues wanted to write a JMeter script to test IMAP. But that code failed. So I also got involved in that. JMeter BeanShell uses Java in the backend. First I tried with a Maven project. Finally I could write a code to list the IMAP folders. Java implementation is shown below.

Then we wrote a code to print IMAP folder count for JMeter BeanShell. Code is show below.

Complete Maven project is available on GitHub - https://github.com/dedunu/imapTest

Increase memory and CPUs on Vagrant Virtual Machines

Last post I showed how to create multiple nodes in a single Vagrant project. Usually "ubuntu/trusty64" box comes with 500MB. For some developers need more RAM, more CPUs. From this post I'm going to show how to increase the memory and number of CPUs in a vagrant project. Run below commands
mkdir testProject1
cd testProject1
vagrant init
Then edit the Vagrant file like below.


Above changes will increase memory to 8GB and also it will add one more core. Run below commands to start the vagrant machine and get the SSH access.
vagrant up
vagrant ssh
If you have an existing project, you just have to add these lines. When you restart the project memory would be increased.

Multiple nodes on Vagrant

Recently I started working with Vagrant. Vagrant is a good tool that you can use for development. From this post I'm going to explain how to create multiple nodes on Vagrant project.
mkdir testProject
cd testProject
vagrant init

If you run above commands, it will create a Vagrant project for you. Now we have to do changes to the vagrant file. Your initial vagrant file will look like below.

You have to edit Vagrantfile add content like below.

Above sample vagrant file will create three nodes. Now run below command to start Vagrant virtual machines.
vagrant up

If you followed the instruction properly, you will get and output like below.

If you want to connect to master node, run below command.
vagrant ssh master
If you want to connect to slave1 node, run below command.
vagrant ssh slave1
What ever the machine you want to connect you just have to type vagrant ssh . Hope this will help you!

May 14, 2015

Alfresco 5.0.1 Document Preview doesn't work on Ubuntu?

I recently installed Alfresco for testing in vagrant instance. I used Ubuntu image for the vagrant instance. But I forgot to install all the libraries which is necessary to be installed on Ubuntu before you install alfresco. But fortunately alfresco worked with out those dependencies.

http://docs.alfresco.com/5.0/concepts/install-lolibfiles.html

Above link gives you what are the libraries you should install before you install Alfresco. You should run below command to install libraries.
sudo apt-get install libice6 libsm6 libxt6 libxrender1 libfontconfig1 libcups2
But still office document previews didn't work properly. Some documents worked properly but some of them did't. Then I tried to debug it with one of my colleagues. We found below text in our logs


Then we tried to run soffice application from terminal. Look what we got!
/home/vagrant/alfresco-5.0.1/libreoffice/program/oosplash: error while loading shared libraries: libXinerama.so.1: cannot open shared object file: No such file or directory
Then we realised that we should install that library on Ubuntu. Run below command on Ubuntu server to install the missing library.

sudo apt-get install libxinerama1

Make sure you run both commands above!

March 31, 2015

Yosemite Full Screen problem :(

People hate Yosemite. But I don't know why. I like Yosemite more than Mavericks. But Yosemite has a problem with Maximize button.(Zoom) As you click Zoom or maximize button it will go to full screen mode.

To avoid this press zoom while you are pressing Alt (key).

If you want to maximize Chrome click zoom while you are pressing Alt + Shift

Enjoy yosemite!

March 30, 2015

Titan DB vs Neo4J

This comparison is an outdated comparison. I think Neo4J has improved a lot with the time. But I'm posting this because a person who wants to compare both of these technologies, can get an idea about the aspects they need to focus. If you know something is outdated please feel free to suggest using comments. I'll update the blog post accordingly.

FeatureNeo4JTitan
LicenseGPL/AGPL/CommercialApache 2 License
Commercial SupportAvailable
Advanced: Email 5x10 USD 6000/yr
Enterprise: Phone 7x24 USD 24,000/yr
Available (Prices and availability of support not published officially.)
Graph TypeProperty GraphProperty Graph
Storage BackendNative Storage EngineCassandra, Hbase, Berkeley DB
Dependin on the requirement we should select Database Backend (eg : Cassandra for Availabilty and Partitionabilty, Hbase for Consistency and Partitionabilty)
ACID SupportYesACID is supported on BerkeleyDB Storage Backend
Has Transactions in Java API.On Cassandra Eventually consistent
ScalabiltyCan't Scale out like TitanOwns very good scalabilty
can scale like Cassandra if storage backend is cassandra
High AvailabiltyReplication is the only way to have high scalabilityTitan is like API because of that Availability of Storage backend is the availabilty for graph database
Failover is not smoothIf we are using cassandra with Titan No-Single-Point of failure. Extremely Available
Query LanguageCypher and GremlinGremlin
Cypher easy to learn but only suitable for simple queries.Gremlin has good algorithms to retrieve data in optimal way. (+ More generic)
Graph Sharding Not Available, under developmentNot Available, under development
Support for languagesJava/.NET/Python/PHP/NodeJS/Scala/GOJava
Written inJavaJava
ProtocolHTTP/RESTCan expose REST using Rexster
Use casesmore than 10 available0 use cases exposed officially
Number of edges vertices supported2^35 (~34 Billion) Nodes (Vertices)2^59 Vertices
2^35 (~34 Billion) Relationships (edges)2^60 (quintillion) edges
2^36 (~68 Billion) Properties
2^15 (~32 000) Relationship types
LimitationsKey Index must be created prior to key being used
Unable to drop key indices
For bulk graph operations we have to use Faunus otherwise storage backends get OutOfMemoryException
Types cannot be changed once created
Web AdminAvailableNot Available
EmbeddableYesYes
MapReduce-Yes with Faunus
Lucene Indexing SupportYesYes
BackupsYesYes (+Titan Parellel backup)

This link is also very usefull - http://db-engines.com/en/system/Neo4j%3BTitan

March 06, 2015

Alfresco: How to write a simple Java based Alfresco web script?

If you want to develop new feature for Alfresco best way is WebScript! Let's start with a simple Alfresco web script. First you need to create an Alfresco AMP maven project using archetype. In this example I'll use the latest alfresco version 5.0.

First I generated Alfresco All-in-One AMP. (Please refer my blog post on generating AMP projects.)

If you go through the files structure which is generated, you will find out a sample web script. It is a JavaScript based WebScript. By this example, I'm going to explain how to write a simple Java based Hello World web script.

HelloWorldWebScript.java service-context.xml helloworld.get.desc.xml helloworld.get.html.ftl
Create above files in below locations of your maven project.
  • HelloWorldWebScript.java - repo-amp/src/main/java/org/dedunu/alfresco/HelloWorldWebScript.java
  • helloworld.get.desc.xml - repo-amp/src/main/amp/config/alfresco/extension/templates/webscripts/org/dedunu/alfresco/helloworld.get.desc.xml
  • helloworld.get.html.ftl - repo-amp/src/main/amp/config/alfresco/extension/templates/webscripts/org/dedunu/alfresco/helloworld.get.html.ftl
  • service-context.xml - repo-amp/src/main/amp/config/alfresco/module/repo-amp/context/service-context.xml
Use below command to run the maven project.

mvn clean install -Prun 

It may take a while to run the project after that open a browser window. Then visit to below URL

http://localhost:8080/alfresco/service/dedunu/helloworld.


March 05, 2015

Alfresco: .gitignore for Alfresco Maven Projects

If you are an Alfresco developer, you have to develop projects using Alfresco AMP modules. Previously Alfresco has used Ant to build projects. But latest Alfresco SDK is using Apache Maven. AMP maven projects generates whole lot of temporary files. Those files you don't want in your version control system. 

Nowadays almost everyone is using Git. If I say Git is the most popular version control system today, I hope a lot of people would agree on that. In Git you can use .gitignore file to mention what are the files that should not add to the repository. So if you mention the patterns on .gitignore, Git won't commit unwanted files. For that you need a good .gitignore file. Last year I wrote a blog post which has almost all the file patterns which you should emit from Java project.

You can create a file called .gitignore on root folder of your Git repository. Then copy above content and add it to that file. After that commit that file into your Git repository. Now you don't have to worry about unwanted files.

March 04, 2015

Alfresco: Calculate folder size using Java based WebScript

I was assigned to a training task to write a web script for calculating the size of a folder or a file. But you need to go through all the nodes recursively. If you don't calculate it recursively in folders you won't get accurate folder size.

Requirements:
  • Java Development Kit 1.7 or later
  • Text Editor or IDE (Eclipse/Sublime Text/Atom)
  • Apache Maven 3 or later
  • Web Browser (Chrome/Firefox/Safari)

For this project, I generated Alfresco 5 All-in-One maven project. You really don't want Alfresco  Share module in this project. But I included it because you may need to find a NodeRefId. It would be easier with Share.  Source code of this project is available at GitHub.

size.get.html.ftl
size.get.desc.xml
FileSizeWebScript.java
service-context.xml 
Create above files in below locations of your maven project. 
  • size.get.desc.xml - repo-amp/src/main/amp/config/alfresco/extension/templates/webscripts/org/dedunu/alfresco/size.get.desc.xml
  • size.get.html.ftl - repo-amp/src/main/amp/config/alfresco/extension/templates/webscripts/org/dedunu/alfresco/size.get.html.ftl
  • FileSizeWebScript.java - repo-amp/src/main/java/org/dedunu/alfresco/FileSizeWebScript.java
  • service-context.xml - repo-amp/src/main/amp/config/alfresco/module/repo-amp/context/service-context.xml
How to test the web script?
Take a terminal. Navigate to project folder. And type below command.

mvn clean install -Prun -Dmaven.test.skip

It may take a while to start the Alfresco Repository and Share server. Wait till it finishes completely. 

Then open a web browser and go to http://localhost:8080/share. Then login. Go to Document library.


Find a folder and click on "View Details". Then copy NodeRef from browser as shown below.


Open a new tab and type below URL. (Replace <NodeRef> with the NodeRef you copied from Alfresco Share interface.)


If you have followed instruction properly, you will get a page like below.


If you have any questions regarding this examples, please comment!!! Enjoy Alfresco!

January 28, 2015

Do you want Unlimited history in Mac OS X Terminal?

Back in 2013, I wrote a post about expanding terminal history unlimited. Recently I moved from Linux to Mac OS. Then I wanted unlimited history. Usually in Mac OS X you will only get 500 entries in history. New entries would replace old entries.

Take the a terminal window and type below command.

open ~/.bash_profile

or 

vim ~/.bash_profile

Most probably you will get an empty file. Add below lines to that file. If the file is not empty add them in the end.

export HISTFILESIZE=
export HISTSIZE=

Next you have to save the file. Close the terminal and get new terminal windows. Now onwards your whole history would be stored in ~/.bash_history file.


January 25, 2015

OpenJDK is not bundled with Ubuntu by default

This is not a technical blog post. This was about a bet. One of my ex-colleagues told that OpenJDK is installed on Ubuntu by default. And I installed a fresh Virtual machine and showed him that it won't. Then I earned Pancakes. We went to The Mel's Tea Cafe


That Cafe on That Day

This was a treat from Jessi. (President of BVT) Actually we earned it buy helping her course work. According to her this was the best place. And we were excited. We planned to go there on 5pm. I was the guy who went there first. And time was around 4pm. Then I was waiting till someone comes. Aliza came there next. And we were waiting for our honorable president. 


I like the atmosphere. It was little bit hard to find That Cafe. You can see Jessi's favorite drink, Ocean Sea Fossil. BVT didn't want to leave the place. And we also decided the next BVT tour as well. Wait for next BVT tour. ;)

Simply Strawberries on 14th Jan

We went to have Strawberry waffles. And all of us wanted it with chocolate sauce. And my friend Jessi always want to take photographs of food. Jessi, Aliza and myself went there. So I got this photo because of her. Waffle was awesome. Also I love the setting there. 


This is the beginning of "Bon Viveur" team. And we decided to go out and try different foods and places much often. Oh my god, I forgot to mention about the shop. It is Simply Strawberries. We had a walk to the place and it was fun!!!

Sunday or Someday on 27th Dec

Three of us wanted to go somewhere. And then we tried to pick a date, but we couldn't. Finally we just agreed to go out on Sunday. Then we went to Lavinia Breeze and had fun. We were acting like kids. Screaming, Laughing. We don't mind what others think. That's us!!!


Then we went to Majestic City Cinema to watch Hobbit. And we laughed like idiots when we are supposed to be serious. ;) Then finally we had went to Elite Indian Restaurant.



Good Best friends!!! :D

The Sizzle on 17th Dec

Recently I started visiting places with my friends and enjoy. So last month, I went to The Sizzle with one of my best friends. Receptionist asked "table for two?". Then I nodded. He bought us two a table for two which looked little bit embarrassing.  But food was good. And This was the second time, I visited "The Sizzle".


And this Sizzle visit will be remarkable. ;)

January 23, 2015

How to run Alfresco Share or Repository AMP projects

From the previous post, I explained how to generate an Alfresco AMP project using Maven. When you have an AMP project you can run it by deploying it to an existing Alfresco Repository or Share. But if you are a developer you will not find it as a effective way to run Alfresco modules. The other way is that you can run the AMP project using Maven plug-in. 

In this post, I'm not going to talk about the first method. As I said earlier we can run an Alfresco instance using Maven. To do that from you terminal move to Alfresco AMP project folder and run below command.

mvm clean package -Pamp-to-war

Perhaps it may take a while. If you are running this command for the first time it will download Alfresco binary for local Maven repository. If you are running an instance again, your changes would be still available on that Alfresco instance. 

If you want to discard all the previous data, use below command.

mvn clean package -Ppurge -Pamp-to-war

Above command will discard all the changes and data. It will start a fresh instance.

Enjoy Alfresco Development!!!

January 20, 2015

How to generate Alfresco 5 AMP project

Recently I have been working as a Alfresco Developer. When you are developing Alfresco Modules, you need to have a proper project with correct directory structure. Since Alfresco use Maven, you can  generate Alfresco 5 AMP project using archetype.

First you need Java and Maven installed on your Linux/Mac/Windows computer. Then run below command to start the project.

mvn archetype:generate -DarchetypeCatalog=http://repo1.maven.org/maven2/archetype-catalog.xml -Dfilter=org.alfresco:

Then you will get below text.

[INFO] Scanning for projects...
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] >>> maven-archetype-plugin:2.2:generate (default-cli) > generate-sources @ standalone-pom >>>
[INFO] 
[INFO] <<< maven-archetype-plugin:2.2:generate (default-cli) < generate-sources @ standalone-pom <<<
[INFO] 
[INFO] --- maven-archetype-plugin:2.2:generate (default-cli) @ standalone-pom ---
[INFO] Generating project in Interactive mode
[INFO] No archetype defined. Using maven-archetype-quickstart (org.apache.maven.archetypes:maven-archetype-quickstart:1.0)
Choose archetype:
1: http://repo1.maven.org/maven2/archetype-catalog.xml -> org.alfresco.maven.archetype:alfresco-allinone-archetype (Sample multi-module project for All-in-One development on the Alfresco plaftorm. Includes modules for: Repository WAR overlay, Repository AMP, Share WAR overlay, Solr configuration, and embedded Tomcat runner)
2: http://repo1.maven.org/maven2/archetype-catalog.xml -> org.alfresco.maven.archetype:alfresco-amp-archetype (Sample project with full support for lifecycle and rapid development of Repository AMPs (Alfresco Module Packages))
3: http://repo1.maven.org/maven2/archetype-catalog.xml -> org.alfresco.maven.archetype:share-amp-archetype (Share project with full support for lifecycle and rapid development of AMPs (Alfresco Module Packages))

Choose a number or apply filter (format: [groupId:]artifactId, case sensitive contains): :

Now you have 3 options to select.
  1. All-in-One (This includes Repository Module, Share Module, Solar configuration and Tomcat runner. One-stop solution for Alfresco development. I don't recommend it to beginners to start with. )
  2. Alfresco Repository Module (This will generate AMP for Alfresco Repository.)
  3. Alfresco Share Module (This will generate AMP for Alfresco Share.)
Choose a number or apply filter (format: [groupId:]artifactId, case sensitive contains): : 2
Choose org.alfresco.maven.archetype:alfresco-amp-archetype version: 
1: 2.0.0-beta-1
2: 2.0.0-beta-2
3: 2.0.0-beta-3
4: 2.0.0-beta-4
5: 2.0.0
Choose a number: 5: 

In this example I used Alfresco Repository Module. Then it prompts for SDK version. By pressing enter you can get the latest(default) SDK version. Then Maven prompts for groupId and artifactId. Please provide a suitable Ids for them.

Define value for property 'groupId': : org.dedunu
Define value for property 'artifactId': : training
[INFO] Using property: version = 1.0-SNAPSHOT
[INFO] Using property: package = (not used)
[INFO] Using property: alfresco_target_groupId = org.alfresco
[INFO] Using property: alfresco_target_version = 5.0.c
Confirm properties configuration:
groupId: org.dedunu
artifactId: training
version: 1.0-SNAPSHOT
package: (not used)
alfresco_target_groupId: org.alfresco

alfresco_target_version: 5.0.c
 Y: : 

Then again Maven prompts for your target Alfresco version. At the moment the latest Alfresco version is 5.0.c. If you hit enter it will continue with latest version. Otherwise you can customize the target Alfresco version. Then it will generate a Maven project for Alfresco.

[INFO] ----------------------------------------------------------------------------
[INFO] Using following parameters for creating project from Archetype: alfresco-amp-archetype:2.0.0
[INFO] ----------------------------------------------------------------------------
[INFO] Parameter: groupId, Value: org.dedunu
[INFO] Parameter: artifactId, Value: training
[INFO] Parameter: version, Value: 1.0-SNAPSHOT
[INFO] Parameter: package, Value: (not used)
[INFO] Parameter: packageInPathFormat, Value: (not used)
[INFO] Parameter: package, Value: (not used)
[INFO] Parameter: version, Value: 1.0-SNAPSHOT
[INFO] Parameter: groupId, Value: org.dedunu
[INFO] Parameter: alfresco_target_version, Value: 5.0.c
[INFO] Parameter: artifactId, Value: training
[INFO] Parameter: alfresco_target_groupId, Value: org.alfresco
[INFO] project created from Archetype in dir: /Users/dedunu/Documents/workspace/training
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 08:33 min
[INFO] Finished at: 2015-01-19T23:58:38+05:30
[INFO] Final Memory: 14M/155M

[INFO] ------------------------------------------------------------------------