Tuesday, 28 August 2012

Installing Graphite on Solaris 10 from scratch leveraging OpenCSW

Some install notes on getting Graphite 0.9.10 ( http://graphite.wikidot.com ) setup on a stock Solaris 10 install using CSW. This expands a bit on the documentation here: http://graphite.wikidot.com/installation , hopefully its useful.
Note: this guide was written July-2012

# Install pkgutil / csw

# Install the required CSW pkgs
/opt/csw/bin/pkgutil -i CSWlibcairo
/opt/csw/bin/pkgutil -i CSWpython
/opt/csw/bin/pkgutil -i CSWpython-dev
/opt/csw/bin/pkgutil -i CSWgit
/opt/csw/bin/pkgutil -i CSWpycairo
/opt/csw/bin/pkgutil -i CSWapache2
/opt/csw/bin/pkgutil -i CSWpysetuptools
/opt/csw/bin/pkgutil -i CSWsqlite
/opt/csw/bin/pkgutil -i CSWpy-django
/opt/csw/bin/pkgutil -i CSWpy-zope-interface
/opt/csw/bin/pkgutil -i CSWpy-twisted
/opt/csw/bin/pkgutil -i ap2_modwsgi

# Optional
/opt/csw/bin/pkgutil -i CSWgcc3 # only necessary if you plan to rebuild components below yourself.
/opt/csw/bin/pkgutil -i CSWpy-ldap # optional

# Install PIP
/opt/csw/bin/easy_install pip

# Install Python modulues via PIP
/opt/csw/bin/pip install python-memcached
/opt/csw/bin/pip install django-tagging
/opt/csw/bin/pip install txamqp

# Now we go and get graphite itself (finally)
mkdir /tmp/graphite
cd /tmp/graphite
cd /tmp/graphite/graphite-web && /opt/csw/bin/git checkout 0.9.10
cd /tmp/graphite/carbon && /opt/csw/bin/git checkout 0.9.10
cd /tmp/graphite/whisper && /opt/csw/bin/git checkout 0.9.10

cd /tmp/graphite
pushd whisper
/opt/csw/bin/python setup.py install
popd

cd /tmp/graphite
pushd carbon
/opt/csw/bin/python setup.py install
popd

cd /tmp/graphite
pushd graphite-web
/opt/csw/bin/python setup.py install
popd

# CONFIGURE
# obviously theres lots of ways to do this, below is just a very quick way to get going.
# I strongly suggest you give this some consideration yourself.
# Worth reading through this doc:
#
cd /opt/graphite/conf
cp dashboard.conf.example dashboard.conf

cp carbon.conf.example carbon.conf
cp storage-schemas.conf.example storage-schemas.conf
cp graphite.wsgi.example graphite.wsgi
cp graphTemplates.conf.example graphTemplates.conf

cd /opt/graphite/webapp/graphite 
cp local_settings.py.example local_settings.py
EDIT: local_settings.py
SET: TIME_ZONE = 'Europe/London' (or whatever else is appropriate)

cd /opt/graphite/webapp/graphite
/opt/csw/bin/python manage.py syncdb

EDIT "/opt/csw/apache2/etc/httpd.conf" uncomment "Include etc/extra/httpd-vhosts.conf" , remove default DocumentRoot and Virtual Server for /

cat "/opt/graphite/examples/example-graphite-vhost.conf" >> /opt/csw/apache2/etc/extra/httpd-vhosts.conf
EDIT "/opt/csw/apache2/etc/extra/httpd-vhosts.conf"
CHANGE "LoadModule wsgi_module modules/mod_wsgi.so" to "LoadModule wsgi_module libexec/mod_wsgi.so"
CHANGE WSGISocketPrefix /tmp/
EDIT "/opt/graphite/webapp/graphite/settings.py" change:
- DATABASE_ENGINE = 'django.db.backends.sqlite3' # 'postgresql', 'mysql', 'sqlite3' or 'ado_mssql'.
+ DATABASE_ENGINE = 'sqlite3'     # 'postgresql', 'mysql', 'sqlite3' or 'ado_mssql'.

# at your own risk. There are better ways to permission things
chown nobody:nobody /opt/graphite/storage/graphite.db
chmod 766 /opt/graphite/storage/log/webapp/ 
chown root:nobody /opt/graphite/storage
chmod 775 /opt/graphite/storage
chmod 775 /opt/graphite/storage/*

# enablethe webserver
svcadm enable cswapache2

# start carbon
/opt/graphite/bin/carbon-cache.py start

Monday, 27 August 2012

Screwing up Solaris linker with crle

DISCLAMER: DO NOT BLINDLY APPLY ANY CONFIGS IN THIS POST, BEFORE FULLY UNDERSTANDING THE IMPACT. YOU HAVE BEEN WARNED!!

-- At the risk of ruining the punchline use LD_NOCONFIG if you mess up your runtime linker path using crle on solaris --

I was setting up a software package provided to me which wanted to install in /opt/environmentname/packagename/[etc, bin, lib, log] and soforth. It was compiled in a way which required the linker to know to search /opt/environmentname/packagename/lib .

So the basic steps were

  1. install software
  2. setup linker
  3. test
Heres what i did:
  • Install software "pkgadd -d packagename"
  • export LD_LIBRARY_PATH=/opt/environmentname/packagename/lib
  • /opt/environmentname/packagename/bin/packagename
  • SUCCESS
But then i thought isnt LD_LIBRARY_PATH evil.. And isnt this software going to be on the majority of my machines. Lets go ahead and update the system-wide library search path.
  • Did a quick man crle and found the -l option , you see where this is going right.
  • export LD_LIBRARY_PATH=""
  • sudo crle -l /opt/environmentname/packagename/lib
  • /opt/environmentname/packagename/bin/packagename # Once again SUCCESS!
  • ls however is now giving me "ld.so.1: ls: fatal: libc.so.1: open failed: No such file or directory" , EPIC FAIL!
It quickly becomes apparent that crle -l overwrites the runtime linker search path, rather than appending to it.. Lets revert to something sane.

~> sudo crle -l /lib -l /usr/lib -l /usr/sfw/libld.so.1:
sudo: fatal: libc.so.1: open failed: No such file or directory

Uh oh... sudo needs a correct environment to do its thing.. Thats OK i'll just update LD_LIBRARY_PATH so sudo can do its thing.

~> export LD_LIBRARY_PATH=/lib:/usr/lib:/usr/sfw/lib 
~> sudo crle -l /lib -l /usr/lib -l /usr/sfw/lib 
ld.so.1: sudo: fatal: libc.so.1: open failed: No such file or directory

Ahh yes.. So isnt sudo secure in that it filters most environment variables from the sudo environment.. So how do i fix things now? LD_NOCONFIG is your friend.

~> LD_NOCONFIG=1  
~> sudo crle -l /lib -l /usr/lib -l /usr/sfw/lib
~> ls 
.    ..

Wohoo it works again! Now i just need to go back and set the search path i originally wanted to but this time append rather than overwrite.
~> sudo crle -u /opt/environmentname/packagename/lib 
~> /opt/environmentname/packagename/bin/packagename # SUCCESS!
From the man page:
man ld.so.1 
...      
LD_NOCONFIG, LD_NOCONFIG_32, and LD_NOCONFIG_64 

         By default the runtime linker attempts to open and  pro-         cess  a  configuration  file. When LD_NOCONFIG is set to         any non-null value, the  runtime  linker  disables  this         configuration file processing.


Wednesday, 2 November 2011

Automount differences in linux and solaris.

posting even though its been stuck in my drafts for nearly 2 years. :)
--
So im part of a project team setup to roll out linux where i work. We've had linux before but it never really worked well for us and at this stage we're running a RHEL3 build which is hopelessly out of date. We've been tasked to roll out something a bit more modern and it looks like Centos 5.x will be what we'll go for...

So we've always been a solaris house, and one of our tasks with this project is to get our Solaris and Linux environments to work together as closely as reasonably possible.

this post is about some strangeness i found with autofs in Linux vs Solaris 10..

So take this automount map snippet for example.
#auto.mymap
mymount          localhost:/path/to/mymount      serveronsamesubnet:/path/to/mymount        serveronremotesubnet:/path/to/mymount

This sort of map works as it should in a consistent way on Solaris10 / Centos5.2... First of all it parses the map sees that the localhost entry is the closest in proximity to this server ie: PROXIMITY_LOCAL (since its the same machine) and prioritizes that for mount. ie: if it is available it is the only mount that can be chosen, regardless of weightings or proximity of the other mounts.

Next we might have something like this where servers are weighted using a (number) next to the server name.
#auto.mymap
mymount          server1onsamesubnet(1):/path/to/mymount        server2onsmaesubnet(10):/path/to/mymount

This time automount will look at these weightings and use these as a scaling factor which will influence which server will be chosen to mount from.. So in this case it will determine the proximity as being PROXIMITY_SUBNET (as both servers are on the same subnet) , next it will do some RPC tests to figure out which server is most responsive [1] , this is the cost... Finally the cost is multipled by the (weight given + 1) to give a compound cost. The lowest cost server is chosen for mount.

Thus far Linux / Solaris appear to exhibit the same behaviour.

Now what happens if we havfe something like this:
#auto.mymap
mymount          localhost:/path/to/mymount        localhost:/path2/to/mymount

This might look a bit strange but it might make sense in the case where one of those localhost paths is in turn a mount to network attached storage. For example maybe on most machines you're happy for your app to scratch to NFS to a netapp which has an index of all its mounts on /netapp... But maybe on some hosts you want faster blocklevel local storage.

Well anyways in this case Linux and solaris behave differently in how they interpret the map. Owing to the way Linux (RHEL 5.3) interprets the map it will pick the first localhost mount from right-to-left, when in Solaris land it picks the first from left-to-right.

[1] - RPC tests appear to first of all try an RPC call to the remote NFS server, with a 0.1 second timeout, recording which server responds in the most timely fashion. If this fails the RPC test is repeated using a 10second timeout... Finally if this test fails automount attempts to mount the first server on the list.

Remote Tech support Warning in Vmware ESXi

Enabling remote tech support in vmware , is flagged as a configuration issue and will display as a warning level alert on each node which has it enabled.

To supress this you can follow this article:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1016205

ie:
for each node you want to do this on go to Configure -> Advanced Settings -> User Vars
Set UserVars.SuppressShellWarning = 1

All sorted.

Wednesday, 22 September 2010

Verizon Business support is really awful.

So we've got a Verizon supplied internet connection.

Given a number to call for BGP related things so called this number, put it on speaker-phone as i waited and was stuck on hold for 40 minutes before i gave up, with no opportunity to request a call back... Just apology messages and hold music... /me gets tad frustrated

I know what i'll do ... I'll log a trouble ticket and hopefully that'll poke them into giving me a call back. So trouble ticket gets logged, and im promised a call back within 4 hours, I'm advised that if 4 hours passes to give them a call back.

Guess what? So over 4 hours passes . I call back verizon support, politely inquiring whats happened, and why i havent been contacted... Advised that a tech is working on the case currently, just hasnt contacted me.. So i request call is transferred to tech.. Guess what... Hes not available. Person at other end of the phone says he doesnt get into the office for another 5 hours!! AT this i really do have to laugh.

So i request that case is transferred to another tech, one who is in the office... Support goes OK. Support when can i expect a call back... It'll be another 4 hours..

/me sighs... All im trying to do is get somebody to flick a switch... Ask support, any chance of a sooner callback ? Line gets cut off...

So thats over an hour wasted on the phone to verizon today, with no sense of whats being done about the ticket BGP peering still not sorted. And no idea of when im getting a callback... And if i decide to make the investment of making another call to support, it takes a minimum of 2minutes and 12 seconds to listen to all their advertising and mash the keypad enough to get through to an actual person...

Verizon, by quite a large margin, you truly do have the worst support of any vendor I have the pleasure of working with.


Rant over.....

Friday, 27 August 2010

Cisco Archive config.

Setting up a new infrastructure and their switches have an archival config like this, meaning that the config is written to the ftp path nominated every time a user issues the "wr mem" command and otherwise every 12 hours if there are changes. The "path" uses 2 variables namely $h == hostname of swtich and $t == time of archive upload.

archive
 log config
  logging enable
  notify syslog contenttype plaintext
  hidekeys
 path ftp://windows-ftp-server/$h-$t
 write-memory
 time-period 720


Noticed however that this was not working on 2 out of 3 switches... when i looked into it it turned out to be only an issue when the $t variable was used.

After a quick pcap of what was going on  (cisco's logs didnt give any info about this) i found the format of the time specified by $t on 2 of the swtiches was like this "Aug-27-07:37:06.147" while on the other it was in the format "Aug-27-12-01-16.548"... This was preventing the upload from working on 2 switches as there wree colons : in the file name, which arent a valid character for a windows filename. Given a windows FTP server was in use the upload failed. :(

I noticed that this was an issue on :
- c3750-advipservicesk9-mz.122-44.SE.bin
- c3750e-universalk9-mz.122-50.SE3.bin

But not on:
- c3750-ipservicesk9-mz.122-52.SE.bin

Upgraded all switches to 12.2(52) and things are looking better now.

Thursday, 24 June 2010

Manage Cisco ASA over vpn connection

ARGH.. this drove me mad.. A surprisingly simple / common thing people may want to do is
- setup a Cisco ASA device (in this case a 5505) at a satellite office
- establish an VPN over the internet from this device to the main office
- manage device remotely via this VPN link.

In this case all that was required, in addition to the vpn setup was:
ssh 0.0.0.0 0.0.0.0 inside
ssh timeout 30
ssh version 2
management-access inside

The key being "management-access inside", just thought i'd post about it because it drove me mad, and its so easy when you know how.

Heres a link:
https://www.cisco.com/en/US/docs/security/pix/pix63/command/reference/mr.html#wp1137951