Thursday, March 06, 2014

 

Improving Performance with Zeos and PostgreSql


Last month I was asked if one of our client-server applications can run from a remote site via Internet, sure, why not?, just open a port in your router and point the connection's IP to the external address and done.

A couple of days later, the same customer calls again, telling me that the application is working, but it's slow like a turtle, even on a 2Mb link. Ok, I closed Hacker News and started thinking about how to debug this.

Logging PostgreSql queries


The first thing I did was getting a backup of that customer's database (well, a very old one, because now it's 0.5Tb), and restored on my PostgreSql 9.2 instance. I did this because as I wanted to log everything I just need one client connected, otherwise I'll get mixed queries from different PCs.

To log everything just open your postgresql.conf and set this:

logging_collector = on 
log_min_duration_statement = 0 
log_statement = all

Save and restart the postgres service, on Debian based systems just do "sudo service postgresql restart". Then start using the application and go to the pg_log directory, by default in /var/lib/postgresql/9.2/main/pg_log, and you'll see postgresql-YYYY-MM-DD_######.log files.

Ok, once logging_collector was set up, I opened my application then went to one option that just listed data (a grid) then double clicked on a row, to open a dialog containing further data, then closed the application. I thought just a couple of queries will be enough for this task, but I was way wrong.

Let me tell you something about this app here, it's an old application started by a small team seven years ago, initially the team was very design patterns focused (remember the time when Java was everywhere?, we used Delphi, but the Java way), after the initial years, whe keep adding features, the team changed over time, and the quality control dissapeared (after the experience I'm writing here things will change).

Going back to the main topic, what I found in the log files was the app was doing not two queries, but forty, yes 40!, and that wasn't the only strange thing I've found in the logs, there were queries we didn't write, and some warnings I must get rid of (I'll explain this at the end of this article).

The process of improving this was first remove the unwanted queries, that look like this:

SELECT
n.nspname,c.relname,a.attname,a.atttypid,a.attnotnull,a.atttypmod,a.attlen,a.attnum,pg_get_expr(def.adbin,
def.adrelid) as adsrc,dsc.description  FROM pg_catalog.pg_namespace n  JOIN
pg_catalog.pg_class c ON (c.relnamespace = n.oid)  JOIN pg_catalog.pg_attribute
a ON (a.attrelid=c.oid)  LEFT JOIN pg_catalog.pg_attrdef def ON
(a.attrelid=def.adrelid AND a.attnum = def.adnum) LEFT JOIN
pg_catalog.pg_description dsc ON (c.oid=dsc.objoid AND a.attnum = dsc.objsubid)
LEFT JOIN pg_catalog.pg_class dc ON (dc.oid=dsc.classoid AND
dc.relname='pg_class') LEFT JOIN pg_catalog.pg_namespace dn ON
(dc.relnamespace=dn.oid AND dn.nspname='pg_catalog')  WHERE a.attnum > 0 AND
NOT a.attisdropped AND c.relname LIKE E'sysconfig' AND a.attname LIKE E'%'
ORDER BY nspname,relname,attnum


I've asked in the Zeus Forum and was told TZConnection has a UseMetadata property set to true by default, so setting it to False fixed the issue. Now the query count was about 30, a 20% less.

The next step was to look for some repeating patterns in the log file, and I found many queries similar to this:

select varvalue from config where varname='REFRESH_TIME';
select varvalue from config where varname='MAX_DISPLAY_ROWS';
...


As you can see, we store configuration values in the "config" table, and query values on demand. This sounds ok, but is this a smart way of doing it?, config params change in real-time?, surely not, it must be a better way.

Luckily all the configuration was encapsulated in a class, with properties like TConfig.VarValue, so, all the changes to be made where there, inside this class.

The solution to this problem was to mantain a Cache of variables, so we added an internal TCollection that was loaded with all the config when the application starts,  then added a LastAccessTime property, containing the last time a variable was accessed, if more than 10 minutes has passed, then we re-load the collection.

This single change removed more than 15 queries, now we are at only 15 queries left to be improved.

Our application has a very neat feature that is per-grid column configuration, this allows an administrator to configure what columns (and it's title, width, etc.) must be displayed. This information was loaded every time the grid is refreshed, once when the window is opened, then every 30 seconds. Usually each screen has one main grid an two or tree subgrids, each with this functionality, all of them are refreshed at the same time. We removed the re-loading at refresh time, eliminating three (for this screen only) queries, now there are 12 queries left.

Some time ago, we implemented "pagination" in our grids, this way we limited the amount of records retrieved from the database, as some queries loaded more than 10k records this was a must have for us. The way we implemented it was this:

1º Having a query, for example "select * from customers", first we needed the total count:

   select count(*) from
      (select * from customers) as foo;


   This allowed us to get the count of any query, very neat at the time.

2º Execute the real query with a Limit and Offset:

   select * from customers limit 100 offset 101;

This way, we used two queries to display "XXXX records of YYYYY". This improved a lot compared to what we have initially, but forced us to use two queries.

Since PostgreSql 8.4, a very neat feature called Window Functions was added, allowing impossible tasks with the former versions, one of those tasks is getting the total record count even when a "limit" is imposed, for example:

   select
      count(*) over() as Total_Count,
      c.*
   from customers c
   limit 100 offset 101;


After removing the old behavior, we removed the count query on each grid, on the screen we used for logging, we have three grids, hence 3 queries were removed. Now only 9 queries were made.

The last step won't remove a query, but alowed us to speed those with Blob fields, an let us get rid of some nasty warnings.

Looking at the log files, we found many Warnings similar to this:

2014-02-25 09:53:57 ART WARNING:  nonstandard use of \\ in a string literal at character 42
2014-02-25 09:53:57 ART HINT:  Use the escape string syntax for backslashes, e.g., E'\\'.
2014-02-25 09:53:57 ART STATEMENT:  update ....'


This tells the database is escaping everything with a backslash, since our blob fields were encoded, they included backslashes. If we get rid of this encoding, the data sent by the wire will be significantly less.

Again, asking in the Zeos Forum, I was told what to do, just add doPreferPrepared to TZQuery objects and done!. This only works if you use parametrized queries, luckily we use them in all of our queries.

Final words


The optimization process didn't finish here, but, to not force the customer to wait more time, we branched the app and delivered this quicker version. I must admit, we are very happy with the results. Surely we could have created a web version, but as our customer's budget is thin, and they needed to use the application in no more than two weeks, I think we did the right thing.

Labels:


Monday, September 09, 2013

 

Lazarus + PostgreSql's JSON (your secret weapon)


Hi, today I've read the news about the new PostgreSql 9.3 and, as usual, I went directly to the Release Anouncement, pretty impressive stuff. The part that caught my attention was the JSON related part, so I've read a little more here and couldn't stop the desire to do some tests.

So, the first thing was installing the brand new version on my Ubuntu 12.04:


1) Add this line to /etc/apt/sources.list

deb http://apt.postgresql.org/pub/repos/apt/ precise-pgdg main

2) Execute this command:

sudo wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | \
sudo apt-key add -

3) sudo apt-get update
4) sudo apt-get install postgresql-9.3



Now, let's create a sample database, in this case please download the SQL metadata from: Here

Scriptable version:

wget http://www.commandprompt.com/ppbook/booktown.sql
psql -h 127.0.0.1 -U postgres < booktown.sql

Now, please take a look at the database using PgAdmin III or directly with the command line tool psql.

The database has many tables related to books. I'll create a very simple query to get all the books and its author:

select *
from books b
join authors a on b.author_id=a.id


Now the fun part beggins. To get the results as JSON, replace the former query by this:

The first step is to replace the "select *" with fields:

select
  b.id as book_id,
  title,
  last_name,
  first_name
from books b
join authors a on b.author_id=a.id


The second step is to use slightly modify the original query, to let use the "json_agg" function:

select json_agg(resultset) as json from (
select
  b.id as book_id,
  title,
  last_name,
  first_name
from books b
join authors a on b.author_id=a.id) as resultset


This returns just one string containing our json:

"[{"book_id":41472,"title":"Practical PostgreSQL","last_name":"Worsley","first_name":"John"},
 {"book_id":25908,"title":"Franklin in the Dark","last_name":"Bourgeois","first_name":"Paulette"},
 {"book_id":1234,"title":"The Velveteen Rabbit","last_name":"Bianco","first_name":"Margery Williams"},
 {"book_id":190,"title":"Little Women","last_name":"Alcott","first_name":"Louisa May"},
 {"book_id":7808,"title":"The Shining","last_name":"King","first_name":"Stephen"},
 {"book_id":4513,"title":"Dune","last_name":"Herbert","first_name":"Frank"},
 {"book_id":2038,"title":"Dynamic Anatomy","last_name":"Hogarth","first_name":"Burne"},
 {"book_id":1501,"title":"Goodnight Moon","last_name":"Brown","first_name":"Margaret Wise"},
 {"book_id":156,"title":"The Tell-Tale Heart","last_name":"Poe","first_name":"Edgar Allen"},
 {"book_id":41477,"title":"Learning Python","last_name":"Lutz","first_name":"Mark"},
 {"book_id":41473,"title":"Programming Python","last_name":"Lutz","first_name":"Mark"},
 {"book_id":41478,"title":"Perl Cookbook","last_name":"Christiansen","first_name":"Tom"},
 {"book_id":4267,"title":"2001: A Space Odyssey","last_name":"Clarke","first_name":"Arthur C."},
 {"book_id":1590,"title":"Bartholomew and the Oobleck","last_name":"Geisel","first_name":"Theodor Seuss"},
 {"book_id":1608,"title":"The Cat in the Hat","last_name":"Geisel","first_name":"Theodor Seuss"}]"


Now, from your Lazarus/fpc program you can easily get the JSON string.

One possible usage for this that comes to my mind, is a CGI app that returns a resultset to an EXTjs grid.  The CGI just executes the query and return the string in the TResponse handler as ContentType "application/json".

Wednesday, March 06, 2013

 

Quick CGI SpellChecker


Here's a small CGI program that provides spellchecking services based on GNU Aspell. I use it on a local network where Delphi Win32 clients connects to this CGI hosted on an Apache Server, running on Linux, to ask for spelling suggestions.

This simple spellchecker has a small API that does the very basic, that is, check for word spelling, add word to dictionary, and delete word from dictionary. The client side can be done using any language capable of doing http GET requests, and handling JSON responses.

Prerequisites:


Of course, the first requisite is to install Aspell and one or more dictionaries. On apt-get based systems, you'll install them using this:

sudo apt-get install aspell
sudo apt-get install aspell-en_US

Then next step is to create a personal dictionary, this is a just a plain text file where new words will be added. The file must have just one line, containing this:

personal_ws-1.1 en 0 

If the dictionary will be using, for example, spanish words, you must replace "personal_ws-1.1 en 0" by "personal_ws-1.1 es 0", and do:

sudo apt-get install aspell-es_ES  (or es_AR for Argentina).

IMPORTANT: please set RW attributes to the file, to allow read/write by everyone.

How it works, the API:


All requests must be done using these commands:

/cgi-bin/cgiaspell/TSpellCheck/WordSpell?word=
/cgi-bin/cgiaspell/TSpellCheck/WordAdd?word=
/cgi-bin/cgiaspell/TSpellCheck/WordDelete?word=

Here is any word to be spelled, added or deleted from dictionary.

Spell checking


For example, if you want to check spelling on word "houuse", youl have to do this:
http://myserver /cgi-bin/cgiaspell/TSpellCheck/WordSpell?word=houuse
The result is this JSON string:
{ "replacements" : ["House", "house", "hose", "horse", "hours", "hoarse", 
"hoes", "hues", "Hosea", "housed", "houses", "Hus", "hos", "horsey", "hour's", 
"Ho's", "hows", "huhs", "Horus", "hoarser", "douse", "louse", "mouse", "rouse", 
"souse", "Hausa", "Hesse", "hoe's", "hoers", "how's", "hussy", "Hui's", 
"House's", "house's", "hue's", "hoar's", "hoer's", "Horus's"], "total" : 38 }

Adding a word to the personal dictionary


http://myserver /cgi-bin/cgiaspell/TSpellCheck/WordAdd?word=houuse

This will return "Ok." if the word was added correctly.

Deleting a word from the personal dictionary


http://myserver /cgi-bin/cgiaspell/TSpellCheck/WordDelete?word=houuse

This will return "Ok." if the word was removed correctly, or, a message saying it wasn't deleted.

The program


In Lazarus, just create a CGI Application (you'll need the WebLaz package), save the project as "cgiaspell.lpi", and rename unit1 to main.

Now, adapt your main.lfm to this :

object SpellCheck: TSpellCheck
  OnCreate = DataModuleCreate
  OldCreateOrder = False
  Actions = <  
    item
      Name = 'WordSpell'
      Default = False
      OnRequest = WordSpellRequest
      Template.AllowTagParams = False
    end  
    item
      Name = 'WordAdd'
      Default = False
      OnRequest = WordAddRequest
      Template.AllowTagParams = False
    end  
    item
      Name = 'WordDelete'
      Default = False
      OnRequest = WordDeleteRequest
      Template.AllowTagParams = False
    end>
  CreateSession = False
  Height = 150
  HorizontalOffset = 250
  VerticalOffset = 250
  Width = 150
end

Then do the same to main.pas as this:

unit main;

{$mode objfpc}{$H+}

interface

uses
  SysUtils, Classes, httpdefs, fpHTTP, fpWeb,
  process,
  fpjson;

type

  { TSpellCheck }

  TSpellCheck = class(TFPWebModule)
    procedure DataModuleCreate(Sender: TObject);
    procedure WordAddRequest(Sender: TObject; ARequest: TRequest;
      AResponse: TResponse; var Handled: Boolean);
    procedure WordDeleteRequest(Sender: TObject; ARequest: TRequest;
      AResponse: TResponse; var Handled: Boolean);
    procedure WordSpellRequest(Sender: TObject; ARequest: TRequest;
      AResponse: TResponse; var Handled: Boolean);
  private
    function ASpellToJSON(AAspellResult: string): string;
    function SpellWord(AWord: string): string;
  public
    { public declarations }
  end;

var
  SpellCheck: TSpellCheck;

const
  cDictionary = '/home/leonardo/.aspell.es_AR.pws';

implementation

{$R *.lfm}

{ TSpellCheck }

procedure TSpellCheck.DataModuleCreate(Sender: TObject);
begin

end;

procedure TSpellCheck.WordAddRequest(Sender: TObject; ARequest: TRequest;
  AResponse: TResponse; var Handled: Boolean);
var
  lStr: TStringList;
  lWord: string;
begin
  if ARequest.QueryFields.IndexOfName('word') = - 1 then
    raise Exception.Create('word param is not present')
  else
    lWord := ARequest.QueryFields.Values['word'];

  // todo: this should be replaced by something
  // more reliable. I.e.: what happens if cDictionary is blocked
  // by another process, or LoadFromFile can be slow on big dictionaries.
  lStr := TStringList.Create;
  try
    lStr.LoadFromFile(cDictionary);
    if lStr.IndexOf( LowerCase(lWord) ) = -1 then
    begin
      lStr.Add(lWord);
      lStr.SaveToFile(cDictionary);
      AResponse.Content := 'Ok.';
    end
    else
      AResponse.Content := lWord + ' already in dictionary.';
  finally
    lStr.Free;
  end;
  Handled:= True;
end;

procedure TSpellCheck.WordDeleteRequest(Sender: TObject; ARequest: TRequest;
  AResponse: TResponse; var Handled: Boolean);
var
  lStr: TStringList;
  lIdx: Integer;
  lWord: string;

begin
  if ARequest.QueryFields.IndexOfName('word') = - 1 then
    raise Exception.Create('word param is not present')
  else
    lWord := ARequest.QueryFields.Values['word'];

  // todo: this should be replaced by something
  // more reliable. I.e.: what happens if cDictionary is blocked
  // by another process, or LoadFromFile can be slow on big dictionaries.
  lStr := TStringList.Create;
  try
    lStr.LoadFromFile(cDictionary);
    lIdx := lStr.IndexOf( LowerCase(lWord) );
    if lIdx <> -1 then
    begin
      lStr.Delete(lIdx);
      lStr.SaveToFile(cDictionary);
      AResponse.Content := 'Ok.';
    end
    else
      AResponse.Content := lWord + ' not in dictionary.';
  finally
    lStr.Free;
  end;
  Handled:= True;
end;

procedure TSpellCheck.WordSpellRequest(Sender: TObject; ARequest: TRequest;
  AResponse: TResponse; var Handled: Boolean);
var
  lWord: string;
begin
  if ARequest.QueryFields.IndexOfName('word') = - 1 then
    raise Exception.Create('word param is not present')
  else
    lWord := ARequest.QueryFields.Values['word'];

  AResponse.Content := SpellWord(lWord);
  Handled := True;
end;

function TSpellCheck.ASpellToJSON(AAspellResult: string): string;
var
  lStr: TStringList;
  lJSon: TJSONObject;
  lJsonArray: TJSONArray;
  I: Integer;
begin
  Result := '';
  lStr := TStringList.Create;
  lJson := TJSONObject.Create;
  try
    if Pos(':', AAspellResult) > 0 then
      lStr.CommaText:= Copy(AAspellResult, Pos(':', AAspellResult) + 1, Length(AAspellResult));
    lJsonArray := TJSONArray.Create;
    for I := 0 to lStr.Count - 1 do
      lJsonArray.Add(lStr[I]);
    lJSon.Add('replacements', lJsonArray);
    lJson.Add('total', lStr.Count);
    Result := lJSon.AsJSON;
  finally
    lJSon.Free;
    lStr.Free;
  end;
end;

function TSpellCheck.SpellWord(AWord: string): string;
var
  lProcess: TProcess;
  Buffer: array[0..2048] of char;
  ReadCount: Integer;
  ReadSize: Integer;
begin
  lProcess := TProcess.Create(nil);
  lProcess.Options := [poUsePipes,poStderrToOutPut];
  lProcess.CommandLine := '/usr/bin/aspell -a --lang=es_AR -p ' + cDictionary;
  lProcess.Execute;
  lProcess.Input.Write(PAnsiChar(AWord)[0], Length(AWord));
  lProcess.CloseInput;

  while lProcess.Running do
    Sleep(1);

  ReadSize := lProcess.Output.NumBytesAvailable;
  if ReadSize > SizeOf(Buffer) then
    ReadSize := SizeOf(Buffer);
  if ReadSize > 0 then
  begin
    ReadCount := lProcess.Output.Read(Buffer, ReadSize);
    Result := Copy(Buffer,0, ReadCount);
    Result := ASpellToJSon(Result);
  end
  else
    raise Exception.Create(Format('Exit status: %d', [lProcess.ExitStatus]));

  lProcess.Free;
end;

initialization
  RegisterHTTPModule('TSpellCheck', TSpellCheck);
end.

Compile, copy to your Apache CGI directory and enjoy!.

This page is powered by Blogger. Isn't yours?