XML DB anyone?

hi,

for the application at flash networks, we need an xml storage that is roll-backable, searchable and transactional... berkley-xmldb was purposed as an option. have you used anything similar?
cheers,

Zvika


Adi:
===
I seriously think that you should consider turn to another solution.
For me, XML always comes as last option (or something very close to it).
The amount of code needed for manipulating XML in a safe way is unacceptable,
the performance is not so good, and the year is 2009.

Udi:
===
Maybe you can use an RDF database?

Zvika:
====
thanks,
don't know much about it, will take a look...


Udi:
===
zvika, what's the nature of the data? what do you mean by hierarchical? is it a tree? directed graph? what do you mean by "no properties"?

what's the size of the data? what's the expected read / update rates?
what's the requirements for search? keyword search? form search?

call if you can...

Adi:
===
What are the sources of the persistent data (who/what are the clients)?
What transport layer is used for communication with the saver?

Ittay:
====
i don't know exactly what you meant by 'auto-versioned' in your last email, but for rdf, you can check http://gvs.hpl.hp.com/application/about-gvs

Adi:
===

There's no need for aggregation
There's no need for semantics
There's no need for RDF

RDF is an XML on steroids.

Zvika:
====

To Adi,
Q: What are the sources of the persistent data (who/what are the clients)?
A: the persistent data is configuraion of network elements. this is an NMS system. The clients are the network elements themselves, through a translation layer that will know how to translate the generic xml format to each device's format (csv, other xml jargon, jmx properties, files, ...)

Q: What transport layer is used for communication with the saver?
A: Untraditiopnally for NMS (or at least I would think), the EMS server initiates communication with the network elements / applications via jmx, ftp, webservice, and maybe some other protocols. the afore-mentioned(!) translation layer should have the knowhow of these protocols.

To Udi,

Q: what's the nature of the data? what do you mean by hierarchical? is it a tree? directed graph? what do you mean by "no properties"?
A: nature of the data is NMS configuration.
data is hierarchical in the sense that properties values may be lists or tables. this is why I don't think the data format could be name-value properties

Q: what's the size of the data? what's the expected read / update rates?
A: data is read-mostly

Q: what's the requirements for search? keyword search? form search?
A: search should be straight - no ambiguities - more than a search it can be thought of as a reference method, allowing to r/w access certain configuration data/nodes in a 1:1 fashion. this is why XPATH was selected in the previous implementation and we're thinking about it now, too

Thanks!

Ittay:
====

maybe use a standard format like CIM (http://en.wikipedia.org/wiki/Common_Information_Model_(computing))? oasis has some standard based on xml but i can't remember what it is right now (and i share Adi's distaste for xml...)
since i think this sort of data tends to have many exceptions, so it is hard to classify into a rigid (read: sql tables) format, i think rdf is a good option.

[regarding transport] why not snmp? (though jmx is very close)

it looks to me that this discussion reveals that it is not clear how the system architecture will look like (specifically, how clients will interact). if this is the case, then maybe it is a good idea to suggest to the client to take a system architecture consultant that knows the standards and pitfalls and can suggest a system architecture model. from there, choosing the java implementation to realize the model will be easier. i can recommend someone if you'd like

ittay

Adi:
===

Yanai, isn't that an expertise of yours?


Ittay:
====
to make sure my suggestion was understood, i was referring to architecture from a sysadmin point of view, especially, protocols being used, no relation to java.

Zvika:
====

the interaction model with the backend apps is well-known, either properties via jmx or http downloads of config files. the interaction model with the front end clients is a little more complicated due to the APP-EMS-CMS chain but is also known.
What I am trying to find out is a: what syntax would support the flexibility that is needed and b: what technology will provide features such as versioning, search and update.
RDF syntax was disregarded due to complexity and the fact that the configuration should be human readable and editable, and xml is still the leading choice.

Ittay:
====

what kind of search are we looking at? is a mapping to a filename sufficient (e.g.,, do you receive a node name and provide a config file for it)? if so, git/hq can get you versioning.
see if this: http://en.wikipedia.org/wiki/CMDB kind of thing is suitable for you.

Zvika:
=====

thanks!

the search is, potentially, attribute based, type-oriented. for example: retrieving all blade centers who are running application X. or: retrieving all parameters of a certain application. or: all applications running on a certain blade. you get the idea. so filename is not sufficient here.

As much as the search is concerned, XPATH is a good enough fit. The problem starts when trying to update data that doesn't yet exist.
for example, if I have a list of banned urls stored as:

<!-- .... -->
<app>
  <config>
    <lists>
      <blackList>
        <url>http://www.ratemypoo.com<
/url>
        <url>http://www.rotten.com</url>
      </...>
    </...>
  </...>
</...>
and now I want to add a url. piece of cake:
select by xpath app/config/blacklist, add <url> element.
but what if the <blacklist> doesn't yet exist? or the <lists> element?


regarding versioning, we got away with a smaller requirement, only to create snapshot of the entire configuration. So file system is still a viable option, but without git or similar (hq?), which will require integration that'll occupy more time than we have (the 1st phase is due within a month or so and is very very very very very very very very very very very very very tight. I would add some more "very"s but I don't have the time for it now. THAT TIGHT! :)

also,
The wiki link on CMDB looks interesting, but I didn't find any relevant opensource implementations... it would probably be an overkill, but it's all good to know.

Adi:
===

Why won't you help me with my dynamic type system?
It can be the first usecase... :)

Ittay:
====

i think it's fairly simple to write a function that recursively uses shorter xpaths to add missing nodes.
but i think that using standards will be better since it will make accepting the software easier in large organizations. see below for a cmdb implementation i found and also search for itil.

then this whole discussion is futile. create an xml file, write the pseudo database functions around it and maybe the next version will do things right.

a search in google gave this which looks interesting
http://www.onecmdb.org/wiki/index.php/Main_Page

what about using java-xml binding? then you read the file to objects, manipulate the objects and stream them back.

Zvika:
====
definitely an option, although it might raise some issues as the xml need to be customized so that it is stable enough not to rock the xpath boat when new attributes or even elements are added to the system. which one would you go for?  I am thinking of jaxb2 if we take that path.

oneCMDB looks interesting indeed,
I'm trying to figure out whether it has APIs or it is only a product-project (in which case modifiable but not in our timeframes)

cheers,
Thank you for your interest!

We will contact you as soon as possible.

Send us a message

Oops, something went wrong
Please try again or contact us by email at info@tikalk.com