The following topic lays out some basic design advice, as with all things not all of it should be applied rigidly, but an understanding of some of the pitfalls behind some approaches is always useful.
Keep it simple, this ones important so again, keep is simple
If you can achieve your goals using basic constructs, then do so. Do not feel just because some aspect of the schema standard exists it should be used.
The further you venture from the basic constructs the more likely you are to find bugs and ambiguity in the parser and validator implementations.
The core constructs will cover the majority of cases, using more advanced features like the Schema Dependencies, id's, AnyOf, AllOf, OneOf, Not, Implied Properties should be undertaken with clear understanding of how they work and what you are trying to achieve. They complicate schemas and make them much more difficult for a user to understand. That's not to say they can't be used, but make sure the additional complexity is worth it.
And keep in mind you are unlikely to be able to capture 100% of the rules relating to your data within your schema regardless of how complex you make it, so it's often better to keep it simple and add more documentation of explain any quirks.
A schema should have a single data type
If the 'type' keyword is not specified it defaults to 'any' allowing it to contain any data type, ("array", "boolean", "integer", "number", "null", "object", "string").
This is rarely desirable, and means that the value in the instance document can be very different from the value that was envisaged.
For example you may create a schema, give the schema some child properties, but leave its type empty (so it defaults to any), the instance object could contain all the values you specified, alternatively it could contain a string (which is unlikely to be what you intended).
You also get readability issues, the 'Any' property in the diagram although obvious only hints at the implications. Furthermore if an object us given both properties and array items, the implication to a user unfamiliar with the Json Schema standard is that the instance object will contain both the items and properties (not one or the other in the instance document).
Finally, the consumer of your schema must eventually produce code that will deal with the data it describes. If they have to be continually checking the type of the data then the code becomes more verbose less readable and more error prone. Much of this kind of validation can be pushed of to a validating Json parser if the schema is better typed.
So to summarize, there are legitimate reasons to produce a schema based on the union of two or more types, but on the whole it adds complexity and is error prone.
Keep references simple
Reusable schema definitions should be placed in the root schemas definitions section and nowhere else.
References should only reference schemas within the definitions section.
The Json schema is pretty lax when it comes to structuring references ($ref), as a result it allows schemas to be referenced from pretty much anywhere. This is not a good thing! It makes reading the schemas complex is also error prone.
Rules regarding the 'id' keyword.
The 'id' keyword opens up a whole can of worms, it's recommended that if it is used it should only be used on the root schema and contain an absolute URI.
If it is used elsewhere then the following recommendations should be followed.
id
in root schemasid
MUST be absolute. It MUST have no, or an empty, fragment part.id
if the rules above are not met.id
in subschemasid
MUST be a fragment only URI. The fragment MUST NOT be empty, and MUST NOT start with a solidus (/
) [this is to avoid conflicts with JSON Pointer].id
MUST NOT be used twice in a same schema.For a full discussion of this have a look at Francis Galiegue's article The "id" conundrum.
Names should (no MUST) be unique
The Json & Json Schema are very vague about what to do with duplicate names within an object. This ambiguity this leads to is potential bugs and security issues, so we have taken the approach of preventing duplicate names and treating them as errors if they are encountered.
You should carry this through into your Json documents.
Their is a lot of discussion threads dedicated to this issue, for more information of this a simple Google search will provide more information.